ArticlePDF Available

An end-to-end fault diagnostics method based on convolutional neural network for rotating machinery with multiple case studies

Authors:
  • 北京航空航天大学

Abstract and Figures

The fault diagnostics of rotating components are crucial for most mechanical systems since the rotating components faults are the main form of failures of many mechanical systems. In traditional diagnostics approaches, extracting features from raw input is an important prerequisite and normally requires manual extraction based on signal processing techniques. This suffers of some drawbacks such as the strong dependence on domain expertise, the high sensitivity to different mechanical systems, the poor flexibility and generalization ability, and the limitations of mining new features, etc. In this paper, we proposed an end-to-end fault diagnostics model based on a convolutional neural network for rotating machinery using vibration signals. The model learns features directly from the one-dimensional raw vibration signals without any manual feature extraction. To fully validate its effectiveness and robustness, the proposed model is tested on four datasets, including two public ones and two datasets of our own, covering the applications of ball screw, bearing and gearbox. The method of manual, signal processing based feature extraction combined with a classifier is also explored for comparison. The results show that the manually extracted features are sensitive to the various applications, thus needing fine-tuning, while the proposed framework has a good robustness for rotating machinery fault diagnostics with high accuracies for all the four applications, without any application-specific manual fine-tuning.
Content may be subject to copyright.
Journal of Intelligent Manufacturing
https://doi.org/10.1007/s10845-020-01671-1
An end-to-end fault diagnostics method based on convolutional
neural network for rotating machinery with multiple case studies
Yiwei Wang1·Jian Zhou1·Lianyu Zheng1·Christian Gogu2
Received: 29 November 2019 / Accepted: 15 September 2020
© Springer Science+Business Media, LLC, part of Springer Nature 2020
Abstract
The fault diagnostics of rotating components are crucial for most mechanical systems since the rotating components faults
are the main form of failures of many mechanical systems. In traditional diagnostics approaches, extracting features from raw
input is an important prerequisite and normally requires manual extraction based on signal processing techniques. This suffers
of some drawbacks such as the strong dependence on domain expertise, the high sensitivity to different mechanical systems,
the poor flexibility and generalization ability, and the limitations of mining new features, etc. In this paper, we proposed an
end-to-end fault diagnostics model based on a convolutional neural network for rotating machinery using vibration signals.
The model learns features directly from the one-dimensional raw vibration signals without any manual feature extraction.
To fully validate its effectiveness and robustness, the proposed model is tested on four datasets, including two public ones
and two datasets of our own, covering the applications of ball screw, bearing and gearbox. The method of manual, signal
processing based feature extraction combined with a classifier is also explored for comparison. The results show that the
manually extracted features are sensitive to the various applications, thus needing fine-tuning, while the proposed framework
has a good robustness for rotating machinery fault diagnostics with high accuracies for all the four applications, without any
application-specific manual fine-tuning.
Keywords Fault diagnostics ·Rotating machinery ·Vibration signals ·Convolutional neural network
Introduction
Rotating machinery is the essential equipment playing a
crucial character in the modern industry. As indispensable
key transmission devices of rotating machinery, the typi-
cal rotating components such as ball screws, bearings, and
gears, are the leading cause of failure in essential industrial
equipment such as induction motors, wheelset of high-speed
railway bogie, aero-engines, wind-turbine, etc. According to
statistics, 30–51% of rotating machinery failure are caused
by these key components (Islam and Kim 2019a; Zhao
et al. 2020). Failure of the rotating components results
in machine performance degradation, unwanted downtime,
BLianyu Zheng
lyzheng@buaa.edu.cn
1School of Mechanical Engineering and Automation, Beihang
University, Beijing 100191, China
2Institut Clément Ader (UMR CNRS 5312)
INSA/UPS/ISAE/Mines Albi, Université de Toulouse,
31400 Toulouse, France
economic losses and even human casualties. Normally, the
rotating components are installed deep inside the machine
and undergo a long degradation process from healthy to
failure. It is not practical to frequently shut down and disas-
semble the machines to examine their health state. If damaged
rotating components are left unattended, it may cause sec-
ondary damage for the machines. On the other hand, due to
different working conditions and other uncertainties, even
the same type of rotating components may exhibit their own
degradation process individually, making it difficult to accu-
rately estimate the health states based on statistics of large
samples. Therefore, online monitoring and real-time fault
diagnostics of individual rotating components based on man-
ufacturing big data is an urgent demand.
Smart manufacturing, which is characterized by the inte-
gration of Artificial Intelligence (AI) with recent emerging
technologies (Lee et al. 2018), enables online monitoring and
massive manufacturing data acquisition from sensors and ter-
minals installed in equipment. However, the data must be
converted into useful information before it can be of value
to the industry. Prognostics and health management (PHM)
123
Journal of Intelligent Manufacturing
Fault classification
SVM
Signal processing based
feature extraction
Raw vibration
data
Data
segmentation
Time domain features
Frequency features
Time-frequency
features
FNN
Deep learning flow
LSTM
Results
Raw vibration
data
Data
segmentation Results
(a)
(b)
Fig. 1 Diagnostics methods: traditional versus deep learning based
is such a bridge converting manufacturing big data to use-
ful information. As an emerging discipline receiving great
attention from both academia and various industries, PHM
has been listed as a part of the “standard architecture of smart
manufacturing” proposed by China. PHM deeply fuses AI
into manufacturing industries through a complete architec-
ture containing functions such as intelligent fault diagnostics,
prognostics, predictive maintenance, etc. (Vogl et al. 2019;
Xia and Xi 2019). This fusion enables timely online fault
diagnostics of devices as well as their future state predic-
tion, and consequently, results in the improvement of the
maintainability, supportability, reliability and safety of essen-
tial industrial equipment. As an important part consisting of
PHM, intelligent fault diagnostics provide solutions for real-
time fault diagnostics of individual rotating components.
For rotating machinery, the vibration signal is widely
used for fault diagnostics due to various advantages, such
as continuous monitoring without stopping the machines,
ease of use, sensitivity towarding faults etc. Traditional intel-
ligent fault diagnostics normally contains two sequential
steps of manually extracting features from raw vibration
signals followed by establishing the mapping between the
extracted features and the corresponding states based on
classification techniques such as support vector machine
(SVM) (Goyal et al. 2019) or feedforward neural network
(FNN), as shown in Fig. 1a. Whether a fault sensitive feature
can be extracted affects the performance of the diagnos-
tics model significantly, and hence lots of effort are devoted
to extracting suitable features before a classification algo-
rithm can be employed. The features are normally extracted
from time domain (Park et al. 2018), frequency domain,
or time-frequency domain using various signal processing
techniques such as fast Fourier transform, Hilbert-Huang
transform (Feng and Pan 2012), empirical mode decompo-
sition (Liu et al. 2018), variation mode decomposition (Yan
and Jia 2018), wavelet transform (Dhamande and Chaudhari
2018 ; Wang et al. 2018a), intrinsic time scale decompo-
sition (Feng et al. 2016), local mean decomposition (Wang
et al. 2018b), etc. Manually extracting features, while having
led to satisfying results in the past, also exhibits some draw-
backs. The complex signal processing techniques required by
feature extraction highly depend on the expertise and prior
knowledge, and also require lots of human labour. In addition,
manually extracted features are normally empirical and thus
sensitive to changes. These empirical features reduce the flex-
ibility and the generalization ability of the diagnostics model,
i.e., the model performs highly accurately for one particu-
lar diagnostics task while much less accurately for another
task. Therefore, significant human labour and expertise are
required for exploring and designing suitable features for dif-
ferent diagnostics tasks (Jing et al. 2017). These difficulties
in feature extraction seriously hinder fault diagnostics evolv-
ing into a mature technology that can be widely deployed in
industry.
The strong feature-learning ability of deep learning such
as auto encoder and convolutional neural network (CNN)
provides a potential solution to the aforementioned draw-
backs (Hamadache et al. 2019; Zhao et al. 2019;Lietal.
2019a; Jia et al. 2018). The hierarchical structures of multiple
neural layers enable deep learning networks to directly mine
information from raw data layer by layer (Fig. 1b). Compared
with other deep learning methods, CNN significantly reduces
the number of parameters to be optimized by the strategies of
weight sharing and sub-sampling. CNN also has strong anti-
noise ability because of its insensitivity to the local change
due to the convolution process. Inspired by the successful
employment of CNN in image classification area, it is easy
to think of converting waveform signal into images and then
using CNN for fault diagnostics. Hoang and Kang (2019)
converted vibration signal into grayscale images through a
simple method proposed by Nguyen et al. (2013), and then
fed the images into CNN for bearing diagnostics. Chen et al.
(2019) proposed a scheme combining discrete wavelet trans-
formation (DWT) with CNN for planetary gearboxes fault
diagnostics. A series of sets of wavelet coefficients of DWT
were used as the input of CNN. Wang et al. (2019) proposed
a conversion method converting vibration signals from mul-
tiple sensors to images. A bottleneck layer optimized CNN
123
Journal of Intelligent Manufacturing
was used for rotating machinery diagnostics. Islam and Kim
(2019b) used 2D representation of acoustic emission sig-
nal processed by wavelet packet transform as the input of
an adaptive deep CNN for bearing fault diagnostics. Wang
et al. (2017) converted time sequences signal of gear box
into time-frequency images using continuous wavelet anal-
ysis and then fed the images into a deep CNN. Zhu et al.
(2019a) transformed multiple vibration signals of a rotor
into symmetrized dot pattern (SDP) images before classified
by CNN. Zhu et al. (2019b) employed short-time Fourier
transform to convert one-dimensional signals of bearing into
a time-frequency graph and then a novel capsule network
was proposed for diagnosing. Liang et al. (2020) employed
wavelet transform to extract time-frequency image features
from raw signals. Generative Adversarial Networks (GANs)
were used to generate additional fake training images for data
augmentation purposes. A CNN model was built for fault
modes classification. The proposed method was validated on
a gearbox application. Chen et al. (2020) used cyclic spec-
tral analysis to obtain the two-dimensional Cyclic Spectral
Coherence maps of vibration signals and a CNN model was
constructed to learn high-level feature representations and
conduct fault classification. The method was validated on a
public dataset of bearing faults published by the Case Western
Reserve University (CWRU). Zhang et al. (2020) processed
the raw vibration signals to gray-scale images without any
predetermined parameters and then fed into a CNN with two
dropout layers and two fully-connected layers for fault clas-
sification. CWRU bearing dataset were used for validation.
It can be seen that most studies require one additional
step that converts 1D vibration signal into 2D represen-
tations before using the CNN model, which circumvent
some drawbacks of manually feature extraction but still need
application-specific adaptation. Recently, directly extracting
features from one-dimensional raw vibrational data without
any signal processing techniques has begun to be proposed
by researchers. This provides an end-to-end solution for fault
diagnostics, which reduces the dependencies on expertise and
prior knowledge, and hence facilitates the use and deploy-
ment of diagnostics model. Wu et al. (2019) optimized the 2D
CNN to be a one-dimensional CNN that is suitable for pro-
cessing vibration signals, and validated the proposed model
on gearbox application. Li et al. (2018a) proposed a 1D CNN
model with the residual learning algorithm for bearing fault
diagnostics, and the raw data without any pre-processing
were fed into the built model. Li et al. (2020) developed an
adaptive 1D separable convolution with residual connection
network for diagnosing gear pitting. Peng et al. (2019)pro-
posed a deeper 1D CNN based on a 1D residual block for the
fault diagnostics of wheelset bearings in high-speed trains.
Wide convolution kernel and dropout technology were used
in the CNN to enhance the network’s generalization perfor-
mance. The traditional fault diagnostics, 1D CNN and 2D
CNN methods employed in the literature reviewed above are
summarized in Table 1.
The above studies focus only on one specific application.
Specifically, bearings and gearboxes are more extensively
studied than ball screws, whose fault diagnostics studies are
very limited due to lack of public dataset. The generaliz-
ability of CNN to fault diagnostics of rotating components
is necessary to be fully investigated. In this paper, we pro-
pose an end-to-end fault diagnostics method based on CNN
using raw vibration signal. A CNN model consisting of three
stacks of convolutional and pooling layer, dropout layer and
fully connected layer is proposed. The alternating convo-
lution and pooling layers of the CNN model automatically
extract feature maps from raw data layer by layer. The soft-
max function is used as the activation function of the last
fully connected layer for dealing with multi-class classifica-
tion problems. No manually extracted feature is necessary. To
fully validate the effectiveness and the generalizability of the
proposed model for fault diagnostics of rotating components,
we tested on four datasets, including two public ones and two
of our own, covering the applications of ball screw, bear-
ing and gearbox. These three types of rotating components
are the typical ones that are widely used as the key compo-
nents in essential industrial equipment such as machine tools,
high-speed trains, aero-engines, wind-turbine, gas-turbine,
etc. To our best knowledge, our work firstly validates the
CNN model for fault diagnostics in such wide applications.
Moreover, the signal processing based feature extraction
combined with long short-term memory (LSTM) network
(the combination method is referred to as traditional method
here) is also explored and compared with the proposed CNN
model. Specifically, three typical engineered features, i.e.,
(a) wavelet packet energy (WPE) based on wavelet packet
decomposition, (b) instantaneous frequency (IF), and (c)
instantaneous spectral entropy (ISE) based on power spec-
trogram, are constructed from the raw vibration data and then
used as the input of an LSTM network. The proposed CNN
model is compared with the traditional method in terms of
accuracy and robustness in various applications
The remainder of the paper is organized as follows. “The
CNN-based diagnostics framework for rotating machin-
ery” section details the structure and the feature learning
mechanism of the proposed model. In “Case studies and dis-
cussions” section, the generalization of the proposed model
is verified by four case studies covering the commonly used
rotating components of ball screw, bearing and gear. The
generalizability and robustness of the proposed model is fur-
ther discussed by the comparison with traditional methods.
Finally, conclusions and perspectives are given in “Conclu-
sions and future work” section.
123
Journal of Intelligent Manufacturing
Table 1 Summary of different categories of fault diagnostics
Category Method Advantage/disadvantage References
Traditional fault diagnostics First manually extract features
based on signal processing
techniques such as Fourier
transform, Hilbert-Huang
transform, empirical mode
decomposition, wavelet
transform, etc.
Then feed the features to classifier
such as support vector machine,
shallow neural networks, etc.
Signal processing techniques
highly depends on expertise and
prior knowledge
Manually extracted features are
application-specific and quite
sensitive to environment or
working conditions.
Require lots of skilled labour to
explore and design suitable
features for new diagnostics
task.
Goyal et al. (2019), Feng and Pan
(2012), Liu et al. (2018), Yan
and Jia (2018), Dhamande and
Chaudhari (2018), Wang et al.
(2018a,b), Feng et al. (2016),
andJingetal.(2017)
2D convolutional neural network First convert raw vibration signal
to 2D representations such as
grayscale images,
time-frequency images,
symmetrized dot pattern
images, cyclic spectral
coherence maps, etc.
Then feed the 2D representations
to 2D convolutional neural
network for classification.
Circumvent some drawbacks of
manually feature extraction but
still need application-specific
adaptation.
Hoang and Kang (2019), Nguyen
et al. (2013), Wang et al.
(2017,2019), Islam and Kim
(2019b), Zhu et al. (2019a,b);
Chen et al. (2020), Zhu et al.
(2019), Chen et al. (2019),
Liang et al. (2020)andZhang
et al. 2020
1D convolutional neural network Use 1D convolutional neural
network to accomplish direct
feature extraction from raw
vibration signal and
classification.
Provide end-to-end solutions for
fault diagnostics
Reduce the dependencies on
expertise and prior knowledge
Reduce the sensitivities to
environment or working
conditions
Facilitate the use and deployment
of diagnostics models.
Wu et al. (2019), Li et al.
(2018b,2020) and Peng et al.
(2019)
Fig. 2 A typical architecture of CNN (Jing et al. 2017)
123
Journal of Intelligent Manufacturing
The CNN-based diagnostics framework
for rotating machinery
Convolutional neural networks (CNNs), first proposed by
LeCun for image processing, has two characteristics, i.e.,
spatially shared weights and spatial pooling (Goodfellow
et al. 2019). The architecture of a typical CNN is illus-
trated in Fig. 2, which is structured by series of stages (Jing
et al. 2017). The convolutional layer convolves multiple fil-
ters with raw input data and generate feature maps. Pooling
layer often follows the convolutional layer to reduce the size
of feature map and extract the most significant local features
(Li et al. 2019b). The last stage of the architecture consists
of a fully-connected layer, which is normally a multi-class
classification model.
The schematic diagram of the proposed framework is illus-
trated in Fig. 3. The sliding window method was used to
segment the raw time sequence vibration data of each health
state and then reshape to a matrix before feeding into the
neural network. The one-hot encoding method is used to
manually create the labels of samples, which serve as the
output of the network. For example, if there are three classes
of data, the first class is encoded as (1, 0, 0), the second (0,
1, 0), and the third (0, 0, 1). The self feature learning abil-
ity is realized by the hidden layers, which is comprised of
stacks of alternated convolutional layers and pooling layers.
One-dimensional convolution kernels and pooling kernels
are used in the network since the input is a one-dimensional
time series signal. The structure of the CNN model and the
feature learning process are detailed below.
Structure of the proposed CNN model
The structure of the proposed model is illustrated in Fig. 4,
including three stacks of convolution-pooling layers and a
fully connected layer. In the convolutional layer, multiple
filters are convolved with raw input data and generate trans-
lation invariant features. In the subsequent pooling layer, the
feature is compressed by sliding a fixed-length window fol-
lowing several rules such as average, max and so on. In the
first two stacks maxpooling layer is used while in the last
stack the average pooling layer is used. The data flow from
the input of the network to the final output in Fig. 4is detailed
by explaining the entities (denoted by the Greek letters) and
the actions (denoted by arrows).
1αis the input matrix of the network, which has the shape
(m1,n1). Note that we use the form (m,n) to represent a
m-by-nmatrix. The subscript of mand nas well as fand
sthat will be introduced later represents the index of the
layer.
2βiis a filter with shape (h,n), in which, i 1,2,…,f1.f1
is the number of filters in the 1st layer. h is the kernel size
of the convolution.
3γis the output matrix of the 1st convolution layer, having
the shape (m1,f1).
4Fromαto γ, the convolution operation is carried out,
which is detailed as follows. The dot product between
filter βiand a concatenation vector αk:k+h1defines the
convolution operation.
cjϕ(βi·αk:k+h1+b)(1)
in which, ·represents the dot product, bthe bias term
and ϕthe non-linear activation function. αk:k+h1is a
h-length window starting from the k-th row to the (k +
h-1)-th row, which is defined as:
αk:k+h1=αkαk+1 ⊕···⊕αk+h1(2)
where is the concatenation operation of two vectors.
As defined in Eq. 1, the output scalar cjcan be regarded
as the activation of the filter βion the corresponding
concatenation vector αk:k+h1. By sliding the filter βi
through αand applying zero padding technique, m1out-
put scalar cjcan be obtained, forming a column vector ci,
also known as a feature map:
ci[c1,c2, ..., cj, ..., cm1](3)
One filter corresponds to one column vector. Since there
are f1filters in the first layer, the output matrix γis thus
(m1,f1) matrix. From the above operation it can be seen
that one filter performs multiple convolution operations,
during which the weights of the filter are shared. The
feature map ci, obtained by convolving one filter βiover
the input data, represents the feature of the input data
extracted from a certain level. By convolving the input
data with multiple filters, a high-dimensional feature map
containing multiple column vectors that reflect the input
data from different perspectives are extracted.
5. μis the output matrix of the 2nd layer, having the
shape (m2/s2,n2), where s2is the pooling length of
the 2nd layer. Note that m2and n2denote input size
of the 2nd layer. Since the output of the current layer
is the input of the next layer, m2m1and n2
f1.
6Fromγto μ, max pooling operation is carried out, which
is detailed as follows. The max operation is taken over
the s2consecutive values in ci. Then the compressed
column vector his obtained as:
hi[h1,h2, ..., hl, ..., hm/s](4)
123
Journal of Intelligent Manufacturing
Fig. 3 Framework of proposed diagnostics model
1
st
layer: 1D convoluon 3
rd
layer: 1D convoluon2
nd
layer: Max Pooling 5
th
layer: 1D convoluon4
th
layer: Max Pooling 6
th
layer:
GlobalAveragePoolin g Fully connected layer
1
In
out
i
β
11
:( , )mnα
i
c
11
:( , )mfγ
4
2
6
5
222
:( , )msnμ
22
:( , )mnγ
3
i
h
l
h
Stack 1 Stack 2 Stack 3
Take average value
Take max value
(m
3
, f
3
)
(m
3
, n
3
)(m
4
, n
4
)
(m
5
, f
5
)
(m
5
, n
5
)(m
6
, n
6
)
(m
4
/s
4
, n
4
)(m
6
/s
6
, n
6
)
j
c
p
1
p
2
p
k
Max(p1,p2, ,p
k
,)
Fig. 4 Structure of proposed CNN network
where hl=max[c(l1)s+1,c(l1)s+2,···,cls]
From above we see that when a matrix goes through one
convolution layer, its number of rows keep unchanged
and the number of column equals to the number of
filters. In the case of pooling layer, the number of
columns keeps unchanged while the number of rows is
compressed depending on the pooling length.
7. In the 2nd and 3rd stacks, the convolution and pooling
propagate. The only difference is that the number of
filters and the pooling length varies.
8 The output of the 7th layer is flattened and connected
with a fully connected layer, which is similar to a tra-
ditional multilayer neural network and can be applied
through different classification. The dropout technique
is employed to prevent overfitting. The softmax func-
tion (Behley et al. 2013) is used as the last layer, which
gives the probability of being each label. Specifically,
assuming a K-label classification task, the output of the
softmax function can be calculated as Eq. 5, in which
Wkand bkare the weight matrix and bias, P(yk|x;
Wk,bk) is the probability of being the k-th label (denoted
as pkin Fig. 4) given the input xand the correspond-
ing weight and bias. Here xis the vector after drop
out in the fully connected layer. The final output of the
123
Journal of Intelligent Manufacturing
network is the health sate label with the highest proba-
bility.
P(y1|x;W1,b1)
...
P(yk|x;Wk,bk)
P(yK|x;WK,bK)
=1
K
k1exp(Wkx+bk)
exp(W1x+b1)
...
exp(Wkx+bk)
exp(WKx+bK)
(5)
Hyperparameters
The activation function of all the convolutional layers is the
Relu function due to its ability to avoid gradient vanishing
and to its fast convergence. The loss function of the CNN
model is cross-entropy and the precision function is categor-
ical accuracy. L2 regularization term is set for the first and
third convolution layers to reduce overfitting. The parameter
of L2 term is a trade-off between the effectiveness of training
and overfitting, i.e., a too-large value will lead to inadequate
training and a too-small value is not enough to reduce the
risk of overfitting. We set this value to 0.001 based on the
prior study (Ng 2004). Dropout is set for the fully connected
layer to reduce overfitting by directly setting the neurons of
the network to zero in a given proportion. We referred to
the study of the founder of the dropout technique (Srivastava
et al. 2014) and set this proportion to 0.5, which is a typical
value in deep learning.
The initial weight of the network is set by the glorot
uniform function, and the bias are set to 0. The weight is opti-
mized by the adaptive moment estimation (ADAM) solver
with initial learning rate 0.001 and exponentially decayed
rate 0.1. Adam solver is a combination of the Momentum
and RMSProp optimization algorithms. It designs an inde-
pendent adaptive learning rate for different parameters by
calculating the first-order moment estimation and second-
order moment estimation of the gradient, which typically
gives better optimization performance than the alternative
stochastic gradient descent with momentum (SGDM) solver
(Kingma and Ba 2015). Adam algorithm is currently the most
widely used optimization algorithm embedded in the field of
machine learning and deep learning.
The mini batch training strategy is adopted here. Specif-
ically, the training examples are divided into small batches.
The model parameters will be updated after each batch pass-
ing through the network. The passing through of one batch
is called one iteration. When the entire training example is
passed through the network once and each example has the
opportunity to update the model parameters, it is one epoch.
The execution environment is an Intel e5-2620v4 CPU and
a GeForce RTX2080Ti GPU. The above network setting and
the execution environment will be used in all the following
cases.
Case studies and discussions
Case 1: Ball screw lubrication states diagnostics
Experiment and data preparation
In this case study, the proposed model is validated for diag-
nosing the lubrication states of the ball screw. Ball screws
are crucial mechanical components being intensively used in
many engineering systems that requires precise positioning
such as the feed system in machine tool, and in high precision
leveling systems for aircrafts and missiles (Li et al. 2018a).
The growing demand for high speed and large lead for ball
screws makes it increasingly important to keep good lubrica-
tion in order to reduce the friction. Indeed, correct lubrication
is vital to ball screws since the lubrication affects signifi-
cantly its performance. Poor lubrication may increase the
friction and impairs the positioning accuracy of ball screws.
In addition, abnormal vibration caused by poor lubrication
accelerates the damage of the machine tool and affects the
quality of machining. Therefore, monitoring and online diag-
nosing of the lubrication state of the ball screw is important
for improving position accuracy and lifetime of ball screws.
Very few reports are available regarding ball screw lubri-
cation state diagnostics. Motivated by this, we design an
experiment that simulates the different lubrication states of
ball screws. The experiment is carried out in the test bench
which was originally designed for measuring the friction
torque of a ball screw, as shown in Fig. 5. The drive sys-
tem drives the nut moving along the screw back and forth.
Three states labeled as “Grease”, “Oil”, and “Absent” are
simulated by (1) lubricating the ball screw using grease, (2)
lubricating using oil and (3) removing the original lubricant,
respectively. These three health states simulate the typical
lubrication states that ball screws may encounter in real work-
ing environment. The vibration signals corresponding to the
three states are acquired at a sampling rate 5 kHz with the
data acquisition system Prosig P8020, as shown in Fig. 6.
128-s data are acquired for each lubrication states.
The raw time domain signals of one round trip (forward
and reverse motion) of the nut under “Absent” lubrication is
shown in Fig. 7. Two parts can be clearly seen, which cor-
respond to the signals of forward and reverse motions of the
nut, respectively. The abrupt “peaks” are due to the sharp
slowdown and stop of the nut near the end of the motions.
The data near the beginning and end of the motions are dis-
carded. Only the “steady state” data in the middle stage of
the motions are retained. For conciseness, the full raw sig-
nals under “Oil” and “Grease” lubrications are not presented.
Instead, the retained segment of the forward motion of the
three lubrication conditions are given in Fig. 8a. It can be seen
that the differences among the three states are quite small,
thus we further transform the signal into frequency domain
123
Journal of Intelligent Manufacturing
Fig. 5 Ball screw test bench
Fig. 6 Data acquisition set-up
Forwar d motion Rever se motion
Retained data Retained data
Peak due to sharp
slowd own
Absent
Fig. 7 Raw signal under “Absent” lubrication
using FFT, as shown in Fig. 8b. The differences among the
three cases are not obvious and it is hard to see appropriate
patterns, making it more challenging to correctly distinguish
different lubrication conditions.
The raw vibration signal is divided into segments to form
the input samples of the network. For each state, there are
128 ×5000 6.4 ×105data point. 6400 samples are selected
as one segment and is further reshaped to a (64, 100) matrix. It
is worth pointing out that the sample length should be traded
off between the number of samples and the feature infor-
mation that one sample contains. A too-short length of time
window may carry incomplete feature information, leading
to the difficulty of diagnostics, while a long length of time
window will result in insufficient training data. Based on the
sampling rate of data used in this paper as well as other related
research works, we take 6400 data points as one sample. 80%
data (80 samples) are reserved for training and the rest 20%
(20 samples) for testing. Finally, the training/testing samples
taken from each lubrication state form the overall training
sets (80 ×3240 samples) and the testing sets (20 ×3
60 samples). The input/output shape, the kernel size, stride
and number of filters of each layer during the training process
are reported in Table 2. Note that the above hyperparameters
(the kernel size, stride and number of filters of each layer)
remain unchanged in all the following case studies.
Results and discussions
The diagnostics accuracy on the test set is 100% and all
the three states are correctly classified (thus the confusion
matrix is not given). In order to better illustrate the fea-
ture learning process of the CNN model, the t-distributed
123
Journal of Intelligent Manufacturing
(a) (b)
Fig. 8 Retained segment of the forward motion of the three lubrication conditions: (a) in the time domain, (b) in the frequency domain
Table 2 Parameters of the
proposed model No. of
layer
Layer Input shape Kernel size/stride/number of filters Output shape
1 Convolution (240, 64, 100) (3, 100)/1/64 (240, 64, 64)
2 Maxpooling (240, 64, 64) 3/3 (240, 21, 64)
3 Convolution (240, 21, 64) 3 ×64/1/128 (240, 21, 128)
4 Maxpooling (240, 21, 128) 3/3 (240, 7, 128)
5 Convolution (240, 7, 128) 3 ×128/1/128 (240, 7, 128)
6 Average pooling (240, 7, 128) / (240, 128)
7 Dropout (240, 128) / (240, 64)
8 Fully connected (240, 64) / (240, 3)
stochastic neighbour embedding (t-SNE) technique (Maaten
and Hinton 2008) is used to illustrate the output of each layer.
t-SNE is a machine learning algorithm for high dimensional
data visualization using nonlinear dimensionality reduction
technique. For the current case study, the feature outputted
by each layer is high-dimensional, whose shape is given in
Table 2(e.g., after dropout layer, the feature that is fed into the
fully connected layer for classification is a 1-by-64 vector).
We use the t-SNE technique to reduce the feature after each
layer to two-dimensional space in order to show how the data
“flow” from input to output, and thus to see how the features
belonging to the same state aggregate. Figure 9shows this
process during testing, in which the distance between points
represents the similarity of different samples. The symbols
“1” (red), “2” (blue) and “3” (green) in the figures represents
the “Absent”, “Oil”, and “Grease” lubrication states, respec-
tively. We see that the in the input layer, the dots of the three
states are completely mixed and no pattern can be observed
to distinguish different fault modes. With the convolutional
and pooling operations implemented, the mixed dots gradu-
ally separated, and in the output layer, dots belonging to the
same state are clustered and dots belongs to different state
are completely separated. In the output layer of Fig. 9,it
can be seen that all the features belonging to the same state
are clustered and the features belonging to different state are
completely separated. There is no confusion, corresponding
to 100% accuracy.
123
Journal of Intelligent Manufacturing
Fig. 9 Model testing process
visualization by t-SNE, case
study 1
Input l ayer 1st layer 2nd layer
3rd layer 4th layer 5th layer
6th layer Output layer
Two well-known and widely used machine learning meth-
ods, i.e., feedforward backpropagation (BP) network and
support vector machine (SVM) are used here as comparison
methods. The sample length remains the same as that used
by the CNN model, and accordingly, the number of train-
ing samples and testing samples are unchanged. Since these
two methods normally accept low-dimensional or moderate-
dimensional data as input, each raw sample is pre-processed
by wavelet packet decomposition (WPD) to extract a feature
vector. Specifically, based on our prior knowledge on the
study of fault diagnostics using the vibration signal, a five-
level WPD is applied on each raw sample and accordingly
2532 frequency sub-bands of the raw sample are obtained.
The energy of each sub-band is calculated and concatenated
to form a 1-by-32 feature vector, which is the input of the BP
neural network and SVM.
A typical three-layer BP neural network is used. The num-
ber of neurons in the input layer and output layer are 32 and
3, which equals to the length of the feature vector and to the
three lubrication states, respectively. A hidden layer contain-
ing 10 neurons is adopted. Note that the number of neurons
in the hidden layer is typically an empirical value. Too many
or too few neurons may reduce the classification accuracy
of the network. We gradually increased the number of neu-
rons from 5 to 20, and finally set this value to 10, where the
network achieved its highest accuracy.
The basic SVM for binary classification is employed and
is turned into multi-classes classifiers by the strategy of “one-
vs-all”. The strategy involves training a single SVM classifier
for each class, with the samples of that class as positive
samples and all other samples as negatives. Specifically, for
the current case study of ball screw, we trained three basic
SVM classifiers that are able to diagnose “Absent”, “Oil”, and
“Grease”, respectively. The Radial Basis Function (RBF) is
used as the kernel function.
The diagnostics accuracies of BP neural network and
SVM are 95%, and 90%. The corresponding confusion matri-
ces are given in Fig. 10a, b, respectively. The last column in
the matrix shows the percentages of examples predicted to
belong to each label that are correctly classified (also called
precision, or positive predictive value). For example, in the
1st row of Fig. 10a, 19 samples are classified by the proposed
model as lubrication state “Absent”, while 18 out of these 19
are correctly classified. One sample that should belong to
label “Oil” are incorrectly classified as “Absent”. The pre-
cision for label “Absent” hence equals to 18/19 94.7%.
The row at the bottom of the matrix shows the percentages
of all the samples belonging to each class that are correctly
classified (also called sensitivity or true positive rate, recall,
probability of detection, etc.). For example, in the 1st col-
umn of Fig. 10a there are 20 samples of “Absent”. 18 out
of 20 are correctly classified and one sample is incorrectly
identified. The sensitivity for label “Absent” is thus 18/20
90%. The value in the bottom right corner is the overall
classification accuracy of the network, which equals to the
number of correctly classified samples divided by the total
number of testing samples, in this case, 95%.
123
Journal of Intelligent Manufacturing
Fig. 10 Confusion matrices of
case study 1, given by aBP
neural network and bSVM
18
30.0 %
4
2.0%
0
0.0%
2
3.33%
16
26.7 %
0
0.0%
0
0%
0
6.2%
20
33.3 %
100%
0.0%
80.0 %
20.0 %
100%
0.0%
90.0 %
10.0 %
81.8 %
18.2 %
88.9 %
11.1 %
100%
0.0%
90.0 %
10.0 %
18
30.0 %
1
1.6%
0
0.0%
2
3.3%
19
31.7 %
0
0.0%
0
0.0%
0
0.0%
20
33.3 %
100%
0.0%
95.0 %
5.0%
100%
0.0%
90.0 %
10%
94.7 %
5.3%
90.5 %
9.5%
100%
0.0%
95.0 %
5.0%
Absent Oil GreaseAbsent Oil Grease
TargetTarget
Output of network
Output of netw ork
(a) (b)
Compared with the proposed CNN model, the overall
accuracies of the BP neural network and SVM are lower. Fig-
ure 10 indicates that confusions occur between the lubrication
states of “Absent” and “Oil”. Objectively, in real working
environment, the differences between “Absent” and “Oil”
are indeed small. This is also reflected in Fig. 8that the time-
domain and frequency-domain signals of these two states are
very similar and hard to be visually distinguished. In contrast
to the BP neural network and SVM, the CNN can well clas-
sify these two states.
Case study 2: Bearing fault diagnostics using CWRU
dataset
Data description and preparation
The public bearing fault dataset from Case Western Reserve
University (CWRU) (Case Western Reserve University
Bearing Data Center Website, https://csegroups.case.edu/
bearingdatacenter/home) is used to validate the proposed
model in this section. A benchmark study of CWRU dataset
was studied by (Smith and Randall 2015). As shown in
Fig. 11, the test bench consists of a 2 hp motor (1 hp
735 W), a torque transducer/encoder, a dynamometer, and
control electronics. SKF-6202 deep groove ball bearings are
used as the test bearings and support the motor shaft. The
experiments were performed under four working conditions,
as reported in Table 3. Four fault modes: “Outer race fault”,
“Inner race fault”, “Ball fault” and “Normal” are introduced.
For each fault mode, a single fault point with three severi-
ties levels, i.e., fault diameters of 0.007 mil, 0.014 mil, and
0.021 mil were seeded, which is regarded as a different fault
mode. Therefore, there are 10 fault modes. Vibration data of
each fault type under each working condition were collected
using accelerometers, which were attached to the housing
with magnetic bases. The sampling rate was 48 kHz.
Fig. 11 Test bench of case study 2 [Case Western Reserve Uni-
versity Bearing Data Center Website, https://csegroups.case.edu/
bearingdatacenter/home]
Table 3 Description of working conditions
Working condition Motor load (hp) Motor speed (rpm)
1 0 1797
2 1 1772
3 2 1750
4 3 1730
The number of data acquired of each fault type under each
working condition are reported in Table 4. The amount of
data provided is different for each working condition. To
adapt this variation, the data preparation is adjusted. The
training/testing ratio is set to be 4:1. For conditions 2–4, the
number of data for each label is truncated to 4.8 ×105.6.4×
103data points are segmented and further reshaped to (64,
100) matrix as one sample. Thus 4.8 ×105/6.4 ×10375
samples for each fault mode are obtained. For condition 1,
the mode of the inner race damage with 0.014 mil is left out
due to insufficient data. Indeed, we are aware of some data
augmentation techniques such as adding noise or Genera-
tive Adversarial Network that may mitigate the problem of
123
Journal of Intelligent Manufacturing
Table 4 Description of CWRU bearing fault data
Fault mode Fault label Number of data points available in each working condition
Condition 1 Condition 2 Condition 3 Condition 4
Normal 1 243,938 483,903 483,903 485,643
Inner race fault with fault diameter 0.007 mil 2 243,938 486,224 485,643 485,643
Inner race fault with fault diameter 0.014 mil 3 63,788 489,125 487,964 485,063
Inner race fault with fault diameter 0.021 mil 4 244,339 485,063 491,446 491,446
Outer race fault with fault diameter 0.007 mil 5 243,538 486,804 486,804 487,964
Outer race fault with fault diameter 0.014 mil 6 245,140 484,483 486,804 488,545
Outer race fault with fault diameter 0.021 mil 7 246,342 489,125 487,964 489,125
Ball fault with fault diameter 0.007 mil 8 243,938 487,384 486,804 488,545
Ball fault with fault diameter 0.014 mil 9 249,146 486,224 487,384 486,804
Ball fault with fault diameter 0.021 mil 10 243,938 486,804 487,384 486,804
Fig. 12 Model testing process
visualization under condition 2,
case study 2
Input layer 1
st
layer 2
nd
layer
3
rd
layer 4
th
layer 5
th
layer
6
th
layer Output layer
1 - red
2 - yellow
3 - blue
4 - green
5 - verdigris
6 - magenta
7 - gray
8 - pink
9 - meadow
10 - purple
1
3
2
4
5
96
7
8
10
insufficient data, but such an investigation will be left for our
coming work. The data of other modes is truncated to 2.4 ×
105.2.4×103data points are segmented and further reshaped
to (24, 100) as one sample. 100 samples are obtained for each
fault label, and there are hence 80 ×9720 samples for
training and 20 ×9180 samples for testing.
Results and discussions
For each condition the diagnostics accuracy on the test set is
100%. Due to the space limitation, only the feature learning
process during testing of condition 2 are visualized by t-SNE,
asshowninFig.12. The 10 symbols in different colours
represents the 10 fault labels of condition 2. It can be seen
that from the 5th layer, features of same fault mode have
already been well aggregated and the features belonging to
different modes have been well separated.
Case study 3: Bearing fault diagnostics with private
dataset
Experiment and data preparation
In this case study, we validate the proposed model with the
bearing fault dataset acquired from our own test bench, as
shown in Fig. 13. Seven health states are considered, includ-
123
Journal of Intelligent Manufacturing
Bearing
Accelerometer
Laser sensor
Speed
controller
Motor
Fig. 13 Private test bench for bearing fault
ing the normal state, four types of single-point faults (i.e.,
inner race, outer race and ball), and three types of compound
faults (i.e., inner race and ball, inner race and outer race, outer
race and ball). Based on the literature and based on our obser-
vations and experiences, these faults are the most frequently
occurs. The vibration data are collected from an NSK-6308
deep groove ball bearing in the experiment performed under
three motor speeds 1500 rpm, 2000 rpm and 2500 rpm at
the sampling rate 20 kHz. For each health state under each
motor speed, the data acquisition lasts for 256 seconds, thus
5.12 ×106data points are acquired. 6.4 ×103data points
are segmented and further reshaped to (64, 100) matrix as
one sample. Thus 800 samples are obtained for each health
state. The train/test ratio is set to 4:1. Figure 14 illustrates the
vibration signal recorded in one second corresponding to the
eight health states. For confidentiality reasons, the raw data
is normalized to (1, 1).
Results and discussions
The diagnostics confusion matrices obtained on the test set
for all three motor speeds are shown in Fig. 15, where the
accuracies are nearly 100%. Labels 0–6 representes the fol-
lowing fault modes, i.e., 0-ball, 1-inner race, 2-outer race,
3-compound fault of inner race and ball, 4-compound fault
of outer race and ball, 5-compound fault of outer race and
inner race, 6-normal.
Due to the space limitation, only the feature learning
process during testing under the motor speed 1500 rpm is
visualized by t-SNE, as shown in Fig. 16. It can be seen that
in the output layer the features of same fault mode have been
well aggregated and the features belonging to different modes
have been well separated. Note that in the output layer, the
samples belonging to “6” are totally aggregate. Very few con-
fusions occur between “3” and “4”, and between “4” and “5”,
which is consistent with the confusion matrix of Fig. 15a.
Case study 4: PHM 2009 spur gearbox challenge data
Data description and preparation
The 2009 PHM data challenge of gearbox fault data is used
in this case study. Readers are referred to “PHM data chal-
lenge 2009,https://www.phmsociety.org/competition/PHM/
09” for more information about the experiment setting. The
overview of the apparatus is shown in Fig. 17a, including
the drive system, a tachometer for providing zero-crossing
information, the testing gearbox, and two accelerometers for
collecting data. Two sets of gears, i.e., spur gears and helical
gears were tested. We used the data of the spur gear since it
contains more fault modes than that of the helical gears. The
spur gearbox is a generic industrial one containing 3 shafts,
4 gears and 6 bearing, as shown in Fig. 17b. The teeth of the
input gear, 1st idler gear, 2nd idler gear and the output gear
are 32, 96, 48, and 80, respectively. Therefore, from input
to output the gear reduction ratio is (16/48) ×(24/40), or 5
to 1 reduction. For the gearbox, instead of single-point fault,
eight types of compound faults caused by gear chipped, gear
eccentric, bearing ball fault, shaft imbalance, shaft keyway
fault, etc. are considered. The detail descriptions of the eight
fault types are reported in Table 5. The faults were seeded in
the experiments. These faults covered the common failures
of gearboxes in real cases.
The experiments were carried out under 10 working condi-
tions, i.e., 1800, 2100, 2400, 2700, 3000 rpm (revolutions per
minute) shaft speed under high and low loading, respectively.
For each fault type under each working condition, vibration
signals were sampled synchronously from accelerometers
mounted on both the input and output shaft retaining plates,
as shown in Fig. 18. Data were acquired with a sampling
rate 66.67 kHz and sampling time 4 s, and thus 266,655 data
points are obtained and further truncated to 2.56 ×105for
each fault type. Additionally, for each working condition,
the experiment was repeated twice.
For data preparation, 6.4 ×103data points are segmented
and reshaped to (64, 100) as one sample. Therefore, 80 sam-
ples are obtained for each label. We randomly draw 80%
data (64 samples) for training and the 20% data (16 samples)
for testing. Finally, the training/testing data taken from each
label form the training sets (512 samples) and the testing sets
(128 samples).
Results and discussions
For all the 10 working conditions, the testing accuracy of
the eight types of faults are 100%. Due to space limitation,
we only show the feature learning process during testing at
working condition 2700 rpm under low loading as an exam-
ple, as given in Fig. 19. The labels 1–8 represent the fault
label as listed in Table 6.
123
Journal of Intelligent Manufacturing
Fig. 14 Visualization of the raw vibration data of the eight heath states
The fivefold cross validation is used to evaluate the model.
All samples and corresponding labels are randomly divided
into five groups (the total number of samples in each group
is the same). Each round four out of the five groups are used
for training the model and the remaining is used for testing.
By cross validation, the model has been tested five times
and all the samples have the chance of being training/testing
data. Since the samples are totally divided randomly, in each
group the number of samples belonging to each label may
imbalanced. For all the ten working conditions, the testing
accuracies are 100%. We take working condition 2700 rpm
shaft speed and low loading as an example and show the
confusion matrices of the five testing results in Fig. 20.
Comparison with traditional diagnostics methods
The method of signal processing based feature extraction
combined with a long short-term memory (LSTM) network
123
Journal of Intelligent Manufacturing
159
14.2%
0
0.0%
0
0.0%
2
0.2%
152
13.6%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
163
14.5%
163
14.5%
163
14.5%
27
33.3%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
1
1.2%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
2
0.2%
1
0.1%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
167
14.9%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
27
33.3%
98.8%
1.2%
100.0%
0.0%
100.0%
0.0%
98.8%
1.2%
99.4%
0.6%
100.0%
0.0%
148
13.2%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
100.0%
0.0%
100.0%
0.0%
98.0%
2.0%
100.0%
0.0%
98.7%
1.3%
100.0%
0.0%
100.0%
0.0%
100.0%
0.0%
99.6%
0.4%
02134
56
(c)
motor speed 2500 rpm
161
14.4%
0
0.0%
0
0.0%
0
0.2%
152
13.6%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
163
14.6%
163
14.6%
164
14.6%
27
33.3%
0
0.0%
0
0.0%
0
0.0%
1
0.3%
1
1.2%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
1
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.3%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
167
14.9%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
27
33.3%
100.0%
0.0%
100.0%
0.0%
100.0%
0.0%
98.8%
1.2%
100.0%
0.0%
100.0%
0.0%
148
13.2%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
100.0%
0.0%
100.0%
0.0%
99.3%
0.7%
100.0%
0.0%
100.0%
1.3%
100.0%
0.0%
100.0%
0.0%
99.4%
0.6%
99.8%
0.2%
02134
56
100.0%
0.0%
100.0%
0.0%
100.0%
0.0%
100.0%
0.0%
100.0%
0.0%
98.2%
1.8%
98.2%
1.8%
99.5%
0.5%
(b)
motor speed 2000 rpm
Output health st ate label of network
161
14.4%
0
0.0%
0
0.0%
0
0.2%
152
13.6%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
163
14.6%
162
14.5%
164
14.6%
27
33.3%
0
0.0%
0
0.0%
0
0.0%
3
0.3%
1
1.2%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
3
0.3%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
164
14.6%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
27
33.3%
100.0%
0.0%
100.0%
0.0%
100.0%
0.0%
98.2%
1.8%
100.0%
0.0%
98.2%
1.8%
148
13.2%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
100.0%
00.0%
02134
56
(a)
motor speed 1500 rpm
Actual health state lab e Actual health state labe Actual health state lab e
Fig. 15 Confusion matrix of case study 3 under three motor speed, given by proposed CNN model
Fig. 16 Model testing process
visualization under motor speed
1500 rpm, case study 3
Input layer 1
st
layer 2
nd
layer
3
rd
layer 4
th
layer 5
th
layer
6
th
layer Output layer
2
0
4
3
5
6
1
0 - red
1 - yellow
2 - blue
3 - green
4 - verdigris
5 - magenta
6 - gray
as a classifier (which is referred to as traditional method from
now on) is utilized to compare with the proposed.
CNN model in the above four case studies. The flowchart
of the traditional method is shown in Fig. 21. The time
domain signal of different health states is firstly divided
into data segments. Then three manually extracted features,
i.e., wavelet packet energy (WPE) based on wavelet packet
decomposition (Zhang et al. 2013), instantaneous frequency
(IF) (Boashash 1992a,1992b) and instantaneous spectral
entropy (ISE) (Pan et al. 2008) based on power spectrogram,
123
Journal of Intelligent Manufacturing
Fig. 17 Gearbox used in 2009
PHM data challenge [PHM data
challenge 2009,https://www.
phmsociety.org/competition/
PHM/09]
Tested gearbox
Drive system
Tachometer
Accelerometer
Table 5 Fault modes description of 2009 PHM spur gears
Fault label Fault description
Gear Bearing Shaft
32T 96T 48T 80T IS:IS ID:IS OS:IS IS:OS ID:OS OS:OS Input Output
1 Good Good Good Good Good Good Good Good Good Good Good Good
2 Chipped Good Eccentric Good Good Good Good Good Good Good Good Good
3 Good Good Eccentric Good Good Good Good Good Good Good Good Good
4 Good Good Eccentric Broken Ball Good Good Good Good Good Good Good
5 Chipped Good Eccentric Broken Inner Ball Outer Good Good Good Good Good
6 Good Good Good Broken Inner Ball Outer Good Good Good Imbalance Good
7 Good Good Good Good Inner Good Good Good Good Good Good Keyway
Sheared
8 Good Good Good Good Good Ball Outer Good Good Good Imbalance Good
Tteeth of the gear, IS input shaft, ID idler shaft, OS output side, OS output side, IS input side
Fig. 18 The location of input
and output shaft accelerometers
[PHM data challenge 2009,
https://www.phmsociety.org/
competition/PHM/09]
Locati on of input
shaft accelerom eter
Location of outpu t
shaft accelerometer
are constructed from each data segment. The LSTM serves
as the classifier. Therefore, three traditional methods, i.e.,
WPD-LSTM, IF-LSTM, ISE-LSTM are compared.
The architecture of the LSTM is composed of a sequence
input, one LSTM layer, a fully connected layer and a soft-
max layer. The fully connected layer multiplies the input
by the weight matrix and adds a bias vector. The output
is finally calculated by a softmax transfer function. For the
hyperparameters of LSTM, through initial trials we found
that the number of LSTM units and batch size are two obvi-
ous parameters that affect the accuracy, given an appropriate
learning rate. Specifically, we changed the number of LSTM
units consecutively from 22to 29for the four case studies
and found that a large amount of LSTM units are normally
required when the number of training samples is large, and
verse vice. For instance, in case study 1, where 240 training
samples are available, a LSTM network with 128 units (or
even fewer) performs better than that with 256 units, while
in case study 2 in which 720 training samples are available, a
LSTM network with 256 units are better than that with fewer
units. In terms of batch size, we find that a smaller batch size
tends to result in a higher accuracy but the training oscilla-
tion increases accordingly. In addition, too small batch sizes
suffer the risk of non-convergence.
123
Journal of Intelligent Manufacturing
Fig. 19 Model testing process
visualization under 2700 rpm
and low loading, case study 4
Input layer 1
st
layer 2
nd
layer
3
rd
layer 4
th
layer 5
th
layer
6
th
layer Output layer
Specifically, the accuracies with all three features of WPE,
IF and ISE in case study 3 are high. Figure 22 illustrates
the confusion matrices in case study 3 under motor speed
1500 rpm as an example. The accuracies of IF-LSTM and
ISE-LSTM are acceptable in the application of gearbox under
the low loading condition, but dramatically decrease under
the high loading as well as in the case of ball screw. WPE,
which performs the best among the three manually extracted
features in many cases but suffers the risk of non-convergence
in some working conditions of gearbox application. The con-
fusion matrices given by the traditional methods for gearbox
application under speed 2700 rpm and low loading are shown
in Fig. 23 as an example. The accuracy given by WPE-LSTM
is very low due to non-convergence. In contrast, the proposed
CNN model can well identify the eight health states under
this working condition, which can be clearly visualized in
Fig. 19.
Through the comparison among the proposed CNN model
and the traditional methods in various applications under
various working conditions, it can be seen that the pro-
posed CNN model exhibits much more robustness, giving
consistently high accuracies in all four case studies. More-
over, the end-to-end structure of the CNN model requires
less reliance on empirical expertise and advanced signal pro-
cessing techniques, which enables the proposed model to be
easily adapted to different diagnostics tasks.
Conclusions and future work
Manual feature extraction based on signal processing tech-
niques is normally required in traditional diagnostics for
rotating machinery, which has the drawbacks such as strong
dependencies on the expertise and prior knowledge, the
requirement for lots of skilled human labour, the sensitiv-
ity to changes, etc., and thus requires extensive fine-tuning.
Some recent works based on deep learning convert the vibra-
tion signal to images based on some time-frequency methods,
which can circumvent some of the previous drawbacks but
still need application-specific adaptation. In this paper, we
proposed an end-to-end health state diagnostics model based
on convolutional neural network (CNN), which can directly
learn feature representation from the raw vibration signal
and no manually extracted feature is required. In addition,
to fully validate the effectiveness and the generalizability of
the proposed model for fault diagnostics of the rotating com-
ponent, we carried out tests on four datasets, including two
public ones and two datasets of our own, covering the appli-
cations of ball screw, bearing and gearbox. The results show
high diagnostics accuracies for all the four tasks. To our best
knowledge, our work firstly validates the CNN model in such
wide applications.
Moreover, the signal processing based feature extraction
combined with long short-term memory (LSTM) network
123
Journal of Intelligent Manufacturing
14
10.9%
0
0.0%
0
0.0%
0
0.0%
12
9.4%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
20
15.6%
12
9.4%
22
17.2%
27
33.3%
0
0.0%
0
0.0%
0
0.0%
0
0.2%
1
1.2%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
6
4.7%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
27
33.3%
100.0%
0.0%
100.0%
0.0%
100.0%
0.0%
100.0%
0.0%
100.0%
0.0%
100.0%
0.0%
18
14.1%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
100.0%
00.0%
100.0%
0.0%
100.0%
0.0%
100.0%
0.0%
100.0%
0.0%
100.0%
0.0%
100.0%
0.0%
100.0%
0.0%
100.0%
0.0%
164
14.6%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
100.0%
00.0%
164
14.6%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
100.0%
0.0%
24
18.8%
0
0.0%
0
0.0%
True Label
Predicted label
22
17.2%
0
0.0%
0
0.0%
0
0.0%
16
12.5%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
22
17.2%
18
14.1%
12
9.4%
27
33.3%
0
0.0%
0
0.0%
0
0.0%
0
0.2%
1
1.2%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
14
10.9%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
27
33.3%
100.0%
0.0%
100.0%
0.0%
100.0%
0.0%
100.0%
0.0%
100.0%
0.0%
100.0%
0.0%
16
12.5%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
100.0%
00.0%
100.0%
0.0%
100.0%
0.0%
100.0%
0.0%
100.0%
0.0%
100.0%
0.0%
100.0%
0.0%
100.0%
0.0%
100.0%
0.0%
164
14.6%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
100.0%
00.0%
164
14.6%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
100.0%
0.0%
8
6.3%
0
0.0%
0
0.0%
True Label
Predicted label
14
10.9%
0
0.0%
0
0.0%
0
0.0%
16
12.5%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
14
10.9%
20
15.6%
12
9.4%
27
33.3%
0
0.0%
0
0.0%
0
0.0%
0
0.2%
1
1.2%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
20
15.6%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
27
33.3%
100.0%
0.0%
100.0%
0.0%
100.0%
0.0%
100.0%
0.0%
100.0%
0.0%
100.0%
0.0%
14
10.9%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
100.0%
00.0%
100.0%
0.0%
100.0%
0.0%
100.0%
0.0%
100.0%
0.0%
100.0%
0.0%
100.0%
0.0%
100.0%
0.0%
100.0%
0.0%
164
14.6%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
100.0%
00.0%
164
14.6%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
100.0%
0.0%
18
14.1%
0
0.0%
0
0.0%
True Label
Predicted label
16
12.5%
0
0.0%
0
0.0%
0
0.0%
18
14.1%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
14
10.9%
12
9.4%
20
15.6%
27
33.3%
0
0.0%
0
0.0%
0
0.0%
0
0.2%
1
1.2%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
22
17.2%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
27
33.3%
100.0%
0.0%
100.0%
0.0%
100.0%
0.0%
100.0%
0.0%
100.0%
0.0%
100.0%
0.0%
18
14.1%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
100.0%
00.0%
100.0%
0.0%
100.0%
0.0%
100.0%
0.0%
100.0%
0.0%
100.0%
0.0%
100.0%
0.0%
100.0%
0.0%
100.0%
0.0%
164
14.6%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
100.0%
00.0%
164
14.6%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
100.0%
0.0%
8
6.3%
0
0.0%
0
0.0%
True Label
Predicted label
14
10.9%
0
0.0%
0
0.0%
0
0.0%
18
14.1%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
10
7.8%
18
14.1%
14
10.9%
27
33.3%
0
0.0%
0
0.0%
0
0.0%
0
0.2%
1
1.2%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
18
14.1%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
27
33.3%
100.0%
0.0%
100.0%
0.0%
100.0%
0.0%
100.0%
0.0%
100.0%
0.0%
100.0%
0.0%
14
10.9%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
100.0%
00.0%
100.0%
0.0%
100.0%
0.0%
100.0%
0.0%
100.0%
0.0%
100.0%
0.0%
100.0%
0.0%
100.0%
0.0%
100.0%
0.0%
164
14.6%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
100.0%
00.0%
164
14.6%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
0
0.0%
100.0%
0.0%
22
17.2%
0
0.0%
0
0.0%
True Label
Predicted label
Fig. 20 Testing accuracies of fivefold cross validation under the working condition 2700 rpm shaft speed and low loading
(here is referred to as traditional method) is also explored and
compared with the proposed CNN model. Specifically, three
typical engineered features, i.e., (a) wavelet packet energy
(WPE) based on wavelet packet decomposition, (b) instanta-
123
Journal of Intelligent Manufacturing
Fig. 21 Flowchart of the
implementation of the
traditional methods
Dat a seg me ntati onDat a seg me ntati onDat a seg me ntati on
Wavelet packet decompos ition, Power spectr ogram
Build LSTM network
Train network
Input test set
Diagnostics result
wavelet packet energy (WPE), Instantaneous frequency(IF), Instantaneous
spectral entropy (ISE)
. . . . . . . . .
Extracted features of training sets
Trained model
Extracted features of testing sets
Testing set Training set
Health state 1 Health state 2 Health state n
. . .
Actual health state lab el
0213456
(a)
WPE-LS TM
(b)
IF-LSTM
(c)
ISE-LS TM
0213456 0213456
Actual health state lab el Actual health state lab el
Fig. 22 Confusion matrix of case study 3 under motor speed 1500, given by the traditional method
123
Journal of Intelligent Manufacturing
Output health state label of network
1324567
(a) WPE-LS TM (b) IF-LSTM (c) ISE-LSTM
Actual health state label
813245678 13245678
Actual health state label Actual health state label
Fig. 23 Confusion matrix of case study 4 under speed 2700 rpm and low loading, given by the traditional method
Table 6 Accuracies of proposed CNN and traditional methods in all case studies
Case study Working
conditions
Number of labels Proposed CNN model (%) WPE-LSTM (%) IF-LSTM (%) ISE-LSTM (%)
1 500 rpm, 0 load 3 100 97.3 ±3.2 86.8 ±4.3 69.6 ±5.6
2 1797 rpm, 0 load 9 100 79.4 ±5.0 89.2 ±2.5 90.3 ±3.5
1772 rpm, 1hp load 10 100 92.7 ±5.2 86.4 ±6.0 82.2 ±4.5
1750 rpm, 2hp load 10 100 82.2 ±7.4 86.8 ±1.8 88.7 ±3.6
1730 rpm, 3hp load 10 100 92.4 ±7.0 86.8 ±3.6 79.4 ±4.6
3 1500 rpm, 0 load 7 99.5 ±0.5 96.3 ±2.1 97.5 ±2.4 92.2 ±2.5
2000 rpm, 0 load 7 99.8 ±0.5 95.4±5.9 99.1 ±2.3 92.3 ±2.1
2500 rpm, 0 load 7 99.5 ±0.5 98.5±3.4 94.6 ±2.4 99.4 ±2.1
4 1800 rpm, low load 8 100 93.4 ±3.6 90.1 ±2.3 81.9 ±4.1
1800 rpm, high load 8 100 91.0 ±1.7 82.1 ±7.3 84.5 ±2.3
2100 rpm, low load 8 100 99.5 ±0.5 88.0 ±8.4 89.4 ±5.4
2100 rpm, high load 8 100 83.5 ±3.4 84.0 ±2.1
2400 rpm, low load 8 100 98.6 ±1.7 89.9 ±2.5 90.7 ±1.8
2400 rpm, high load 8 100 86.8 ±2.4 82.2 ±2.9 81.1 ±2.7
2700 rpm, low load 8 100 97.1 ±1.9 89.7 ±2.4
2700 rpm, high load 8 100 92.2 ±4.6 84.0 ±1.6 76.9 ±2.7
3000 rpm, low load 8 100 95.2 ±5.9 90.9 ±4.0 91.0 ±1.7
3000 rpm, high load 8 100 96.1 ±2.7 82.2 ±1.6 76.8 ±2.1
“–” represents non-convergence
neous frequency (IF), and (c) instantaneous spectral entropy
(ISE) based on power spectrogram, are constructed from the
raw vibration data and then used as the input of a clas-
sifier (LSTM network). The results indicate that manually
extracted features based on signal processing techniques are
indeed sensitive to diagnostics tasks. One feature performs
well in one task but may fail to give satisfactory accuracy
or lead to non-convergence in another task. The comparison
shows that the proposed CNN based model has indeed good
robustness and ability of generalization that is easy to adapt
to different diagnostics task without any manual tuning.
The limits of the current work, and the corresponding
future work are summarized as follows. The current work
used the data acquired from the test benches of the laboratory.
Next, we will investigate the performance of the proposed
model in real industry environment. In the current work, the
high diagnostics accuracy of each application is based on the
assumptions that sufficient labeled data are available, and that
the training and testing data are from the same distribution,
which may be a limiting factor in industrial applications.
To release these assumptions, our future work will focus
on transfer learning methods, which are able to transfer
123
Journal of Intelligent Manufacturing
vibration-based diagnostics capabilities to new working con-
ditions, experimental protocols and instrumented devices
while avoiding the requirement for new labeled fault data.
By this way, the diagnostics models trained with laboratory
data have the potential of being used in the real industry envi-
ronment. In the current work, the fault data of each label are
balanced. In our future work, we will focus on building the
diagnostics model when the fault data are unbalanced, i.e.,
only small fault data or even no fault data are available for
some specific fault labels, since in practice, faults of high-
stakes industrial devices are rare. In addition to the single
fault type considered in the current work, we will study the
fault diagnostics of compound faults. The issue of low signal
to noise ratio in the acquired vibration signal caused by the
strong coupling of different components is also our interests
for future work.
Acknowledgements The present work was funded by the National Nat-
ural Science Foundation of China (No.51805262) and the Graduate
Student Innovation Fund of Beihang University (YCSJ-03-2019-06).
The authors gratefully acknowledge the Key Laboratory of Performance
Test for CNC Machine Tool Components affiliated of Ministry of Indus-
try and Information Technology of China for providing the ball screw
test bench and experiment materials.
References
Behley, J., Steinhage, V., & Cremers, A. B. (2013). Laser-based seg-
ment classification using a mixture of bag-of-words. In 2013
IEEE/RSJ international conference on intelligent robots and sys-
tems (pp. 4195–4200). https://doi.org/10.1109/IROS.2013.66969
57.
Boashash, B. (1992a). Estimating and interpreting the instantaneous
frequency of a signal. I. Fundamentals. Proceedings of the IEEE,
80(4), 520–538, doi:https://doi.org/10.1109/5.135376.
Boashash B (1992b). Estimating and interpreting the instantaneous
frequency of a signal. Proceedings of the IEEE 80(4), 540–568,
doi:https://doi.org/10.1109/5.135378.
Case Western Reserve University Bearing Data Center Website, Avail-
able: https://csegroups.case.edu/bearingdatacenter/home.
Chen, R., Huang, X., Yang, L., Xu, X., Zhang, X., & Zhang, Y. (2019).
Intelligent fault diagnosis method of planetary gearboxes based
on convolution neural network and discrete wavelet transform.
Computers in Industry, 106, 48–59. doi:https://doi.org/10.1016/j.
compind.2018.11.003.
Chen, Z., Mauricio, A., Li, W., & Gryllias, K. (2020). A deep learn-
ing method for bearing fault diagnosis based on Cyclic Spectral
Coherence and Convolutional Neural Networks. Mechanical Sys-
tems and Signal Processing, 140, 106683. doi:https://doi.org/10.
1016/j.ymssp.2020.106683.
Dhamande, L. S., & Chaudhari, M. B. (2018). Compound gear-bearing
fault feature extraction using statistical features based on time-
frequency method. Measurement, 125, 63–77. doi:https://doi.org/
10.1016/j.measurement.2018.04.059.
Feng, Z., Lin, X., & Zuo, M. J. (2016). Joint amplitude and frequency
demodulation analysis based on intrinsic time-scale decomposi-
tion for planetary gearbox fault diagnosis. Mechanical Systems
and Signal Processing, 72–73, 223–240. doi:https://doi.org/10.1
016/j.ymssp.2015.11.024.
Feng, G., & Pan, Y. (2012). Establishing a cost-effective sensing system
and signal processing method to diagnose preload levels of ball
screws. Mechanical Systems and Signal Processing, 28, 78–88.
doi:https://doi.org/10.1016/j.ymssp.2011.10.004.
Goodfellow, I., Bengio, Y., & Courville, A. (2019). Deep learning.
Cambridge, MIT Press.
Goyal, D., Choudhary, A., Pabla, B. S., & Dhami, S. S. (2019). Sup-
port vector machines based non-contact fault diagnosis system
for bearings. Journal of Intelligent Manufacturing. doi:https://doi.
org/10.1007/s10845-019-01511-x.
Hamadache, M., Jung, J. H., Park, J., & Youn, B. D. (2019). A com-
prehensive review of artificial intelligence-based approaches for
rolling element bearing PHM: shallow and deep learning. JMST
Advances, 1(1), 125–151. doi:https://doi.org/10.1007/s42791-01
9-0016-y.
Hoang, D. T., & Kang, H. J. (2019). Rolling element bearing fault diag-
nosis using convolutional neural network and vibration image.
Cognitive Systems Research, 53, 42–50. doi:https://doi.org/10.10
16/j.cogsys.2018.03.002.
Islam, M. M. M., & Kim, J. M. (2019a). Reliable multiple combined
fault diagnosis of bearings using heterogeneous feature models
and multiclass support vector Machines. Reliability Engineering
& System Safety, 184, 55–66. doi:https://doi.org/10.1016/j.ress.2
018.02.012.
Islam, M. M. M., & Kim, J. M. (2019b). Automated bearing fault diag-
nosis scheme using 2D representation of wavelet packet transform
and deep convolutional neural network. Computers in Industry,
106, 142–153. doi:https://doi.org/10.1016/j.compind.2019.01.00
8.
Jia, F., Lei, Y., Guo, L., Lin, J., & Xing, S. (2018). A neural net-
work constructed by deep learning technique and its application
to intelligent fault diagnosis of machines. Neurocomputing, 272,
619–628. doi:https://doi.org/10.1016/j.neucom.2017.07.032.
Jing, L., Zhao, M., Li, P., & Xu, X. (2017). A convolutional neural
network based feature learning and fault diagnosis method for
the condition monitoring of gearbox. Measurement, 111, 1–10.
doi:https://doi.org/10.1016/j.measurement.2017.07.017.
Kingma, D. P., & Ba, J. (2015). Adam: A method for Stochastic
Optimization. the 3rd International Conference for Learning Rep-
resentations, San Diego, 2015, arXiv preprint arXiv:1412.6980.
Lee, J., Davari, H., Singh, J., & Pandhare, V. (2018). Industrial Arti-
ficial Intelligence for industry 4.0-based manufacturing systems.
Manufacturing Letters, 18, 20–23. doi:https://doi.org/10.1016/j.
mfglet.2018.09.002.
Li, P., Jia, X., Feng, J., Davari, H., Qiao, G., Hwang, Y., et al. (2018a).
Prognosability study of ball screw degradation using systematic
methodology. Mechanical Systems and Signal Processing, 109,
45–57. doi:https://doi.org/10.1016/j.ymssp.2018.02.046.
Li, X., Li, J., Qu, Y., & He, D. (2019a). Semi-supervised gear fault
diagnosis using raw vibration signal based on deep learning. Chi-
nese Journal of Aeronautics. doi:https://doi.org/10.1016/j.cja.20
19.04.018.
Li, X., Li, J., Zhao, C., Qu, Y., & He, D. (2020). Gear pitting fault
diagnosis with mixed operating conditions based on adaptive 1D
separable convolution with residual connection. Mechanical Sys-
tems and Signal Processing, 142, 106740. doi:https://doi.org/10.
1016/j.ymssp.2020.106740.
Li, X., Zhang, W., & Ding, Q. (2019b). Deep learning-based remaining
useful life estimation of bearings using multi-scale feature extrac-
tion. Reliability Engineering & System Safety, 182, 208–218.
doi:https://doi.org/10.1016/j.ress.2018.11.011.
Li, X., Zhang, W., Ding, Q., & Sun, J. Q. (2018b). Intelligent rotating
machinery fault diagnosis based on deep learning using data aug-
mentation. Journal of Intelligent Manufacturing. doi:https://doi.
org/10.1007/s10845-018-1456-1.
123
Journal of Intelligent Manufacturing
Liang, P., Deng, C., Wu, J., & Yang, Z. (2020). Intelligent fault diag-
nosis of rotating machinery via wavelet transform, generative
adversarial nets and convolutional neural network. Measurement,
159, 107768. doi:https://doi.org/10.1016/j.measurement.2020.10
7768.
Liu, L., Liang, X., & Zuo, M. J. (2018). A dependence-based feature
vector and its application on planetary gearbox fault classification.
Journal of Sound and Vibration, 431, 192–211. doi:https://doi.org/
10.1016/j.jsv.2018.06.015.
Maaten, L., & Hinton, G. (2008). Visualizing data using t-SNE. Journal
of Machine Learning research, 9(Nov), 2579–2605.
Ng, A. Y. (2004). Feature selection, L1 vs L2 regularization, and
rotational invariance. In Proceedings of the 21th international
conference on machine learning.
Nguyen, D., Kang, M., Kim, C. H., & Kim, J.-M. (2013). Highly
reliable state monitoring system for induction motors using dom-
inant features in a two-dimension vibration signal. NewReviewof
Hypermedia and Multimedia, 19(3–4), 248–258. doi:https://doi.
org/10.1080/13614568.2013.832407.
PHM data challenge. (2009). Available from https://www.phmsociety.
org/competition/PHM/09.
Pan, Y. N., Chen, J., & Li, X. L. (2008). Spectral entropy: A complemen-
tary index for rolling element bearing performance degradation
assessment. Proceedings of the Institution of Mechanical Engi-
neers, Part C: Journal of Mechanical Engineering Science, 223(5),
1223–1231, doi:https://doi.org/10.1243/09544062JMES1224.
Park, S., Kim, S., & Choi, J. H. (2018). Gear fault diagnosis using
transmission error and ensemble empirical mode decomposi-
tion. Mechanical Systems and Signal Processing, 108, 262–275.
doi:https://doi.org/10.1016/j.ymssp.2018.02.028.
Peng, D., Liu, Z., Wang, H., Qin, Y., & Jia, L. (2019). A novel deeper
one-dimensional CNN with residual learning for fault diagno-
sis of wheelset bearings in high-speed trains. IEEE Access, 7,
10278–10293. doi:https://doi.org/10.1109/ACCESS.2018.28888
42.
Smith, W. A., & Randall, R. B. (2015). Rolling element bearing
diagnostics using the Case Western Reserve University data: A
benchmark study. Mechanical Systems and Signal Processing,
64–65, 100–131. doi:https://doi.org/10.1016/j.ymssp.2015.04.02
1.
Srivastava, N., Hinton, G., Krizhevsky, A., Sutsjever, I., & Salakhut-
dinov, R. (2014). DropOut: A simple way to prevent neural
network from overfitting. Journal of Machine Learning research,
15, 1929–1958.
Vogl, G. W., Weiss, B. A., & Helu, M. (2019). A review of diagnostic
and prognostic capabilities and best practices for manufacturing.
Journal of Intelligent Manufacturing, 30(1), 79–95. doi:https://
doi.org/10.1007/s10845-016-1228-8.
Wang,P., Ananya,Yan, R., & Gao, R. X. (2017). Virtualizationand deep
recognition for system fault classification. Journal of Manufactur-
ing Systems, 44, 310–316. doi:https://doi.org/10.1016/j.jmsy.201
7.04.012.
Wang, C., Gan, M., & Zhu, C. a. (2018a). Fault feature extraction of
rolling element bearings based on wavelet packet transform and
sparse representation theory. Journal of Intelligent Manufactur-
ing, 29(4), 937–951. doi:https://doi.org/10.1007/s10845-015-115
3-2.
Wang, H., Li, S., Song, L., & Cui, L. (2019). A novel convolutional
neural network based fault recognition method via image fusion
of multi-vibration-signals. Computers in Industry, 105, 182–190.
doi:https://doi.org/10.1016/j.compind.2018.12.013.
Wang, L., Liu, Z., Miao, Q., & Zhang, X. (2018b). Complete ensemble
local mean decomposition with adaptive noise and its applica-
tion to fault diagnosis for rolling bearings. Mechanical Systems
and Signal Processing, 106, 24–39. doi:https://doi.org/10.1016/j.
ymssp.2017.12.031.
Wu, C., Jiang, P., Ding, C., Feng, F., & Chen, T. (2019). Intelligent
fault diagnosis of rotating machinery based on one-dimensional
convolutional neural network. Computers in Industry, 108, 53–61.
doi:https://doi.org/10.1016/j.compind.2018.12.001.
Xia, T., & Xi, L. (2019). Manufacturing paradigm-oriented PHM
methodologies for cyber-physical systems. Journal of Intelligent
Manufacturing, 30(4), 1659–1672. doi:https://doi.org/10.1007/s1
0845-017-1342-2.
Yan, X., & Jia, M. (2018). A novel optimized SVM classification
algorithm with multi-domain feature and its application to fault
diagnosis of rolling bearing. Neurocomputing. doi:https://doi.org/
10.1016/j.neucom.2018.05.002.
Zhang, J., Sun, Y., Guo, L., Gao, H., Hong, X., & Song, H. (2020).
A new bearing fault diagnosis method based on modified convo-
lutional neural networks. Chinese Journal of Aeronautics, 33(2),
439–447. doi:https://doi.org/10.1016/j.cja.2019.07.011.
Zhang, Z., Wang, Y., & Wang, K. (2013). Fault diagnosis and prognosis
using wavelet packet decomposition, Fourier transform and artifi-
cial neural network. Journal of Intelligent Manufacturing, 24(6),
1213–1227. doi:https://doi.org/10.1007/s10845-012-0657-2.
Zhao, X., Jia, M., & Lin, M. (2020). Deep Laplacian Auto-encoder
and its application into imbalanced fault diagnosis of rotating
machinery. Measurement, 152, 107320. doi:https://doi.org/10.1
016/j.measurement.2019.107320.
Zhao, R., Yan, R., Chen, Z., Mao, K., Wang, P., & Gao, R. X. (2019).
Deep learning and its applications to machine health monitor-
ing. Mechanical Systems and Signal Processing, 115, 213–237.
doi:https://doi.org/10.1016/j.ymssp.2018.05.050.
Zhu, X., Hou, D., Zhou, P., Han, Z., Yuan, Y., Zhou, W., et al. (2019a).
Rotor fault diagnosis using a convolutional neural network with
symmetrized dot pattern images. Measurement, 138, 526–535.
doi:https://doi.org/10.1016/j.measurement.2019.02.022.
Zhu, Z., Peng, G., Chen, Y., & Gao, H. (2019b). A convolutional neural
network based on a capsule network with strong generalization for
bearing fault diagnosis. Neurocomputing, 323, 62–75. doi:https://
doi.org/10.1016/j.neucom.2018.09.050.
Publisher’s Note Springer Nature remains neutral with regard to juris-
dictional claims in published maps and institutional affiliations.
123
... On the other hand, fault diagnosis methods based on deep learning (DL) are increasingly popular [20][21][22]. For example, Yin et al. [23] and Wang et al. [21] proposed ball screw fault diagnosis model based on CNN, respectively. ...
... On the other hand, fault diagnosis methods based on deep learning (DL) are increasingly popular [20][21][22]. For example, Yin et al. [23] and Wang et al. [21] proposed ball screw fault diagnosis model based on CNN, respectively. Vashisht et al. [24] established long short-term memory (LSTM) networks by current signals, which improved the fault diagnosis accuracy of numerical control machine tools. ...
Article
Vibration-based fault diagnosis methods of ball screw are susceptible to noise and transmission path. Moreover, the accuracy of supervised deep learning models depends on large amounts of labeled samples, which are not only difficult to obtain but also laborious to label. Therefore, to solve these problems, a zero-cost unsupervised transfer method based on non-vibration signals fusion is proposed to achieve ball screw fault diagnosis in this paper. Firstly, non-vibration signals, such as current and speed, are adopted and orderly fused together to constitute multi-source fusion signal samples, which are easier to obtain and contain fewer interferences than vibration signals. Secondly, by virtue of its excellent abnormal detection ability, isolation forest algorithm is circularly utilized to generate pseudo-labels of source domain samples without manual labeling, which further realizes zero-cost sample labeling and unsupervised process. Finally, large amount of generated pseudo-labeled samples of source domain is applied to pre-train the transfer model parameters, and fine-tuning strategy with small number of labeled samples of target domain is used to complete transfer fault diagnosis of ball screw. The effectiveness of the proposed method is verified by ball screw signals across three different operation conditions, ablation and comparison analysis are also studied to illustrate its advantages.
... In the past few years, the issue of health monitoring for ball screws has attracted the attention of scholars, and several fault diagnosis methods based on data-driven have been proposed. For example, Wang et al. [3] proposed an end-to-end method for fault diagnosis of ball screws, utilizing a convolutional neural network (CNN) to directly extract fault features from raw vibration signals, eliminating the need for manual feature extraction. Shan et al. [4] proposed a data fusion method based on multisensor for cross-domain diagnosis of ball screws, using CNN to extract fault features from weighted data. ...
Article
Full-text available
Due to the varying working conditions of SCARA (Selective Compliance Assembly Robot Arm) robots, there are significant differences in data distribution among different machines. As a result, it is challenging to apply unsupervised methods for cross-machine fault diagnosis. This paper proposes a method called Metric Learning-based Few-shot Adversarial Domain Adaptation (MLFADA) for cross-machine diagnosis of the SCARA robot’s ball screws. Firstly, MLFADA constructs data pairs by sampling a few shots of samples from the source domain (named SCARA A) and the target domain (named SCARA B). Subsequently, it integrates metric learning and adversarial learning theories to minimize the distance between data pairs that belong to the same class in both domains while maximizing the distance between data pairs from different classes. Secondly, to further enhance the performance of MLFADA, a strategy called Pseudo-label Self-correcting Maximum Mean Discrepancy (PSMMD) is proposed to reduce the conditional distribution differences between the two domains. Finally, a lightweight network is designed for feature extraction and fault classification to facilitate deployment on terminal devices. The experiment demonstrates that the challenge of cross-machine fault diagnosis for the ball screws of SCARA robots has been successfully resolved. This is a relatively understudied problem. Compared to mainstream domain adaptation and few-shot methods, the proposed method achieved the best diagnostic accuracy of 88.14%, even when there was only one labeled sample available in the target domain.
... It achieves this by capturing and analyzing data from sensors on the equipment, which is then used for further development into PdM. Several case studies on fault diagnostics of rotating machinery have been discovered in [14]. This algorithm enables manufacturers to reduce unexpected downtime by expecting occurrence of abnormalities or faults and swiftly identifying their sources. ...
Conference Paper
Full-text available
Having accurate prediction on the health of machines in manufacturing can lead to a profitable organization if the operations and maintenance decisions are appropriately performed. This hinges on making well-informed operational and maintenance decisions. Incorporating condition monitoring and predictive maintenance strategies can significantly contribute to achieving this goal. By continuously monitoring the real-time condition of machines, organizations can gather valuable data that offers insights into the performance and health of the equipment. However, dealing with a scarce dataset, which is common in real world applications, makes any prognostics on the maintenance system intricate. This is further exacerbated by the unavailability of failure data within the system which makes degradation model is best suited for the said situation. Since there is no extensive study discussing computational time under similar settings of two different networks for the degradation model in estimating RUL, this study investigates a simple Long Short-Term Model (LSTM) method for prognostics, which is compared to a two-dimensional Convolutional Neural Network (CNN) under the same training options. The networks are trained using the popular Commercial Modular Aero-Propulsion System Simulation (C-MAPSS) dataset from the National Aeronautics and Space Administration (NASA). The aim of this study is to estimate the remaining useful life (RUL) of a turbofan engine in the most effective way. With carefully designed and defined network architectures, better performance can be attained, enabling proper foreseen of the RUL of an engine as soon as it is more likely to be close to failure. Based on the comparison, it is noted that the simple LSTM method for RUL prediction outperforms the two-dimensional CNN with better RUL prediction, Root Mean Square Error (RMSE), and computational time. For future improvement, this study can be further explored for a more sophisticated hybrid model that might produce better prediction in various sectors such as manufacturing, automotive, and military applications.
... 2-D CNN usually analyzes 2-D images, which can be obtained via cropping raw signal images or combining time-frequency domain transformation. In terms of raw signal images, both time-domain and frequency-domain signal images can be utilized to train 2-D CNN models to detect BRB/ER, BD, WSC, and WOC (Liu et al, 2016;Janssens et al, 2016;Wang et al, 2020e). In terms of images from time-frequency domain transformation, Zhang et al (2020b) have employed 2-D CNN combined with STFT to detect BD. ...
Article
Full-text available
Induction machines (IMs) are utilized in different industrial sectors such as manufacturing, transportation, transmission, and energy due to their ruggedness, low cost, and high efficiency. If IMs fail without advanced warning, unscheduled maintenance needs to be performed, leading to downtime and maintenance costs for asset owners. To avoid these, conducting prognostics and health management (PHM) for IMs is indispensable. There are different PHM methods (expert knowledge, physics-based, and machine learning) to analyze the health and estimate the remaining useful life (RUL) of IMs. It is essential to select appropriate methods and algorithms to solve practical engineering problems by comparing their pros and cons. This paper will systematically summarize the application of the PHM framework to IMs and comprehensively present how to select appropriate general methods as well as specific algorithms Springer Nature 2021 L A T E X template 3 applied in the PHM for IMs to solve practical engineering problems, aiming to provide some guidance for future researchers and practitioners.
... Li et al. [26] proposed a modified CNN for fault diagnosis based on the LeNet-5 architecture by replacing the fully connected layer with a global average pooling layer. Wang et al. [27] proposed an end-to-end health state diagnostics model based on a CNN with multiscale feature extraction modules, which can directly learn feature maps from the raw vibration signal. Kim et al. [28] propose a direct-connection-CNN-based fault diagnosis method for rotor systems by improving the connectivity between various layers within the CNN. ...
Article
Full-text available
More layers in a convolution neural network (CNN) means more computational burden and longer training time, resulting in poor performance of pattern recognition. In this work, a simplified global information fusion convolution neural network (SGIF-CNN) is proposed to improve computational efficiency and diagnostic accuracy. In the improved CNN architecture, the feature maps of all the convolutional and pooling layers are globally convoluted into a corresponding one-dimensional feature sequence, and then all the feature sequences are concatenated into the fully connected layer. On this basis, this paper further proposes a novel fault diagnosis method for a rotor–journal bearing system based on SGIF-CNN. Firstly, the time-frequency distributions of samples are obtained using the Adaptive Optimal-Kernel Time–Frequency Representation algorithm (AOK-TFR). Secondly, the time–frequency diagrams of the training samples are utilized to train the SGIF-CNN model using a shallow information fusion method, and the trained SGIF-CNN model can be tested using the time–frequency diagrams of the testing samples. Finally, the trained SGIF-CNN model is transplanted to the equipment’s online monitoring system to monitor the equipment’s operating conditions in real time. The proposed method is verified using the data from a rotor test rig and an ultra-scale air separator, and the analysis results show that the proposed SGIF-CNN improves the computing efficiency compared to the traditional CNN while ensuring the accuracy of the fault diagnosis.
... Rolling-element bearings are basic mechanical parts responsible for transferring the rotational movement in a wide range of mechanical engineering systems [1][2][3][4][5]. The global trend on sustainability and constant product development have improved the bearing's performance through its high-speed operation [6], reduced friction [7], and reduced noise [8]. ...
Article
Full-text available
This paper focuses on the influence of radial internal clearance on the dynamics of a rolling-element bearing. In the beginning, the 2—Degree of Freedom (DOF) model was studied, in which the clearance was treated as a bifurcation parameter. The derived nonlinear mathematical model is based on Hertzian contact theory and takes into consideration shape errors of rolling surfaces and eccentricity reflecting real operating conditions. The analysis showed characteristic dynamical behavior by specific clearance range, which reflects others in a low or high amplitude and can refer to the optimal clearance. The experimental validation was conducted with the use of a double row self-aligning ball bearing (SABB) NTN 2309SK in which the acceleration response was measured by various rotational velocities. The time series obtained from the mathematical model and the experiment were analyzed with the recurrence quantification analysis.
... As artificial intelligence continues to evolve, People started to use data-driven deep learning algorithms in industry [7][8][9][10][11]. The deep learning detection model has many advantages, such as more powerful complex feature extraction ability without manual design, and to different application scenarios. ...
Article
Full-text available
Surface defect detection is an important part of the process of product quality control in the industry. Automatic detection of surface defects based on machine learning is an up-and-coming research field, and there have been many successful cases. Deep learning has become the most suitable detection method for this task. Most algorithms require a large number of defect samples to achieve good results. However, defect samples in actual production are very limited. Although some unsupervised or semi-supervised methods can reduce training costs, their accuracy is difficult to guarantee, so they are difficult to be applied in industrial inspection. In this paper, we propose a detail-semantic guide network (DSGNet), which can achieve better result with fewer training samples. It is a two-stage neural network framework. In the first stage, we design a new semantic branch based on the modified residual shrinkage network and the proposed joined atrous spatial pyramid pooling (JASPP) module. This is the first time that residual shrinkage network is applied to defect detection and achieves good results. Also, we design a clear and efficient detail branch based on dense connection network. Specially, we propose a new detail-semantic guide module (DSGM), which can better integrate the feature information of the two branches. In the training phase, we propose a weight mask based on defect area to improve the ability of extracting small defects. We did experiments on four datasets and our method achieved excellent detection results even with only a small number of training samples.
Chapter
With the rapid development of power electronics technology and the proposal of intelligent ships, electric propulsion systems on ships are becoming more and more widespread. As the power source for ship navigation, timely and accurate diagnosis and prediction of faults of electric propulsion system play a vital role in the operation safety of ships. This paper summarises the common faults of electric propulsion systems, reviews the latest developments and applications of fault diagnosis techniques based on fault signal analysis in electric propulsion system fault diagnosis, and discusses the advantages and disadvantages of typical methods in the light of the latest literature and current research problems. The paper concludes by proposing future trends in fault diagnosis and prediction for ship electric propulsion systems.KeywordsMarine electric propulsion systemFault diagnosisSignal analysis
Article
Full-text available
Bearing defects have been accepted as one of the major causes of failure in rotating machinery. It is important to identify and diagnose the failure behavior of bearings for the reliable operation of equipment. In this paper, a low-cost non-contact vibration sensor has been developed for detecting the faults in bearings. The supervised learning method, support vector machine (SVM), has been employed as a tool to validate the effectiveness of the developed sensor. Experimental vibration data collected for different bearing defects under various loading and running conditions have been analyzed to develop a system for diagnosing the faults for machine health monitoring. Fault diagnosis has been accomplished using discrete wavelet transform for denoising the signal. Mahalanobis distance criteria has been employed for selecting the strongest feature on the extracted relevant features. Finally, these selected features have been passed to the SVM classifier for identifying and classifying the various bearing defects. The results reveal that the vibration signatures obtained from developed non-contact sensor compare well with the accelerometer data obtained under the same conditions. A developed sensor is a promising tool for detecting the bearing damage and identifying its class. SVM results have established the effectiveness of the developed non-contact sensor as a vibration measuring instrument which makes the developed sensor a cost-effective tool for the condition monitoring of rotating machines.
Article
Full-text available
Fault diagnosis is vital in manufacturing system. However, the first step of the traditional fault diagnosis method is to process the signal, extract the features and then put the features into a selected classifier for classification. The process of feature extraction depends on the experimenters’ experience, and the classification rate of the shallow diagnostic model does not achieve satisfactory results. In view of these problems, this paper proposes a method of converting raw signals into two-dimensional images. This method can extract the features of the converted two-dimensional images and eliminate the impact of expert’s experience on the feature extraction process. And it follows by proposing an intelligent diagnosis algorithm based on Convolution Neural Network (CNN), which can automatically accomplish the process of the feature extraction and fault diagnosis. The effect of this method is verified by bearing data. The influence of different sample sizes and different load conditions on the diagnostic capability of this method is analyzed. The results show that the proposed method is effective and can meet the timeliness requirements of fault diagnosis.
Article
Full-text available
In aerospace industry, gears are the most common parts of a mechanical transmission system. Gear pitting faults could cause the transmission system to crash and give rise to safety disaster. It is always a challenging problem to diagnose the gear pitting condition directly through the raw signal of vibration. In this paper, a novel method named augmented deep sparse autoencoder (ADSAE) is proposed. The method can be used to diagnose the gear pitting fault with relatively few raw vibration signal data. This method is mainly based on the theory of pitting fault diagnosis and creatively combines with both data augmentation ideology and the deep sparse autoencoder algorithm for the fault diagnosis of gear wear. The effectiveness of the proposed method is validated by experiments of six types of gear pitting conditions. The results show that the ADSAE method can effectively increase the network generalization ability and robustness with very high accuracy. This method can effectively diagnose different gear pitting conditions and show the obvious trend according to the severity of gear wear faults. The results obtained by the ADSAE method proposed in this paper are compared with those obtained by other common deep learning methods. This paper provides an important insight into the field of gear fault diagnosis based on deep learning and has a potential practical application value. Keywords: Deep learning, Gear pitting diagnosis, Gear teeth, Raw vibration signal, Semi-supervised learning, Sparse autoencoder
Article
Classification of spall and crack faults of gear teeth is studied by applying the ensemble empirical mode decomposition (EEMD) to the transmission error (TE) measured by the encoders of the input and output shafts. Finite element models of the gears with the two faults are built, and TE’s are obtained by simulation of the faulty gears under loaded contact to identify the different characteristics. A simple test bed for a pair of spur gears is prepared to illustrate the approach, in which the TE’s are measured for the gears with seeded spall and crack, respectively. EEMD is applied to extract fault features under the noise from the measured TE. The differences of the spall and crack are clearly identified by the selected features of the intrinsic mode functions based on the class separability criterion. The k-nearest neighbor method is applied for the classification of the faults and normal gears using the features. The proposed method is advantageous over the existing practices in the sense that the TE signal measures the gear faults more directly with less noise, enabling successful diagnosis.
Article
The fault detection of rotating machinery systems especially its typical components such as bearings and gears is of special importance for maintaining machine systems working normally and safely. However, due to the change of working conditions, the disturbance of environment noise, the weakness of early features and various unseen compound failure modes, it is quite hard to achieve high-accuracy intelligent failure monitoring task of rotating machinery using existing intelligent fault diagnosis approaches in real industrial applications. In the paper, a novel and high-accuracy fault detection approach named WT-GAN-CNN for rotating machinery is presented based on Wavelet Transform (WT), Generative Adversarial Nets (GANs) and convolutional neural network (CNN). The proposed WT-GAN-CNN approach includes three parts. To begin with, WT is employed for extracting time-frequency image features from one-dimension raw time domain signals. Secondly, GANs are used to generate more training image samples. Finally, the built CNN model is used to accomplish the fault detection of rotating machinery by the original training time-frequency images and the generated fake training time-frequency images. Two experiment studies are implemented to assess the effectiveness of our proposed approach and the results demonstrate it is higher in testing accuracy than other intelligent failure detection approaches in the literatures even in the interference of strong environment noise or when working conditions are changed. Furthermore, its result in the stability of testing accuracy is also quite excellent.
Article
Gear pitting fault diagnosis has always been an important subject to industry and research community. In the past, the diagnosis of early gear pitting faults has usually been carried out under single gear health state. In order to diagnose the early gear pitting faults with mixed operating conditions and reduce the number of training parameters, a new method is proposed in this paper. The proposed method uses an adaptive 1D separable convolution with residual connection network to classify gear pitting faults with mixed operating conditions. Compared to the traditional convolutional neural network, the separable convolution with residual connection network can carry out the channel convolution with point-by-point convolution to effectively reduce the number of network parameters. The residual connection can solve the representational bottleneck problem of the features in the model. Moreover, the method proposed in this paper applies the search algorithm to select better hyperparameters of the model. The raw vibration signals of the gear pitting faults at different speeds collected in a gear test rig are used to validate the effectiveness of the proposed method. The results show that the proposed method can accurately diagnose the early gear pitting faults with mixed speeds. In comparison with other machine learning models, the proposed method has provided a better diagnostic accuracy with fewer model parameters.
Article
Accurate fault diagnosis is critical to ensure the safe and reliable operation of rotating machinery. Data-driven fault diagnosis techniques based on Deep Learning (DL) have recently gained increasing attention due to theirs powerful feature learning capacity. However, one of the critical challenges lies in how to embed domain diagnosis knowledge into DL to obtain suitable features that correlate well with the health conditions and to generate better predictors. In this paper, a novel DL-based fault diagnosis method, based on 2D map representations of Cyclic Spectral Coherence (CSCoh) and Convolutional Neural Networks (CNN), is proposed to improve the recognition performance of rolling element bearing faults. Firstly, the 2D CSCoh maps of vibration signals are estimated by cyclic spectral analysis to provide bearing discriminative patterns for specific type of faults. The motivation for using CSCoh-based preprocessing scheme is that the valuable health condition information can be revealed by exploiting the second-order cyclostationary behavior of bearing vibration signals. Thus, the difficulty of feature learning in deep diagnosis model is reduced by leveraging domain-related diagnosis knowledge. Secondly, a CNN model is constructed to learn high-level feature representations and conduct fault classification. More specifically, Group Normalization (GN) is employed in CNN to normalize the feature maps of network, which can reduce the internal covariant shift induced by data distribution discrepancy. The proposed method is tested and evaluated on two experimental datasets, including data category imbalances and data collected under different operating conditions. Experimental results demonstrate that the proposed method can achieve high diagnosis accuracy under different datasets and present better generalization ability, compared to state of the art fault diagnosis techniques.
Article
Generally, the measured health condition data from mechanical system often exhibits imbalanced distribution in real-world cases. To enhance fault diagnostic accuracy of the imbalanced data set, a novel rotating machinery fault imbalanced diagnostic approach based on Deep Laplacian Auto-encoder (DLapAE) is firstly developed in this paper. First of all, the collected vibration signals are immediately entered into the constructed DLapAE algorithm for layer-by-layer feature extraction, afterwards the extracted deep discriminative sensitive features are flowed into Back Propagation (BP) classifier for health condition diagnosis. More specifically, it is well worth mentioning that Laplacian regularization term can be reasonably added into the original objective function of Deep Auto-encoder (DAE) for smoothing the manifold structure of data in DLapAE. Namely, the proposed DLapAE algorithm with Laplacian regularization can improve the generalization performance of this fault diagnosis framework and make it more suitable for feature learning and classification of imbalanced data. Last but not least, two case of the experimental bearing systems can prove the effectiveness of proposed methodology. Compared with other existing fault diagnosis methods based on deep learning, the proposed fault diagnosis method can effectively implement the accurate fault diagnosis for rotating machinery balanced and imbalanced datasets.
Article
Fault diagnosis of rotating machinery plays a significant role in the reliability and safety of modern industrial systems. The traditional fault diagnosis methods usually need manually extracting the features from raw sensor data before classifying them with pattern recognition models. This requires much professional knowledge and complex feature extraction, only to cause results in a poor flexibility of the model, which only applies to the diagnosis of a fault in particular equipment. In recent years, deep learning has developed rapidly, and great achievements have been made in image analysis, speech recognition and natural language processing. However, its application in fault diagnosis of rotating machinery is still at the initial stage. In order to solve the problem of end-to-end fault diagnosis, this paper focuses on developing a convolutional neural network to learn features directly from the original vibration signals and then diagnose faults. The effectiveness of the proposed method is validated through PHM (Prognostics and Health Management) 2009 gearbox challenge data and a planetary gearbox test rig. Compared with the other three traditional methods, the results show that the one-dimensional convolutional neural network (1-DCNN) model has higher accuracy for fixed-shaft gearbox and planetary gearbox fault diagnosis than that of the traditional diagnostic ones.
Article
The objective of this paper is to present a comprehensive review of the contemporary techniques for fault detection, diagnosis, and prognosis of rolling element bearings (REBs). Data-driven approaches, as opposed to model-based approaches, are gaining in popularity due to the availability of low-cost sensors and big data. This paper first reviews the fundamentals of prognostics and health management (PHM) techniques for REBs. A brief description of the different bearing-failure modes is given, then, the paper presents a comprehensive representation of the different health features (indexes, criteria) used for REB fault diagnostics and prognostics. Thus, the paper provides an overall platform for researchers, system engineers, and experts to select and adopt the best fit for their applications. Second, the paper provides overviews of contemporary REB PHM techniques with a specific focus on modern artificial intelligence (AI) techniques (i.e., shallow learning algorithms). Finally, deep-learning approaches for fault detection, diagnosis, and prognosis for REB are comprehensively reviewed.