Content uploaded by Changhe Zhang
Author content
All content in this area was uploaded by Changhe Zhang on Apr 03, 2024
Content may be subject to copyright.
Measurement Science and
Technology
PAPER
Fault diagnosis of key components in the rotating
machinery based on Fourier transform multi-filter
decomposition and optimized LightGBM
To cite this article: Changhe Zhang
et al
2021
Meas. Sci. Technol.
32 015004
View the article online for updates and enhancements.
You may also like
Fast prediction of reservoir permeability
based on embedded feature selection and
LightGBM using direct logging data
Kaibo Zhou, Yangxiang Hu, Hao Pan et al.
-
Disruption prediction and model analysis
using LightGBM on J-TEXT and HL-2A
Y Zhong, W Zheng, Z Y Chen et al.
-
Estimation of Stellar Atmospheric
Parameters with Light Gradient Boosting
Machine Algorithm and Principal
Component Analysis
Junchao Liang, Yude Bu, Kefeng Tan et
al.
-
This content was downloaded from IP address 115.156.143.170 on 03/04/2024 at 22:42
Measurement Science and Technology
Meas. Sci. Technol. 32 (2021) 015004 (13pp) https://doi.org/10.1088/1361-6501/aba93b
Fault diagnosis of key components in
the rotating machinery based on Fourier
transform multi-filter decomposition and
optimized LightGBM
Changhe Zhang, Li Kong, Qi Xu, Kaibo Zhouand Hao Pan
Key Laboratory of Image Processing and Intelligent Control of Education Ministry, School of Articial
Intelligence and Automation, Huazhong University of Science and Technology, Wuhan 430074, People’s
Republic of China
E-mail: xuqi@hust.edu.cn
Received 12 April 2020, revised 13 July 2020
Accepted for publication 24 July 2020
Published 23 October 2020
Abstract
Rotating machinery is a primary element of mechanical equipment, and thus fault diagnosis of
its key components is very important to improve the reliability and safety of modern industrial
systems. The key point to diagnose the faults of these components is to extract effectively the
hidden fault information. However, the actual vibration signals of rotating machinery have
nonlinear and non-stationary characteristics, so traditional signal decomposition methods are
unable to extract the frequency components accurately, leading to spectrum overlap of the
decomposed sub-signals. Therefore, a rotating machinery fault diagnosis approach based on
Fourier transform multi-lter decomposition (FTMFD), fuzzy entropy (FE), joint mutual
information maximization (JMIM), and a light gradient boosting machine (LightGBM), is
proposed in this paper. FTMFD is used to extract the frequency domain information of the raw
vibration signals, whereas FE is used to calculate and extract the fault information of the
decomposed sub-signals. Then feature selection is carried out by using JMIM to reduce the
inuence of redundant features on data analysis and classication accuracy. Furthermore,
LightGBM is used to rank the candidate features and outputs the fault diagnosis result.
Experimental results from two real datasets show that the proposed method achieves higher
accuracy with fewer features than some existing methods for fault recognition. Various working
conditions are also considered and veried.
Keywords: Fourier transform multilter decomposition, fuzzy entropy, joint mutual information
maximization, LightGBM classier, rotating machinery, fault diagnosis
(Some gures may appear in colour only in the online journal)
1. Introduction
Rotating machinery is widely used in industrial production
[1–3]. However, its primary components are likely to be dam-
aged during use due to the complex and harsh working envir-
onment, which severely inuences the production safety of
modern industrial systems [4,5]. Therefore, it is of great
importance to carry out an investigation on fault diagnosis
for the key components of rotating machinery. Generally
speaking, the fault diagnosis of rotating machinery based on
vibration signal analysis is composed of three main steps:
vibration signal extraction, fault feature extraction and fault
pattern recognition [1,6]. The feature extraction is most
important and often directly affects the nal diagnosis res-
ult [7,8]. It has been reported that the commonly used
signal analysis or fault feature extraction methods include
time-domain analysis, frequency-domain analysis and time-
frequency domain analysis [9–11]. However, the vibration
1361-6501/21/015004+13$33.00 1 © 2020 IOP Publishing Ltd Printed in the UK
Meas. Sci. Technol. 32 (2021) 015004 C Zhang et al
monitoring signal of rotating machinery is often nonlinear
and non-stationary, and some investigators have shown that
it is difcult to extract effectively the fault features from
non-stationary signals by using the time domain or frequency
domain methods [8,11,12].
In recent years, the entropy-based feature extraction meth-
ods have been widely used in signal analysis, image pro-
cessing, mechanical fault diagnosis and so on [13,14]. Entropy
is a measure of the randomness or disorder of time series,
which mainly includes approximate entropy (AE) [15], sample
entropy (SE) [16], fuzzy entropy (FE) [12,17], permutation
entropy (PE) [14,18], dispersion entropy (DE) [19], and sym-
bol dynamics entropy (SDE) [7], etc. Although these entrop-
ies measure the complexity of time series on a single scale,
the information on other scales are ignored [20]. Costa et al
combined a multiscale procedure with SE to obtain multiscale
entropy (MSE) [21]. However, the multiscale analysis made
use of the coarse-graining procedure with reduced data length
and an increased scale factor, leading to inaccurate estimation
[7]. Even some improved multiscale analysis methods were
also coarse-grained regarding the time series used to carry out
low-pass ltering of the vibration signal, which may have res-
ulted in the loss of high-frequency information [12,13,22,23].
In addition, some adaptive time-frequency decomposition
methods are widely used in the eld of fault diagnosis, such
as empirical mode decomposition (EMD) [24], local mean
decomposition (LMD) [25], variational mode decomposition
(VMD) [26] and so on. However, EMD suffers from the end
effect, mode mixing, envelope overshoot and undershoot [27],
whereas LMD has the defects of the endpoint effect, mode ali-
asing and low computational efciency [27,28]. For VMD
with anti-mode aliasing and noise robustness, the optimiz-
ation calculation requires a large amount of computational
resources, and the parameters such as penalty factor aand
mode number Kneed to be dened in advance [14,29]. Dif-
ferent from these decomposition methods, which lost some
frequency components of the original signal, the wavelet
packet decomposition (WPD) retains low-frequency and high-
frequency information well, with the decomposition results
largely depending on the selection of wavelet basis function
(WBF) and decomposition layers [30,31].
In this paper, the Fourier transform multi-lter decompos-
ition (FTMFD) is combined with FE to obtain sufcient fault
information from the time series for fault diagnosis. FTMFD is
an adaptive decomposition method used to completely retain
the fault information of low and high frequencies [32], while
it uses FE to replace the Heaviside function with a Gaussian
function in order to avoid the drawbacks of SE [17], is bene-
cial for extracting fault information from vibration signals
with good robustness, and has high sensitivity to the dynam-
ical change and insensitivity to background noise [6,12,20].
Therefore, FTMFD is used to decompose the vibration signal,
and then the FE values of decomposed sub-signals are calcu-
lated to form fault feature vectors with the advantage of the
information entropy method in measuring the dynamic char-
acteristics of the time series.
In order to reduce the redundant features and improve the
classication accuracy, feature selection is usually used to
nd the optimal feature subset based on the extracted fault
features [33,34]. For the lter feature selection based on
mutual information, Peng et al studied max-relevance and
min-redundancy (mRMR) based on maximum-dependency,
maximum-correlation and minimum-redundancy criteria [35],
which is widely used in the eld of fault diagnosis and its only
disadvantage is that the size of the mutual information is con-
sidered after the addition of a single feature [7,13]. In addition,
feature selection methods based on joint mutual information
(JMI) are also widely used, such as joint mutual information
[36] and joint mutual information maximization (JMIM) [37].
JMI ignores the case of single feature correlation so that the
correlation between two or more features is reduced, whereas
JMIM considers the overall stability of JMI to ensure the sta-
bility of the selected features.
Since the fault recognition and diagnosis of rotating
machinery are carried out based on the optimal feature sub-
set, the classication algorithm used in the nal stage dir-
ectly affects the performance of diagnosis methods. The
commonly used classiers include support vector machine
(SVM) [38], random forests (RF) [34], stacked auto-encoders
(SAEs) [7,39], convolutional neural network (CNN) [40], and
gradient boosting decision tree (GBDT) frameworks such as
XGBoost [41], light gradient boosting machine (LightGBM)
[42] and CatBoost [43]. Compared with the traditional meth-
ods, LightGBM is a fast and efcient classication algorithm,
exhibiting a good performance in many machine learning
tasks, e.g. regression, classication, sorting and so on [42,44].
Based on the GBDT algorithm, it is possible to obtain the con-
tribution of each feature in model training for the development
of an embedded feature selection (EFS) method.
In this paper, a feature extraction method combining
FTMFD with FE is proposed to solve the problem of fre-
quency information loss and low computing efciency exist-
ing in some traditional methods, where FTMFD is used to
decompose the signal with the advantage of FE in measuring
the dynamic characteristics of the time series. Then the JMIM
and LightGBM method is used to extract the effective fea-
tures to reduce redundant features and simplify the classier
modeling. A Bayesian optimization algorithm is further intro-
duced to optimize the hyperparameters in the LightGBM clas-
sier to improve the classication accuracy of fault diagnosis,
which is also considered to be available in other classication
algorithms. This paper is organized as follows. Section 2intro-
duces the proposed approach and the methodology. Section 3
shows the experiments results of the proposed approach with
an MFS dataset. Further experimental verication and com-
parison results using KAT bearing datasets are discussed in
section 4. The conclusion and future research are presented in
section 5.
2. The proposed approach
The owchart of the fault diagnosis approach for the key com-
ponents in the rotating machinery is shown in gure 1. Firstly,
FTMFD is applied to the original time series signal to extract
the frequency domain information of the vibration signal, and
2
Meas. Sci. Technol. 32 (2021) 015004 C Zhang et al
Figure 1. Flowchart of the proposed fault diagnosis approach.
FE is used to calculate and extract the fault information of
the decomposed sub-signals. Then, JMIM is used to select the
candidate feature set F1. On this basis, LightGBM is used to
rank the candidate features in F1; according to the ranking res-
ult these features are added in turn with labels to form a new
dataset, so that the curve of feature number and classication
accuracy can be obtained. When the classication accuracy
reaches the maximum, the features used are composed into the
nal selected feature set F2. Finally, the LightGBM classier
is used to train and classify these selected fault features. To
verify the effectiveness of the proposed approach, two kinds
of datasets are used in this paper.
2.1. Signal preprocessing and feature extraction method
2.1.1. Fourier transform multi-lter decomposition. Given a
time series {x(i)}, i=−∞,…,−1,0,1,…,+∞, the Fourier
transform of {x(i)} is dened as
X(ejω) =
+∞
∑
n=−∞
x(i)e−jωi,(1)
Figure 2. Fourier transform multi-lter decomposition.
where fis the frequency and ω=2πfdenotes the angular fre-
quency. The inverse Fourier transform of X(ejω) is
x(i) = 1
2πˆ2π
X(ejω)ejωidω. (2)
In practical engineering applications, most of the non-
stationary vibration signals collected are limited digital dis-
crete signals. Assume the length of time series {x(i)} is N, then
equations (1) and (2) can be rewritten as
X(k) =
N
∑
i=1
x(i)e−j2πki/N,(3)
and
x(i) = 1
N
N
∑
k=1
X(k)ej2πki/N,(4)
where kdenotes the frequency components.
An important application of the Fourier transform is sig-
nal ltering [45], which can be summarized in three steps.
First, the Fourier transform is performed to transform the sig-
nal from the time domain to the frequency domain through
equation (3); second, with the lter H(k), some required fre-
quency components are retained and other unnecessary fre-
quency components are ltered out of the spectrum, that is
X∗(k) = X(k)H(k) = {X(k),ks≤k≤ke
0,other ,(5)
where ksand kecorrespond to the start frequency and cutoff
frequency of H(k), respectively. Finally, the inverse Fourier
transform is performed on the ltered spectrum X∗(k), and the
ltered signal x∗(i) is
x∗(i) = 1
NX∗(k)ej2πki/N.(6)
The idea of FTMFD is to lter the signals by lters with differ-
ent passbands, and then carry out the inverse Fourier transform
on the ltered spectrums to obtain sub-signals with different
frequency components, as shown in gure 2.
Let ∆fi=[fis, fie ], i=1, 2, …, n, where ∆firepresents the
lter passband of lter Hi(f), and fis and fie represent the start
frequency and cutoff frequency of Hi(f) respectively.
3
Meas. Sci. Technol. 32 (2021) 015004 C Zhang et al
If FTMFD is designed to be adaptive and the number of l-
ters is n, then according to a certain strategy, the signal spec-
trum can be divided into nparts and ndifferent sub-bands can
be obtained. The relationship of the sub-bands satises
{∆f1∪∆f2∪ · · · ∪ ∆fn= (0,Fs/2)
∆fi∩∆fj=ϕ, i=j,1≤i,j≤n,(7)
where Fsis the sampling frequency, and the original signal
s(t) is equal to the sum of all sub-signals si(t), i=1, 2, …, n,
namely
s(t) =
n
∑
i=1
si(t),i=1,2,..., n.(8)
In this paper, the spectrum is evenly divided according to log-
arithmic coordinates; then |∆fi|=|∆fj|, i=j, 1 ⩽i, j ⩽N.
Moreover, the spectrum can be divided by other strategies,
such as energy, which requires certain prior knowledge such
as frequency characteristics of vibration signals. In general,
FTMFD is a exible decomposition strategy, in which the
details can be adjusted according to actual needs.
2.1.2. Fuzzy entropy. FE is an improvement on SE, in which
the Heaviside function is replaced by a Gaussian function to
measure the similarity between two vectors, which can effect-
ively overcome the shortcoming of SE in practical applica-
tions.
For a given n-dimensional time series x(i), i=1, 2, …, N,
the similarity of FE is dened as follows.
Dm
ij =µ(dm
ij ,n,r) = e−ln2(dm
ij /r)n,(9)
where ris the similarity tolerance. The dm
ij represents the dis-
tance between Xm
iand Xm
j. Dene the function φmas
φm(n,r) = 1
N−m
N−m
∑
i=1
1
N−m−1
N−m
∑
j=1,j=i
Dm
ij
.(10)
Then, FE can be expressed as
FE(m,n,r,N) = lnφm(n,r)−lnφm+1(n,r).(11)
The owchart of the FE method is shown in gure 3.
2.2. Feature selection method
2.2.1. Joint mutual information maximization. JMIM uses the
following iterative greedy search algorithm to nd the relevant
feature subset of size kin the feature space.
I(fi,fs;C)=I(fs;C)+I(fi;C|fs),(12)
(1) For a feature set F={f1,f2, …, fN}, the feature selec-
tion process identies a feature subset Swith dimension k,
where k≤N, and S⊆F. Theoretically, the selected feature
subset Sshould maximize the JMI between class label C
and feature subset Swith xed dimension k.
Figure 3. Flowchart of the FE method.
(2) Calculate the value of JMI between fi, fsand C:
where I(fs; C) represents the value of mutual information
between fsand C. The larger it is, the stronger the correlation
between fsand C. I (fi; C|fs) represents the value of mutual
information between fiand Cunder condition fs.
fJMIM =argmaxfi∈SI(fi,fs;C).(13)
(3) JMIM selects features according to the following criteria:
According to equation (13), JMIM considers the value of
each I(fi, fs; C). After the addition of feature fi, there is at
least feature fsin the subset, which makes the value of I(fi, fs;
C) larger than the condition of other features added.
2.2.2. Embedded feature selection with LightGBM. EFS
methods use the performance of the learning algorithm to eval-
uate the quality of the feature subset. Firstly, for a feature sub-
set to be evaluated, the EFS method need to train the classier
in advance. After that, the weight coefcients of each feature
can be obtained according to certain indexes, such as the regu-
larization term or loss function. Finally, the features are selec-
ted and ranked according to the weight coefcients.
Most GBDT algorithms, such as XGBoost, use an inef-
cient decision tree growth strategy called the level-wise
method. It is replaced by the leaf-wise method in LightGBM
to split the nodes of the weak learner. When the tree model
is selected as the basic learner of LightGBM, the sum of the
4
Meas. Sci. Technol. 32 (2021) 015004 C Zhang et al
information gain or the frequency used by each feature during
the splitting process can be obtained after training the model,
and accordingly the features used can be ranked.
2.3. Bayesian optimization
In this paper, a Bayesian optimization [46] algorithm is
considered to optimize the hyperparameters of classication
model. The main idea of Bayesian optimization is that, for
a given optimized objective function, the posterior distribu-
tion of the objective function is updated by constantly adding
sample points, until the posterior distribution is basically t-
ted to the real distribution, so as better to adjust the current
parameters. There are two core processes in Bayesian optim-
ization: prior function (PF) and acquisition function (AC). To
achieve the objective function, the balance between explora-
tion and exploitation must be considered.
Suppose δ=δ1,δ2,…, δnrepresents hyperparameters of
classier C, and Dtrain and Dvalid are the training set and valid-
ation set, respectively. A(C, δ, Dtrain , Dvalid) and L(C, δ, Dtrain ,
Dvalid) denote the classication accuracy and validation loss
of C, respectively. K-fold cross-validation is applied and the
objective function of optimization can be described as
f(δ) = argmax(1
k
k
∑
i=1
A(C,δ,Dtrain,Dvalid )),(14)
or
f(δ) = argmin(1
k
k
∑
i=1
L(C,δ,Dtrain,Dvalid )).(15)
During the parameter optimization the model is trained con-
tinuously, whereas the classication performance for each
parameter combination is evaluated by calculating the object-
ive function. Compared with grid search or random search
[47], the advantages of Bayesian optimization lie in the fol-
lowing: rstly, the Gaussian process is adopted to continu-
ously update the prior by considering the information of pre-
vious parameters; secondly, the number of iterations is small
and the speed is fast; nally, for non-convex problems, it is
still robust and the result is globally optimal rather than locally
optimal.
3. Case I: MFS dataset verification
3.1. Data description and experimental setup
Firstly, a dataset of mixed rotor and bearing faults from
the Machinery Fault Simulator (MFS) platform was used to
verify the effectiveness of the proposed approach [48,49].
As shown in gure 4, the experimental platform (model
number: MFS2010-PK3) adopted here is developed by the
Spectra Quest company in the United States [48], which is
composed of an AC motor, coupling, acceleration sensor,
rotor, rolling bearing, centering adjustment plate, data acquis-
ition box and inverter. The data were collected under a
Table 1. Details of 10 types of faults.
Fault type Label Fault type Label
Central bent 1 Ball defect 6
Cocked rotor 2 Inner race defect 7
Couple bent 3 Outer race defect 8
Eccentric rotor 4 Normal 9
Unbalanced rotor 5 Combination defect of
inner and outer race
10
Figure 4. Experimental platform for rotor test [49].
single operational condition with the 6 kHz sampling fre-
quency, and the motor speed was 2100 rpm. The data-
set used includes 10 fault types, the details are shown in
table 1.
These fault types have a total of 1600 samples with 160
per type and 1000 data points for each sample. Python 3.7.3
is used for algorithm design and development in this paper,
and the experimental platform is congured with Intel Core
i5-6000hq CPU and 12 G RAM.
3.2. Parameter settings
Here, the number of lters of FTMFD is set as 32. That is to
say, the frequency band of each sample is evenly divided into
32 parts, and FE values of each sub-signal are calculated sep-
arately. Therefore, each sample corresponds to 32 fault fea-
tures. The parameters of FE are set as follows: the embedding
dimension m=2, the time delay λ=1, the similarity tolerance
r=0.15δ(δis the standard deviation of time series), and the
gradient of similar tolerance n=2. The dataset is composed
of extracted features and the category labels are divided into
training set and testing set. In order to eliminate the inuence
of contingency in sample division, the training set is divided
into training set and validation set according to 10-fold cross
validation, and then the trained model is used for classica-
tion prediction on testing set. The main hyperparameters of
LightGBM are shown in table 2, which are determined by the
Bayesian optimization algorithm. The details of parameters of
Bayesian optimization are listed in table 3. The objective of
optimization, i.e. the output of the algorithm, is to maximize
f(δ) in equation (14).
5
Meas. Sci. Technol. 32 (2021) 015004 C Zhang et al
Table 2. The main hyperparameters of LightGBM.
Parameter Value
Objective Multiclass
Number of classes 10
Learning rate 0.545
Number of boosting iterations 827
L1 regularization 0.001
L2 regularization 0.268
Max number of leaves in one tree 5
Limit the max depth for tree model 4
The number of seeds used to generate other seeds 50
Table 3. Details of main hyperparameters setting of Bayesian
optimization.
Parameter Value
Prior Function Gaussian process regression
Acquisition Function Probability of Improvement
Random State 30
Init_Pointsa100
N_iterb100
aInit_points is the number of steps of random exploration needs be
performed.
bN_iter is the number of steps needs be performed of Bayesian optimization.
3.3. Discussion
3.3.1. Research on different proportions of training samples.
The number of training samples will affect the classication
accuracy. In order to illustrate the advantages of the proposed
feature extraction method and LightGBM classier, different
proportions of training and testing samples are set in this sec-
tion. The classication results and training time are shown in
gure 5. The classication accuracy of the training sets reaches
100% of all experiments. When the ratio of training to testing
is set as 9:1, the accuracy of testing set reaches 100%. How-
ever, considering that there is accidental inuence with small
testing samples, and when the ratio is set as 3:2, the trained
model performs well in both validation set and testing set, so
we take the results at this time as the nal classication res-
ult. There is a positive correlation between model training time
and the ratio, but due to the speed of LightGBM, the training
time uctuates smoothly, which ranges from 1.0 s to 3.1 s.
3.3.2. Comparison with different decomposition methods.
To highlight the advantages of FTMFD, four decomposition
methods, EMD, LMD, VMD and WPD, are used for com-
parison. According to the actual decomposition, the FEs of
the rst six components of the decomposition sub-signals of
EMD and the rst four components of LMD are calculated as
fault features. According to the reference [29], the IMFs of
VMD can be determined referring to EMD, which is selec-
ted as 6 in this paper. Therefore, the FEs of the rst six, four
and six components corresponding to EMD, LMD, and VMD
are calculated as features, respectively. In addition, the WBF
of WPD is selected based on the principle of the ratio of
maximum energy to Shannon entropy [30]. Considering the
Figure 5. Different proportions of training and testing samples.
Table 4. Time consumption on model training and feature
extraction of different decomposition methods.
Method Feature extraction time (s)/sample Training time (s)
FTMFD 1.22 2.76
EMD 0.01 3.18
LMD 0.21 2.29
VMD 1.28 2.53
WPD 1.59 2.78
Table 5. Details of main parameters of the t-SNE setting.
Parameter Value
Algorithm Exact
NumPCAComponents 10
Perplexity 40
NumDimensions 2
Standardize False
LearnRate 2000
characteristics of vibration signals of rotating machinery and
different WBFs, ‘db’ wavelet, ‘sym’ wavelet and ‘coif’ wave-
let are considered here, among which the ‘coif3′is selected as
the optimal WBF by calculation. If the signal is decomposed
by the level-layer WPD, the frequency resolution can be calcu-
lated as df=fs/2level+1, where the sampling frequency is 6 kHz.
The more sub-bands divided, the more computation and fea-
ture redundancy will be increased. Therefore, the number of
decomposition layer is select as 5. That is to say, the number
of sub-signals of WPD is 32, which is the same as FTMFD.
The experimental results are shown in gure 6and table 4.
As can be seen from gure 6, except for EMD, the train-
ing accuracy of all the other decomposition methods reaches
100%. The classication accuracy of EMD, LMD and VMD
in the validation and testing set are all lower than FTMFD,
and the classication results of WPD and FTMFD are similar.
As can be seen from table 4, EMD consumes the least time in
6
Meas. Sci. Technol. 32 (2021) 015004 C Zhang et al
Table 6. Details of main parameters of the different settings of the
entropy-based methods.
Parameter SE PE DE SDE
Number of classes / / 10
Time delay factor 1 1 1 1
Embedding dimension 2 3 3 3
Similarity tolerance 0.15∗STDa/ /
Symbol interval number / / / 8
aSTD is the standard deviation of a signal.
Table 7. Time consumption of model training and feature extraction
of different entropy-based methods.
Entropy Feature extraction time (s)/sample Training time (s)
FE 1.22 2.76
SE 0.85 2.80
PE 0.15 4.94
DE 3.73 3.57
SDE 0.19 3.92
feature extraction, and FTMFD is faster than WPD when the
number of decomposed sub-signals are the same.
t-SNE is a common method used in data simplication and
feature visualization [50]. Through the visualization of feature
samples by t-SNE, the advantages and disadvantages of differ-
ent methods in feature extraction can be more intuitively seen.
The main parameters of t-SNE are listed in table 5. The results
are shown in gure 7, where it is not difcult to see that the
features extracted by FTMFD and WPD are easier to distin-
guish than EMD, LMD and VMD.
3.3.3. Comparison with different entropy-based methods.
The purpose of this section is to discuss the advantages
and disadvantages of the ability to extract fault features and
time consumption of different entropy-based methods. In this
part, SEs, PEs, DEs and SDEs of sub-signals decomposed
by FTMFD are calculated respectively, and their classica-
tion results are compared with the proposed approach using
FE. The parameter settings of all entropy-based methods are
shown in table 6.
The experimental results are shown in gure 8and table 7.
The training accuracy of all the methods reaches 100%. The
classication accuracy of FTMFD-PE in the validation and
testing set are the lowest, at only 87.14% and 85.42%. The
testing accuracy of FTMFD-SE is second to FTMFD-FE. It
can be seen from table 7that FTMFD-PE is the fastest in
the amount of time of feature extraction, while the slowest in
model training time. Therefore, among these entropy-based
methods, although the feature extraction time of FE is rel-
atively long, its classication accuracy is the highest and its
model training time is the shortest.
3.3.4. Comparison with different classication algorithms.
To illustrate that the feature extraction method proposed in this
paper has satisfactory fault diagnosis capability in combina-
tion with different classiers, SVM, RF, SAE, XGBoost and
Table 8. Time consumption of model training of different
classication algorithms.
Classier Training time (s) Classier Training time (s)
LightGBM 2.76 XGBoost 5.91
SVM 0.51 CatBoost 36.64
RF 47.01 SAE 178.82
CatBoost are used in this section to compare with LightGBM.
The main hyperparameters of these classication algorithms
are still determined by the Bayesian optimization algorithm.
The features extracted by FTMFD-FE are input into these dif-
ferent classiers, and the classication results are shown in g-
ure 9. As can be seen in gure 9, except for SVM and SAE, the
training accuracy of other classiers reaches 100%. The aver-
age accuracy on the validation set of RF is highest (98.75%).
The testing accuracy of LightGBM is highest, and the per-
formance of CatBoost is second to it. The time consumption
on model training of different classiers is shown in table 8,
among which the fastest is SVM, which is only 0.51 s. Cat-
Boost and SAE are slower, and SAE consumes the longest
time (178.82 s). Therefore, if the hyperparameters are selec-
ted suitably, there is little difference in the classication results
with different classiers, while the time consumption of model
training varies greatly, and an appropriate classier should be
selected according to the actual needs. The experiment res-
ults show that fault features extracted by the proposed method
are easy to be classied and recognized, and the classication
advantages of LightGBM in the fault diagnosis task of rotating
machinery are reected.
3.3.5. Comparison with different feature selection methods.
The number of features will affect the model training time
and classication accuracy of classiers. In order to illus-
trate the advantages of the proposed feature selection method,
ReliefF [51], JMI and mRMR are compared in this section.
The rst-round candidate features are selected by the above
methods. Then LightGBM is used to rank these features; the
curve between the number of features, model training time and
classication accuracy can be obtained according to the rank-
ing results.
The experimental results are shown in gures 10(a,b). It
can be seen from gure 10(a) that the training accuracy of
JMIM-LightGBM is the highest when two features are used
(99.79%). As can be seen from gure 10(b), when the num-
ber of features is small, it is correlated with the model train-
ing time; the reason may be that having fewer features is not
conducive to model building. When the number of features is
greater than 10, the relationship becomes positive, which is
consistent with engineering experience. The proposed method
is obviously superior to other methods when fewer features
are used: when 3 features are used, the classication accuracy
is more than 90%, while the other methods require 4 or more
features; when 12 features are used, the classication accur-
acy reaches the maximum (99.22%), and the model training
time is only 1.75 s. Therefore, by using the feature selection
7
Meas. Sci. Technol. 32 (2021) 015004 C Zhang et al
Figure 6. Classication results of different decomposition methods.
Figure 7. Feature visualization of different decomposition methods using t-SNE. (a) Combination of FTMFD and FE; (b) combination of
EMD and FE; (c) combination of LMD and FE; (d) combination of VMD and FE; (e) combination of WPD and FE.
method, fewer features can be selected in order to achieve a
better classication result and a shorter model training time.
4. Case II: KAT bearing dataset verification
4.1. Data description and experimental setup
The KAT bearing damage dataset was provided by the KAT
data center at Paderborn University [52]. The hardware con-
guration and settings of the experimental platform are shown
in [52]. There are 15 datasets, which can be categorized
into three classications as shown in table 9. The K0-series
(K001–K005) represent the healthy condition, the KA-series
(KA04, KA15, KA16, KA22, KA30) represent the outer bear-
ing ring with damage and the KI-series (KI04, KI14, KI16,
KI18, KI21) represent the inner bearing ring with damage. The
experiments are conducted with four different operating para-
meters, and the details are shown in table 10. Each experiment
is repeated 20 times, and the sampling frequency is 64 kHz. It
should be noted that the damage of the datasets is real damage
caused by accelerated lifetime test [53].
The details of the datasets used for experimental verica-
tion are shown in table 11. Datasets D1 to D4 correspond to
different fault types under the same working condition, and
8
Meas. Sci. Technol. 32 (2021) 015004 C Zhang et al
Figure 8. Classication results of different entropy-based methods.
Figure 9. Classication results of different classication algorithms.
(a) (b)
Figure 10. Comparison of different feature selection methods. (a) Classication results of the training set; (b) classication results of the
testing set and model training time.
dataset D5 contains all four working conditions. Each sample
contains 2560 non-overlapping data points, with a total of
1200 samples (100 samples for each fault type in each working
condition).
4.2. Feature visualization
The parameters of FTMFD and FE are set referring to sec-
tion 3.2; the extracted features by FTMFD-FE are compressed
9
Meas. Sci. Technol. 32 (2021) 015004 C Zhang et al
Table 9. Categorization of datasets.
Healthy (Class 1)
Outer ring damage
(Class 2)
Inner ring damage
(Class 3)
K001 KA04 KI04
K002 KA15 KI14
K003 KA16 KI16
K004 KA22 KI18
K005 KA30 KI21
Table 10. Four operation parameters.
No.
Rotational
speed (rpm)
Load torque
(Nm)
Radial force
(N)
Condition
number
0 1500 0.7 1000 C0
1 900 0.7 1000 C1
2 1500 0.1 1000 C2
3 1500 0.7 400 C3
Table 11. Details of the datasets of different working conditions.
Dataset label Working condition Number of samples
D1 C0 300
D2 C1 300
D3 C2 300
D4 C3 300
D5 C0, C1, C2, C3 1200
Figure 11. T-SNE visualization of features extracted by
FTMFD-FE.
into two dimensions by t-SNE. The main parameters of t-SNE
are still those in table 5in section 3.3.2 and the results are
shown in gure 11. It can be seen that the three fault types are
quite distinct, which indicates that the proposed feature extrac-
tion method can effectively extract fault features of the rolling
bearing.
Table 12. The main hyperparameters of LightGBM.
Parameter Value
Objective Multiclass
Number of classes 3
Learning rate 0.071
Number of boosting iterations 816
L1 regularization 0.957
L2 regularization 0.583
Max number of leaves in one tree 4
Limit the max depth for tree model 2
The number of seeds used to generate other seeds 50
4.3. Discussion
For the data after feature extraction, the ratio of training to test-
ing samples is still set as 3:2. The main hyperparameters of
LightGBM are still optimized by the Bayesian optimization
algorithm, and the results are shown in table 12. The classi-
cation results when all 32 features are used are shown in g-
ure 12. The training accuracy of all the datasets reaches 100%.
For the single working condition datasets (D1 to D4), except
for D3, the testing accuracy is 97.78%; the other datasets reach
100%, and the testing accuracy of D5 is 99.72% under the most
complex working conditions. In addition, in reference [53],
the prediction accuracy of the negative correlation ensemble
transfer learning method (NCTE) on the KAT bearing dataset
is 98.73%, which is slightly lower than the accuracy achieved
by the method proposed in this paper.
4.3.1. Comparison with different feature selection methods.
According to the experiments in section 3.3.5, JMIM is still
compared with ReliefF, JMI and mRMR, and the results are
shown in gure 13. As can be seen from gure 13(a), the train-
ing accuracy of JMIM-LightGBM reaches 100% when two
features are used, while the other methods need more features.
As can be seen from gure 13(b), when only one feature is
selected, the testing accuracy of JMIM-LightGBM is relat-
ively low, only 77.78%, while mRMR-LightGBM is highest
(89.17%). However, when 2 features are selected, the test-
ing accuracy of the proposed method reaches 97.50%, and it
reaches maximum (99.72%) when 14 features are used; mean-
while the model training time is only 0.36 s. The results further
indicate the effectiveness of the feature extraction method.
4.3.2. Comparison with different classication algorithms.
In this section, LightGBM is compared with other classiers
including SVM, RF and CatBoost. JMIM is used to select can-
didate features, and the classication results are shown in g-
ure 14. As can be seen from gure 14(a), the training accuracy
of RF is highest when one feature is used, which is 99.52%,
while the accuracy of LightGBM reaches 100% when two or
more features are used. According to the classication accur-
acy curve in gure 14(b), when one features is used, the test-
ing accuracy of SVM is highest (91.11%). When the number
of features is small, there is little difference in the classica-
tion accuracy of all four classiers. But when the number of
10
Meas. Sci. Technol. 32 (2021) 015004 C Zhang et al
Figure 12. Classication accuracy on datasets of different working conditions.
(a) (b)
Figure 13. Comparison of different feature selection methods. (a) Classication results of the training set; (b) classication results of the
testing set and model training time.
(a) (b)
Figure 14. Comparison of different classiers. (a) Classication results of training set; (b) classication results of testing set and model
training time.
11
Meas. Sci. Technol. 32 (2021) 015004 C Zhang et al
features is greater than seven, the classication accuracy of
LightGBM is slightly higher than other classiers. In terms
of model training time, LightGBM is only second to SVM,
while CatBoost consumes the longest time. Experimental res-
ults further illustrate the speed and superiority of LightGBM
classier.
5. Conclusion
In this paper, a new fault diagnosis approach for the key com-
ponents of rotating machinery based on FTMFD-FE, JMIM
and LightGBM is proposed. For non-linear and non-stationary
mechanical vibration signals, the combination of FTMFD and
FE for monitoring signal pretreatment and feature extraction
can effectively extract hidden mechanical fault features. While
retaining the advantages of information entropy, it overcomes
the problem that traditional multiscale analysis cannot effect-
ively extract high-frequency information and helps improve
classication accuracy. On this basis, fault feature selection
based on JMIM and LightGBM is used to effectively reduce
redundant features and simplify classier model construction,
and thus the model training time can be reduced. Finally,
the effectiveness of the proposed approach is experimentally
veried on the MFS dataset and the KAT bearing dataset
by comparative experiments of signal decomposition, feature
extraction and feature selection, respectively. The experi-
mental results show that the proposed approach can effect-
ively identify the fault states for the key components of rotat-
ing machinery. Moreover, the effectiveness of the proposed
approach under multiple working conditions is also veried
on the KAT bearing dataset, and the classication accuracy
reaches 99.72%.
The actual working environment for the key components of
rotating machinery is more complex and changeable, so future
research will focus on feature extraction and classication
of vibration signals under more complex working conditions.
In addition, except for the combinination with information
entropy, the combination of FTMFD with some dimension-
less time-domain indexes such as kurtosis for feature extrac-
tion can also be researched in future work.
Acknowledgments
The work here is supported by the National Key Research
and Development Program of China (No. 2018YFB2003303),
the Fundamental Research Funds for the Central Universities
(No. 2019kfyXJJS137), the research fund (No. 61400020401),
and the Nondestructive Detection and Monitoring Techno-
logy for High Speed Transportation Facilities, Key Laborat-
ory of Ministry of Industry and Information Technology (Nos.
KL2019W003 and KL2019W004).
ORCID iDs
Changhe Zhang https://orcid.org/0000-0001-7046-9240
Qi Xu https://orcid.org/0000-0002-9795-1616
Kaibo Zhou https://orcid.org/0000-0003-0055-3193
Hao Pan https://orcid.org/0000-0001-9324-0545
References
[1] Liu R, Yang B, Zio E and Chen X 2018 Articial intelligence
for fault diagnosis of rotating machinery: a review Mech.
Syst. Signal Process. 108 33–47
[2] Liu J, Hu Y, Wang Y, Wu B, Fan J and Hu Z 2018 An
integrated multi-sensor fusion-based deep feature learning
approach for rotating machinery diagnosis Meas. Sci.
Technol. 29 055103
[3] Lei Y, Lin J, He Z and Zuo M J 2013 A review on empirical
mode decomposition in fault diagnosis of rotating
machinery Mech. Syst. Signal Process. 35 108–26
[4] Wei Y et al 2019 A review of early fault diagnosis approaches
and their applications in rotating machinery Entropy
21 409
[5] Li Y, Li G, Yang Y, Liang X and Xu M 2018 A fault diagnosis
scheme for planetary gearboxes using adaptive multi-scale
morphology lter and modied hierarchical permutation
entropy Mech. Syst. Signal Process. 105 319–37
[6] Li Y, Xu M, Wang R and Huang W 2016 A fault diagnosis
scheme for rolling bearing based on local mean
decomposition and improved multiscale fuzzy entropy J.
Sound Vib. 360 277–99
[7] Li Y, Yang Y, Li G, Xu M and Huang W 2017 A fault
diagnosis scheme for planetary gearboxes using modied
multi-scale symbolic dynamic entropy and mRMR feature
selection Mech. Syst. Signal Process. 91 295–312
[8] Gao Y and Yu D 2020 Total variation on horizontal visibility
graph and its application to rolling bearing fault diagnosis
Mech. Mach. Theory 147 103768
[9] Wang L and Shao Y 2020 Fault feature extraction of rotating
machinery using a reweighted complete ensemble empirical
mode decomposition with adaptive noise and demodulation
analysis Mech. Syst. Signal Process. 138 106545
[10] Medina R, Macancela J C, Lucero P, Cabrera D, Cerrada M,
S´
anchez R-V and V´
asquez R E 2019 Vibration signal
analysis using symbolic dynamics for gearbox fault
diagnosis Int. J. Adv. Manuf. Technol. 104 2195–214
[11] Wen X et al 2020 Graph modeling of singular values for early
fault detection and diagnosis of rolling element bearings
Mech. Syst. Signal Process. 145 106956
[12] Liu Q, Pan H, Zheng J, Tong J and Bao J 2019 Composite
interpolation-based multiscale fuzzy entropy and its
application to fault diagnosis of rolling bearing Entropy
21 292
[13] Yan X and Jia M 2019 Intelligent fault diagnosis of rotating
machinery using improved multiscale dispersion entropy
and mRMR feature selection Knowl. Based Syst.
163 450–71
[14] Chen L and Wan S 2020 Mechanical fault diagnosis of
high-voltage circuit breakers using multi-segment
permutation entropy and a density-weighted one-class
extreme learning machine Meas. Sci. Technol.
31 85107
[15] Pincus S 1995 Approximate entropy (ApEn) as a complexity
measure Chaos 5110–7
[16] Richman J S and Moorman J R 2000 Physiological time-series
analysis using approximate entropy and sample entropy Am.
J. Physiol. Heart Circ. Physiol. 278 H2039–49
[17] Chen W, Zhuang J, Yu W and Wang Z 2009 Measuring
complexity using fuzzyen, apen, and sampen Med. Eng.
Phys. 31 61–68
[18] Bandt C and Pompe B 2002 Permutation entropy: a natural
complexity measure for time series Phys. Rev. Lett.
88 174102
12
Meas. Sci. Technol. 32 (2021) 015004 C Zhang et al
[19] Rostaghi M and Azami H 2016 Dispersion entropy: A measure
for time-series analysis IEEE Signal Process. Lett. 23 610–4
[20] Li Y, Wang X, Liu Z, Liang X and Si S 2018 The entropy
algorithm and its variants in the fault diagnosis of rotating
machinery: A review IEEE Access 666723–41
[21] Costa M, Goldberger A L and Peng C K 2005 Multiscale
entropy analysis of biological signals Phys. Rev. E
71 021906
[22] Wu S D, Wu C W, Lin S G, Lee K-Y and Peng C-K 2014
Analysis of complex time series using rened composite
multi-scale entropy Phys. Lett. A378 1369–74
[23] Wu S D, Wu C W, Lee K Y and Lin S-G 2013 Modied
multiscale entropy for short-term time series analysis
Physica A392 5865–73
[24] Huang N E, Shen Z, Long S R, Wu M C, Shih H H, Zheng Q,
Yen N-C, Tung C C and Liu H H 1998 The empirical mode
decomposition and the Hilbert spectrum for nonlinear and
non-stationary time series analysis Proc. R. Soc. Lond. A
454 903–95
[25] Smith J S 2005 The local mean decomposition and its
application to EEG perception data J. R. Soc. Interface
2443–54
[26] Dragomiretskiy K and Zosso D 2013 Variational mode
decomposition IEEE Trans. Signal Process.
62 531–44
[27] Wang Y, He Z and Zi Y 2010 A comparative study on the local
mean decomposition and empirical mode decomposition
and their applications to rotating machinery health
diagnosis J. Vib. Acoust. 132 2
[28] Liu W Y, Zhang W H, Han J G and Wang G F 2012 A new
wind turbine fault diagnosis method based on the local
mean decomposition Renew. Energy 48 411–5
[29] Li F, Li R, Tian L, Chen L and Liu J 2019 Data-driven
time-frequency analysis method based on variational mode
decomposition and its application to gear fault diagnosis in
variable working conditions Mech. Syst. Signal Process.
116 462–79
[30] Kankar P K, Sharma S C and Harsha S P 2013 Fault
diagnosis of rolling element bearing using cyclic
autocorrelation and wavelet transform Neurocomputing
110 9–17
[31] Eren L and Devaney M J 2004 Bearing damage detection via
wavelet packet decomposition of the stator current IEEE
Trans. Instrum. Meas.
53 431–6
[32] Pan H, Zhou K B and Liu J 2019 A fault diagnosis method for
rolling bearings based on Fourier transform multi-lter
decomposition and permutation entropy Proc. 13th
National Conf. Vibration Theory and Application (Chinese
Society of Vibration Engineering) pp 259–64 (in Chinese)
[33] Rauber T W, de Assis Boldt F and Varej˜
ao F M 2014
Heterogeneous feature models and feature selection applied
to bearing fault diagnosis IEEE Trans. Ind. Electron.
62 637–46
[34] Hu Q, Si X S, Zhang Q H and Qin A-S 2020 A rotating
machinery fault diagnosis method based on multi-scale
dimensionless indicators and random forests Mech. Syst.
Signal Process. 139 106609
[35] Peng H, Long F and Ding C 2005 Feature selection based on
mutual information criteria of max-dependency,
max-relevance, and min-redundancy IEEE Trans. Pattern
Anal. Mach. Intell. 27 1226–38
[36] Yang H and Moody J 1999 Feature selection based on joint
mutual information Proc. Int. ICSC Symp. Advances in
Intelligent Data Analysis pp 22–25
[37] Bennasar M, Hicks Y and Setchi R 2015 Feature selection
using joint mutual information maximisation Expert Syst.
Appl. 42 8520–32
[38] Fu W et al 2020 Fault diagnosis for rolling bearings based on
composite multiscale ne-sorted dispersion entropy and
SVM with hybrid mutation SCA-HHO algorithm
optimization IEEE Access 813086–104
[39] Zabalza J, Ren J, Zheng J, Zhao H, Qing C, Yang Z, Du P and
Marshall S 2016 Novel segmented stacked autoencoder for
effective dimensionality reduction and feature extraction in
hyperspectral imaging Neurocomputing 185 1–10
[40] Zhou Q, Li Y, Tian Y and Jiang L 2020 A novel method based
on nonlinear auto-regression neural network and
convolutional neural network for imbalanced fault diagnosis
of rotating machinery Measurement 161 107880
[41] Zhou K B, Zhang Z X, Liu J, Hu Z-X, Duan X-K and Xu Q
2018 Anode effect prediction based on a singular value
thresholding and extreme gradient boosting approach Meas.
Sci. Technol. 30 015104
[42] Ke G et al 2017 Lightgbm: a highly efcient gradient boosting
decision tree Adv. Neural Inf. Process. Syst. 30 3146–54
[43] Prokhorenkova L et al 2018 CatBoost: unbiased boosting with
categorical features Adv. Neural Inf. Process. Syst. pp
6638–48
[44] Sun X, Liu M and Sima Z 2018 A novel cryptocurrency price
trend forecasting model based on LightGBM Finance Res.
Lett. 32 101084
[45] Zhang J, Wen H and Tang L 2019 Improved smoothing
frequency shifting and ltering algorithm for harmonic
analysis with systematic error compensation IEEE Trans.
Ind. Electron. 66 9500–9
[46] Snoek J, Larochelle H and Adams R P 2012 Practical bayesian
optimization of machine learning algorithms Adv. Neural
Inf. Process. Syst. pp 2951–9
[47] Bergstra J and Bengio Y 2012 Random search for
hyper-parameter optimization J. Mach. Learn. Res.
13 281–305
[48] Shan Y, Zhou J, Jiang W, Liu J, Xu Y and Zhao Y 2019 A fault
diagnosis method for rotating machinery based on improved
variational mode decomposition and a hybrid articial
sheep algorithm Meas. Sci. Technol. 30 055002
[49] Ge M et al 2020 A deep condition feature learning approach
for rotating machinery based on MMSDE and optimized
SAEs Meas. Sci. Technol. (accepted) (https://doi.org.
10.1088/1361-6501/ab89e3)
[50] Maaten L and Hinton G 2008 Visualizing data using t-SNE J.
Mach. Learn. Res. 92579–605
[51] Robnik-Šikonja M and Kononenko I 2003 Theoretical and
empirical analysis of ReliefF and RReliefF Mach. Learn.
53 23–69
[52] Lessmeier C et al 2016 Condition monitoring of bearing
damage in electromechanical drive systems by using motor
current signals of electric motors: a benchmark data set for
data-driven classication Proc. Eur. Conf. Prognostics and
Health Management Society pp 05–08
[53] Wen L, Gao L, Dong Y and Zhu Z 2019 A negative correlation
ensemble transfer learning method for fault diagnosis based
on convolutional neural network Math. Biosci. Eng.
16 3311–30
13
A preview of this full-text is provided by IOP Publishing.
Content available from Measurement Science and Technology
This content is subject to copyright. Terms and conditions apply.