ArticlePDF Available

Fault diagnosis of key components in the rotating machinery based on Fourier transform multi-filter decomposition and optimized LightGBM

IOP Publishing
Measurement Science and Technology
Authors:

Abstract and Figures

Rotating machinery is a primary element of mechanical equipment, and thus fault diagnosis of its key components is very important to improve the reliability and safety of modern industrial systems. The key point to diagnose the faults of these components is to extract effectively the hidden fault information. However, the actual vibration signals of rotating machinery have nonlinear and non-stationary characteristics, so traditional signal decomposition methods are unable to extract the frequency components accurately, leading to spectrum overlap of the decomposed sub-signals. Therefore, a rotating machinery fault diagnosis approach based on Fourier transform multi-filter decomposition (FTMFD), fuzzy entropy (FE), joint mutual information maximization (JMIM), and a light gradient boosting machine (LightGBM), is proposed in this paper. FTMFD is used to extract the frequency domain information of the raw vibration signals, whereas FE is used to calculate and extract the fault information of the decomposed sub-signals. Then feature selection is carried out by using JMIM to reduce the influence of redundant features on data analysis and classification accuracy. Furthermore, LightGBM is used to rank the candidate features and outputs the fault diagnosis result. Experimental results from two real datasets show that the proposed method achieves higher accuracy with fewer features than some existing methods for fault recognition. Various working conditions are also considered and verified.
This content is subject to copyright. Terms and conditions apply.
Measurement Science and
Technology
PAPER
Fault diagnosis of key components in the rotating
machinery based on Fourier transform multi-filter
decomposition and optimized LightGBM
To cite this article: Changhe Zhang
et al
2021
Meas. Sci. Technol.
32 015004
View the article online for updates and enhancements.
You may also like
Fast prediction of reservoir permeability
based on embedded feature selection and
LightGBM using direct logging data
Kaibo Zhou, Yangxiang Hu, Hao Pan et al.
-
Disruption prediction and model analysis
using LightGBM on J-TEXT and HL-2A
Y Zhong, W Zheng, Z Y Chen et al.
-
Estimation of Stellar Atmospheric
Parameters with Light Gradient Boosting
Machine Algorithm and Principal
Component Analysis
Junchao Liang, Yude Bu, Kefeng Tan et
al.
-
This content was downloaded from IP address 115.156.143.170 on 03/04/2024 at 22:42
Measurement Science and Technology
Meas. Sci. Technol. 32 (2021) 015004 (13pp) https://doi.org/10.1088/1361-6501/aba93b
Fault diagnosis of key components in
the rotating machinery based on Fourier
transform multi-filter decomposition and
optimized LightGBM
Changhe Zhang, Li Kong, Qi Xu, Kaibo Zhouand Hao Pan
Key Laboratory of Image Processing and Intelligent Control of Education Ministry, School of Articial
Intelligence and Automation, Huazhong University of Science and Technology, Wuhan 430074, People’s
Republic of China
E-mail: xuqi@hust.edu.cn
Received 12 April 2020, revised 13 July 2020
Accepted for publication 24 July 2020
Published 23 October 2020
Abstract
Rotating machinery is a primary element of mechanical equipment, and thus fault diagnosis of
its key components is very important to improve the reliability and safety of modern industrial
systems. The key point to diagnose the faults of these components is to extract effectively the
hidden fault information. However, the actual vibration signals of rotating machinery have
nonlinear and non-stationary characteristics, so traditional signal decomposition methods are
unable to extract the frequency components accurately, leading to spectrum overlap of the
decomposed sub-signals. Therefore, a rotating machinery fault diagnosis approach based on
Fourier transform multi-lter decomposition (FTMFD), fuzzy entropy (FE), joint mutual
information maximization (JMIM), and a light gradient boosting machine (LightGBM), is
proposed in this paper. FTMFD is used to extract the frequency domain information of the raw
vibration signals, whereas FE is used to calculate and extract the fault information of the
decomposed sub-signals. Then feature selection is carried out by using JMIM to reduce the
inuence of redundant features on data analysis and classication accuracy. Furthermore,
LightGBM is used to rank the candidate features and outputs the fault diagnosis result.
Experimental results from two real datasets show that the proposed method achieves higher
accuracy with fewer features than some existing methods for fault recognition. Various working
conditions are also considered and veried.
Keywords: Fourier transform multilter decomposition, fuzzy entropy, joint mutual information
maximization, LightGBM classier, rotating machinery, fault diagnosis
(Some gures may appear in colour only in the online journal)
1. Introduction
Rotating machinery is widely used in industrial production
[13]. However, its primary components are likely to be dam-
aged during use due to the complex and harsh working envir-
onment, which severely inuences the production safety of
modern industrial systems [4,5]. Therefore, it is of great
importance to carry out an investigation on fault diagnosis
for the key components of rotating machinery. Generally
speaking, the fault diagnosis of rotating machinery based on
vibration signal analysis is composed of three main steps:
vibration signal extraction, fault feature extraction and fault
pattern recognition [1,6]. The feature extraction is most
important and often directly affects the nal diagnosis res-
ult [7,8]. It has been reported that the commonly used
signal analysis or fault feature extraction methods include
time-domain analysis, frequency-domain analysis and time-
frequency domain analysis [911]. However, the vibration
1361-6501/21/015004+13$33.00 1 © 2020 IOP Publishing Ltd Printed in the UK
Meas. Sci. Technol. 32 (2021) 015004 C Zhang et al
monitoring signal of rotating machinery is often nonlinear
and non-stationary, and some investigators have shown that
it is difcult to extract effectively the fault features from
non-stationary signals by using the time domain or frequency
domain methods [8,11,12].
In recent years, the entropy-based feature extraction meth-
ods have been widely used in signal analysis, image pro-
cessing, mechanical fault diagnosis and so on [13,14]. Entropy
is a measure of the randomness or disorder of time series,
which mainly includes approximate entropy (AE) [15], sample
entropy (SE) [16], fuzzy entropy (FE) [12,17], permutation
entropy (PE) [14,18], dispersion entropy (DE) [19], and sym-
bol dynamics entropy (SDE) [7], etc. Although these entrop-
ies measure the complexity of time series on a single scale,
the information on other scales are ignored [20]. Costa et al
combined a multiscale procedure with SE to obtain multiscale
entropy (MSE) [21]. However, the multiscale analysis made
use of the coarse-graining procedure with reduced data length
and an increased scale factor, leading to inaccurate estimation
[7]. Even some improved multiscale analysis methods were
also coarse-grained regarding the time series used to carry out
low-pass ltering of the vibration signal, which may have res-
ulted in the loss of high-frequency information [12,13,22,23].
In addition, some adaptive time-frequency decomposition
methods are widely used in the eld of fault diagnosis, such
as empirical mode decomposition (EMD) [24], local mean
decomposition (LMD) [25], variational mode decomposition
(VMD) [26] and so on. However, EMD suffers from the end
effect, mode mixing, envelope overshoot and undershoot [27],
whereas LMD has the defects of the endpoint effect, mode ali-
asing and low computational efciency [27,28]. For VMD
with anti-mode aliasing and noise robustness, the optimiz-
ation calculation requires a large amount of computational
resources, and the parameters such as penalty factor aand
mode number Kneed to be dened in advance [14,29]. Dif-
ferent from these decomposition methods, which lost some
frequency components of the original signal, the wavelet
packet decomposition (WPD) retains low-frequency and high-
frequency information well, with the decomposition results
largely depending on the selection of wavelet basis function
(WBF) and decomposition layers [30,31].
In this paper, the Fourier transform multi-lter decompos-
ition (FTMFD) is combined with FE to obtain sufcient fault
information from the time series for fault diagnosis. FTMFD is
an adaptive decomposition method used to completely retain
the fault information of low and high frequencies [32], while
it uses FE to replace the Heaviside function with a Gaussian
function in order to avoid the drawbacks of SE [17], is bene-
cial for extracting fault information from vibration signals
with good robustness, and has high sensitivity to the dynam-
ical change and insensitivity to background noise [6,12,20].
Therefore, FTMFD is used to decompose the vibration signal,
and then the FE values of decomposed sub-signals are calcu-
lated to form fault feature vectors with the advantage of the
information entropy method in measuring the dynamic char-
acteristics of the time series.
In order to reduce the redundant features and improve the
classication accuracy, feature selection is usually used to
nd the optimal feature subset based on the extracted fault
features [33,34]. For the lter feature selection based on
mutual information, Peng et al studied max-relevance and
min-redundancy (mRMR) based on maximum-dependency,
maximum-correlation and minimum-redundancy criteria [35],
which is widely used in the eld of fault diagnosis and its only
disadvantage is that the size of the mutual information is con-
sidered after the addition of a single feature [7,13]. In addition,
feature selection methods based on joint mutual information
(JMI) are also widely used, such as joint mutual information
[36] and joint mutual information maximization (JMIM) [37].
JMI ignores the case of single feature correlation so that the
correlation between two or more features is reduced, whereas
JMIM considers the overall stability of JMI to ensure the sta-
bility of the selected features.
Since the fault recognition and diagnosis of rotating
machinery are carried out based on the optimal feature sub-
set, the classication algorithm used in the nal stage dir-
ectly affects the performance of diagnosis methods. The
commonly used classiers include support vector machine
(SVM) [38], random forests (RF) [34], stacked auto-encoders
(SAEs) [7,39], convolutional neural network (CNN) [40], and
gradient boosting decision tree (GBDT) frameworks such as
XGBoost [41], light gradient boosting machine (LightGBM)
[42] and CatBoost [43]. Compared with the traditional meth-
ods, LightGBM is a fast and efcient classication algorithm,
exhibiting a good performance in many machine learning
tasks, e.g. regression, classication, sorting and so on [42,44].
Based on the GBDT algorithm, it is possible to obtain the con-
tribution of each feature in model training for the development
of an embedded feature selection (EFS) method.
In this paper, a feature extraction method combining
FTMFD with FE is proposed to solve the problem of fre-
quency information loss and low computing efciency exist-
ing in some traditional methods, where FTMFD is used to
decompose the signal with the advantage of FE in measuring
the dynamic characteristics of the time series. Then the JMIM
and LightGBM method is used to extract the effective fea-
tures to reduce redundant features and simplify the classier
modeling. A Bayesian optimization algorithm is further intro-
duced to optimize the hyperparameters in the LightGBM clas-
sier to improve the classication accuracy of fault diagnosis,
which is also considered to be available in other classication
algorithms. This paper is organized as follows. Section 2intro-
duces the proposed approach and the methodology. Section 3
shows the experiments results of the proposed approach with
an MFS dataset. Further experimental verication and com-
parison results using KAT bearing datasets are discussed in
section 4. The conclusion and future research are presented in
section 5.
2. The proposed approach
The owchart of the fault diagnosis approach for the key com-
ponents in the rotating machinery is shown in gure 1. Firstly,
FTMFD is applied to the original time series signal to extract
the frequency domain information of the vibration signal, and
2
Meas. Sci. Technol. 32 (2021) 015004 C Zhang et al
Figure 1. Flowchart of the proposed fault diagnosis approach.
FE is used to calculate and extract the fault information of
the decomposed sub-signals. Then, JMIM is used to select the
candidate feature set F1. On this basis, LightGBM is used to
rank the candidate features in F1; according to the ranking res-
ult these features are added in turn with labels to form a new
dataset, so that the curve of feature number and classication
accuracy can be obtained. When the classication accuracy
reaches the maximum, the features used are composed into the
nal selected feature set F2. Finally, the LightGBM classier
is used to train and classify these selected fault features. To
verify the effectiveness of the proposed approach, two kinds
of datasets are used in this paper.
2.1. Signal preprocessing and feature extraction method
2.1.1. Fourier transform multi-lter decomposition. Given a
time series {x(i)}, i=−∞,…,1,0,1,…,+, the Fourier
transform of {x(i)} is dened as
X(ejω) =
+
n=−∞
x(i)ejωi,(1)
Figure 2. Fourier transform multi-lter decomposition.
where fis the frequency and ω=2πfdenotes the angular fre-
quency. The inverse Fourier transform of X(ejω) is
x(i) = 1
2πˆ2π
X(ejω)ejωidω. (2)
In practical engineering applications, most of the non-
stationary vibration signals collected are limited digital dis-
crete signals. Assume the length of time series {x(i)} is N, then
equations (1) and (2) can be rewritten as
X(k) =
N
i=1
x(i)ej2πki/N,(3)
and
x(i) = 1
N
N
k=1
X(k)ej2πki/N,(4)
where kdenotes the frequency components.
An important application of the Fourier transform is sig-
nal ltering [45], which can be summarized in three steps.
First, the Fourier transform is performed to transform the sig-
nal from the time domain to the frequency domain through
equation (3); second, with the lter H(k), some required fre-
quency components are retained and other unnecessary fre-
quency components are ltered out of the spectrum, that is
X(k) = X(k)H(k) = {X(k),kskke
0,other ,(5)
where ksand kecorrespond to the start frequency and cutoff
frequency of H(k), respectively. Finally, the inverse Fourier
transform is performed on the ltered spectrum X(k), and the
ltered signal x(i) is
x(i) = 1
NX(k)ej2πki/N.(6)
The idea of FTMFD is to lter the signals by lters with differ-
ent passbands, and then carry out the inverse Fourier transform
on the ltered spectrums to obtain sub-signals with different
frequency components, as shown in gure 2.
Let fi=[fis, fie ], i=1, 2, …, n, where firepresents the
lter passband of lter Hi(f), and fis and fie represent the start
frequency and cutoff frequency of Hi(f) respectively.
3
Meas. Sci. Technol. 32 (2021) 015004 C Zhang et al
If FTMFD is designed to be adaptive and the number of l-
ters is n, then according to a certain strategy, the signal spec-
trum can be divided into nparts and ndifferent sub-bands can
be obtained. The relationship of the sub-bands satises
{f1f2 · · · fn= (0,Fs/2)
fifj=ϕ, i=j,1i,jn,(7)
where Fsis the sampling frequency, and the original signal
s(t) is equal to the sum of all sub-signals si(t), i=1, 2, …, n,
namely
s(t) =
n
i=1
si(t),i=1,2,..., n.(8)
In this paper, the spectrum is evenly divided according to log-
arithmic coordinates; then |∆fi|=|∆fj|, i=j, 1 i, j N.
Moreover, the spectrum can be divided by other strategies,
such as energy, which requires certain prior knowledge such
as frequency characteristics of vibration signals. In general,
FTMFD is a exible decomposition strategy, in which the
details can be adjusted according to actual needs.
2.1.2. Fuzzy entropy. FE is an improvement on SE, in which
the Heaviside function is replaced by a Gaussian function to
measure the similarity between two vectors, which can effect-
ively overcome the shortcoming of SE in practical applica-
tions.
For a given n-dimensional time series x(i), i=1, 2, …, N,
the similarity of FE is dened as follows.
Dm
ij =µ(dm
ij ,n,r) = eln2(dm
ij /r)n,(9)
where ris the similarity tolerance. The dm
ij represents the dis-
tance between Xm
iand Xm
j. Dene the function φmas
φm(n,r) = 1
Nm
Nm
i=1
1
Nm1
Nm
j=1,j=i
Dm
ij
.(10)
Then, FE can be expressed as
FE(m,n,r,N) = lnφm(n,r)lnφm+1(n,r).(11)
The owchart of the FE method is shown in gure 3.
2.2. Feature selection method
2.2.1. Joint mutual information maximization. JMIM uses the
following iterative greedy search algorithm to nd the relevant
feature subset of size kin the feature space.
I(fi,fs;C)=I(fs;C)+I(fi;C|fs),(12)
(1) For a feature set F={f1,f2, …, fN}, the feature selec-
tion process identies a feature subset Swith dimension k,
where kN, and SF. Theoretically, the selected feature
subset Sshould maximize the JMI between class label C
and feature subset Swith xed dimension k.
Figure 3. Flowchart of the FE method.
(2) Calculate the value of JMI between fi, fsand C:
where I(fs; C) represents the value of mutual information
between fsand C. The larger it is, the stronger the correlation
between fsand C. I (fi; C|fs) represents the value of mutual
information between fiand Cunder condition fs.
fJMIM =argmaxfiSI(fi,fs;C).(13)
(3) JMIM selects features according to the following criteria:
According to equation (13), JMIM considers the value of
each I(fi, fs; C). After the addition of feature fi, there is at
least feature fsin the subset, which makes the value of I(fi, fs;
C) larger than the condition of other features added.
2.2.2. Embedded feature selection with LightGBM. EFS
methods use the performance of the learning algorithm to eval-
uate the quality of the feature subset. Firstly, for a feature sub-
set to be evaluated, the EFS method need to train the classier
in advance. After that, the weight coefcients of each feature
can be obtained according to certain indexes, such as the regu-
larization term or loss function. Finally, the features are selec-
ted and ranked according to the weight coefcients.
Most GBDT algorithms, such as XGBoost, use an inef-
cient decision tree growth strategy called the level-wise
method. It is replaced by the leaf-wise method in LightGBM
to split the nodes of the weak learner. When the tree model
is selected as the basic learner of LightGBM, the sum of the
4
Meas. Sci. Technol. 32 (2021) 015004 C Zhang et al
information gain or the frequency used by each feature during
the splitting process can be obtained after training the model,
and accordingly the features used can be ranked.
2.3. Bayesian optimization
In this paper, a Bayesian optimization [46] algorithm is
considered to optimize the hyperparameters of classication
model. The main idea of Bayesian optimization is that, for
a given optimized objective function, the posterior distribu-
tion of the objective function is updated by constantly adding
sample points, until the posterior distribution is basically t-
ted to the real distribution, so as better to adjust the current
parameters. There are two core processes in Bayesian optim-
ization: prior function (PF) and acquisition function (AC). To
achieve the objective function, the balance between explora-
tion and exploitation must be considered.
Suppose δ=δ1,δ2,…, δnrepresents hyperparameters of
classier C, and Dtrain and Dvalid are the training set and valid-
ation set, respectively. A(C, δ, Dtrain , Dvalid) and L(C, δ, Dtrain ,
Dvalid) denote the classication accuracy and validation loss
of C, respectively. K-fold cross-validation is applied and the
objective function of optimization can be described as
f(δ) = argmax(1
k
k
i=1
A(C,δ,Dtrain,Dvalid )),(14)
or
f(δ) = argmin(1
k
k
i=1
L(C,δ,Dtrain,Dvalid )).(15)
During the parameter optimization the model is trained con-
tinuously, whereas the classication performance for each
parameter combination is evaluated by calculating the object-
ive function. Compared with grid search or random search
[47], the advantages of Bayesian optimization lie in the fol-
lowing: rstly, the Gaussian process is adopted to continu-
ously update the prior by considering the information of pre-
vious parameters; secondly, the number of iterations is small
and the speed is fast; nally, for non-convex problems, it is
still robust and the result is globally optimal rather than locally
optimal.
3. Case I: MFS dataset verification
3.1. Data description and experimental setup
Firstly, a dataset of mixed rotor and bearing faults from
the Machinery Fault Simulator (MFS) platform was used to
verify the effectiveness of the proposed approach [48,49].
As shown in gure 4, the experimental platform (model
number: MFS2010-PK3) adopted here is developed by the
Spectra Quest company in the United States [48], which is
composed of an AC motor, coupling, acceleration sensor,
rotor, rolling bearing, centering adjustment plate, data acquis-
ition box and inverter. The data were collected under a
Table 1. Details of 10 types of faults.
Fault type Label Fault type Label
Central bent 1 Ball defect 6
Cocked rotor 2 Inner race defect 7
Couple bent 3 Outer race defect 8
Eccentric rotor 4 Normal 9
Unbalanced rotor 5 Combination defect of
inner and outer race
10
Figure 4. Experimental platform for rotor test [49].
single operational condition with the 6 kHz sampling fre-
quency, and the motor speed was 2100 rpm. The data-
set used includes 10 fault types, the details are shown in
table 1.
These fault types have a total of 1600 samples with 160
per type and 1000 data points for each sample. Python 3.7.3
is used for algorithm design and development in this paper,
and the experimental platform is congured with Intel Core
i5-6000hq CPU and 12 G RAM.
3.2. Parameter settings
Here, the number of lters of FTMFD is set as 32. That is to
say, the frequency band of each sample is evenly divided into
32 parts, and FE values of each sub-signal are calculated sep-
arately. Therefore, each sample corresponds to 32 fault fea-
tures. The parameters of FE are set as follows: the embedding
dimension m=2, the time delay λ=1, the similarity tolerance
r=0.15δ(δis the standard deviation of time series), and the
gradient of similar tolerance n=2. The dataset is composed
of extracted features and the category labels are divided into
training set and testing set. In order to eliminate the inuence
of contingency in sample division, the training set is divided
into training set and validation set according to 10-fold cross
validation, and then the trained model is used for classica-
tion prediction on testing set. The main hyperparameters of
LightGBM are shown in table 2, which are determined by the
Bayesian optimization algorithm. The details of parameters of
Bayesian optimization are listed in table 3. The objective of
optimization, i.e. the output of the algorithm, is to maximize
f(δ) in equation (14).
5
Meas. Sci. Technol. 32 (2021) 015004 C Zhang et al
Table 2. The main hyperparameters of LightGBM.
Parameter Value
Objective Multiclass
Number of classes 10
Learning rate 0.545
Number of boosting iterations 827
L1 regularization 0.001
L2 regularization 0.268
Max number of leaves in one tree 5
Limit the max depth for tree model 4
The number of seeds used to generate other seeds 50
Table 3. Details of main hyperparameters setting of Bayesian
optimization.
Parameter Value
Prior Function Gaussian process regression
Acquisition Function Probability of Improvement
Random State 30
Init_Pointsa100
N_iterb100
aInit_points is the number of steps of random exploration needs be
performed.
bN_iter is the number of steps needs be performed of Bayesian optimization.
3.3. Discussion
3.3.1. Research on different proportions of training samples.
The number of training samples will affect the classication
accuracy. In order to illustrate the advantages of the proposed
feature extraction method and LightGBM classier, different
proportions of training and testing samples are set in this sec-
tion. The classication results and training time are shown in
gure 5. The classication accuracy of the training sets reaches
100% of all experiments. When the ratio of training to testing
is set as 9:1, the accuracy of testing set reaches 100%. How-
ever, considering that there is accidental inuence with small
testing samples, and when the ratio is set as 3:2, the trained
model performs well in both validation set and testing set, so
we take the results at this time as the nal classication res-
ult. There is a positive correlation between model training time
and the ratio, but due to the speed of LightGBM, the training
time uctuates smoothly, which ranges from 1.0 s to 3.1 s.
3.3.2. Comparison with different decomposition methods.
To highlight the advantages of FTMFD, four decomposition
methods, EMD, LMD, VMD and WPD, are used for com-
parison. According to the actual decomposition, the FEs of
the rst six components of the decomposition sub-signals of
EMD and the rst four components of LMD are calculated as
fault features. According to the reference [29], the IMFs of
VMD can be determined referring to EMD, which is selec-
ted as 6 in this paper. Therefore, the FEs of the rst six, four
and six components corresponding to EMD, LMD, and VMD
are calculated as features, respectively. In addition, the WBF
of WPD is selected based on the principle of the ratio of
maximum energy to Shannon entropy [30]. Considering the
Figure 5. Different proportions of training and testing samples.
Table 4. Time consumption on model training and feature
extraction of different decomposition methods.
Method Feature extraction time (s)/sample Training time (s)
FTMFD 1.22 2.76
EMD 0.01 3.18
LMD 0.21 2.29
VMD 1.28 2.53
WPD 1.59 2.78
Table 5. Details of main parameters of the t-SNE setting.
Parameter Value
Algorithm Exact
NumPCAComponents 10
Perplexity 40
NumDimensions 2
Standardize False
LearnRate 2000
characteristics of vibration signals of rotating machinery and
different WBFs, ‘db’ wavelet, ‘sym’ wavelet and ‘coif’ wave-
let are considered here, among which the ‘coif3is selected as
the optimal WBF by calculation. If the signal is decomposed
by the level-layer WPD, the frequency resolution can be calcu-
lated as df=fs/2level+1, where the sampling frequency is 6 kHz.
The more sub-bands divided, the more computation and fea-
ture redundancy will be increased. Therefore, the number of
decomposition layer is select as 5. That is to say, the number
of sub-signals of WPD is 32, which is the same as FTMFD.
The experimental results are shown in gure 6and table 4.
As can be seen from gure 6, except for EMD, the train-
ing accuracy of all the other decomposition methods reaches
100%. The classication accuracy of EMD, LMD and VMD
in the validation and testing set are all lower than FTMFD,
and the classication results of WPD and FTMFD are similar.
As can be seen from table 4, EMD consumes the least time in
6
Meas. Sci. Technol. 32 (2021) 015004 C Zhang et al
Table 6. Details of main parameters of the different settings of the
entropy-based methods.
Parameter SE PE DE SDE
Number of classes / / 10
Time delay factor 1 1 1 1
Embedding dimension 2 3 3 3
Similarity tolerance 0.15STDa/ /
Symbol interval number / / / 8
aSTD is the standard deviation of a signal.
Table 7. Time consumption of model training and feature extraction
of different entropy-based methods.
Entropy Feature extraction time (s)/sample Training time (s)
FE 1.22 2.76
SE 0.85 2.80
PE 0.15 4.94
DE 3.73 3.57
SDE 0.19 3.92
feature extraction, and FTMFD is faster than WPD when the
number of decomposed sub-signals are the same.
t-SNE is a common method used in data simplication and
feature visualization [50]. Through the visualization of feature
samples by t-SNE, the advantages and disadvantages of differ-
ent methods in feature extraction can be more intuitively seen.
The main parameters of t-SNE are listed in table 5. The results
are shown in gure 7, where it is not difcult to see that the
features extracted by FTMFD and WPD are easier to distin-
guish than EMD, LMD and VMD.
3.3.3. Comparison with different entropy-based methods.
The purpose of this section is to discuss the advantages
and disadvantages of the ability to extract fault features and
time consumption of different entropy-based methods. In this
part, SEs, PEs, DEs and SDEs of sub-signals decomposed
by FTMFD are calculated respectively, and their classica-
tion results are compared with the proposed approach using
FE. The parameter settings of all entropy-based methods are
shown in table 6.
The experimental results are shown in gure 8and table 7.
The training accuracy of all the methods reaches 100%. The
classication accuracy of FTMFD-PE in the validation and
testing set are the lowest, at only 87.14% and 85.42%. The
testing accuracy of FTMFD-SE is second to FTMFD-FE. It
can be seen from table 7that FTMFD-PE is the fastest in
the amount of time of feature extraction, while the slowest in
model training time. Therefore, among these entropy-based
methods, although the feature extraction time of FE is rel-
atively long, its classication accuracy is the highest and its
model training time is the shortest.
3.3.4. Comparison with different classication algorithms.
To illustrate that the feature extraction method proposed in this
paper has satisfactory fault diagnosis capability in combina-
tion with different classiers, SVM, RF, SAE, XGBoost and
Table 8. Time consumption of model training of different
classication algorithms.
Classier Training time (s) Classier Training time (s)
LightGBM 2.76 XGBoost 5.91
SVM 0.51 CatBoost 36.64
RF 47.01 SAE 178.82
CatBoost are used in this section to compare with LightGBM.
The main hyperparameters of these classication algorithms
are still determined by the Bayesian optimization algorithm.
The features extracted by FTMFD-FE are input into these dif-
ferent classiers, and the classication results are shown in g-
ure 9. As can be seen in gure 9, except for SVM and SAE, the
training accuracy of other classiers reaches 100%. The aver-
age accuracy on the validation set of RF is highest (98.75%).
The testing accuracy of LightGBM is highest, and the per-
formance of CatBoost is second to it. The time consumption
on model training of different classiers is shown in table 8,
among which the fastest is SVM, which is only 0.51 s. Cat-
Boost and SAE are slower, and SAE consumes the longest
time (178.82 s). Therefore, if the hyperparameters are selec-
ted suitably, there is little difference in the classication results
with different classiers, while the time consumption of model
training varies greatly, and an appropriate classier should be
selected according to the actual needs. The experiment res-
ults show that fault features extracted by the proposed method
are easy to be classied and recognized, and the classication
advantages of LightGBM in the fault diagnosis task of rotating
machinery are reected.
3.3.5. Comparison with different feature selection methods.
The number of features will affect the model training time
and classication accuracy of classiers. In order to illus-
trate the advantages of the proposed feature selection method,
ReliefF [51], JMI and mRMR are compared in this section.
The rst-round candidate features are selected by the above
methods. Then LightGBM is used to rank these features; the
curve between the number of features, model training time and
classication accuracy can be obtained according to the rank-
ing results.
The experimental results are shown in gures 10(a,b). It
can be seen from gure 10(a) that the training accuracy of
JMIM-LightGBM is the highest when two features are used
(99.79%). As can be seen from gure 10(b), when the num-
ber of features is small, it is correlated with the model train-
ing time; the reason may be that having fewer features is not
conducive to model building. When the number of features is
greater than 10, the relationship becomes positive, which is
consistent with engineering experience. The proposed method
is obviously superior to other methods when fewer features
are used: when 3 features are used, the classication accuracy
is more than 90%, while the other methods require 4 or more
features; when 12 features are used, the classication accur-
acy reaches the maximum (99.22%), and the model training
time is only 1.75 s. Therefore, by using the feature selection
7
Meas. Sci. Technol. 32 (2021) 015004 C Zhang et al
Figure 6. Classication results of different decomposition methods.
Figure 7. Feature visualization of different decomposition methods using t-SNE. (a) Combination of FTMFD and FE; (b) combination of
EMD and FE; (c) combination of LMD and FE; (d) combination of VMD and FE; (e) combination of WPD and FE.
method, fewer features can be selected in order to achieve a
better classication result and a shorter model training time.
4. Case II: KAT bearing dataset verification
4.1. Data description and experimental setup
The KAT bearing damage dataset was provided by the KAT
data center at Paderborn University [52]. The hardware con-
guration and settings of the experimental platform are shown
in [52]. There are 15 datasets, which can be categorized
into three classications as shown in table 9. The K0-series
(K001–K005) represent the healthy condition, the KA-series
(KA04, KA15, KA16, KA22, KA30) represent the outer bear-
ing ring with damage and the KI-series (KI04, KI14, KI16,
KI18, KI21) represent the inner bearing ring with damage. The
experiments are conducted with four different operating para-
meters, and the details are shown in table 10. Each experiment
is repeated 20 times, and the sampling frequency is 64 kHz. It
should be noted that the damage of the datasets is real damage
caused by accelerated lifetime test [53].
The details of the datasets used for experimental verica-
tion are shown in table 11. Datasets D1 to D4 correspond to
different fault types under the same working condition, and
8
Meas. Sci. Technol. 32 (2021) 015004 C Zhang et al
Figure 8. Classication results of different entropy-based methods.
Figure 9. Classication results of different classication algorithms.
(a) (b)
Figure 10. Comparison of different feature selection methods. (a) Classication results of the training set; (b) classication results of the
testing set and model training time.
dataset D5 contains all four working conditions. Each sample
contains 2560 non-overlapping data points, with a total of
1200 samples (100 samples for each fault type in each working
condition).
4.2. Feature visualization
The parameters of FTMFD and FE are set referring to sec-
tion 3.2; the extracted features by FTMFD-FE are compressed
9
Meas. Sci. Technol. 32 (2021) 015004 C Zhang et al
Table 9. Categorization of datasets.
Healthy (Class 1)
Outer ring damage
(Class 2)
Inner ring damage
(Class 3)
K001 KA04 KI04
K002 KA15 KI14
K003 KA16 KI16
K004 KA22 KI18
K005 KA30 KI21
Table 10. Four operation parameters.
No.
Rotational
speed (rpm)
Load torque
(Nm)
Radial force
(N)
Condition
number
0 1500 0.7 1000 C0
1 900 0.7 1000 C1
2 1500 0.1 1000 C2
3 1500 0.7 400 C3
Table 11. Details of the datasets of different working conditions.
Dataset label Working condition Number of samples
D1 C0 300
D2 C1 300
D3 C2 300
D4 C3 300
D5 C0, C1, C2, C3 1200
Figure 11. T-SNE visualization of features extracted by
FTMFD-FE.
into two dimensions by t-SNE. The main parameters of t-SNE
are still those in table 5in section 3.3.2 and the results are
shown in gure 11. It can be seen that the three fault types are
quite distinct, which indicates that the proposed feature extrac-
tion method can effectively extract fault features of the rolling
bearing.
Table 12. The main hyperparameters of LightGBM.
Parameter Value
Objective Multiclass
Number of classes 3
Learning rate 0.071
Number of boosting iterations 816
L1 regularization 0.957
L2 regularization 0.583
Max number of leaves in one tree 4
Limit the max depth for tree model 2
The number of seeds used to generate other seeds 50
4.3. Discussion
For the data after feature extraction, the ratio of training to test-
ing samples is still set as 3:2. The main hyperparameters of
LightGBM are still optimized by the Bayesian optimization
algorithm, and the results are shown in table 12. The classi-
cation results when all 32 features are used are shown in g-
ure 12. The training accuracy of all the datasets reaches 100%.
For the single working condition datasets (D1 to D4), except
for D3, the testing accuracy is 97.78%; the other datasets reach
100%, and the testing accuracy of D5 is 99.72% under the most
complex working conditions. In addition, in reference [53],
the prediction accuracy of the negative correlation ensemble
transfer learning method (NCTE) on the KAT bearing dataset
is 98.73%, which is slightly lower than the accuracy achieved
by the method proposed in this paper.
4.3.1. Comparison with different feature selection methods.
According to the experiments in section 3.3.5, JMIM is still
compared with ReliefF, JMI and mRMR, and the results are
shown in gure 13. As can be seen from gure 13(a), the train-
ing accuracy of JMIM-LightGBM reaches 100% when two
features are used, while the other methods need more features.
As can be seen from gure 13(b), when only one feature is
selected, the testing accuracy of JMIM-LightGBM is relat-
ively low, only 77.78%, while mRMR-LightGBM is highest
(89.17%). However, when 2 features are selected, the test-
ing accuracy of the proposed method reaches 97.50%, and it
reaches maximum (99.72%) when 14 features are used; mean-
while the model training time is only 0.36 s. The results further
indicate the effectiveness of the feature extraction method.
4.3.2. Comparison with different classication algorithms.
In this section, LightGBM is compared with other classiers
including SVM, RF and CatBoost. JMIM is used to select can-
didate features, and the classication results are shown in g-
ure 14. As can be seen from gure 14(a), the training accuracy
of RF is highest when one feature is used, which is 99.52%,
while the accuracy of LightGBM reaches 100% when two or
more features are used. According to the classication accur-
acy curve in gure 14(b), when one features is used, the test-
ing accuracy of SVM is highest (91.11%). When the number
of features is small, there is little difference in the classica-
tion accuracy of all four classiers. But when the number of
10
Meas. Sci. Technol. 32 (2021) 015004 C Zhang et al
Figure 12. Classication accuracy on datasets of different working conditions.
(a) (b)
Figure 13. Comparison of different feature selection methods. (a) Classication results of the training set; (b) classication results of the
testing set and model training time.
(a) (b)
Figure 14. Comparison of different classiers. (a) Classication results of training set; (b) classication results of testing set and model
training time.
11
Meas. Sci. Technol. 32 (2021) 015004 C Zhang et al
features is greater than seven, the classication accuracy of
LightGBM is slightly higher than other classiers. In terms
of model training time, LightGBM is only second to SVM,
while CatBoost consumes the longest time. Experimental res-
ults further illustrate the speed and superiority of LightGBM
classier.
5. Conclusion
In this paper, a new fault diagnosis approach for the key com-
ponents of rotating machinery based on FTMFD-FE, JMIM
and LightGBM is proposed. For non-linear and non-stationary
mechanical vibration signals, the combination of FTMFD and
FE for monitoring signal pretreatment and feature extraction
can effectively extract hidden mechanical fault features. While
retaining the advantages of information entropy, it overcomes
the problem that traditional multiscale analysis cannot effect-
ively extract high-frequency information and helps improve
classication accuracy. On this basis, fault feature selection
based on JMIM and LightGBM is used to effectively reduce
redundant features and simplify classier model construction,
and thus the model training time can be reduced. Finally,
the effectiveness of the proposed approach is experimentally
veried on the MFS dataset and the KAT bearing dataset
by comparative experiments of signal decomposition, feature
extraction and feature selection, respectively. The experi-
mental results show that the proposed approach can effect-
ively identify the fault states for the key components of rotat-
ing machinery. Moreover, the effectiveness of the proposed
approach under multiple working conditions is also veried
on the KAT bearing dataset, and the classication accuracy
reaches 99.72%.
The actual working environment for the key components of
rotating machinery is more complex and changeable, so future
research will focus on feature extraction and classication
of vibration signals under more complex working conditions.
In addition, except for the combinination with information
entropy, the combination of FTMFD with some dimension-
less time-domain indexes such as kurtosis for feature extrac-
tion can also be researched in future work.
Acknowledgments
The work here is supported by the National Key Research
and Development Program of China (No. 2018YFB2003303),
the Fundamental Research Funds for the Central Universities
(No. 2019kfyXJJS137), the research fund (No. 61400020401),
and the Nondestructive Detection and Monitoring Techno-
logy for High Speed Transportation Facilities, Key Laborat-
ory of Ministry of Industry and Information Technology (Nos.
KL2019W003 and KL2019W004).
ORCID iDs
Changhe Zhang https://orcid.org/0000-0001-7046-9240
Qi Xu https://orcid.org/0000-0002-9795-1616
Kaibo Zhou https://orcid.org/0000-0003-0055-3193
Hao Pan https://orcid.org/0000-0001-9324-0545
References
[1] Liu R, Yang B, Zio E and Chen X 2018 Articial intelligence
for fault diagnosis of rotating machinery: a review Mech.
Syst. Signal Process. 108 33–47
[2] Liu J, Hu Y, Wang Y, Wu B, Fan J and Hu Z 2018 An
integrated multi-sensor fusion-based deep feature learning
approach for rotating machinery diagnosis Meas. Sci.
Technol. 29 055103
[3] Lei Y, Lin J, He Z and Zuo M J 2013 A review on empirical
mode decomposition in fault diagnosis of rotating
machinery Mech. Syst. Signal Process. 35 108–26
[4] Wei Y et al 2019 A review of early fault diagnosis approaches
and their applications in rotating machinery Entropy
21 409
[5] Li Y, Li G, Yang Y, Liang X and Xu M 2018 A fault diagnosis
scheme for planetary gearboxes using adaptive multi-scale
morphology lter and modied hierarchical permutation
entropy Mech. Syst. Signal Process. 105 319–37
[6] Li Y, Xu M, Wang R and Huang W 2016 A fault diagnosis
scheme for rolling bearing based on local mean
decomposition and improved multiscale fuzzy entropy J.
Sound Vib. 360 277–99
[7] Li Y, Yang Y, Li G, Xu M and Huang W 2017 A fault
diagnosis scheme for planetary gearboxes using modied
multi-scale symbolic dynamic entropy and mRMR feature
selection Mech. Syst. Signal Process. 91 295–312
[8] Gao Y and Yu D 2020 Total variation on horizontal visibility
graph and its application to rolling bearing fault diagnosis
Mech. Mach. Theory 147 103768
[9] Wang L and Shao Y 2020 Fault feature extraction of rotating
machinery using a reweighted complete ensemble empirical
mode decomposition with adaptive noise and demodulation
analysis Mech. Syst. Signal Process. 138 106545
[10] Medina R, Macancela J C, Lucero P, Cabrera D, Cerrada M,
S´
anchez R-V and V´
asquez R E 2019 Vibration signal
analysis using symbolic dynamics for gearbox fault
diagnosis Int. J. Adv. Manuf. Technol. 104 2195–214
[11] Wen X et al 2020 Graph modeling of singular values for early
fault detection and diagnosis of rolling element bearings
Mech. Syst. Signal Process. 145 106956
[12] Liu Q, Pan H, Zheng J, Tong J and Bao J 2019 Composite
interpolation-based multiscale fuzzy entropy and its
application to fault diagnosis of rolling bearing Entropy
21 292
[13] Yan X and Jia M 2019 Intelligent fault diagnosis of rotating
machinery using improved multiscale dispersion entropy
and mRMR feature selection Knowl. Based Syst.
163 450–71
[14] Chen L and Wan S 2020 Mechanical fault diagnosis of
high-voltage circuit breakers using multi-segment
permutation entropy and a density-weighted one-class
extreme learning machine Meas. Sci. Technol.
31 85107
[15] Pincus S 1995 Approximate entropy (ApEn) as a complexity
measure Chaos 5110–7
[16] Richman J S and Moorman J R 2000 Physiological time-series
analysis using approximate entropy and sample entropy Am.
J. Physiol. Heart Circ. Physiol. 278 H2039–49
[17] Chen W, Zhuang J, Yu W and Wang Z 2009 Measuring
complexity using fuzzyen, apen, and sampen Med. Eng.
Phys. 31 61–68
[18] Bandt C and Pompe B 2002 Permutation entropy: a natural
complexity measure for time series Phys. Rev. Lett.
88 174102
12
Meas. Sci. Technol. 32 (2021) 015004 C Zhang et al
[19] Rostaghi M and Azami H 2016 Dispersion entropy: A measure
for time-series analysis IEEE Signal Process. Lett. 23 610–4
[20] Li Y, Wang X, Liu Z, Liang X and Si S 2018 The entropy
algorithm and its variants in the fault diagnosis of rotating
machinery: A review IEEE Access 666723–41
[21] Costa M, Goldberger A L and Peng C K 2005 Multiscale
entropy analysis of biological signals Phys. Rev. E
71 021906
[22] Wu S D, Wu C W, Lin S G, Lee K-Y and Peng C-K 2014
Analysis of complex time series using rened composite
multi-scale entropy Phys. Lett. A378 1369–74
[23] Wu S D, Wu C W, Lee K Y and Lin S-G 2013 Modied
multiscale entropy for short-term time series analysis
Physica A392 5865–73
[24] Huang N E, Shen Z, Long S R, Wu M C, Shih H H, Zheng Q,
Yen N-C, Tung C C and Liu H H 1998 The empirical mode
decomposition and the Hilbert spectrum for nonlinear and
non-stationary time series analysis Proc. R. Soc. Lond. A
454 903–95
[25] Smith J S 2005 The local mean decomposition and its
application to EEG perception data J. R. Soc. Interface
2443–54
[26] Dragomiretskiy K and Zosso D 2013 Variational mode
decomposition IEEE Trans. Signal Process.
62 531–44
[27] Wang Y, He Z and Zi Y 2010 A comparative study on the local
mean decomposition and empirical mode decomposition
and their applications to rotating machinery health
diagnosis J. Vib. Acoust. 132 2
[28] Liu W Y, Zhang W H, Han J G and Wang G F 2012 A new
wind turbine fault diagnosis method based on the local
mean decomposition Renew. Energy 48 411–5
[29] Li F, Li R, Tian L, Chen L and Liu J 2019 Data-driven
time-frequency analysis method based on variational mode
decomposition and its application to gear fault diagnosis in
variable working conditions Mech. Syst. Signal Process.
116 462–79
[30] Kankar P K, Sharma S C and Harsha S P 2013 Fault
diagnosis of rolling element bearing using cyclic
autocorrelation and wavelet transform Neurocomputing
110 9–17
[31] Eren L and Devaney M J 2004 Bearing damage detection via
wavelet packet decomposition of the stator current IEEE
Trans. Instrum. Meas.
53 431–6
[32] Pan H, Zhou K B and Liu J 2019 A fault diagnosis method for
rolling bearings based on Fourier transform multi-lter
decomposition and permutation entropy Proc. 13th
National Conf. Vibration Theory and Application (Chinese
Society of Vibration Engineering) pp 259–64 (in Chinese)
[33] Rauber T W, de Assis Boldt F and Varej˜
ao F M 2014
Heterogeneous feature models and feature selection applied
to bearing fault diagnosis IEEE Trans. Ind. Electron.
62 637–46
[34] Hu Q, Si X S, Zhang Q H and Qin A-S 2020 A rotating
machinery fault diagnosis method based on multi-scale
dimensionless indicators and random forests Mech. Syst.
Signal Process. 139 106609
[35] Peng H, Long F and Ding C 2005 Feature selection based on
mutual information criteria of max-dependency,
max-relevance, and min-redundancy IEEE Trans. Pattern
Anal. Mach. Intell. 27 1226–38
[36] Yang H and Moody J 1999 Feature selection based on joint
mutual information Proc. Int. ICSC Symp. Advances in
Intelligent Data Analysis pp 22–25
[37] Bennasar M, Hicks Y and Setchi R 2015 Feature selection
using joint mutual information maximisation Expert Syst.
Appl. 42 8520–32
[38] Fu W et al 2020 Fault diagnosis for rolling bearings based on
composite multiscale ne-sorted dispersion entropy and
SVM with hybrid mutation SCA-HHO algorithm
optimization IEEE Access 813086–104
[39] Zabalza J, Ren J, Zheng J, Zhao H, Qing C, Yang Z, Du P and
Marshall S 2016 Novel segmented stacked autoencoder for
effective dimensionality reduction and feature extraction in
hyperspectral imaging Neurocomputing 185 1–10
[40] Zhou Q, Li Y, Tian Y and Jiang L 2020 A novel method based
on nonlinear auto-regression neural network and
convolutional neural network for imbalanced fault diagnosis
of rotating machinery Measurement 161 107880
[41] Zhou K B, Zhang Z X, Liu J, Hu Z-X, Duan X-K and Xu Q
2018 Anode effect prediction based on a singular value
thresholding and extreme gradient boosting approach Meas.
Sci. Technol. 30 015104
[42] Ke G et al 2017 Lightgbm: a highly efcient gradient boosting
decision tree Adv. Neural Inf. Process. Syst. 30 3146–54
[43] Prokhorenkova L et al 2018 CatBoost: unbiased boosting with
categorical features Adv. Neural Inf. Process. Syst. pp
6638–48
[44] Sun X, Liu M and Sima Z 2018 A novel cryptocurrency price
trend forecasting model based on LightGBM Finance Res.
Lett. 32 101084
[45] Zhang J, Wen H and Tang L 2019 Improved smoothing
frequency shifting and ltering algorithm for harmonic
analysis with systematic error compensation IEEE Trans.
Ind. Electron. 66 9500–9
[46] Snoek J, Larochelle H and Adams R P 2012 Practical bayesian
optimization of machine learning algorithms Adv. Neural
Inf. Process. Syst. pp 2951–9
[47] Bergstra J and Bengio Y 2012 Random search for
hyper-parameter optimization J. Mach. Learn. Res.
13 281–305
[48] Shan Y, Zhou J, Jiang W, Liu J, Xu Y and Zhao Y 2019 A fault
diagnosis method for rotating machinery based on improved
variational mode decomposition and a hybrid articial
sheep algorithm Meas. Sci. Technol. 30 055002
[49] Ge M et al 2020 A deep condition feature learning approach
for rotating machinery based on MMSDE and optimized
SAEs Meas. Sci. Technol. (accepted) (https://doi.org.
10.1088/1361-6501/ab89e3)
[50] Maaten L and Hinton G 2008 Visualizing data using t-SNE J.
Mach. Learn. Res. 92579–605
[51] Robnik-Šikonja M and Kononenko I 2003 Theoretical and
empirical analysis of ReliefF and RReliefF Mach. Learn.
53 23–69
[52] Lessmeier C et al 2016 Condition monitoring of bearing
damage in electromechanical drive systems by using motor
current signals of electric motors: a benchmark data set for
data-driven classication Proc. Eur. Conf. Prognostics and
Health Management Society pp 05–08
[53] Wen L, Gao L, Dong Y and Zhu Z 2019 A negative correlation
ensemble transfer learning method for fault diagnosis based
on convolutional neural network Math. Biosci. Eng.
16 3311–30
13
... Moreover, LightGBM employs depth-limited leaf growth, prioritizing the discovery of leaf nodes with the highest split gain over unnecessary split points. This approach contrasts with conventional level-wise decision trees, which may be less effective in certain classification tasks [41]. Consider the training sample set T of the LightGBM algorithm to be {( , ) ( , ) … ( , )} and the prediction output . ...
... Moreover, LightGBM employs depthlimited leaf growth, prioritizing the discovery of leaf nodes with the highest split gain over unnecessary split points. This approach contrasts with conventional level-wise decision trees, which may be less effective in certain classification tasks [41]. ...
Article
Full-text available
This study introduces an innovative approach to diagnostics, employing a unique combination of techniques including a stratified group K-fold cross-validation method and a sparse stacked autoencoder (SSAE) alongside LightGBM. By examining signatures derived from motor current, voltage, speed, and torque, the framework aims to effectively detect and classify broken rotor bars (BRBs) within inverter-fed induction machines. In this kind of cross-validation method, class labels and grouping factors are spread out across folds by distributing motor operational data attributes equally over target label stratification and extra grouping information. By integrating SSAE and LightGBM, a gradient-boosting framework, we elevate the precision and efficacy of defect diagnosis. The SSAE feature extraction algorithm proves to be particularly effective in identifying small BRB signatures within motor operational data. Our approach relies on comprehensive datasets collected from motor systems operating under diverse loading conditions, ranging from 0% to 100%. Using a sparse stacked autoencoder, the model lowers the dimensionality and noise of the motor fault data. It then sends the cleaned data to the LightGBM network for fault diagnosis. LightGBM leverages the attributes of the sparse stacked autoencoder to showcase the distinctive qualities associated with BRBs. This integration offers the potential to improve defect identification by furnishing input representations that are both more precise and more concise. The proposed model (SSAE with LightGBM) was trained using 80% of the data, while the remaining 20% was used for testing. To validate the proposed architecture, we evaluate the accuracy, precision, recall, and F1-scores of the results using motor global signals, with the help of confusion matrices with receiver operating characteristic (ROC) curves. Following the training of a new LightGBM model with refined hyperparameters through Bayesian optimization, we proceed to conduct the final classification utilizing the optimal feature subset. Evaluation of the test dataset indicates that the BRBs diagnostic framework facilitates the detection and classification of issues with induction motor BRBs, achieving accuracy rates of up to 99% across all loading conditions.
... Machine learning algorithms that presume a somewhat well-balanced distribution will have a severe problem when the number of data across distinct classes is highly skewed [10,11]. Due to its powerful ability to establish a dense representation of the feature space, which makes it effective in learning high-order features from the raw data [12,13], gradient boosting decision tree might be a good candidate to discover the highly intricate relationship between characteristics and incident causes when dealing with categorical data [14,15]. Since the challenges in ASRS above, an improved cross-validation (CV) method to optimize the classification model with light gradient boosting machine (LGBM) is adopted in this paper. ...
Article
Aviation accidents are currently one of the leading causes of significant injuries and deaths worldwide. This entices researchers to investigate aircraft safety using data analysis approaches based on an advanced machine learning algorithm. To assess aviation safety and identify the causes of incidents, a classification model with light gradient boosting machine (LGBM) based on the aviation safety reporting system (ASRS) has been developed. It is improved by k-fold cross-validation with hybrid sampling model (HSCV), which may boost classification performance and maintain data balance. The results show that employing the LGBM-HSCV model can significantly improve accuracy while alleviating data imbalance. Vertical comparison with other cross-validation (CV) methods and lateral comparison with different fold times comprise the comparative approach. Aside from the comparison, two further CV approaches based on the improved method in this study are discussed: one with a different sampling and folding order, and the other with more CV. According to the assessment indices with different methods, the LGBM-HSCV model proposed here is effective at detecting incident causes. The improved model for imbalanced data categorization proposed may serve as a point of reference for similar data processing, and the model's accurate identification of civil aviation incident causes can assist to improve civil aviation safety.
... Some works have included combining Adaboost and EMD [53,54], while in [55], researchers used the DT classifier followed by Adaboost to compare it with SVM to achieve 96% and 92% maximum testing accuracy, respectively. One more classifier is the Light Gradient-Boosting Machine (LightGBM), which was implemented in bearing fault detection in [56][57][58] and for the same contribution but supported with CNN in [59][60][61][62]. ...
Article
Full-text available
This research introduces a groundbreaking method for bearing defect detection. It leverages ensemble machine learning (ML) models and conducts comprehensive feature importance analysis. The key innovation is the training and benchmarking of three tree ensemble models—Decision Tree (DT), Random Forest (RF), and Extreme Gradient Boosting (XGBoost)—on an extensive experimental dataset (QU-DMBF) collected from bearing tests with seeded defects of varying sizes on the inner and outer raceways under different operating conditions. The dataset was meticulously prepared with categorical variable encoding and Min–Max data normalization to ensure consistent class distribution and model accuracy. Implementing the ML models involved a grid search method for hyperparameter tuning, focusing on reporting the models’ accuracy. The study also explores applying ensemble methods and using supervised and unsupervised learning algorithms for bearing fault detection. It underscores the value of feature importance analysis in understanding the contributions of specific inputs to the model’s performance. The research compares the ML models to traditional methods and discusses their potential for advanced fault diagnosis in bearing systems. The XGBoost model, trained on data from actual bearing tests, outperformed the others, achieving 92% accuracy in detecting bearing health and fault location. However, a deeper analysis of feature importance reveals that the models weigh certain experimental conditions differently—such as sensor location and motor speed. This research’s primary novelties and contributions are comparative evaluation, experimental validation, accuracy benchmarking, and interpretable feature importance analysis. This comprehensive methodology advances the bearing health monitoring field and has significant practical implications for condition-based maintenance, potentially leading to substantial cost savings and improved operational efficiency.
... Rotating machinery has been widely employed in various fields, such as aerospace, wind turbine, and high-speed rail transportation [1,2]. Equipment in these fields typically operates under harsh conditions characterized by high speed, heavy load, and changing operating parameters [3], which may cause abnormal situations or even catastrophic failures [4]. Hence, it is crucial to timely detect the potential anomalies within rotating machinery and provide an early warning system [5]. ...
Article
Full-text available
Health condition assessment of rotating machinery has been a persistent challenge. Traditional condition assessment methods often rely on single features, limiting their application to comprehensively measure the health condition of rotating machinery. This study introduced a quantitative condition assessment method for rotating machinery using fuzzy neural network (FNN). Initially, multi-domain features of signals from rotating machinery are extracted to achieve comprehensive representation of signals in the feature space. To eliminate redundant information of various features, a feature dimensionality reduction method is explored based on variance variation and stacked auto-encoder. Afterward, a normalized health indicator is constructed by integrating the optimized features through FNN, and it can indicate the current conditions of rotating machinery. Furthermore, an early anomaly alarm strategy based on 3σ criterion is designed for rotating machinery. The abnormal signal will be recognized automatically when it exceeds the predetermined thresholds. Last, the effectiveness of the proposed method is validated on IMS bearing dataset and XJTU-SY bearing dataset. The results show that the proposed method can effectively obtain the quantitative indicators that reflect the operation conditions of rotating machinery and can accurately detect the early abnormal signals.
... Through a designed bandpass filter, periodic signal information can be provided by the frequency domain, which is also able to extract background noise and fault signals and improve the signalto-noise ratio [9,10]. Generally, the features that can be extracted from each frequency segment include power spectral density (PSD), standard deviation, peak value, and other statistical features. ...
Article
Full-text available
Stirred reactors are key equipment in production, and unpredictable failures will result in significant economic losses and safety issues. Therefore, it is necessary to monitor its health state. To achieve this goal, in this study, five states of the stirred reactor were firstly preset: normal, shaft bending, blade eccentricity, bearing wear, and bolt looseness. Vibration signals along x, y and z axes were collected and analyzed in both the time domain and frequency domain. Secondly, 93 statistical features were extracted and evaluated by ReliefF, Maximal Information Coefficient (MIC) and XGBoost. The above evaluation results were then fused by D-S evidence theory to extract the final 16 features that are most relevant to the state of the stirred reactor. Finally, the CatBoost algorithm was introduced to establish the stirred reactor health monitoring model. The validation results showed that the model achieves 100% accuracy in detecting the fault/normal state of the stirred reactor and 98% accuracy in diagnosing the type of fault.
... Xie et al used the eXtreme Gradient Boosting (XGBoost) algorithm to establish an intelligent bearing health condition evaluation, which realised the high accuracy of bearing health evaluation [32]. When juxtaposed with single classifiers, ensemble learning models, such as the Light Gradient Boosting Machine (LightGBM), exhibit the distinct advantages of heightened prediction precision, formidable generalisation capabilities and rapid convergence [33]. Liu et al established a LightGBM-based fault diagnosis model for rotating machinery. ...
Article
Full-text available
The accurate health condition evaluation of the functional components in computer numerical control (CNC) machine tools is an important prerequisite for predictive maintenance and fault warning. The vibration signals of the functional components in CNC machine tools often contain substantial noise, impeding the extraction of relevant health condition information from the vibration signals. This work presents an approach that leverages the variational mode decomposition (VMD) enhanced by the Artificial Hummingbird Algorithm (AHA) alongside the Light Gradient Boosting Machine (LightGBM) optimised through particle swarm optimisation (PSO) to evaluate the health condition of the functional components in CNC machine tools amidst pervasive noise. Initially, the AHA optimised the penalty factor (α) and the decomposition layer (K) within the VMD. This optimised VMD was subsequently applied to denoise the original vibration signals. After this denoising process, PSO was employed to optimise the learning rate and maximum tree depth within LightGBM. Health condition evaluation experiments were executed on the feed system and spindle of the CNC machine tool to validate the proposed methodology. Comparative analysis indicates that the proposed method attains paramount accuracy and computational efficiency, which are crucial for accurately evaluating the health condition of the functional components in CNC machine tools.
... Frequency domain analysis can provide periodic signal information, extract background noise and fault signals, and improve the signal-to-noise ratio through a designed band-pass filter [7] [8]. Generally, the features that can be extracted from each frequency segment include power spectral density (PSD), standard deviation, peak value and other statistical features. ...
Preprint
Full-text available
Stirred reactor is a key equipment in the production process, and will result in large economic losses and safety issues when unpredictable failures occur. Therefore, it is necessary to monitor their health state. With this goal, firstly, this study presets five states of the stirred reactor: normal, shaft bending, blade eccentricity, bearing wear, and bolt looseness. x, y, z axes vibration signals are collected and analyzed in time and frequency domain. Secondly, 93 statistical features are extracted evaluated by Relieff, MIC and XGBoost. The above evaluation results are then fused by D-S evidence theory to obtain the final 16 features that are most relevant to the state of the stirred reactor. Finally, CatBoost algorithm is introduced to establish the health state monitoring model of the stirred reactor.The validation results show that accuracy of the proposed model is 100% for state recognition and 98% for fault diagnosis.
... Fuzzy entropy was used to reflect a change of complexity of the intrinsic oscillation [16]. Zhang et al proposed a multi-method fusion fault detection approach for rotating machinery [17]. In this work, fuzzy entropy was used to calculate and extract fault information from processed signals. ...
Article
Full-text available
Complexity measures typically represented by entropy are capable of detecting and characterizing underlying dynamic changes in a system and they have been considerably studied for machine condition monitoring and fault diagnosis. Various entropies have been developed based on Shannon entropy to meet actual demands. Nevertheless, currently existing research works about complexity measures mainly focus on experimental studies, and their theoretical studies are still going on and not fully explored. In previous studies, it was theoretically and experimentally proved that two complexity measures including correlation dimension and approximate entropy have a ‘‘bilateral reduction” effect. Since sample entropy and fuzzy entropy are two more advanced complexity measures that were developed based on the concept of correlation dimension and approximate entropy, this paper continues conducting theoretical and experimental investigations on sample entropy and fuzzy entropy and exploring their theoretical properties to enrich the domain of complexity measure analysis and its applications to machine condition monitoring. Specifically, this paper theoretically proves and verifies that sample entropy and fuzzy entropy still have a similar “bilateral reduction” effect with correlation dimension and approximate entropy, and they are indeed complexity measures. The relationships between sample entropy, fuzzy entropy, and their key parameters during their calculation are numerically and experimentally studied. Bearing and gear run-to-failure datasets are used to investigate the effectiveness of sample entropy and fuzzy entropy for bearing and gear condition monitoring, and experimental results of sample entropy and fuzzy entropy are well-matched with the theoretical “bilateral reduction” effect of sample entropy and fuzzy entropy. Overall, this paper will provide a guideline for correct uses of sample entropy and fuzzy entropy for engineering applications, especially for machine condition monitoring.
Article
Aiming at the strong vibration condition of the inner curve radial plunger hydraulic motor, the bolt connection of the motor base is easy to loosen, and it is difficult to distinguish the early loosening faults online in time by using the vibration signals. In this article, we propose a bolt loosening fault diagnosis method based on LightGBM to recognize sound signal features, and this method can achieve online monitoring of bolt loosening faults. Through the vibration energy recovery test platform, the sound signals of four different bolt preload forces during the normal operation of the equipment were collected, and the bolt preload force was increased from completely loosened 0–60 $\text{N}\cdot \text{m}$ , with an increment of 20 $\text{N}\cdot \text{m}$ each time, and the sound signals were denoised using wavelet threshold denoising method. The LightGBM bolt loosening fault diagnosis model is constructed based on the gradient of one-sided sampling and mutually exclusive feature bundle algorithms. By extracting the time- and frequency-domain features of the denoised sound signals, a dataset containing labels of normal and three faulty signals is generated for training and diagnosis. Finally, the diagnostic accuracy of this method is compared and verified. The results show that the LightGBM algorithm after wavelet threshold denoising improves the diagnostic accuracy by 2.17% over the no-denoising LightGBM and by 5.47% and 2.21% over the XGboost algorithm before and after denoising, respectively.
Article
Full-text available
The failure of the rotating machinery affects the quality of the product and the entire production process. However, it usually suffers the following deficiency that the hyperparameters of the fault diagnosis model require constant debugging. This paper proposes a deep condition feature learning approach for rotating machinery based on modified multi-scale symbolic dynamic entropy (MMSDE) and optimized stacked auto-encoders (SAEs). Firstly, MMSDE have been used to extract fault characteristics of the original vibration signal, due to that such methods do not rely on the prior knowledge and experience. MMSDE conducts multi-scale analysis on the original vibration signal and calculates the entropy value of the multi-scale signal. The multiscale fault characteristics are obtained. Then, Bayesian optimization-based SAEs is applied to select feature samples and classify fault status in mechanical fault diagnosis without debugging. The effectiveness of the proposed method is verified by the open source data and experimental data. Multiple working conditions are also considered and investigated.
Article
Full-text available
Condition monitoring for high-voltage circuit breakers (HVCBs) is of great significance for the safety of power grids. Based on machine-learning methods, most relevant studies have contributed significantly to improving the classification accuracy of known states. However, these studies have neglected the detection of unknown faults. In this study, a new one-class classifier, called a density-weighted one-class extreme learning machine (DW-OCELM), was proposed to detect unknown faults of HVCBs. The DW-OCELM determines the classification boundary considering data distribution by introducing the notion of density weight, such that samples located in low-density regions are more likely to be separated, improving detection performance. On this basis, a multi-class classifier was developed based on the homogeneous combination of multiple DW-OCELMs to classify known states. In addition, the proposed classifiers were trained based on multi-segment permutation entropy calculated from vibration signals. Experiments on a 35 kV HVCB demonstrated that the proposed methods outperformed other state-of-the-art techniques.
Article
Full-text available
The health condition of rolling bearing possesses a significant impact on the safety and efficiency of rotating machinery. Accordingly, to diagnose the faults in rolling bearings effectively and accurately, a novel hybrid approach coupling variational mode decomposition (VMD), composite multiscale fine-sorted dispersion entropy (CMFSDE) and support vector machine (SVM) optimized by mutation sine cosine algorithm and Harris hawks optimization (MSCAHHO) is proposed in the paper. Firstly, VMD is employed to decompose raw vibration signals with various fault types into different sets of intrinsic mode functions (IMFs) to weaken the non-stationarity of signals, before which the parameter K of VMD is decided through central frequency observation method. Subsequently, CMFSDE is put forward in this paper to analyze the complexity of fault signals by fully considering the relationship between neighboring elements based on composite multiscale technique, with which the representative features of different fault samples are extracted to construct feature vectors. Later, an enhanced hybrid optimization approach called MSCAHHO is proposed by integrating sine cosine algorithm (SCA) and a periodic mutation strategy to improve Harris hawks optimization (HHO). Then, MSCAHHO is employed to optimize the parameters of SVM, after which the optimal SVM model is utilized for fault classification. Finally, the performance of the proposed methodology is evaluated with four validity indices through comparative experiments. The experimental results reveal that the proposed VMD-CMFSDE-MSCAHHO-SVM method achieves favorable diagnosis results comparing with other relevant methods.
Article
Full-text available
This paper addresses the use of two algorithms based on symbolic dynamics analysis of vibration signal for fault diagnosis in gearboxes. The symbolic dynamics algorithm (SDA) works by subdividing the phase space described by the Poincaré plot into several angular regions; then, a symbol is assigned to each region. The probability distributions generated by the set of symbols are considered as features for classification of faults in a gearbox. The peak symbolic dynamics algorithm (PSDA) is a variant that extracts a sequence of peaks from the vibration signals and then performs the phase-space subdivision and symbol coding. A gearbox vibration signal dataset is analyzed for classifying 10 types of faults. Fault classification is attained using a multi-class support vector machine. The highest accuracy attained using k-fold cross-validation is 100.0% for load L3 with SDA and 100% with load L2 with PSDA. The accuracy considering all signals in the gearbox dataset is 99.2% with SDA and 99.8% with PSDA. The algorithms proposed have the advantage of being simple, accurate, and fast, and they could be adapted for online condition monitoring.
Article
Early fault detection and diagnosis plays an important role in reducing maintenance cost and ensuring reliability of rolling element bearings (REBs). Singular value decomposition (SVD) is considered as a promising method to achieve this end, but lacks of consideration of inter-correlation between resulting singular values leading to the loss of weak fault information hidden in specific components. This paper, motivated by recent advances in graph modeling of highly noisy vibration signals, presents a novel method, called graph-modeled singular values (GMSVs), that integrates graph theory and SVD with the purpose of inspection of dynamic REB health conditions. The method utilizes the singular values as inputs to construct the graph, as such it achieves a balance between sensitivity to early fault and robustness to noise; meanwhile, it brings a more powerful ability of fault discrimination. Taking merits of GMSVs, a common null hypothesis testing is performed to inspect whether a fault occurs or not during REB successive operations; the KNN classifier is used to identify the fault type. Experiments are conducted on two publicly-available data sets: XJTU-SY data set and CWRU data set. Comprehensive experimental results along with comparison of those state-of-the-arts demonstrate the priority and great potential of the method in real applications.
Article
The total variation on graph (TVG) is a powerful vertex domain index for measuring the smoothness of graph signals, but its performance is closely related to the underlying graph. Since the horizontal visibility graph can better reflect the dynamics characteristics of bearing vibration signals than the path graph, the underlying graph of TVG is designated as horizontal visibility graph. The vertex domain index TVG defined on horizontal visibility graph is called simply as TVHVG in this paper. For better distinguishing the different states of rolling bearings, the bearing vibration signal is converted into the graph signal indexed by its horizontal visibility graph, and the vertex domain index TVHVG is extracted as the single fault feature. Based on TVHVG feature extraction and Mahalanobis distance classification, a novel fault diagnosis method for rolling bearings is proposed. The proposed method is applied to analyze two sets of experimental data containing normal and faulty rolling bearings. The results indicate that the proposed method can diagnose the bearing faults with different types and degrees effectively, and the vertex domain index TVHVG is superior to some classical time domain indexes in distinguishing the different states of rolling bearings.
Article
Fault diagnosis methods based on dimensionless indicators have long been studied for rotating machinery. However, traditional dimensionless indicators frequently suffer a low accuracy of fault diagnosis for nonlinear and non-stationary dynamic signals of rotating machinery. In this paper, we propose an effective fault diagnosis method based on multi-scale dimensionless indicator (MSDI) and random forests. In the proposed method, the real-time vibration signals are first processed by the variational mode decomposition and then six types of MSDI are constructed based on the decomposed signals. Through utilizing the Fisher criterion, several top ranked MSDIs are selected as fault features. Based on the selected MSDIs, the random forests model is applied to determine fault types. To verify the superiority of the proposed method, several experiments on fault diagnosis are conducted on a centrifugal multi-level impeller blower. The results demonstrate that the proposed method can successfully identify different fault types and the average accuracy can reach 95.58%. In contrast with traditional dimensionless indicators based methods, the proposed method can improve the fault diagnosis accuracy by 7.25% and outperforms other techniques such as back propagation neural network, support vector machine and extreme learning machine. These results indicate that the MSDI can effectively solve the deficiency of the traditional dimensionless indicator, and has stronger distinguishing ability for the fault types.
Article
Despite the diagnosis methods of rotating machinery based on convolutional neural network (CNN) have achieved great success. They generally assume the number of normal and fault samples is the same. However, it’s difficult to obtain adequate fault samples. Moreover, CNN cannot well handle the imbalanced fault diagnosis. Nonlinear auto-regressive neural network (NARNN) has strong prediction ability and can expand the small number of fault samples. Thus, a novel fault diagnosis approach combining CNN with NARNN has been proposed. First, NARNN is applied to expand the small number of samples. Thereby, the sample sizes of different health conditions are equal. Subsequently, continuous wavelet transform is employed to convert the 1-dimensional vibration signals into 2-dimensional time-frequency images. Finally, CNN is established to automatically learn the characteristics and achieve fault identification. Through the comparative experiments, the superiority of the proposed method has been validated based on the two datasets with different imbalanced levels.
Article
Fault feature extraction is crucial to detect failures as earlier as possible in fault diagnosis of rotating machinery. Due to the influence of environment noise and interference, the signal to noise ratio (SNR) of fault feature is relatively low in the measured signal. Complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN) is an improved method based on EEMD, which has been extensively applied to signal de-noising. The key problem for CEEMDAN is to determine the fault-related degree of a decomposed intrinsic mode function (IMF), especially in the presence of both Gaussian and non-Gaussian noises or interferences. However, most of the traditional assessment criterions are developed to describe the statistical parameters of IMFs, e.g. correlation coefficient and kurtosis, which ignore the specific characteristics of the fault and are easily affected by noise components. Therefore, a new criterion is proposed to quantify the fault-related degree of a vibration signal, in which the ratio of periodic modulation components caused by fault to the generalized interferences is defined. Then, a reweighted and reconstruction strategy of the decomposed IMFs is presented to obtain the de-noised signal based on the new criterion. Furthermore, in order to detect the fault-related modulation features in multi-frequency scales, a time-frequency representation (TFR) based demodulation analysis is employed, which guarantees an accurate extraction of the fault feature at the early stage of fault. The effectiveness of the proposed fault diagnosis method comparing to traditional methods are demonstrated by both numerical simulation and experimental studies. The results show that the proposed method achieves a better performance in terms of SNR improvement and fault feature detection, it can successfully detect the fault features in the presence of Gaussian and non-Gaussian noises.