ArticlePDF Available

An interpretable data augmentation scheme for machine fault diagnosis based on a sparsity-constrained generative adversarial network

Authors:

Abstract and Figures

Vibration signal-based methods have been widely utilized in machine fault diagnosis. Usually, a lack of sufficient training data can prevent these methods from achieving satisfactory performance. The generative adversarial network (GAN) is a feasible solution to this problem. However, existing GAN-based methods struggle to stably generate raw vibration signals. To achieve vibration signal generation, a novel sparsity-constrained GAN (SC-GAN) method containing a two-stage training process is developed, which can perform data augmentation for machine fault diagnosis with a simple structure. Autoencoder (AE)-based pretraining and sparsity regularization constraints are implemented in the proposed method. Furthermore, to understand the internal mechanisms of vibration signal generation, we propose a method for analyzing the network’s weight matrix to interpret the generation mechanism. In a case study on rolling element bearings, the SC-GAN is verified to be able to generate raw vibration signals under 10 different health conditions with a more stable training process than other models. In a fault diagnosis task, the data augmentation by SC-GAN significantly improves the diagnostic accuracy by 7.44%. An analysis of the well-trained SC-GAN shows that the model captures key frequency components, which provides a credible interpretation for the generation mechanism. Another case study on the gearbox illustrates the good generalization ability of SC-GAN to other machines and more complicated signals.
Content may be subject to copyright.
An interpretable data augmentation scheme for machine fault diagnosis based on a
sparsity-constrained generative adversarial network
Liang Ma1,2,3, # (buaaml@buaa.edu.cn),
Yu Ding2,4, *, # (dingyu@buaa.edu.cn),
Zili Wang1,2,3 (wzl@buaa.edu.cn),
Chao Wang1,2,3 (wangchaowork@buaa.edu.cn),
Jian Ma1,2,3 (09977@buaa.edu.cn),
Chen Lu1,2,3 (luchen@buaa.edu.cn)
1 Institute of Reliability Engineering, Beihang University, Beijing, 100191, China.
2 Science and Technology on Reliability and Environmental Engineering Laboratory, Beijing, 100191, China.
3 School of Reliability and Systems Engineering, Beihang University, Beijing, 100191, China.
4 School of Aeronautic Science and Engineering, Beihang University, Beijing, 100191, China.
* Corresponding Author: Yu Ding (dingyu@buaa.edu.cn).
# These authors contributed equally to this work and should be considered co-first authors.
Highlights
A novel SC-GAN data augmentation scheme for stable raw vibration signal generation.
An interpretation method to open the black box of the proposed neural network model.
The model can learn frequency domain patterns from time domain vibration signals.
Data augmentation improves the performance of the machine fault diagnosis model.
Abstract
Vibration signal-based methods have been widely utilized in machine fault diagnosis. Usually, a lack of
sufficient training data can prevent these methods from achieving satisfactory performance. The generative
adversarial network (GAN) is a feasible solution to this problem. However, existing GAN-based methods
struggle to stably generate raw vibration signals. To achieve vibration signal generation, a novel sparsity-
constrained GAN (SC-GAN) method containing a two-stage training process is developed, which can perform
data augmentation for machine fault diagnosis with a simple structure. Autoencoder (AE)-based pretraining
and sparsity regularization constraints are implemented in the proposed method. Furthermore, to understand
the internal mechanisms of vibration signal generation, we propose a method for analyzing the network’s
weight matrix to interpret the generation mechanism. In a case study on rolling element bearings, the SC-GAN
is verified to be able to generate raw vibration signals under 10 different health conditions with a more stable
training process than other models. In a fault diagnosis task, the data augmentation by SC-GAN significantly
improves the diagnostic accuracy by 7.44%. An analysis of the well-trained SC-GAN shows that the model
captures key frequency components, which provides a credible interpretation for the generation mechanism.
Another case study on the gearbox illustrates the good generalization ability of SC-GAN to other machines
and more complicated signals.
Keywords: Generative adversarial networks, Data augmentation, Mechanism interpretation, Machine fault
diagnosis, Raw vibration signal
1 Introduction
Effective fault diagnosis techniques ensure the reliability and safety of industrial machines by avoiding
losses of life and property caused by failure. Vibration signals contain important information about the
operating status of machines. Therefore, vibration signal-based methods are widely applied in the fault
diagnosis of industrial machines (Chen et al., 2016). Signal processing methods perform well in understanding
the key information in vibration signals, which is effective for identifying the differences among fault modes.
These signal processing methods include signal decomposition methods (represented by empirical mode
decomposition (EMD) (Lei et al., 2011), variational mode decomposition (VMD) (Dibaj et al., 2020), wavelet
package transform (WPT) (Wu and Liu, 2009), etc.) and feature extraction methods (represented by time-
domain features (Rauber et al., 2015), frequency-domain features (Z. Chen et al., 2019), etc.). Moreover, with
the rapid development of deep learning (DL), benefitting from the ability of DL-based methods to process
high-dimensional nonlinear signals without relying on prior expert knowledge, the use of DL-based methods
to process vibration signals and build fault diagnosis models is currently a popular research topic.
However, DL-based intelligent methods highly rely on sufficient, balanced and labeled data, which are
difficult to obtain in actual industrial scenarios (He et al., 2020). Notably, machines usually work under healthy
conditions, and faulty conditions are infrequent. In addition, from an economic perspective, it is costly to
obtain complete, high-quality fault data for industrial machines. Therefore, data augmentation is a direct and
effective way to solve fault diagnosis problems under the circumstance of a lack of sufficient data. The
synthetic minority oversampling technique (SMOTE) (Chawla et al., 2002) and its variants (Bunkhumpornpat
et al., 2009) (Zhang et al., 2018) are very popular algorithms in data augmentation. These methods utilize real
samples to synthesize new samples by linear interpolation, and their mechanism remains at the instance level
instead of capturing the intrinsic distribution of the data.
A generative adversarial network (GAN) is a framework that can learn a mapping between a prior noise
distribution and the real data distribution via an adversarial learning mechanism and then generate synthetic
samples that are similar to the real data. The most famous research on and application of GANs is fake image
generation (Goodfellow et al., 2014) (Radford et al., 2016). In addition, GANs have been applied to other
fields. Chen et al. (J. Chen et al., 2019) proposed a SAE-GAN method that trains the GAN to generate the
features extracted by the SAE to detect credit card fraud. Recently, GANs have also been investigated and
applied in data augmentation for machine fault diagnosis. Most of these studies focused on generating the
frequency spectra of vibration signals obtained by a fast Fourier transform (FFT). Wang et al. (Wang et al.,
2018) proposed a scheme that combined a GAN and stacked denoising autoencoders (SDAEs) to generate the
spectra of planetary gearbox vibration signals. The SDAE-GAN method performed well in anti-noise ability
and fault diagnosis under the condition of small samples. Guo et al. (Guo et al., 2020) developed a multilabel
1D generation adversarial network (ML1D-GAN) to generate realistic FFT spectra. The ML1D-GAN was
used to extend the training set, which improved the diagnostic accuracy of rolling bearings from 95% to 98%.
Gao et al. (Gao et al., 2020) utilized a GAN to enlarge a dataset consisting of finite element method-based
simulation signals and measured signals. Li et al. (Li et al., 2019) proposed a cross-domain fault diagnosis
method with data augmentation using a GAN. Studies by Wang et al. (Wang et al., 2019), Mao et al. (Mao et
al., 2019), Ding et al. (Ding et al., 2019) and Zheng et al. (Zheng et al., 2020) also involved the generation of
the spectra of vibration signals, which were then utilized to improve the performance of machine fault
diagnosis models under the conditions of insufficient or unbalanced data. In addition to using GANs to
generate vibration signal spectra, existing studies have developed methods for generating the vibration signal
features obtained by signal processing methods. Liu et al. (Liu et al., 2018) applied categorical adversarial
autoencoders (CatAAEs) to generate 10 handcrafted features extracted from the vibration signal. Cabrera et
al. (Cabrera et al., 2019) selected a WPT-based feature extraction result as the generation target for the GAN.
In the study of Zhou et al. (Zhou et al., 2020), a GAN was designed to generate fault features extracted via an
autoencoder (AE) instead of directly generating real fault data samples.
To a certain extent, the information in the original signals will inevitably be lost in the process of
decomposing signals using the aforementioned methods. Thus, directly generating the raw vibration signals
is a data augmentation method with no information loss. However, due to the high nonlinearity and noise
disturbances in signals collected from industrial scenarios, the generation procedure is undeniably difficult to
successfully implement, as stated by Mao et al. (Mao et al., 2019). Based on a literature survey, there have
been only two successful attempts involving the generation of raw vibration data. Shao et al. (Shao et al., 2019)
proposed an auxiliary classifier GAN (ACGAN)-based framework to generate realistic one-dimensional raw
data. To improve the stability of the adversarial training process, multiple 1D convolution layers were stacked
to form the generator and discriminator, and auxiliary labels were fed into the model to provide additional
information. Zhang et al. (Zhang et al., 2020) developed a complex GAN structure for the generation of raw
vibration signals under one specific health condition. The generator consisted of 6 layers, and the discriminator
consisted of 5 layers, including convolutional layers, flattened layers and fully connected layers. However, the
authors mentioned that further research on the optimization of the network structure is needed.
From the literature review, it can be concluded that the following knowledge gaps and key challenges are
still not solved by existing GAN-based data augmentation methods for machine fault diagnosis:
(1) The stable generation of raw vibration signals for lossless data augmentation is difficult. Although a
few methods have been successfully applied, they rely on complex network structures and some training tricks,
which make the GAN hard to train stably.
(2) The mechanism of realistic data generation using GANs needs to be further studied and explained.
Like other DL-based methods, a GAN is usually treated as a black box. Research on how the GAN works can
help us to better understand it and utilize it with more confidence, which is also mentioned by Pan et al. (Pan
et al., 2020).
To address these issues, the major efforts and main contributions of this paper are listed as follows:
(1) To achieve the stable generation of raw vibration signals for machine fault diagnosis, a novel sparsity-
constrained GAN (SC-GAN) model, which includes a two-stage training process, is proposed. Inspired by the
findings of Lei et al. (Lei et al., 2016) that the sparse filtering neural network model shares a physical
interpretation, a sparsity constraint is imposed on the model during the processes of AE-based pretraining and
GAN-based adversarial training, by which the model can learn useful and explainable representations of the
signal. With a simple fully connected structure with one hidden layer, the generator and discriminator can
stably converge and then realize the generation of raw time-domain vibration signals.
(2) To explain how the SC-GAN can stably generate raw vibration signals, a method for interpreting the
vibration signal generation mechanism is proposed from the aspect of the feed forward mechanism of neural
networks. The column vectors of the weight matrix of the hidden sparse layer are separated and regarded as
the learned representations by each neuron. The output of the generator is expressed as a linear combination
of several activated units. According to the results of two case studies, by analyzing the well-trained model
based on the proposed generating mechanism interpreting method, the learned representations of the SC-GAN
contain the key frequency components of the vibration signal, even if directly using the time-domain signals
during the training process. These results provide a clear and direct explanation of the proposed data
augmentation scheme.
The remainder of the paper is organized as follows: A brief introduction to the GAN and sparse
autoencoders (SAEs) is given in Section 2. The proposed interpretable data augmentation scheme for machine
fault diagnosis is described in Section 3, including the model structure and two-stage training process of the
SC-GAN, as well as the method of mechanism interpretation. Experimental validations are given in Sections
4 and 5, followed by the discussion in Section 6 and conclusions in Section 7.
2 Generative adversarial networks and sparse autoencoders
2.1 Generative adversarial networks
Figure 1 Architecture of a GAN.
As shown in Figure 1, a GAN consists of two dual parts, namely, the generator (G) and discriminator (D).
Both G and D are neural networks. G receives random vectors sampled from a certain prior noise distribution,
transforms them into a distribution as similar to the real data as possible, and attempts to confuse D to give
the wrong distinguishing results. In contrast, D tries to determine whether the input sample is a synthesized
sample or a sample obtained from the real data. Mathematically, given the noise vector , the generator
outputs the synthesized sample  . The discriminator receives the generated sample and real
sample and outputs the probability values  and  that reflect whether the input sample is
derived from the real data. G tries to learn the real data distribution and outputs to confuse D to predict
=1, while D tries to predict =0 and =1.
Consequently, the loss function of G is given in Eq. (1).

(1)
The loss function of D is given in Eq. (2).

(2)
The generator and discriminator play a binomial zero-sum game. In the training process, the parameters
of G and D are alternatively updated by adversarial training, as shown in Eq. (3).



(3)
At the end of the adversarial training process, G and D reach a Nash equilibrium, where G can generate
realistic samples and D tends to predict equal probabilities for real and generated samples.
2.2 Sparse autoencoders
An AE is a neural network that tries to learn a reconstruction mapping from the input data to itself. Given
the input , the AE learns the nonlinear function
, where
represents reconstructed data
and  and  are the weight matrixes and bias vectors of the encoder and decoder,
respectively.
Within the encoder, the input data are transformed to a hidden space by a linear transformation and
nonlinear activation function:

(4)
is the encoded vector in the hidden space. are the parameters of the encoder, including and
.  is the nonlinear activation function.
Similar to the encoder, within the decoder, the encoded vector is transformed to the reconstructed vector:

(5)
Here,  is the activation function of the decoder.
The mean squared error (MSE) between
and is usually selected as the loss function, as shown in
Eq. (6), where is the number of samples.


 

(6)
If a sparsity constraint is imposed on the encoded vector by the hidden layer, the features captured by the
AE will be more implementable and robust. Limiting the activation of hidden units is an effective solution.
We denote the activation value of the jth hidden unit corresponding to the input sample  as .
Thus, the average activation level of this hidden unit within samples is

 .
It is expected that the average activation of each hidden unit remains at the low level , where is the
sparsity parameter that is set to a positive number near 0. The KL divergence between and is considered
a part of the loss function that needs to be minimized, as shown in Eq. (7), where
is the vector composed
of the activation values of all hidden units.




(7)
An AE with the sparsity constraint is referred to as an SAE. The final loss function of the SAE is a
combination of Eq. (6) and Eq. (7), where controls the strength of the sparsity penalty.
 
(8)
3 Sparsity-constrained generative adversarial networks for raw vibration
signal generation
In this section, the two-stage training process of the proposed SC-GAN is described. The two-stage
training method consists of an SAE-based pretraining stage and a GAN-based adversarial training stage. An
effective method for revealing the interpretable patterns captured by the SC-GAN will be established. This
method can also be adapted to analyze other neural networks with similar structures.
3.1 Two-stage training process of the SC-GAN
Figure 2 Schematic of SAE-based pretraining, GAN-based adversarial training and the connection between these processes.
As shown in Figure 2, in the pretraining stage, an SAE with one input layer, one hidden layer with
sparsity-constrained units and one output layer is trained to accurately reconstruct the raw vibration signals.
After pretraining, the decoder part of the SAE is separated and reconstructed to form the generator of the SC-
GAN by adding a fully connected layer before it. The encoder is separated and reconstructed to act as the
discriminator of the SC-GAN by adding an output unit after it.
Stage 1: SAE-based pretraining
In the first stage, an SAE is iteratively trained by the minibatch stochastic gradient descent (SGD)
algorithm until convergence is achieved. In this stage, the encoder is enabled to transform vibration signals to
sparse vectors, thereby compressing the critical information contained in the original data into a limited
number of activated units. Correspondingly, the decoder acquires the ability to reconstruct the original
vibration signals from the sparse vectors. The pseudocode is expressed as follows:
Algorithm 1 Minibatch SGD training for the SAE
for  iterations do
Shuffle the dataset  that contains vibration signal
samples of a health condition.
Let .
for  
iterations do
Sample a minibatch  from that
contains  samples.
Calculate the reconstruction error loss  based on Eq. (6).
Calculate the sparsity loss 
based on Eq. (7).
Update the parameters considering the SAE loss based on Eq. (8).


Let .
end for
end for
Stage 2: GAN-based adversarial training
In the second stage, the pretrained encoder and decoder of the SAE are utilized as part of the discriminator
and generator, respectively, of the SC-GAN.
Specifically, a fully connected neural layer is added before the pretrained decoder. The function of this
layer is to transform the random noise into a sparse vector, which can activate the embedded information in
the decoder to generate a realistic vibration signal. To keep the outputs of the second layer of the generator
sparse, a sparsity constraint is imposed on the generator’s second layer to introduce an additional sparsity loss.
The final loss function for the sparsity-constrained generator is:
 

(9)
where controls the strength of the penalty, is the sparsity parameter of the generator, and

is the average activation vector of the second layer of the generator.
A sigmoid-activated unit is added after the pretrained encoder to form the discriminator. Similar to the
function mentioned in the SAE, the first layer of the discriminator is expected to extract critical features from
real or synthetic vibration signals and transform them into sparse vectors. The second layer issues
discrimination results according to the sparse vectors that reflect whether the samples are real or fake.
Consequently, the second layer of the discriminator has a sparsity constraint. The final loss function is:
 

(10)
where the variables have meanings analogous to those in Eq. (9).
The pseudocode is expressed as follows:
Algorithm 2 Minibatch SGD training for the SC-GAN.
for  iterations do
for steps do
Sample a minibatch of random noise samples .
Sample a minibatch of samples  from the real
dataset.
Calculate the loss of the sparsity-constrained discriminator based on Eq.
(10).
Update the parameters of the discriminator by gradient descent:

end for
Sample a minibatch of random noise samples .
Calculate the loss of the sparsity-constrained generator based on Eq. (9).
Update the parameters of the generator by gradient descent:

end for
In Algorithm 2, controls the ratio of the training time of D to the training time of G. and
are the parameter sets of the discriminator and generator, respectively.
3.2 Method of interpreting the SC-GAN
Normally, DL methods, including GANs, are regarded as black boxes. This study proposes a method of
interpreting the SC-GAN to reveal the captured patterns that enable it to generate realistic synthetic vibration
signals.
After adversarial training, the generator with a three-layer structure of the SC-GAN can generate realistic
samples from random noise. Given the noise vector , the output of the first layer is a sparse vector:
 
(11)
where  and  are the weight matrix and bias vector, respectively, of the first layer of the
generator.
The sparse vector is then transformed into the vibration signal, which is output by the generator:
   
(12)
Here, we rewrite the weight matrix  with its column vectors and the sparse vector  with the
elements that it contains:
 







 

 
(13)
Since the activation function is monotonously increasing, it only affects the amplitude of the signal, not
the frequency, which is more critical to vibration-based fault diagnosis. In Eq. (13), regardless of the activation,
the generated vibration signals can be denoted as a linear combination of 


plus a bias
vector. The weights of the linear combination are the elements of the sparse vector. Due to the sparsity
restrictions, there are many close-to-zero values in the vector. Hence, the generator utilizes a sparse vector to
activate several columns that contain embedded discriminative information associated with the real vibration
signals. The generation mechanism of the SC-GAN can be interpreted by analyzing the column vectors of the
hidden layer’s weight matrix in the generator.
4 Case study 1: Rolling element bearings
4.1 Data description
A rolling bearing dataset provided by Case Western Reserve University (CWRU) is employed in this
study (Smith and Randall, 2015). The data under a 1-hp load were applied and sampled with a frequency of
48 kHz. Data for 4 different kinds of health conditions, including normal (N), inner race fault (IR), rolling ball
fault (B), and outer race fault (OR) conditions, were collected. Additionally, the fault diameters were different:
0.007 inches (007), 0.014 inches (014) and 0.021 inches (021). Therefore, 10 health conditions, including
different fault modes and fault locations, were considered, as shown in Table 1. The vibration signals were
separated using a fixed-width window; as a result, each sample contained 320 data points.
Table 1 Health conditions and their abbreviations in Case study 1.
Health
condition
Fault mode
Fault diameters
Abbreviation
0
normal
/
N
1
inner race fault
0.007 inches
IR007
2
0.014 inches
IR014
3
0.021 inches
IR021
4
rolling ball fault
0.007 inches
B007
5
0.014 inches
B014
6
0.021 inches
B021
7
outer race fault
0.007 inches
OR007
8
0.014 inches
OR014
9
0.021 inches
OR021
4.2 Parameter settings
Model parameters
In the SAE-based pretraining stage, the AE has a single-hidden-layer structure with 320-160-320 neurons
in each layer. The activation functions in the hidden layer and output layer are sigmoid and tanh, respectively.
The functional forms of the sigmoid activation function and tanh activation function are denoted as Eq. (14)
and Eq. (15), respectively.

(14)

(15)
An L1 sparsity constraint with the parameter  is imposed on the hidden layer. In the GAN-
based adversarial training stage, the generator of the SC-GAN has a single-hidden-layer structure with 320-
160-320 neurons in each layer, and the discriminator has a structure of 320-160-1. The activation function in
the hidden layers of the generator and discriminator, as well as the output layer of the discriminator, is sigmoid.
The activation function in the output layer of the generator is tanh. An L1 sparsity constraint with the parameter
 is imposed on the hidden layers of both the generator and discriminator. The noise vector is
sampled from the uniform distribution between -1 and 1.
Training parameters
Both the SAE-based pretraining stage and GAN-based adversarial training stage use the Adam optimizer
with recommended hyperparameters and involve 2000 training epochs. The batch size is 64. The training ratio
between D and G is . In addition, a soft-label training strategy is implemented, which means that the
label of a real sample is 0.95 instead of 1.
4.3 Model training
After the SAE-based pretraining process, the training loss during the adversarial training process is shown
in Figure 3(d). To demonstrate the necessity of AE-based pretraining and the sparsity constraint, experiments
with three other comparative models are performed. These models include the basic GAN model (GAN,
Figure 3(a)), GAN model with AE-based pretraining (AE-GAN, Figure 3(b)) and GAN model with a sparsity
constraint but no AE-based pretraining (SGAN, Figure 3(c)). In addition, to more clearly show the
convergence of the model, the last 200 training epochs of the SGAN and SC-GAN are enlarged in Figure 3.
The GAN and AE-GAN are far from Nash equilibrium. For SGAN, although the G-loss and D-loss values
almost reach Nash equilibrium, there is a gap between them, indicating that equilibrium is still not reached.
Only the two losses of SC-GAN converge to near equality, indicating that SC-GAN can be stably trained.
Figure 3 Loss of the generator and discriminator during the adversarial training stage: (a) GAN, (b) AE-GAN, (c) SGAN, and (d)
SC-GAN.
Loss curves cannot reflect the variation in the generated sample quality at different training epochs. Thus,
we introduce the MMD to measure the quality of the generation. The definition of MMD between the two
probability distributions p and q in space is:
( ) ( )
( )
12
12
Φ
( , , ) sup 
 
 =
 
x p x q
f
MMD p q E f x E f x
(16)
where
is the class of functions
:f
, and is the reproducing kernel Hilbert space (RKHS).
Given two datasets
 
1, ,=
=p
ii
p
n
p
Xx
and
 
1, ,=
=q
ii
q
n
q
Xx
, the estimation of MMD can be denoted as:
( ) ( ) ( )
1 1
11
,

==
=−

pq
n
p q p
n
i
q
j
p q
ij
MMD nn
X X x x
(17)
where
( ):
is a feature map. After employing the kernel method, Eq. (17) can be rewritten as:
( ) ( ) ( ) ( )
1/2
1 1 1 1 1 1
11
, , , ,
2
= = = = = =

= + +


  
p p q q p q
n n n n n n
i j i j i j
i j i j
p q p p q q p q
p p q q p ij
q
MMD k k k
n n n n n n
X X x x x x x x
(18)
where
( )
,k
is a kernel function. In this case, the Gaussian kernel is applied:
( )
( )
( )
22
, exp / 2
= − −

k x x x x
.
The MMD distance between real samples and synthetic samples is calculated at the end of each training
epoch, as shown in Figure 4. The light noisy curve is the original MMD of each training epoch. The dark
smooth curve is the result of smoothing the original curve by locally weighted scatterplot smoothing
(LOWESS) (Cleveland, 1981) to better reflect the overall trend of MMD changes. In Figure 4, as adversarial
training continues, the discrepancy between real samples and synthetic samples gradually decreases,
demonstrating that the quality of the generated samples increases.
Figure 4 Variation in the MMD distance between real samples and synthetic samples with the training epoch.
4.4 Generation quality evaluation
4.4.1 By a frequency spectra analysis
Figure 5 Raw vibration signals and frequency spectra of generated samples and the corresponding real samples: (a) NORMAL,
(b) IR007, (c) IR014, (d) IR021, (e) B007, (f) B014, (g) B021, (h) OR007, (i) 0R014, and (j) OR021.
For ten health conditions, the generated raw vibration signals and real signals, as well as the generated
frequency spectra and real spectra, are shown in Figure 5. Beyond the time domain, the generated samples
show high similarities with the real samples in the frequency domain. The key frequencies of the generated
samples and real samples match, which indicates that the SC-GAN can capture useful patterns hidden in the
raw signals in the time and frequency domains, although the signal spectrum is not directly applied during the
training process.
4.4.2 By a fault diagnosis task
Data augmentation is usually performed to improve the performance of machine fault diagnosis
algorithms. As a critical issue in health monitoring tasks, a fault diagnosis experiment is designed to evaluate
the quality of generated samples and their contributions to improvements in the diagnosis results. A stacked
AE-based deep neural network (DNN) is employed to build the diagnosis model in this study; this DNN was
proposed by Jia et al. (Jia et al., 2016). The model structure is 160-120-80-120-160. The activation function
in the hidden layers is a rectified linear unit (ReLU). The number of pretraining epochs is 100 for each AE,
and 200 epochs are used for fine-tuning. The batch size is 32, and the Adam optimizer is applied. The diagnosis
model needs to be trained on the training set, which contains vibration signal samples collected from the
bearings under the ten health conditions listed in Table 1. After training, the model needs to diagnose the health
condition of the bearing when given vibration signal samples in the testing set. To reflect a case with
insufficient training data, the DNN needs to be trained on 10 real samples and tested on 50 samples (for each
health condition, the same process). Different numbers of generated samples are added to the training set to
reflect different levels of data augmentation. Five independent trials were carried out for each experimental
setting. The mean diagnosis accuracy and standard deviation were calculated, as shown in Table 2. According
to the results, when the training data are insufficient, data augmentation by SC-GAN can significantly improve
the diagnosis accuracy by 7.44%. Furthermore, as the proportion of generated samples increases, the diagnosis
accuracy also increases. It can be inferred that the quality of the generated samples is sufficient enough that
these samples can substitute the missing information caused by insufficient real data.
Table 2 Diagnosis results with different numbers of real samples and generated samples for training.
Real training samples
(each health condition)
10
Generated training samples
(each health condition)
0
5
10
20
40
90
Testing samples
(each health condition)
50
Mean accuracy (%)
80.44
82.44
84.96
86.48
87.12
87.88
Standard deviation (%)
2.14
2.04
1.23
1.75
1.34
0.74
4.5 Physical interpretation of the model and generation mechanism
Figure 6 Selected weight vectors of neurons in the hidden layer of the generator and the corresponding frequency spectrum
obtained by FFT: (a) #26, (b) #18, (c) #135, (d) #114, (e) #72, (f) #155, (g) #81, and (h) #148.
DNNs are typically regarded as black boxes, which limits their use since people are unclear about how
they work. By implementing the proposed interpretation method for the SC-GAN, the weight vectors of the
generator under the NORMAL condition are analyzed to reveal what the model learns. Eight neurons are
selected from all the 160 neurons in the hidden layer; the raw weight vectors and corresponding frequency
spectra are shown in Figure 6. As Figure 5 shows, the spectrum of the NORMAL condition mainly contains
three key frequency components. As shown in Figure 6, these neurons have learned useful but diverse patterns
hidden in the raw vibration signals. Some neurons mainly learned one frequency component, such as neurons
#26 and #18 (the lowest frequency), neuron #135 (the middle frequency), and neuron #114 (the highest
frequency). Some neurons, such as neurons #72 and #155, learned a combination of two frequency
components. Some neurons learned all the key components, and these neurons included #81 and #148.
Therefore, the generation mechanism of the proposed SC-GAN can be interpreted as follows: (1) First, the
random noise fed into the generator is embedded into a sparse vector by the first layer; (2) Second, a few
nonzero values in the sparse vector activate the corresponding neurons and weight vectors that contain
different components of the vibration signal; (3) Third, the activated weight vectors are linearly combined
based on the weights provided by the sparse vector, which enables the final outputs to contain complete
frequency components similar to real signals; and (4) Last, the combined vector is transformed by the
nonlinear activation function, which adjusts the signal amplitude to obtain the final synthetic samples that are
output by the generator.
4.6 Comparison with related methods
To understand the complexity of the data and overall performance of the method, the proposed SC-GAN
is compared with related methods from two aspects.
(1) To illustrate the improvement from the innovation of the proposed method, three baseline models are
selected as comparisons: the basic GAN (without pretraining and sparsity constraint), AE-GAN (without
sparsity constraint) and SGAN (without pretraining).
According to the experimental results, none of the three comparison methods achieved the generation of
vibration signals. As shown in Figure 3, the GAN and AE-GAN failed to converge to the Nash equilibrium
state during the adversarial training process. Although the SGAN has a tendency to converge, compared with
SC-GAN, it still has a large deviation from the Nash equilibrium. The SGAN also failed to stably generate
meaningful vibration signals.
(2) To illustrate the superiority of the proposed method compared to related methods in the literature, two
models proposed by studies cited in the introduction section, AC-GAN (Zhang et al., 2020) and Deep-GAN
(Zhang et al., 2020), are selected for comparison.
The AC-GAN was validated using an induction motor vibration dataset in the original paper. To eliminate
the interference caused by different datasets, we reproduced the AC-GAN and validated it on the CWRU
dataset with the same parameter settings. Since the validation of the Deep-GAN applied the CWRU dataset in
the original paper, the experiment was not repeated, and the results in the original paper were used for
comparative analysis.
Figure 7 Five randomly generated vibration signals of different health conditions: (a) IR014, (b) B014, and (c) OR014.
In the comparative experiment, AC-GAN converges near the Nash equilibrium state and can realize
vibration signal generation. However, the AC-GAN produces a typical problem of the GAN: mode collapse.
Using the trained model, 5 independent samples were generated for three health conditions: IR014, B014 and
OR014. The results are shown in Figure 7. As shown in Figure 7, although the input noise vectors are sampled
from a probability distribution (uniform distribution between -1 and 1) randomly and independently, the
vibration signals generated by AC-GAN are almost identical. Once the model falls into a mode collapse, it
will not be able to generate diversified samples for data augmentation and diagnostic capability improvement.
For the Deep-GAN, although it has realized vibration signal generation on the CWRU dataset, the model
structure is more complicated. According to the original paper, the generator consists of 3 convolutional layers,
3 fully connected layers, 5 batch normalization layers and 1 flatten layer, and the discriminator consists of 2
convolutional layers, 2 fully connected layers and 1 flatten layer. Due to the complicated model structure, it
is difficult to tune the hyperparameters, and the required number of training epochs, which is set to 200,000
in the original paper, will also increase.
Table 3 Comparison of the proposed method and related methods (N-No, Y-Yes, H-High, L-Low).
Model
Is it able to generate
vibration signals?
Mode collapse
Is it explainable?
Model complexity
GAN
N
/
/
L
AE-GAN
N
/
/
L
SGAN
N
/
/
L
AC-GAN
Y
Y
N
H
Deep-GAN
Y
N
N
H
SC-GAN (proposed)
Y
N
Y
L
In summary, the five comparison methods are compared with the proposed method in different aspects,
as shown in Table 3. It can be concluded that compared to related methods, the proposed SC-GAN can realize
stable generation of vibration signals with no mode collapse. Simultaneously, the mechanism of vibration
signal generation can be explained, and the model complexity is relatively low.
5 Case study 2: Gearbox
5.1 Experiments and data description
To validate the effectiveness and performance of the proposed method, a dataset collected from a
planetary gearbox is utilized. The dataset is available in
https://drive.google.com/file/d/1NWPbfAq52Wb9M0X5OoZF17eezUhPDZT5/view?usp=sharing. The test
bench employed for collecting the data in this case study is the power transmission fault prediction test bench
manufactured by Spectra Quest, USA. This test bench consists of a driver, a lubrication system, a driver motor,
four gearboxes, a load motor, torque sensors and force sensors, as shown in Figure 8. Faults were injected into
the planetary gear of the gearbox, including one missing tooth, broken, wear and tooth root cracks. The gear
shape that corresponds to the four fault modes is shown in Figure 9. The vibration sensor was attached to the
outer end cap of the input shaft of the tested planetary gearbox via a thread connection.
The dataset contains 5 health conditions: normal condition, one missing tooth, wear, broken and tooth
root cracks. The rotating speed of the gear was 60 Hz, and the load was 1.2Nm. For each health condition,
data of 10 seconds were collected with a sampling frequency of 12.8 kHz. The data were divided into samples
with 640 points, so there were 200 samples for each health condition.
Figure 8 Testbench used to collect the gearbox dataset.
Figure 9 Gear shape of the four fault modes: (a) one missing tooth, (b) broken, (c) wear, (d) tooth root crack.
5.2 Parameter settings
Model parameters
In the SAE-based pretraining stage, the AE has a structure of 640-320-640 neurons in each layer. In the
GAN-based adversarial training stage, the structure of the generator is 640-320-640, and the structure of the
discriminator is 640-320-1. Other parameters, including the activation functions and the sparsity constraint in
the pretraining and adversarial training stages, are the same as those of the previous case study. The noise
vector is sampled from the uniform distribution between -1 and 1.
Training parameters
With the exception that the training involves 5000 epochs and the batch size is 32, all the other parameters
are the same as those of the previous study.
5.3 Vibration signal generation results and model physical interpretation
Figure 10 Raw vibration signals and frequency spectra of generated samples and the corresponding real samples: (a) Normal,
(b) One missing tooth, (c) Wear, (d) Broken, and (e) Tooth root crack.
The generated vibration signals and corresponding frequency spectra of 5 different health conditions of
the gearbox are shown in Figure 10. According to the generation results, the SC-GAN can generate realistic
vibration signals that are similar to real samples in the time and frequency domains, although the signals of
the gearbox are much more complicated than those of the rolling element bearings. The good performance of
the SC-GAN on the gearbox dataset illustrated its generalization ability for vibration signals that contain more
noise and frequency components. In particular, there are more high frequencies in the vibration signals of the
gearbox than those of the rolling bearings.
Table 4 Diagnosis results with different numbers of real samples and generated samples for training.
Real training samples
(each health condition)
5
Generated training samples
(each health condition)
0
5
50
100
300
600
Testing samples
(each health condition)
600
Mean accuracy (%)
94.92
96.51
100
100
100
100
Standard deviation (%)
3.47
5.08
0
0
0
0
Figure 11 Variation in the mean testing accuracy with fine-tuning epoch of the fault diagnosis task using different numbers of
generated training samples. The accuracy is the average of five independent trials.
To validate the effectiveness of the proposed data augmentation scheme, a DNN model is established to
complete a fault diagnosis task using different numbers of generated training samples. Except for the number
of neurons in each layer and number of fine-tuning epochs, all the other hyperparameters are the same as those
in the case of rolling element bearings. Here, the model structure is 320-160-80-160-320, and the fine-tuning
epoch is 100. In this case study, a more severe small sample condition is considered, with only 5 real vibration
signal samples available for training and 600 samples for testing. Five independent trials were carried out for
each experimental setting. The diagnosis results are shown in Table 4. According to the results, the fault
diagnosis performance with insufficient real training samples can be improved significantly. Although the
diagnosis accuracy reaches 100% and no longer increases when there are 50 generated samples, the required
number of epochs is reduced when the number of generated samples increases, as shown in Figure 11. The
samples generated by SC-GAN have reasonable consistency with real samples, so the DNN is provided with
more information during the AE-based pretraining stage and the model parameters are located at better
positions at the beginning of the fine-tuning stage.
Figure 12 Selected weight vectors of neurons in the hidden layer of the generator and the corresponding frequency spectrum
obtained by FFT: (a) #315, (b) #100, (c) #0, (d) #184, (e) #83, (f) #84, (g) #272, and (h) #34.
The SC-GAN that is trained to generate vibration signals of the wear fault mode is selected to illustrate
the generation mechanism. By implementing the proposed model interpretation method, the weight vectors of
the generator and the corresponding FFT spectrum are shown in Figure 12. Similar to the case of rolling
element bearings, in this case study, the neurons also learned useful but diverse frequency domain patterns
contained in the time domain vibration signals. In addition, the frequency components learned by the model
cover all the key frequency bands of the real vibration signals. Therefore, the interpretation of the raw vibration
signal generation mechanism stated in section 4.5 still holds.
6 Discussion
As stated in the introduction, the main contributions of this research can be summarized in the following
two points: the proposal of a data augmentation scheme that can stably generate raw vibration signals and an
interpretation of the generation mechanism of the proposed model.
For the first point, different from existing GAN-based methods, the proposed SC-GAN takes advantage
of AE-based pretraining and the imposition of the sparsity constraint. In the pretraining stage, patterns of the
vibration signals can be prestored in the network, providing strong prior information for GAN-based
adversarial training. It makes the GAN stable and easier to reach the Nash Equilibrium, as shown in Figure 3.
From a frequency domain perspective, the energy of the vibration signals of industrial machines for different
health states is concentrated at several characteristic frequencies. Therefore, the pretraining stage before the
game process of a GAN can guide the subsequent training process, instead of letting the entire game process
start a random exploration from scratch. The sparsity constraint forces the models to learn more useful patterns,
which is also stated by Lei et al. (Lei et al., 2016), so that the model can reconstruct or generate vibration
signals with fewer activated neurons.
For the second point, by the physical interpretation of the model and the generation mechanism illustrated
in Figure 6 and Figure 12, it is noted that the neurons have successfully mined the characteristic frequency
hidden in the vibration signals instead of becoming confused in the noise. Eventually, the trained SC-GAN
managed to obtain the characteristic frequencies and generate new vibration signals by combining the learning
components of each characteristic frequency rather than copying the time-domain signals.
The stable generation ability and clear generation mechanism interpretation make the proposed SC-GAN
a reliable data augmentation scheme for machine fault diagnosis.
The authors believe that this research provides new inspiration for vibration signal generation methods
applied for data augmentation of machine fault diagnosis. As described in the Introduction, most of these
methods focus on generating FFT spectra or handcrafted features of a raw vibration signal, inevitably causing
information loss. A few studies that attempt to generate a raw vibration signal are focused on the use of a more
complex GAN architecture to improve the learning capability of the model. This research proposes two other
new aspects, namely, the pretraining mechanism and introduction of sparse regularization constraints. The
pretraining mechanism has been widely employed in the field of deep learning. This method ensures that the
GAN has a good initial state. During the AE-based pretraining process, the network learns key features of the
vibration signal, thereby avoiding the instability caused by training from scratch. The introduction of sparse
regularization constraints has been shown to be able to improve the performance of the model. Especially in
(Lei et al., 2016), sparse filtering feature learning enables the network to learn meaningful frequency
components in vibration signals. This research further illustrates its important role in vibration signal
generation tasks. The authors suggest that future research could attempt to employ other GAN architectures
to achieve vibration signal generation with the premise of fully utilizing the two points of inspiration, or after
adding these two improvements, try to use GAN to generate other data that are difficult to generate at this
stage.
7 Conclusions
In this paper, a novel data augmentation scheme for machine fault diagnosis based on an SC-GAN that
contains a two-stage training process is developed. The scheme can adaptively capture the distribution of
vibration signals and stably generate high-quality raw vibration data without a complicated model structure.
Furthermore, a method of interpreting the patterns learned by the SC-GAN is proposed. To the best of our
knowledge, this method is quite an early attempt to reveal the mechanism of the GAN-based data
augmentation method. In a case study on rolling element bearings, vibration signals under ten health
conditions are generated, and the quality of the signals is evaluated by a frequency spectra analysis and a fault
diagnosis task. The generated samples show high consistency with real samples in the time and frequency
domains and significantly contribute to improvements in diagnostic performance when real data are lacking.
The learned patterns are obtained from the model to support the interpretation of the generation mechanism.
Additionally, another case study on the gearbox validates the generation ability of the SC-GAN for other types
of industrial machines, of which the vibration signals are noisier and more complicated.
In future research, the authors would like to focus on the combination of GAN-based data augmentation
and transfer learning methods. In the experiment performed to evaluate the generation quality by a fault
diagnosis task of rolling element bearings, the results show that although there is a significant increase in the
diagnosis accuracy, it is difficult to obtain an accuracy as high as that achieved using real data. This finding
may indicate that there is still a divergence between the distributions of real and generated data. Transfer
learning is probably a feasible way of narrowing this gap, so further investigation is warranted.
Acknowledgment
This study was supported by the National Natural Science Foundation of China (Grant Nos. 61973011
and 61803013), the Fundamental Research Funds for the Central Universities (Grant No.YWF-20-BJ-J-517
and YWF-20-BJ-J-723), National key Laboratory of Science and Technology on Reliability and
Environmental Engineering (Grant Nos. 6142004180501 and 6142004190501), the Research Fund(Grant No.
61400020401), the Capital Science & Technology Leading Talent Program (Grant No. Z191100006119029),
as well as the China Postdoctoral Science Foundation (Grant No. 2019M650438).
References
Bunkhumpornpat, C., Sinapiromsaran, K., Lursinsap, C., 2009. Safe-Level-SMOTE: Safe-Level-Synthetic
Minority Over-Sampling TEchnique for Handling the Class Imbalanced Problem BT - Advances in
Knowledge Discovery and Data Mining, in: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, T.-B.
(Eds.), . Springer Berlin Heidelberg, Berlin, Heidelberg, pp. 475482.
Cabrera, D., Sancho, F., Long, J., Sanchez, R.V., Zhang, S., Cerrada, M., Li, C., 2019. Generative
Adversarial Networks Selection Approach for Extremely Imbalanced Fault Diagnosis of Reciprocating
Machinery. IEEE Access 7, 7064370653. https://doi.org/10.1109/ACCESS.2019.2917604
Chawla, N. V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P., 2002. SMOTE: Synthetic minority over-
sampling technique. J. Artif. Intell. Res. 16, 321357. https://doi.org/10.1613/jair.953
Chen, J., Li, Z., Pan, J., Chen, G., Zi, Y., Yuan, J., Chen, B., He, Z., 2016. Wavelet transform based on inner
product in fault diagnosis of rotating machinery: A review. Mech. Syst. Signal Process.
https://doi.org/10.1016/j.ymssp.2015.08.023
Chen, J., Shen, Y., Ali, R., 2019. Credit Card Fraud Detection Using Sparse Autoencoder and Generative
Adversarial Network. 2018 IEEE 9th Annu. Inf. Technol. Electron. Mob. Commun. Conf. IEMCON
2018 10541059. https://doi.org/10.1109/IEMCON.2018.8614815
Chen, Z., Zhai, W., Wang, K., 2019. Vibration feature evolution of locomotive with tooth root crack
propagation of gear transmission system. Mech. Syst. Signal Process. 115, 2944.
https://doi.org/10.1016/j.ymssp.2018.05.038
Cleveland, W.S., 1981. LOWESS: A program for smoothing scatterplots by robust locally weighted
regression. Am. Stat. 35, 54.
Dibaj, A., Ettefagh, M.M., Hassannejad, R., Ehghaghi, M.B., 2020. A hybrid fine-tuned VMD and CNN
scheme for untrained compound fault diagnosis of rotating machinery with unequal-severity faults.
Expert Syst. Appl. 114094. https://doi.org/10.1016/j.eswa.2020.114094
Ding, Y., Ma, L., Ma, J., Wang, C., Lu, C., 2019. A generative adversarial network-based intelligent fault
diagnosis method for rotating machinery under small sample size conditions. IEEE Access 7, 149736
149749. https://doi.org/10.1109/ACCESS.2019.2947194
Gao, Y., Liu, X., Xiang, J., 2020. FEM Simulation-Based Generative Adversarial Networks to Detect
Bearing Faults. IEEE Trans. Ind. Informatics 16, 49614971. https://doi.org/10.1109/TII.2020.2968370
Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio,
Y., 2014. Generative Adversarial Nets, in: Proceedings of the 27th International Conference on Neural
Information Processing Systems - Volume 2, NIPS’14. MIT Press, Cambridge, MA, USA, pp. 2672–
2680.
Guo, Q., Li, Y., Song, Y., Wang, D., Chen, W., 2020. Intelligent Fault Diagnosis Method Based on Full 1-D
Convolutional Generative Adversarial Network. IEEE Trans. Ind. Informatics 16, 20442053.
https://doi.org/10.1109/TII.2019.2934901
He, Z., Shao, H., Cheng, J., Zhao, X., Yang, Y., 2020. Support tensor machine with dynamic penalty factors
and its application to the fault diagnosis of rotating machinery with unbalanced data. Mech. Syst.
Signal Process. 141, 106441. https://doi.org/10.1016/j.ymssp.2019.106441
Jia, F., Lei, Y., Lin, J., Zhou, X., Lu, N., 2016. Deep neural networks: A promising tool for fault
characteristic mining and intelligent diagnosis of rotating machinery with massive data. Mech. Syst.
Signal Process. 7273, 303315. https://doi.org/10.1016/j.ymssp.2015.10.025
Lei, Y., He, Z., Zi, Y., 2011. EEMD method and WNN for fault diagnosis of locomotive roller bearings.
Expert Syst. Appl. 38, 73347341. https://doi.org/10.1016/j.eswa.2010.12.095
Lei, Y., Jia, F., Lin, J., Xing, S., Ding, S.X., 2016. An Intelligent Fault Diagnosis Method Using
Unsupervised Feature Learning Towards Mechanical Big Data. IEEE Trans. Ind. Electron. 63, 3137
3147. https://doi.org/10.1109/TIE.2016.2519325
Li, X., Zhang, W., Ding, Q., 2019. Cross-domain fault diagnosis of rolling element bearings using deep
generative neural networks. IEEE Trans. Ind. Electron. 66, 55255534.
https://doi.org/10.1109/TIE.2018.2868023
Liu, H., Zhou, J., Xu, Y., Zheng, Y., Peng, X., Jiang, W., 2018. Unsupervised fault diagnosis of rolling
bearings using a deep neural network based on generative adversarial networks. Neurocomputing 315,
412424. https://doi.org/10.1016/j.neucom.2018.07.034
Mao, W., Liu, Y., Ding, L., Li, Y., 2019. Imbalanced fault diagnosis of rolling bearing based on generative
adversarial network: A comparative study. IEEE Access 7, 95159530.
https://doi.org/10.1109/ACCESS.2018.2890693
Pan, T., Chen, J., Xie, J., Chang, Y., Zhou, Z., 2020. Intelligent fault identification for industrial automation
system via multi-scale convolutional generative adversarial network with partially labeled samples. ISA
Trans. 101, 379389. https://doi.org/10.1016/j.isatra.2020.01.014
Radford, A., Metz, L., Chintala, S., 2016. Unsupervised Representation Learning with Deep Convolutional
Generative Adversarial Networks.
Rauber, T.W., De Assis Boldt, F., Varejão, F.M., 2015. Heterogeneous feature models and feature selection
applied to bearing fault diagnosis. IEEE Trans. Ind. Electron. 62, 637646.
https://doi.org/10.1109/TIE.2014.2327589
Shao, S., Wang, P., Yan, R., 2019. Generative adversarial networks for data augmentation in machine fault
diagnosis. Comput. Ind. 106, 8593. https://doi.org/10.1016/j.compind.2019.01.001
Smith, W.A., Randall, R.B., 2015. Rolling element bearing diagnostics using the Case Western Reserve
University data: A benchmark study. Mech. Syst. Signal Process.
https://doi.org/10.1016/j.ymssp.2015.04.021
Wang, J., Li, S., Han, B., An, Z., Bao, H., Ji, S., 2019. Generalization of Deep Neural Networks for
Imbalanced Fault Classification of Machinery Using Generative Adversarial Networks. IEEE Access 7,
111168111180. https://doi.org/10.1109/ACCESS.2019.2924003
Wang, Z., Wang, J., Wang, Y., 2018. An intelligent diagnosis scheme based on generative adversarial
learning deep neural networks and its application to planetary gearbox fault pattern recognition.
Neurocomputing 310, 213222. https://doi.org/10.1016/j.neucom.2018.05.024
Wu, J. Da, Liu, C.H., 2009. An expert system for fault diagnosis in internal combustion engines using
wavelet packet transform and neural network. Expert Syst. Appl. 36, 42784286.
https://doi.org/10.1016/j.eswa.2008.03.008
Zhang, W., Li, Xiang, Jia, X.D., Ma, H., Luo, Z., Li, Xu, 2020. Machinery fault diagnosis with imbalanced
data using deep generative adversarial networks. Meas. J. Int. Meas. Confed. 152, 107377.
https://doi.org/10.1016/j.measurement.2019.107377
Zhang, Y., Li, X., Gao, L., Wang, L., Wen, L., 2018. Imbalanced data fault diagnosis of rotating machinery
using synthetic oversampling and feature learning. J. Manuf. Syst. 48, 3450.
https://doi.org/10.1016/j.jmsy.2018.04.005
Zheng, T., Song, L., Wang, J., Teng, W., Xu, X., Ma, C., 2020. Data synthesis using dual discriminator
conditional generative adversarial networks for imbalanced fault diagnosis of rolling bearings. Meas. J.
Int. Meas. Confed. 158, 107741. https://doi.org/10.1016/j.measurement.2020.107741
Zhou, F., Yang, S., Fujita, H., Chen, D., Wen, C., 2020. Deep learning fault diagnosis method based on
global optimization GAN for unbalanced data. Knowledge-Based Syst. 187, 104837.
https://doi.org/10.1016/j.knosys.2019.07.008
... GAN can be utilized to generate pseudo data that is similar to real data. Ma et al [28] presented a sparse constrained GAN approach that can expand data for fault diagnosis. Liu et al [29] developed a conditional variational auto encoding GAN for fault diagnosis. ...
Article
Full-text available
Intelligent diagnosis of mechanical faults is an important means to guarantee the safe maintenance of equipment. Cross domain diagnosis may lack sufficient measurement data as support, and this bottleneck is particularly prominent in high-end manufacturing. This paper presents a few-shot fault diagnosis methodology based on meta transfer learning for gearbox. To be specific, firstly, the subtasks for transfer diagnosis are constructed, and then joint distribution adaptation is conducted to align the two domain distributions; secondly, through adaptive manifold regularization, the data of target working condition is further utilized to explore the potential geometric structure of the data distribution. Meta stochastic gradient descent is explored to dynamically adjust the model’s parameter based on the obtained task information to obtain better generalization performance, ultimately to achieve transfer diagnosis of gearbox faults with few samples. The effectiveness of the approach is supported by the experimental datasets of the gearbox.
... This method lowers high-frequency noise in the input signal while capturing distant relationships in time series data [28]. While variational auto-encoders (VAEs) were used for anomaly identification in wind turbines, generative adversarial networks (GANs) were used to generate training data in gearbox defect diagnostics [29,30]. The ML -based data-driven methodologies have attracted a lot of interest in this area. ...
Article
Full-text available
The rolling elements of the induction motor are highly susceptible to faults. The detection and diagnosis of rolling element faults are accurate and reliable only when the extracted features are accurate. The paper proposes an approach for bearing and rotor fault diagnosis using deep optimal feature extraction and selection based on vibration signal analysis. The deep feature extraction is done using an ensemble deep models features extraction approach in which features are extracted from seven pretrained models are fused serially using serial-based feature fusion technique. This leads to a solution for a higher efficacy model, but at the cost of high processing time as the feature data set gets large. A unique approach termed Ensemble Feature Selection has been developed to address this issue and limit the harmful impact of unwanted features in data-driven diagnostics. The processing time is further reduced using the shallow classifier at the fully connected layer. The proposed model is tested using the data acquired in the laboratory and validated using the available online benchmark data sets.
... • An inefficient and expensive feature of traditional GANs is that they can generate only one sample at a time. • According to Ma et al. (2021), raw vibration signals cannot be reliably generated by some GAN-based methods. ...
Preprint
Full-text available
In this era of advanced manufacturing, it's now more crucial than ever to diagnose machine faults as early as possible to guarantee their safe and efficient operation. With the massive surge in industrial big data and advancement in sensing and computational technologies, data-driven Machinery Fault Diagnosis (MFD) solutions based on machine/deep learning approaches have been used ubiquitously in manufacturing. Timely and accurately identifying faulty machine signals is vital in industrial applications for which many relevant solutions have been proposed and are reviewed in many articles. Despite the availability of numerous solutions and reviews on MFD, existing works often lack several aspects. Most of the available literature has limited applicability in a wide range of manufacturing settings due to their concentration on a particular type of equipment or method of analysis. Additionally, discussions regarding the challenges associated with implementing data-driven approaches, such as dealing with noisy data, selecting appropriate features, and adapting models to accommodate new or unforeseen faults, are often superficial or completely overlooked. Thus, this survey provides a comprehensive review of the articles using different types of machine learning approaches for the detection and diagnosis of various types of machinery faults, highlights their strengths and limitations, provides a review of the methods used for condition-based analyses, comprehensively discusses the available machinery fault datasets, introduces future researchers to the possible challenges they have to encounter while using these approaches for MFD and recommends the probable solutions to mitigate those problems. The future research prospects are also pointed out for a better understanding of the field. We believe this article will help researchers and contribute to the further development of the field.
Article
Recently, deep learning has been a predominantly used technique for intelligent fault diagnosis of industrial machines. It has accomplished satisfactory performance as well. However, noise is present in a real-life industrial working environment, and the operational load also constantly changes. This work proposes a Time-Frequency Fusion Network (TFFNet) for intelligent fault diagnosis. It is robust convolutional neural network based deep-learning algorithm and eliminates the signal processing required for denoising. The success of the developed model is verified in the presence of real-time noisy conditions and under a load-varying environment. The proposed model attained 99.98% accuracy in a noisy environment and 98.6% average accuracy under six cases of domain shift. Finally, the results are compared with past studies using accuracy as a performance indicator.
Article
Full-text available
Diagnosis of rolling bearings plays an important role in condition monitoring of industrial rotating machinery. In many actual applications, rolling bearings work in normal state at most time and faulty samples are difficult to be collected. Thus, it is easy to arise problem of imbalanced dataset which restricts accuracy and stability of fault diagnosis. Generative adversarial networks (GANs) have been proved to be effective to produce artificial data that are alike real data, and have been widely used in image fields. Data synthesis using deep generative model provide a promising methodology for imbalanced fault diagnosis of machinery. In this paper, we propose a novel framework named dual discriminator conditional generative adversarial networks (D2CGANs) to learn from sensor signals on multimodal fault samples and automatically synthesize realistic one-dimensional signals of each fault. The framework is designed to produce realistic multimodal samples with fault labels and dual-discriminator structure is benefit to enhance the quality and diversity of synthesized data without mode collapse. Then, synthesized data can be used for data augmentation to improve the accuracy of imbalanced fault diagnosis. In order to evaluate the performance of the generative model, we introduce a set of assessments to evaluate quality and diversity of synthesized data, including quantitative statistical metrics and qualitative visualization. Finally, experiments on rolling bearings datasets from Case Western Reserve University (CWRU) are implemented to verify the effectiveness of the proposed approach for imbalanced fault diagnosis. Results demonstrate our method outperforms other widely used synthesis techniques in terms of data synthesis quality and fault diagnosis accuracy, and timeliness analysis also denotes our method can meet requirement of online fault diagnosis.
Article
Full-text available
Rolling bearings are the widely used parts in most of the industrial automation systems. As a result, intelligent fault identification of rolling bearing is important to ensure the stable operation of the industrial automation systems. However, a major problem in intelligent fault identification is that it needs a large number of labeled samples to obtain a well-trained model. Aiming at this problem, the paper proposes a semi-supervised multi-scale convolutional generative adversarial network for bearing fault identification which uses partially labeled samples and sufficient unlabeled samples for training. The network adopts a one-dimensional multi-scale convolutional neural network as the discriminator and a multi-scale deconvolutional neural network as the generator and the model is trained through an adversarial process. Because of the full use of unlabeled samples, the proposed semi-supervised model can detect the faults in bearings with limited labeled samples. The proposed method is tested on three datasets and the average classification accuracy arrived at of 100%, 99.28% and 96.58% respectively Results indicate that the proposed semi-supervised convolutional generative adversarial network achieves satisfactory performance in bearing fault identification when the labeled data are insufficient.
Article
Full-text available
Rotating machinery plays a key role in mechanical equipment, and the fault diagnosis of rotating machinery is a popular research topic. To overcome the dependency on expert knowledge regarding conventional time-frequency analysis diagnosis methods, machine learning (ML) and artificial intelligence (AI)-based methods are commonly studied. Although these methods can achieve high-accuracy diagnosis results, they are based on a large number of training samples. A generative adversarial network (GAN) is an algorithm with the capability of generating realistic samples that are similar to the real samples, and it can be applied to solve fault diagnosis problems with insufficient training data, which is called the small sample size condition in this study. However, a single-GAN model cannot achieve a good diagnostic result. To achieve adaptive feature extraction and high diagnosis accuracy, this study proposes an intelligent fault diagnosis method for rotating machinery based on GANs under small sample size conditions. The effectiveness and performance of the proposed method are validated using rolling bearing and gearbox datasets. In these datasets, only 10% and 20% of the samples are selected as the training data. Samples associated with different health conditions and various working conditions are included in the datasets. Compared with those of other diagnosis methods, the high-accuracy and low-volatility diagnosis results indicate that the proposed method can stably distinguish fault modes under different working conditions in an adaptive way, even though few training samples are available.
Article
Full-text available
Mechanical fault datasets are always highly imbalanced with abundant common mechanical fault samples but a paucity of samples from rare fault conditions. To overcome this weakness, simulation of rare fault signals is proposed in this paper. Specifically, frequency spectra are employed as model signals, then Wasserstein generative adversarial network (WGAN) is implemented to generate simulated signals based on a labeled dataset. Finally, the real and artificial signals are combined to train stacked autoencoders (SAE) to detect mechanical health conditions. To validate the effectiveness of the proposed WGAN-SAE method, two specially designed experiments are carried out and some traditional methods are adopted for comparison. The diagnosis results show that the proposed method can deal with imbalanced fault classification problem much more effectively. The improved performance is mainly due to the artifical fault signals generated from WGAN to balance the dataset, where the signals that are lacking in training dataset are effectively augmented. Furthermore, the learned features in each layer of the generator network are also analyzed via visualization, which may help us understand the working process of WGAN.
Article
In the case of a compound fault diagnosis of rotating machinery, when two failures with unequal severity occur in distinct parts of the system, the detection of a minor fault is a complicated and challenging task. In this case, the minor fault is overshadowed by the more severe one, and the characteristics of the compound fault are prone to the more severe one. Generally, the proposed methods in the literature consider compound failure as an individual fault type and unrelated to the corresponding single faults, either at the different locations of a sensitive component or in two separate parts, such as the bearing and gear, with approximately the same fault severity. Considering these issues, this study proposes a novel end-to-end fault diagnosis method based on fine-tuned VMD and convolutional neural network (CNN). The main idea is that CNN is trained only on a healthy and single fault dataset, without the use of compound fault data in training. In the test stage of the CNN model, the intelligent method alarms an untrained compound fault state if acquired probabilities of CNN output satisfy a set of probabilistic conditions. The performance of the fine-tuned VMD and the proposed hybrid method is evaluated by the decomposition of a simulated vibration signal and the analysis of a gearbox system with a compound fault scenario in such a way that one fault is minor and the other severe. The results obtained show the high accuracy of the proposed method in compound fault diagnosis and the feature extraction and classification of a minor fault in the presence of a more severe one.
Article
Despite the recent advances of intelligent data-driven fault diagnosis methods on rotating machines, balanced training data for different machine health conditions are assumed in most studies. However, the signals in machine faulty states are usually difficult and expensive to collect, resulting in imbalanced training dataset in most cases. That significantly deteriorates the effectiveness of the existing data-driven approaches. This paper proposes a deep learning-based fault diagnosis method to address the imbalanced data problem by explicitly creating additional training data. Generative adversarial networks are firstly used to learn the mapping between the distributions of noise and real machinery temporal vibration data, and additional realistic fake samples can be generated to balance and further expand the available dataset afterwards. Through experiments on two rotating machinery datasets, it is validated that the data-driven methods can significantly benefit from the data augmentation, and the proposed method offers a promising tool on fault diagnosis with imbalanced training data.
Article
Complete fault samples is essential to activate artificial intelligent (AI) models. A novel fault detection scheme is proposed to build a bridge between AI and real-world running mechanical systems. Firstly, the finite element method (FEM) simulation is used to simulate samples with different faults to overcome the shortcoming of missing fault samples. Secondly, to enlarge datasets, new samples similar to the simulation and measurement fault samples are generated by generative adversarial networks (GANs) and further combined with the original simulation and measurement samples to obtain synthetic samples. Finally, the synthetic and unknown fault samples are severed as the training and test samples, respectively to the classifiers of AI models, and the unknown fault types will be finally determined. A public datasets of bearings have been used to verify the effectiveness of the proposed scheme. It is expected that the proposed scheme can be extended to complex mechanical systems.
Article
The fault diagnosis methods of rotating machinery based on machine learning have been developed in the past years, such as support vector machine (SVM) and convolutional neural networks (CNN). SVM just can be only used for the classification of the vector space in which the feature data extracted from raw signals are input data in vector form, so SVM loses its functions while the input feature data are high order tensors which can contain rich feature information of rotating machinery. Moreover, a large number of data are needed in CNN, but it’s hard to get large numbers of fault samples of rotating machinery under different conditions. Recently, a kind of tensor classifier called support tensor machines (STM) can solve the problems in the above methods. But when the input samples of STM are unbalanced data, the hyper-plane obtained by the training of STM may not be the optimal hyper-plane and it may reduce the overall classification rate. Therefore, in this paper, a novel tensor classifier called support tensor machine with dynamic penalty factors (DC-STM) is proposed and applied to the fault diagnosis of rotating machinery. In this method, for linear separable case, linear support tensor model with dynamic penalty factors (DC-LSTM) is proposed, which does not ignore the impact of rare support vectors of a class with less training samples on the structural risk. Subsequently, for nonlinear separable case, a tensor kernel function is introduced into DC-LSTM, and nonlinear support tensor model with dynamic penalty factors (DC-NSTM) is proposed. In order to verify the performance of DC-STM in unbalanced data classification, it is applied to fault classification of rotating machinery with unbalanced data. The experimental results show that the proposed method can achieve better classification results when the training samples of rotating machinery are unbalanced data.
Article
Data-driven fault diagnosis is essential for the reliability and safety of industry equipment. However, the lack of real labeled fault data make the machine learning based diagnosis methods difficult to carry out. To solve this problem, this paper proposes a new fault diagnosis framework called multi-label 1D generation adversarial network (ML1D-GAN). In our method, Auxiliary Classifier GAN (AC-GAN) is utilized first for real damage data generation. Then the generated and real damage data are both used to train the fault classifier. Experimental results reveal that the generated data is applicable, and ML1D-GAN improves the diagnosing accuracy for real bearing faults from 95% to 98% when trained with the generated data. The scalability of the learning model is also proven in the experiment.
Article
Deep learning can be applied to the field of fault diagnosis for its powerful feature representation capabilities. When a certain class fault samples available are very limited, it is inevitably to be unbalanced. The fault feature extracted from unbalanced data via deep learning is inaccurate, which can lead to high misclassification rate. To solve this problem, new generator and discriminator of Generative Adversarial Network (GAN) are designed in this paper to generate more discriminant fault samples using a scheme of global optimization. The generator is designed to generate those fault feature extracted from a few fault samples via Auto Encoder (AE) instead of fault data sample. The training of the generator is guided by fault feature and fault diagnosis error instead of the statistical coincidence of traditional GAN. The discriminator is designed to filter the unqualified generated samples in the sense that qualified samples are helpful for more accurate fault diagnosis. The experimental results of rolling bearings verify the effectiveness of the proposed algorithm.