ArticlePDF Available

DSTED: A Denoising Spatial-Temporal Encoder-Decoder Framework for Multistep Prediction of Burn-Through Point in Sintering Process

Authors:

Abstract and Figures

Sinter ore is the main raw material of the blast furnace, and burn-through point (BTP) has a direct influence on the yield, quality, and energy consumption of the ironmaking process. Since iron ore sintering is a very complex industrial process with strong nonlinearity, multivariable coupling, random noises, and time variation, traditional soft-sensor models are hard to learn the knowledge of the sintering process. In this article, a multistep prediction model, called denoising spatial–temporal encoder–decoder, is developed to predict BTP in advance. First, the mechanism analysis is carried out to determine the relevant-BTP variables, and the BTP prediction is defined as a sequence-to-sequence modeling problem. Second, motivated by the random noises of industrial data, a denoising gated recurrent unit (DGRU) is designed to alleviate the impact of noise by adding a denoising gate into the GRU. In this case, the encoder with DGRU can better extract the latent variables of original sequence data. Then, spatial–temporal attention is embedded into the decoder to simultaneously capture the time-wise and variable-wise correlations between the latent variables and the target variable BTP. Finally, the experimental results on the real-world dataset of a sintering process demonstrated that the integrated multistep prediction model is effective and feasible.
Content may be subject to copyright.
IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. 69, NO. 10, OCTOBER 2022 10735
DSTED: A Denoising Spatial–Temporal
Encoder–Decoder Framework for Multistep
Prediction of Burn-Through Point in Sintering
Process
Feng Yan , Chunjie Yang , Senior Member, IEEE, and Xinmin Zhang , Member, IEEE
AbstractSinter ore is the main raw material of the
blast furnace, and burn-through point (BTP) has a direct
influence on the yield, quality, and energy consumption
of the ironmaking process. Since iron ore sintering is a
very complex industrial process with strong nonlinearity,
multivariable coupling, random noises, and time variation,
traditional soft-sensor models are hard to learn the knowl-
edge of the sintering process. In this article, a multistep
prediction model, called denoising spatial–temporal
encoder–decoder, is developed to predict BTP in advance.
First, the mechanism analysis is carried out to determine
the relevant-BTP variables, and the BTP prediction is
defined as a sequence-to-sequence modeling problem.
Second, motivated by the random noises of industrial data,
a denoising gated recurrent unit (DGRU) is designed to
alleviate the impact of noise by adding a denoising gate
into the GRU. In this case, the encoder with DGRU can
better extract the latent variables of original sequence
data. Then, spatial–temporal attention is embedded into
the decoder to simultaneously capture the time-wise and
variable-wise correlations between the latent variables and
the target variable BTP. Finally, the experimental results on
the real-world dataset of a sintering process demonstrated
that the integrated multistep prediction model is effective
and feasible.
Index TermsBurn-through point (BTP), denoising gated
recurrent unit (DGRU), multistep prediction, soft-sensor,
spatial–temporal attention.
I. INTRODUCTION
SINTERING process, as a primary modus in the iron and
steel industry, has increasingly received scientists’ attention
in the past few years [1], [2]. Sinter ore is the main iron raw
material of most blast furnaces, and its quality is mainly affected
by operation and state parameters. Burn-through point (BTP),
Manuscript received December 3, 2021; revised January 23, 2022;
accepted February 3, 2022. Date of publication February 23, 2022;
date of current version May 2, 2022. This work was supported by the
National Natural Science Foundation of China under Grant 61933015.
(Corresponding author: Chunjie Yang.)
The authors are with the State Key Laboratory of Industrial Con-
trol Technology, College of Control Science and Engineering, Zhejiang
University, Hangzhou 310000, China (e-mail: yanfeng555@zju.edu.cn;
cjyang999@zju.edu.cn; xinminzhang@zju.edu.cn).
Color versions of one or more figures in this article are available at
https://doi.org/10.1109/TIE.2022.3151960.
Digital Object Identifier 10.1109/TIE.2022.3151960
as one of the most important state parameters, represents the po-
sition where the sinter process is completed. However, sintering
is a very complicated physical and chemistry process, and the
whole process is nonlinear and dynamic. These characteristics
make it quite hard to establish a precise mathematical model
to predict BTP in the sintering process [3], [4]. The position of
BTP directly influences the quality of the sintered product. For
example, if BTP is located in front of the optimal position, the
iron ore is not fully burned; if the BTP is behind the desired
position, the quality of iron ore cannot meet the requirements of
blast furnace ironmaking [5]. Therefore, the accurate prediction
of BTP is of great significance to the normal operation of the
sintering process and the improvement of product quality.
According to previous investigations, two types of BTP pre-
diction approaches were usually examined: mechanism-based
mathematical model and data-driven model. For example, a
mathematical model was established by Cao et al. [6] to directly
predict BTP using pallet velocity, bed depth, and other state
parameters. But the mechanism-based model is very compli-
cated and time-consuming, and cannot meet the requirements
of real-time prediction. Thus, an increasing number of scholars
have begun to examine the internal properties and patterns of
the sintering process using artificial intelligence. Toktassynova
et al. [7] used the gray theory model GM (l, n) optimized by
the particle swarm algorithm to predict BTP using a small data
at the beginning of the sintering process. Afterward, Liu et al.
[8] established a prediction system of BTP based on the gradient
boosting decision tree algorithm with the combination of process
knowledge and feature selection method. Subsequently, a hybrid
BTP prediction model was presented based on an artificial neural
network and multilinear regression error compensation algo-
rithm [9]. Moreover, a dynamic subspace model was developed
to predict BTP based on pallet velocity, the thickness of the
material layer, and other operation variables [10]. In recent years,
fuzzy neural network has been widely used in the intelligent
prediction system. For instance, Wang et al. [11] presented a
fuzzy neural network that can deal with fuzzy information and
have the ability of self- learning in the sintering process. In
another related study, Du et al. [12] designed a hybrid fuzzy
time-series prediction model of BTP with the fuzzy c-means
clustering. Besides, a fluctuation interval prediction model of
0278-0046 © 2022 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: Zhejiang University. Downloaded on May 05,2022 at 04:13:15 UTC from IEEE Xplore. Restrictions apply.
10736 IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. 69, NO. 10, OCTOBER 2022
BTP is proposed based on the principal component analysis, the
fuzzy information granulation method, and the Elman neural
network [13]. The factory experiments indicated that it can
effectively predict the fluctuation interval of the BTP and lay a
solid foundation for the stable operation of the sintering process.
These classical modeling methods can hardly recognize the
knowledge of the complex industrial processes using the histor-
ical data. Fortunately, with the emergence of deep learning [14],
[15], there were many great breakthroughs in many research
fields, such as computer vision [16], natural language process
[17], and speech recognition [18]. In this case, more and more
scholars have started to pay attention to deep learning modeling
methods and apply them to industrial processes [19]–[21]. For
instance, Sun et al. [22] proposed a novel ensemble semisuper-
vised gated stacked autoencoder for key performance indicators
prediction to deal with those excessive unlabeled samples. Fur-
thermore, the memory-gated-based autoencoder was adopted to
detect and diagnose indoor air quality and numerical experi-
ments demonstrated that the measurement model was effective
[23]. Apart from autoencoder, the recurrent neural network was
also applied to industrial processes. Yuan et al. [24] designed a
supervised long short-term memory (LSTM) network to learn
quality-relevant variables in the penicillin fermentation process
and an industrial debutanizer column. In addition, a specific
type of recurrent network called echo-state network (ESN),
was adopted to estimate key process variables on the sulfur
recovery unit (SRU) [25]. But the traditional ESN cannot model
long-term dependent soft sensors. To solve this problem, asyn-
chronously deep ESN and singular value decomposition-based
ESN (SVD-ESN) were proposed one after another, and their
validity was demonstrated on modeling real-life soft sensors
[26], [27]. Through the analysis mentioned above, deep learning
has been extensively used in modern industrial processes.
However, data-driven methods of the sintering process based
on deep learning are rare due to the difficulty of obtaining
data and the process complexity. The existing research works
have not achieved the multistep prediction of BTP, which is
instead very important for the sintering process control. In face
of this situation, how to make full use of deep learning models
with a strong ability in mapping the complex nonlinear relation
to explore the BTP multistep prediction issue has become a
challenging and urgent project. In addition, through the detailed
field investigation and mechanism analysis, we have learned
that there are several problems. First, the sintering process is
dynamic, and the BTP multistep prediction can be perceived as a
sequence-to-sequence learning task. But the traditional recurrent
neural networks such as LSTM [28] and gated recurrent unit
(GRU) [29] are hard to handle these tasks. Second, the industrial
environment is extremely complex, and the time-series data
collected from electronic sensors is usually mixed with noisy
data, which also brings some challenges to the sequence model-
ing. Finally, the sintering process is nonlinear and multivariable
coupling, and it is difficult for existing models to capture the
target-relevant hidden dynamics.
To address these challenging problems, an end-to-end ap-
proach, called denoising spatial–temporal encoder–decoder
(DSTED), framework is developed in this article, which treats
the BTP multistep prediction as the sequence-to- sequence task.
Fig. 1. Process flowchart of iron ore sintering.
The specific contributions of this article are summarized as
follows.
1) According to the sintering mechanism, the BTP-relevant
variables are selected as the input features, and the label
BTP is calculated by fitting the exhaust gas temperature
in the bellows.
2) To the best of our knowledge, the BTP multistep predic-
tion is first defined as a sequence-to-sequence problem.
Besides, in the encoder network, to alleviate the effect
of industrial noisy data, a denoising GRU (DGRU) is
proposed to obtain the latent variables representation of
the original input variables.
3) In the decoder network, spatial–temporal attention is
designed to model dynamic spatial–temporal correlations
of data. Specifically, the spatial attention is applied to
capture the complex spatial correlations between different
latent variables and the target variable. The temporal
attention is used to obtain the dynamic temporal corre-
lations of the time series.
4) A series of comparative experiments are conducted using
the actual industrial data of the sintering plant, and the re-
sults confirmed that the proposed model is more effective
and feasible as compared to the existing baselines.
The rest of this article is organized as follows. Section II
introduces the mechanism of the sintering process and problem
definition. Section III outlines the procedures of the proposed
method. The results of the case study are discussed in Section IV.
Finally, Section V concludes this article.
II. MECHANISM ANALYSIS AND PROBLEM DEFINITION
In this section, the sintering process and BTP are described
in detail, and the sintering process parameters are classified
into five types systematically. Besides, the data characteristics
are briefly analyzed, and the BTP prediction problem is also
defined.
A. Description of the Sintering Process
Sintering process is a continuous and complex production
process with a long process flow and large-scale control equip-
ment. It can be seen as a process in which the materials mix-
ture is powdered into the massive solid under high-temperature
heating conditions. At present, more than 90% of iron and steel
enterprises use the Dwight–Lloyd sintering machine with 24
bellows, as shown in Fig. 1. It is clear that the sintering process
Authorized licensed use limited to: Zhejiang University. Downloaded on May 05,2022 at 04:13:15 UTC from IEEE Xplore. Restrictions apply.
YAN et al.: DSTED: A DENOISING SPATIAL–TEMPORAL ENCODER–DECODER FRAMEWORK 10737
includes five steps: proportioning, mixing, ignition, sintering
with ventilation, cooling, and screening.
To study the sintering process more conveniently, the whole
production process is approximately regarded as a dynamic
system. All process parameters are divided into five categories:
raw material parameters, equipment parameters, state parame-
ters, manipulated parameters, and index parameters. From the
perspective of system theory, the index and state parameters
of sintering are produced when raw material and manipulated
parameters act on the equipment.
B. Exploration of Data Analysis
Based on the mechanism description above, we can find that
the sintering process is nonlinear, multivariable coupling, and
time-varying. Besides, the sintering environment is very com-
plex, and thus some noisy data will be generated. To illustrate
the motivation of this study, the following key characteristics are
needed to be elaborately described.
1) Time Varying: The whole sintering is a dynamic process
industry, and it takes about 40 min to finish the sintering task.
In this process, all kinds of process parameters are changing
irregularly with the moving of the trolley. For example, the
bellows negative pressure is constantly adjusted according to
the sintering process.
2) Random Noise: According to the analysis of the sinter-
ing process, it can be seen that the whole sintering process is
complex and dynamic. There usually exists some random noisy
data in the actual industrial process, which brings some difficulty
for the establishment of the data-driven model. To provide a
sufficient theoretical basis for the prediction model, it is nec-
essary to conduct random noise analysis using industrial data
collected from the sintering factory. Through the exploration
of these typical variables, we can find that usually there exists
some random noise in industrial process data, which has an
adverse impact to the BTP modeling. Hence, it is essential to
improve the antinoise performance of the model, and the details
are interpreted in Section III.
C. Problem Definition
According to our knowledge, the BTP multistep prediction
can be regarded as a sequence-to-sequence task. Suppose there
are mvariables in the sintering process, each of which can
generate time series. Among these variables, BTP time series
is used as the target variable for making predictions, while the
other variables are used as the input features. For the input
sequence, the input length is set to Th, then the sequence
X=(xtTh+1,xtTh+2 ,...,xt)RTh×mis regarded as all
input sequence at time t. Similarly, given a time window of
length Tf,weuseY=(yt+1 ,y
t+2,...,y
t+Tf)RTfto
represent BTP prediction series.
III. METHODOLOGY
In this section, a DSTED framework is developed for the BTP
multistep prediction. The established encoder–decoder model
consists of two parts: an encoder with denoising GRU and a
Fig. 2. Basic encoder–decoder sequence-to-sequence model.
Fig. 3. Str ucture of D GRU.
decoder with spatial–temporal attention. In the encoder network,
the denoising GRU is designed to alleviate the industrial noise
and improve the ability of the latent variable extractor. In the
decoder network, the temporal attention module is used to learn
the dynamic, and the spatial attention module is used to capture
the relevance of the latent variable and the target variable. The
specific content of the proposed method is as follows.
A. Basic Encoder–Decoder Framework
Obviously, the BTP multistep prediction task is a typi-
cal sequence-to-sequence problem, the basic encoder–decoder
framework is just suitable for this task. This framework was
proposed by Cho et al. [30] and made up of two parts: encoder
and decoder. The core idea is simple: the encoder network is
used to read the input sentence and compress the information of
the whole sentence into a context vector; then, another decoder
network is used to decode the context vector and decompress
it into a sentence of the target language, as shown in Fig. 2.
The training process is to minimize the conditional probability
between the target sequence and the source sequence.
B. Denoising GRU
Essentially, the encoder–decoder framework consists of a
series of recurrent neural network (RNN) units. Considering the
characteristics and efficiency problem of the sintering process,
the GRU network is selected as the basic module of an encoder to
capture the mapping relationships from the time-varying data.
However, we cannot ignore a fact that process data are often
corrupted with many random noises, caused by multiple factors
as environmental disturbances, human interventions, and faulty
sensors. Therefore, it is necessary and meaningful to enhance
the performance of the encoder network.
Motivated by this, a new gate called denoising gate is designed
to alleviate the impact of noises on the basis of GRU. To this
end, the DGRU is developed for the encoder, as shown in Fig. 3.
Authorized licensed use limited to: Zhejiang University. Downloaded on May 05,2022 at 04:13:15 UTC from IEEE Xplore. Restrictions apply.
10738 IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. 69, NO. 10, OCTOBER 2022
Fig. 4. Structure of spatial–temporal attention.
As can be seen from the structure of DGRU, the new unit is
composed of three gates: denoising gate, reset gate, and update
gate. The denoising gate (dt) is mainly to adaptively extract
more useful information from the original data. The reset gate
(rt) controls how much information of the previous state (ht1)
is reserved and transmitted into the next time. The purpose of
the update gate (zt) is to determinate how much of the current
hidden state (ht) is memorized and updated according to the
intermediate hidden state (˜
ht).
Suppose the ith input variable sequence is X={xi
tTh+1,
xi
tTh+2,...,x
i
t}RTh,μiand sidenote the mean and variance
of the variable xi. According to the theory of statistics, the
variance represents the degree of deviation from the center,
which is used to measure the volatility of data. Then, a guidance
matrix is defined to guide the learning process of the denoising
gate, the formula is expressed by
gt=μ1
t
2
t,...,μ
m
t;s1
t,s
2
t,...,s
m
tR2m.(1)
Next, the deviation factor eRmand the fluctuation
factor fRmare defined to reflect the deviation and fluctuation
range of each variable
e=k(1 μ)(2)
f=(1k)s(3)
where kis the weight coefficient connecting the input layer, and
it is used to balance the rate of the two factors eand f
k=σ([xt,ht1,gt]·Wd).(4)
Furthermore, the denoising gate dtRmis constructed by the
denoising function D(xt),asshowninFig. 5
D(xt)=xt+k(1 μ)+(1k)s(5)
dt=D(xt).(6)
Then, the calculation procedures of reset gate (rtRh) and
update gate (ztRh) are as follows:
rt=σ([ht1,dt]·Wr)(7)
zt=σ([ht1,dt]·Wz)(8)
˜
ht=tanh
[rtht1,dt]·W˜
h(9)
ht=(1zt)ht1+zt˜
ht(10)
where WdR3m,WrRm×h,WzRm×h, and W˜
hRm×hare
the weights of the update gate, reset gate, hidden layer, and the
last fully connected layer, respectively, and his the number of
hidden neurons.
Finally, the classic backpropagation through time (BPTT)
training algorithm is used to update the weight parameters of
Wd,Wr,Wz,W˜
h, and Woby the optimizer Adam algo-
rithm. Note that the whole network parameters are updated after
the samples go through the encoder–decoder model. Thus, the
specific loss function is in Section IV-D.
C. Spatial–Temporal Attention
The dynamic features are extracted by the encoder network
composed of a series of DGRU. It is noted that the dynamic
features are also usually called latent variables in the industrial
field. Considering the complex multivariable coupling and dy-
namic of the sintering process, a novel spatial–temporal attention
mechanism is embedded in the decoder network to capture
the dynamic spatial and temporal relationships between latent
variables and target variables in the industrial process. It contains
two modules, i.e., the temporal attention module and the spatial
attention module, as shown in Fig. 4.
1) Temporal Attention: In the temporal dimension, there
exists dynamic relevance at different time slices in the sinter-
ing process. Furthermore, the performance of the conventional
encoder–decoder network will degrade as the length of the input
sequence increases. That is to say, each target variable in the
output sequences needs different input information, so a single
mix-length context vector will fail to provide the output target
variable with the required pertinent information in the decoding
process. To solve this problem, the latent variable correlations at
different time steps are calculated by the temporal attention [31]
to adaptively learn different importance of time-varying sam-
ples. In this way,each encoder hidden state is assigned a temporal
attention value. Then, an adaptively weighted content vector is
obtained as the input for the decoder network. Specifically, each
encoder hidden state is assigned an attention value according to
the similarity between the current encoder output and the hidden
state of the previous decoder. The specific calculation method is
as follows:
ej
t=score(htTh+j,st)
=Vj
ltanh Wj
l[htTh+j;st]+b
j
l(11)
αj
t=
exp ej
t
Th
j=1 exp ej
t(12)
where stdenotes the current decoder hidden state, htTh+j
denotes the jth encoder hidden output at time step t,Thand Tf
are the lengths of the encoder and decoder sequences, respec-
tively, Wj
lRTh+Tf,Vj
lRTh+Tf,and bj
lR are all learnable
parameters, ej
tis used to compute the similarity between stand
htTh+j, and αj
tis the attention value at time t.
Authorized licensed use limited to: Zhejiang University. Downloaded on May 05,2022 at 04:13:15 UTC from IEEE Xplore. Restrictions apply.
YAN et al.: DSTED: A DENOISING SPATIAL–TEMPORAL ENCODER–DECODER FRAMEWORK 10739
Fig. 5. Multistep deep learning prediction model for burning-through point in the sintering process.
Then, a weighted average of all encoder hidden states is
calculated as follows:
xt=
j
αj
thtTh+j.(13)
Finally, the latent variables (context vector) are obtained by
the nonlinear mapping of the concatenation of xtand stthrough
the hyperbolic tangent motivation function (tanh)
ct=tanh (xt,st)(14)
where nis the dimension of the latent variables. Based on the
temporal attention, the latent variables ct=[c1
t,c
2
t,...,c
n
t]Rn
extracted by the decoder network can learn more historical
information, acting as the input of the decoder network.
2) Spatial Attention: In the spatial dimension, these la-
tent variables are seen as the advanced representation of the
original input variables. Different latent variables can impose
different effects on the target series yt. Thus, exploring the
target-relevant hidden dynamics can lay a foundation for the
multistep predicting series. However, the impacting weights
are changing dynamically at different times. Inspired by this
fact, for the decoder network, spatial attention is designed to
capture the correlations between these variables and the tar-
get variable, as vividly depicted in Fig. 4. Given that ek
tis
the spatial attention score of the kth latent variable at time t
(et=[e1
t,e
2
t,...,e
n
t]Rn); we calculate the attention weight
(i.e., impacting weight) between them as follows:
et=score(ct,y
t)
=Vk
ltanh Wk
l[ct,y
t]+bk
l(15)
βk
t=exp ek
t
n
k=1 exp ek
t(16)
˜ct=β2
tc1
t
2
tc2
t
3
tc3
t, ..., βn
tcn
t(17)
where Wk
lR(n+1)×n,Vk
lR(n+1)×n, and bk
lRnare the
parameters to be learned, βk
tis the spatial attention value,
Fig. 6. Schematic diagram of experimental system implementation.
and ˜ct=[˜c1
t,˜c2
t,...,˜cn
t]Rnis the final latent variables af-
ter the spatial attention operation. By exploiting the internal
relevance between the target series and the latent variables,
this attention mechanism can adaptively learn more potential
correlations across different variables, to make better sequence
predictions.
D. Multistep Prediction Model of BTP Based on DSTED
In conclusion, a BTP multistep prediction model is developed
by the DSTED. The whole model consists of three parts: data
generation, an encoder with denoising GRU, and a decoder
with spatial–temporal attention, as shown in Fig. 5 . First, the
relevant-BTP variables are selected through mechanism anal-
ysis, and data are collected and preprocessed from the sinter
plant. Second, the dynamic latent variables are extracted by the
encoder with DGRU. The initial hidden state of the decoder
network is the last hidden output of the encoder network. Next, a
spatial–temporal attention module is embedded into the decoder
to capture the dynamic correlations between latent variables and
the target variable. Then, the extracted latent variables, as well
Authorized licensed use limited to: Zhejiang University. Downloaded on May 05,2022 at 04:13:15 UTC from IEEE Xplore. Restrictions apply.
10740 IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. 69, NO. 10, OCTOBER 2022
Algorithm 1: Denoising spatial–temporal encoder–decoder
model.
Input: The historical sintering data D=(X,Y); hidden
size H;
Hidden layers Num; Batch size B; the length of input
Th;
the length of output Tf; learning rate η; dropout p.
Output: BTP predictions Y
for all available time t(1 tnum_samples)do
X=(xtTh+1 ,xtTh+2,...,xt)RTh×m
Y=(yt+1 ,y
t+2,...,y
t+Tf)RTf
end for
for epoch p(1 pnum_epoch)do
for batch b(1 bBdo
encoder hidden state ht=encoder(x1,x
2,x3,...,xTh)
Temporal attention
et=Vj
ltanh(Wj
l[htTh+j;st]+bj
l)
Normalization αj
t=exp(ej
t)
Th
j=1 exp(ej
t),xt=
j
αj
thtTh+j
latent variable ct=tanh(xt,st)
Spatial attention
ek
t=Vk
ltanh(Wk
l[ct,y
t]+bk
l)
Normalization βk
t=exp(ek
t)
n
k=1 exp(ek
t)
Final latent variable
˜ct=(β1
tc1
t
2
tc2
t
3
tc3
t, ..., βn
tcn
t)
BTP prediction output
Y=decoder(˜ct,y
t+1,y
t+2,...,y
t+Tf)
The parameters are updated after the whole
encoder–decoder by BPTT
end for
end for
as the target variable output at the previous time step are both fed
into the decoder network. Also, a dropout layer is added to reduce
the overfitting caused by the complex structure. Finally, a fully
connected layer with a ReLU activation function is appended to
forecast the target BTP. After the forward pass is completed,
the whole model can be trained via the backpropagation al-
gorithm. During the training process, an Adam optimizer is
adopted to train our model by minimizing the following loss
function:
L=1
Tf
Tf
i=1
((ytˆyt)2+λ
2Tf
W
W2
2+
V
V2
2
(18)
where Wand Vrepresent the weights of encoder and decoder,
respectively, and λis the regularization coefficient to reduce
the overfitting of the model. Finally, the development of the
proposed multistep prediction model is elaborately given in
Algorithm 1.
IV. EXPERIMENTAL STUDIES
In this section, based on the mechanism analysis and data
collected from a real industrial process, extensive experiments
TABLE I
LIST OF VARIABLES
Fig. 7. BTP soft-sensor method.
are carried out to demonstrate the effectiveness of the proposed
BTP multistep prediction model.
A. Experimental System and Dataset Generation
In this experiment, the raw data were collected from the sinter-
ing plant of a steel company in real time every 1 min from Oct. 12,
2021 to Oct. 20, 2021. A workshop of the sintering plant contains
a 360 m2belt type sintering machine, as well as silo, conveying,
cooling, and other equipment. A sintering intelligent control
system in this plant consists of industrial computers, application
software, dynamic data exchange communication interference,
and distributed control system (DCS), as shown in Fig. 6. The
DCS is made up of five programmable logic controller (PLCs) to
achieve the basic automation, including material control, mixing
control cooling control, igniting control, and desulphurizing
control [32]. All PLC modules send those data to the central
control room, which controls the drive motor of the strand.
Meanwhile, the actual strand velocity is measured by sensors,
and the results calculated by the established models are sent
back to the intelligent controller. All raw data are recorded and
transmitted to the time-series database (InfluxDB). According
to the mechanism analysis of Section II, the input variables are
determined and described in Table I.
Due to the complexity of the sintering process, there is no
instrument to measure BTP directly. Here, we used the classic
soft-sensor method, called temperature fitting of the exhaust gas
in the bellows, to calculate BTP. As shown in Fig. 7, the curve is
fitted by a quadratic or high-order polynomial, and the highest
temperature in the fitted curve is the BTP position. Similarly,
Authorized licensed use limited to: Zhejiang University. Downloaded on May 05,2022 at 04:13:15 UTC from IEEE Xplore. Restrictions apply.
YAN et al.: DSTED: A DENOISING SPATIAL–TEMPORAL ENCODER–DECODER FRAMEWORK 10741
Fig. 8. Schematic diagram of sliding window fragment extraction.
the BRP is calculated when the temperature is 180 °C through
investigations. According to the theory of the sintering process,
the last ten bellows need to be calculated for the BTP. The
specific calculation procedures are as follows:
T(δi)=ω0+ω1δ1
i+ω2δ2
i+...+ωpδp
i(19)
L(ω)=1
2
24
i=15
(T(δi)Ti)2(20)
where δiand Tiare the position and the temperature of ith
bellow, respectively(i=15,16,17,..., 24) ,ω0
1,...,ω
p
are coefficients to be calculated, and L(ω)is the loss function.
Then, all samples are segmented by the sliding window
method, as shown in Fig. 8. In this way, the BTP sequence
fragments are completely constructed.
B. Experimental Settings
After data preprocessing, 7780 fragments are used for eval-
uating the models. Here, we use the first 6000 fragments as
the training set, the next 1000 fragments as the validation set,
and the remaining 780 fragments as the test set. The lengths
of the input and output sequences are 40 and 3, respectively
(Th=40,T
f=3).
In our experiments, all the comparison models are imple-
mented using the Pytorch framework in Python. The test plat-
form includes the laptop equipped with Core i5- 4210H CPU
and 8G RAM. Without loss of generality, the accuracy of the
proposed model is compared with other typical time-series base-
lines: vector autoregressive (VAR), autoregressive integrated
moving average (ARIMA), LSTM, and GRU. The three com-
monly employed statistical indicators (R2,MAE,RMSE) are
used to evaluate the performance of these models. To ensure
the stability of the model prediction, all evaluation indicators of
each method are the average results of 20 trials on the test set.
C. Model Comparison and Results Analysis
The comprehensive performance comparisons of each method
are illustrated in Table II, which are the mean accuracies of all the
time steps. It is obvious that the two traditional statistical time-
series models have very poor performance and their accuracies
are both less than 0.5, only 0.4547 and 0.4895, respectively.
Because both VAR and ARIMA are linear models, it is difficult
TABLE II
COMPARISON OF DIFFERENT METHODS FOR BTP PREDICTION
Fig. 9. Performance comparison with other models.
to capture the nonlinear relationship of the sintering process.
Encouragingly, the two deep learning models of LSTM and GRU
perform better, with R2values exceeding 0.7, which also indi-
cated that recurrent neural networks can learn complex nonlinear
and dynamic characteristics of the industrial process. Although
LSTM and GRU are able to use memory cells to remember
past useful information and model time-varying data for BTP
prediction, it is hard to get rid of the long-term dependences.
That is to say, the recurrent neural networks cannot explicitly
model periodic and trend information, due to their limited ability
to describe temporal dependencies. Hence, the single RNN units
cannot achieve a satisfying effect on the multistep tasks. By con-
trast, the proposed DSTED has demonstrated its feasibility and
superiority in BTP prediction, with the accuracy over 0.9. The
detailed predictions on the testing dataset are further presented
in Fig. 9. It is intuitively seen that the error between the actual
BTP and the predicted BTP using the proposed model is very
Authorized licensed use limited to: Zhejiang University. Downloaded on May 05,2022 at 04:13:15 UTC from IEEE Xplore. Restrictions apply.
10742 IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. 69, NO. 10, OCTOBER 2022
Fig. 10. Performance on different output time steps.
Fig. 11. Performance on different lengths of input sequence. (a) Model
accuracy of different lengths of input sequence. (b) Train time of different
lengths of input sequence.
small. Such findings reveal that DSTED has greater advantages
in sequence-to-sequence modeling of BTP than the traditional
deep learning models.
To evaluate the long-term stability of our developed model,
we illustrate the stepwise accuracy using the following the seven
time steps in Fig. 10. The results indicate that the prediction
performance will drop gradually when the length of the output
sequence is large because of the long-term dependence. In partic-
ular, we can observe that the accuracy of our model will decrease
dramatically when the output steps exceed 5 min. Because the
prediction error will gradually accumulate as the length of the
output sequence increases. Therefore, there is still a certain gap
between the theoretical level and the actual situation in multistep
prediction. But, after a detailed communication with the onsite
operators of sintering factory, we find that the adjustment of
BTP completely depends on the experience of workers in the
present. In fact, short-term forecasting BTP in advance can also
provide constructive guidance for operators to adjust the process
parameters for the normal operation of the sintering process.
Thus, it is still meaningful and essential to achieve the BTP
multistep prediction in advance.
Then, the validation for the length of the input sequence
is also conducted for DSTED. Here, the length of the input
sequence is adjusted from 10 to 80. The three evaluation met-
rics and computational efficiency are simultaneously used to
evaluate the performance of our model. From Fig. 11(a),the
R2curve rises up first, then drops dramatically at time step
50, and then rises slightly. On the other hand, we can observe
TABLE III
ABLATIVE VARIANTS PERFORMANCE ON BTP PREDICTION
that the average model training time of each epoch increases
significantly as the input step size increases, as shown in
Fig. 11(b). After comprehensive comparison and consideration,
the ideal performance of DSTED can be obtained when the input
length is set to 40, which is more suitable for actual industrial
applications.
D. Ablation Study
To further verify the effectiveness of each component in
our DSTED, we also conducted ablation studies from three
aspects: DGRU, temporal attention (TAtt), and spatial atten-
tion (SAtt). We subsequently conducted some components as
the ablative variants. The typical variants of our model are as
follows.
1) Encoder–Decoder: We remove all components;
2) Encoder–Decoder+DGRU: The denoising GRU is em-
bedded into the encoder;
3) Encoder–Decoder+DGRU+TAtt: We omit spatial atten-
tion;
4) Encoder–Decoder+DGRU+TAt t +SAtt (DSTED): Our
integrated model.
As can be seen from Table III, our integrated DSTED outper-
forms all its ablative variants in terms of all evaluation metrics
on the BTP prediction. More specifically, DSTED has higher
R2, lower RMSE and MAE than by the basic encoder–decoder
without any components. These results also indicate that the
existing deep learning models in NLP are generally difficult to be
directly applied in the field of industrial data modeling. Because
there exist some differences between the industrial fields and
NLP. That is to say, it is necessary to improve the network
structure according to the characteristics of industrial process
data. Then we also find that encoder–decoder with denoising
GRU obtains better results than the original encoder–decoder.
This is due to the fact that DGRU can alleviate the impact of
noisy data in the sintering process. In addition, the temporal
attention mechanism is also employed in the decoder network
to determine the discriminative encoder hidden state and capture
the time trend, which brings an improvement in R2from 85.28%
to 88.43%. The reason is that the temporal attention mechanism
can better learn correlation of samples. Finally, our integrated
model with spatial–temporal attention mechanism obtains the
best result since it can not only adaptively capture the dynamic
but also learn the relevance between latent variables and the
target variable BTP. From the results of the ablation study, we
can conclude that all well-designed components in the DSTED
exactly play important roles in the BTP multistep prediction
task.
Authorized licensed use limited to: Zhejiang University. Downloaded on May 05,2022 at 04:13:15 UTC from IEEE Xplore. Restrictions apply.
YAN et al.: DSTED: A DENOISING SPATIAL–TEMPORAL ENCODER–DECODER FRAMEWORK 10743
Fig. 12. Prediction performance on several typical parameter settings.
(a) Different learning rates. (b) Different hidden layers. (c) Different
hidden neurons. (d) Different dropouts.
E. Hyperparameter Tuning
To investigate how different hyperparameters influence the
BPT prediction performance, we conduct the sensitivity analysis
of several key hyperparameters on the BTP dataset. First, we
adjust the value of learning rate from 0.0001 to 0.05. From
Fig. 12(a), it can be seen that with the increase of the learning
rate, the RMSE of the model has a downward trend. If the
learning rate is larger, the neural network is hard to learn internal
knowledge. But too small learning rate also imposed an adverse
impact on the model training. Noticeably, our model reaches
the best performance when the learning rate is 0.001. Besides,
the number of hidden layers of GRU also plays an important
role in model training. From Fig. 12(b), it is clear that our model
falls into overfitting when the hidden layers arrive at 4. Thus, two
hidden layers are suitable for the network. Similarly, by trial and
error, when the number of hidden neurons is set to 20 the RMSE
is the minimum according to Fig. 12(c). Because the number of
input variables is 12, the hidden neurons should not be too small
or too large. Otherwise, the prediction model may be overfitting.
For simplicity, the number of hidden layers and hidden neurons
of the encoder is set to be equal to that of the decoder. Moreover,
to avoid overfitting, the dropout layer is embedded into the
decoder, and the optimal dropout value is 0.1 through trial and
error, as shown in Fig. 12(d). In addition, the performance of
RMSE with different iterations is also investigated for training
and testing, and the number of training iterations is selected
from range set {10,15,20,35,40}. The experimental findings
indicate that the RMSE reaches the convergent state when the
iteration is about 20. So, the number of iterations is set to
20 in this study. For another hyperparameter (batch size), the
optimal batch size is selected as 20 by changing the batch
size from the set {10,20,30,40}. As well, the number of input
neurons of the encoder is equal to the dimension of the input
variables.
V. C ONCLUSION
In this article, an end-to-end approach to sequence learning
was proposed and successfully applied to the BTP multistep
prediction of the sintering process. The integrated model was
composed of an encoder with denoising GRU and a decoder with
spatial–temporal attention mechanisms. Specifically, inspired
by the random noises in the industrial data, we designed a
denoising GRU to reduce the interference of noises and enhance
the ability of latent variables extraction. In addition, the spatial
and temporal attention modules were simultaneously embedded
into the decoder to capture the dynamic relevance of samples
and the correlation between the latent variables and the target
variable. Experimental results on the real-word dataset show
that the multistep prediction accuracy of our proposed model is
superior to the existing models. In the future, we will improve
the structure of deep learning models to solve the problem of
long-term prediction.
REFERENCES
[1] Z. Yuan and B. Wang, “Application of deep belief network in prediction
of secondary chemical components of sinter,” in Proc. 13th IEEE Conf.
Ind. Electron. Appl., 2018, pp. 2746–2751.
[2] W. Chen, B. Wang, Y. Chen, H. Zhang, and X. Li, “Using BP neural
network to predict the sinter comprehensive performance: Feo and sin-
ter yield,” Adv. Mater. Res., vol. 771, pp. 209–212, 2013, doi: 10.4028/
www.scientific.net/AMR.771.209.
[3] W. Yan, R. Xu, K. Wang, T. Di, and Z. Jiang, “Soft sensor modeling method
based on semisupervised deep learning and its application to wastewater
treatment plant,” Ind. Eng. Chem. Res., vol. 59, no. 10, pp. 4589–4601,
2020, doi: 10.1021/acs.iecr.9b05087.
[4] W. Yan, D. Tang, and Y. Lin, “A data-driven soft sensor modeling method
based on deep learning and its application,” IEEE Trans. Ind. Electron.,
vol. 64, no. 5, pp. 4237–4245, May 2017.
[5] S. Du, M. Wu, X. Chen, J. Hu, and W. Cao, “Intelligent integrated
control for burn-through point to carbon efficiency optimization in iron
ore sintering process,” IEEE Trans. Control Syst. Technol., vol. 28, no. 6,
pp. 2497–2505, Nov. 2020.
[6] W. Cao, Y. Zhang, J. She, M. Wu, and Y. Cao, “A dynamic subspace
model for predicting burn-through point in iron sintering process,” Inf.
Sci., vol. 466, pp. 1–12, 2018, doi: 10.1016/j.ins.2018.06.069.
[7] N. Toktassynova et al., “Modelling and control structure of a phosphorite
sinter process with grey system theory, J. Grey Syst., vol. 32, no. 2,
pp. 150–166, 2020.
[8] S. Liu, Q. Lyu, X. Liu, Y. Sun, and X. Zhang, “A prediction system of
burn through point based on gradient boosting decision tree and decision
rules,” ISIJ Int., vol. 59, no. 12, pp. 2156–2164, 2019.
[9] B. Wang, Y. Fang, J. Sheng, and W. Gui, “BTP prediction model based on
ANN and regression analysis,” in Proc. 2nd Int. Workshop Knowl. Discov.
Data Mining., 2009, pp. 108–111, doi: 10.1109/WKDD.2009.179.
[10] Z. Zhu, G. Geng, and Q. Jiang, “Power system dynamic model reduction
based on extended krylov subspace method, IEEE Trans. Power Syst.,
vol. 31, no. 6, pp. 4483–4494, Nov. 2016.
[11] J. Wang, X. Li, Y. Li, and K. Wang, “BTP prediction of sintering process
by using multiple models,” in Proc. 26th Chin. Control Decis. Conf., 2014,
pp. 4008–4012.
[12] S. Du, M. Wu, L. Chen, and W. Pedrycz, “Prediction model of
burn-through point with fuzzy time series for iron ore sintering pro-
cess,” Eng. Appl. Artif. Intell., vol. 102, 2021, Art. no. 104259,
doi: 10.1016/j.engappai.2021.104259.
[13] S. Du et al., “Operating mode recognition based on fluctuation interval
prediction for iron ore sintering process,” IEEE/ASME Trans. Mechatron.,
vol. 25, no. 5, pp. 2297–2308, Oct. 2020.
[14] Y. Lecun, Y. Bengio, and G. Hinton, “Deep learning, Nature, vol. 521,
no. 7553, pp. 436–444, 2015, doi: 10.1038/nature14539.
[15] A. J. Holden et al., “Reducing the dimensionality of data with neural
networks,” Neural Comput., vol. 18, no. 7, pp. 1527–1554, 2006.
Authorized licensed use limited to: Zhejiang University. Downloaded on May 05,2022 at 04:13:15 UTC from IEEE Xplore. Restrictions apply.
10744 IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. 69, NO. 10, OCTOBER 2022
[16] X. Wu, D. Sahoo, and S. C. H. Hoi, “Recent advances in deep learn-
ing for object detection,” Neurocomputing, vol. 396, pp. 39–64, 2020,
doi: 10.1016/j.neucom.2020.01.085.
[17] J. Chen, X. Qiu, P. Liu, and X. Huang, “Meta multi-task learning for
sequence modeling,” in Proc. 32nd AAAI Conf. Artif. Intell., 2018, vol. 32,
no. 1, pp. 5070–5077.
[18] X. Shi, Z. Chen, H. Wang, D. Y. Yeung, W.K. Wong, and W.C. Woo, “Con-
volutional LSTM network: A machine learning approach for precipitation
nowcasting, in Proc. Adv. Neural Inf. Process. Syst., 2015, pp. 802–810.
[19] L. Feng, C. Zhao, Y. Li, M. Zhou, H. Qiao, and C. Fu, “Multichannel
diffusion graph convolutional network for the prediction of endpoint
composition in the converter steelmaking process, IEEE Trans. Instrum.
Meas., vol. 70, 2021, Art no. 3000413.
[20] W. K. Tsinghua, D. Huang, F. Yang, and Y. Jiang, “Soft sensor de-
velopment and applications based on LSTM in deep neural networks,”
in Proc. IEEE Symp. Ser. Comput. Intell., 2017, vol. 2017, pp. 1–6,
doi: 10.1109/SSCI.2017.8280954.
[21] Q. Sun and Z. Ge, “A survey on deep learning for data-driven soft sensors,”
IEEE Trans. Ind. Inform., vol. 17, no. 9, pp. 5853–5866, Sep. 2021.
[22] Q. Sun and Z. Ge, “Deep learning for industrial KPI prediction: When
ensemble learning meets semi-supervised data,” IEEE Trans. Ind. Inform.,
vol. 17, no. 1, pp. 260–269, Jan. 2021.
[23] J. Loy-benitez, S. Heo, and C. Yoo, “Control engineering practice soft
sensor validation for monitoring and resilient control of sequential subway
indoor air quality through memory-gated recurrent neural networks-based
autoencoders,” Control Eng. Pract., vol. 97, 2020, Art. no. 104330,
doi: 10.1016/j.conengprac.2020.104330.
[24] X. Yuan, L. Li, and Y. Wang, “Nonlinear dynamic soft sensor modeling
with supervised long short-term memory network,” IEEE Trans. Ind.
Inform., vol. 16, no. 5, pp. 3168–3176, May 2020.
[25] L. Patanè and M. G. Xibilia, “Echo-state networks for soft sensor
design in an SRU process, Inf. Sci., vol. 566, pp. 195–214, 2021,
doi: 10.1016/j.ins.2021.03.013.
[26] Y. C. Bo, P. Wang, X. Zhang, and B. Liu, “Modeling data-driven sensor
with a novel deep echo state network, Chemom. Intell. Lab. Syst., vol. 206,
2020, Art. no. 104062, doi: 10.1016/j.chemolab.2020.104062.
[27] Y. L. He, Y. Tian, Y. Xu, and Q. X. Zhu, “Novel soft sensor development
using echo state network integrated with singular value decomposition:
Application to complex chemical processes,” Chemom. Intell. Lab. Syst.,
vol. 200, 2020, Art. no. 103981, doi: 10.1016/j.chemolab.2020.103981.
[28] S. Hochreiter, “Long short-term memory, Neural Comput., vol. 9, no. 8,
pp. 1735–1780, 1997.
[29] A. M. Dai, “Semi-supervised sequence learning, Adv. Neural Inf.Process.
Syst., vol. 28, pp. 3079–3087, 2015.
[30] K. Cho et al., “Learning phrase representations using RNN encoder-
decoder for statistical machine translation,” in Proc. Conf. Empirical
Methods Natural Lang. Process., 2014, pp. 1724–1734, doi: 10.3115/v1/
d14-1179.
[31] D. Bahdanau, K. H. Cho, and Y. Bengio, “Neural machine translation
by jointly learning to align and translate,” in Proc. 3rd Int. Conf. Learn.
Representations, Conf. Track Proc., 2015, pp. 1–15.
[32] C. S. Wang and M. Wu, “Hierarchical intelligent control system and its
application to the sintering process,”IEEE Trans. Ind. Inform., vol. 9, no. 1,
pp. 190–197, Feb. 2013.
Feng Yan received the B.S. degree in vehicle
engineering from the College of Automotive and
Traffic Engineering, Jiangsu University, Zhen-
jiang, China, in 2018, and the M.S. degree in
vehicle engineering from the College of Me-
chanical Vehicle Engineering, Hunan University,
Changsha, China, in 2021. He is currently work-
ing toward the Ph.D. degree in control science
and engineering with the College of Control Sci-
ence and Engineering, Zhejiang University.
His current research interests include deep
learning, data mining, and intelligent optimization in the industrial pro-
cess applications.
Chunjie Yang (Senior Member, IEEE) received
the B.S. degree in machine design, the M.S.
degree in fluid transmission and control, and
the Ph.D. degree in industrial automation from
Zhejiang University, Hangzhou, China, in 1992,
1995, and 1998, respectively.
He is currently a Professor with the College of
Control Science and Engineering, as well as a
Qiushi Distinguished Professor of Zhejiang Uni-
versity. His current research interests include ar-
tificial intelligence, machine learning modeling,
control, and fault diagnosis for industrial process.
Xinmin Zhang (Member, IEEE) received the
Ph.D. degree in system science from Kyoto Uni-
versity, Kyoto, Japan, in 2019.
From April 2019 to December 2019, he was
a Postdoctoral Research Fellow with the De-
partment of Systems Science, Kyoto University.
He is currently an Associate Professor with the
College of Control Science and Engineering,
Zhejiang University, Hangzhou, China. His re-
search interests include process control, pro-
cess data analysis, machine learning and in-
dustrial big data, and virtual sensing technology with applications to
industrial processes.
Authorized licensed use limited to: Zhejiang University. Downloaded on May 05,2022 at 04:13:15 UTC from IEEE Xplore. Restrictions apply.
... However, it is sometimes difficult to explicitly align the effect of process variables at each time step with the responses of process outputs due to the dynamic nature of continuous processes and time delay. Therefore, Recurrent Neural Network (RNN), as a special class of NN for modeling time series measurements, is suitable for the prediction of future process output values in MPC, and have received much research attention (Bonassi et al., 2022;Zarzycki & Ławryńczuk, 2022;Zhao et al., 2023;Schwedersky & Flesch, 2022;Norouzi et al., 2022;Jung et al., 2023;Bonassi et al., 2024;Alhajeri et al., 2022;Zheng et al., 2023;Giuli et al., 2024;Núñez et al., 2020;Yan et al., 2022;Zhang et al., 2024;Liu et al., , 2020. Despite the huge progress of model design, the simulated processes with simplifying assumptions are mostly considered as the case study. ...
... An attention RNN-based encoder-decoder model was proposed for an MPC scheme to control an industrial paste thickener with real-world experiments based on an Industrial Internet of Things (IIoT) platform (Núñez et al., 2020). A denoising GRU-based encoder-decoder model with spatial-temporal attention was developed to predict burn-through point of the sintering process, which was validated through the plant of a steel company (Yan et al., 2022). An attention-based Convolutional Neural Network (CNN) combined with a bidirectional LSTM network were proposed for the heat load prediction in a blast furnace ironmaking process . ...
... While these applications are beneficial for the numerical evaluations of developed models, a visible gap exists between them and the real industrial cases with potential deviations from the nominal conditions (Bonassi et al., 2022). The efforts of establishing RNN-based predictive models for the real-world manufacturing processes are still limited (Núñez et al., 2020;Yan et al., 2022), which is particularly reflected on the time period and variability of the processes under study, e.g., 8 days in a single workshop without obvious changes in the process dynamics (Yan et al., 2022). Therefore, this paper aims to design a novel mechanism for the general predictive models to facilitate the applications in realworld continuous processes without simplifying assumptions over a period of several months and multiple production lines. ...
Article
Full-text available
Real-time prediction of future process outputs is critical for the model predictive control of continuous manufacturing processes. It helps identify when and how to adjust the process variables under the disturbances. A lot of recurrent neural network-based predictive models have been developed and validated on the simulated processes. Based on that, some works further consider the existence of multiple operating conditions that can be unforeseen and transitive in real-world manufacturing. However, their designed online learning mechanisms mostly focus on fast local tracking without preserving important old knowledge. Besides, the proactive input data adaptation is largely unexplored. To bridge this gap, a novel self-adaptation mechanism is proposed in this paper. This mechanism can be easily integrated into different choices of predictive model to improve the stability of performance towards various changes in a long period of manufacturing. In the proposed mechanism, the components of adaptive sequence filtering and adaptive input normalization first extract the compact and properly scaled features from the growing multivariate time series subject to delayed output response and non-stationarity. Based on the encoder-decoder network as an exemplary predictive model, the component of adaptive model update consists of a non-Euclidean loss for evaluating sequential predictions and a task-free knowledge consolidation strategy for continual learning-based regularization. The application to an industrial rotary drying process is demonstrated, where data streams are collected from four production lines over 14 months. Extensive comparative study shows the superior performance of proposed mechanism and ablation study further verifies the effectiveness of each individual component.
... BTP multistep prediction based on denoising spatial-temporal encoder-decoder framework. 101 In summary, these soft sensing methods have not explicitly modeled the coupling characteristics among the process variables. These MIQ models are based on fully connected methods, such as LSTM-based methods. ...
... In order to alleviate the disturbance of industrial noise, a denoising spatial-temporal encoder-decoder (DSTED) framework was proposed to achieve the BTP multistep prediction in advance, which can provide sufficient time for site-workers to adjust the trolley speed to maintain the normal operation of sintering process. 101 DSTED is made up of three modules: the data generation module, encoder module, and decoder module, as depicted in Figure 6. First, we determine the process variables that are related to BTP and collect raw data from sintering process. ...
Article
Full-text available
Data-driven soft sensing modeling is becoming a powerful tool in the ironmaking process due to the rapid development of machine learning and data mining. Although various soft sensing techniques have been successfully used in both the sintering process and blast furnace, they have not been comprehensively reviewed. In this work, we provide an overview of recent advances on soft sensing in the ironmaking process, with a special focus on data-driven techniques. First, we present a general soft sensing development framework of the ironmaking process based on the mechanism analysis and process characteristics. Second, we provide a detailed taxonomy of current soft sensing methods categorized by their predictive tasks (i.e., quality indicators prediction, state parameters prediction, etc.). Finally, we outline several insightful and promising directions, such as self-supervised learning and digital twins in the ironmaking process, for future research.
... For this, techniques such as support vector machines [6], neural networks [7][8][9], genetic programming [10], and decision trees [11] have been used. Recently, studies have been on BTP prediction through multi-step prediction with a spatial-temporal encoder [12,13]. There are also studies in which machine learning methods such as artificial neural networks and support vector machines are used to manage the machine speed after finding the BTP points [14][15][16][17]. ...
Article
Full-text available
Intelligent control systems developed for production facilities significantly contribute to production efficiency and quality. Using intelligent control systems has now become a necessity in iron and steel sintering plants that produce millions of tonnes annually. Automatic control of the sinter machine speed, which directly affects production efficiency and quality, is one of the first issues to be addressed. The complexity of the sintering process, being affected by many variables, and the nonlinearity of these variables make it difficult to control the machine speed. This study demonstrates that we have overcome this challenge using a fuzzy logic controller (FLC), which is optimized with an adaptive neuro-fuzzy inference system (ANFIS). The FLC we have designed operates with the characteristic point of the thermal state, the mixture level, the vacuum average, and the current speed parameters. We achieved an average success rate of 95%. The developed system automatically controls the speed of the sinter machine with high accuracy, independent of the operator. The system we have developed is used continuously at the Iskenderun Iron & Steel Co. sinter plant. The results obtained from the production facility show that the developed system captures the thermal change in the sinter pallet and manages the machine accordingly, increases the sintering efficiency by at least 10%, and ensures process safety. These results revealed that the developed system can be used effectively in the iron and steel industry and the use of the system will increase efficiency.
... However, the disadvantages of the traditional methods lie in the time-consuming feature extraction and the unwarranted accuracy and robustness. With the advent of industrial Internet of Things and Big Data [26], [27], the condition monitoring methods based on deep learning has become a research hotspot [28], [29], [30], [31], [32], by which researchers can evaluate the condition of complex systems without considering their coupling characteristics. Since deep learning models have strong depth feature extraction capability, the information obtained by multisensor is of more significance than that of single-sensor, which can enhance the reliability of the system with more information [33], [34], [35]. ...
Article
Full-text available
This article explores the corresponding relationship between the equipment fault and grinding quality in a robotic grinding system, and establishes a unified and lightweight monitoring and matching framework, providing a perceptual basis for accurate tracking and effective control of grinding quality. Firstly, a multi-channel vibration imaging method named Wavetrizorn is developed based on vibration signals, and the images generated were used to train a fault diagnosis model for the equipment. Particularly, a lossy reconstruction algorithm based on wavelet packet and convolutional autoencoder (WPCAE) is proposed for vibration signals with strong noise, which can help networks to extract the fault information. Then a regression model mapping from vibration signal to force signal is established based on the reconstructed signal graphs to monitor the grinding quality. Finally, to match the fault type and the grinding quality, a unique canonical correlation feature (CCF) is proposed and calculated, which can achieve precise quality traceability. Consequently, during the online monitoring, it is only necessary to use vibration signals to regress the CCF to accurately match the fault type and grinding quality with significant efficiency. The effectiveness of the framework is verified on a robotic grinding system in the laboratory.
Article
The iron oxide (FeO) content had a significant impact on both the metallurgical properties of sintered ores and the economic indicators of the sintering process. Precisely predicting FeO content possessed substantial potential for enhancing the quality of sintered ore and optimizing the sintering process. A multi-model integrated prediction framework for FeO content during the iron ore sintering process was presented. By applying the affinity propagation clustering algorithm, different working conditions were efficiently classified and the support vector machine algorithm was utilized to identify these conditions. Comparison of several models under different working conditions was carried out. The regression prediction model characterized by high precision and robust stability was selected. The model was integrated into the comprehensive multi-model framework. The precision, reliability and credibility of the model were validated through actual production data, yielding an impressive accuracy of 94.57% and a minimal absolute error of 0.13 in FeO content prediction. The real-time prediction of FeO content provided excellent guidance for on-site sinter production.
Article
Full-text available
In the wake of the era of big data, the techniques of deep learning have become an essential research direction in the machine learning field and are beginning to be applied in the steel industry. The sintering process is an extremely complex industrial scene. As the main process of the blast furnace ironmaking industry, it has great economic value and environmental protection significance for iron and steel enterprises. It is also one of the fields where deep learning is still in the exploration stage. In order to explore the application prospects of deep learning techniques in iron ore sintering, a comprehensive summary and conclusion of deep learning models for intelligent sintering were presented after reviewing the sintering process and deep learning models in a large number of research literatures. Firstly, the mechanisms and characteristics of parameters in sintering processes were introduced and analysed in detail, and then, the development of iron ore sintering simulation techniques was introduced. Secondly, deep learning techniques were introduced, including commonly used models of deep learning and their applications. Thirdly, the current status of applications of various types of deep learning models in sintering processes was elaborated in detail from the aspects of prediction, controlling, and optimisation of key parameters. Generally speaking, deep learning models that could be more effectively implemented in more situations of the sintering and even steel industry chain will promote the intelligent development of the metallurgical industry.
Article
Burn-through point (BTP) is an essential thermal state parameter in a sintering process, which is a direct reflection of the stability of this process. However, it cannot be measured online. Soft-sensing technology offers a reliable method for estimating unmeasurable variables in industrial processes. Here, a soft-sensing model for BTP based on weighted kernel just-in-time learning (WKJITL) and fuzzy broad-learning system (FBLS) is built. First, an abnormal production data detection and correction strategy is employed to process the production data, and the mechanism analysis and mutual information analysis are utilized to specify the detectable process variables that are directly related to BTP. Then, the WKJITL method is proposed to obtain historical production data similar to the query data of BTP for local learning modeling, and the FBLS is utilized as an efficient modeling method for the soft-sensing prediction of BTP. Finally, the results of simulation experiments based on actual sintering production data reveal that the developed soft-sensing model of BTP exhibits better prediction accuracy and efficiency compared with some advanced modeling methods. Furthermore, the proposed method is of general nature and can also be easily applied to other industrial processes.
Article
Methanol-to-olefins, as a promising non-oil pathway for the synthesis of light olefins, has been successfully industrialized. The accurate prediction of process variables can yield significant benefits for advanced process control and optimization. The challenge of this task is underscored by the failure of traditional methods in capturing the complex characteristics of industrial processes, such as high nonlinearities, dynamics, and data distribution shift caused by diverse operating conditions. In this paper, we propose a novel hybrid spatial-temporal deep learning prediction model to address these issues. Firstly, a unique data normalization technique called reversible instance normalization is employed to solve the problem of different data distributions. Subsequently, convolutional neural networks integrated with the self-attention mechanism are utilized to extract the temporal patterns. Meanwhile, a multi-graph convolutional network is leveraged to model the spatial interactions. Afterward, the extracted temporal and spatial features are fused as input into a fully connected neural network to complete the prediction. Finally, the outputs are denormalized to obtain the ultimate results. The monitoring results of the dynamic trends of process variables in an actual industrial methanol-to-olefins process demonstrate that our model not only achieves superior prediction performance but also can reveal complex spatial-temporal relationships using the learned attention matrices and adjacency matrices, making the model more interpretable. Lastly, this model is deployed onto an end-to-end Industrial Internet Platform, which achieves effective practical results.
Article
Evenness of filament yarn is a crucial indicator that significantly impacts the quality of downstream textile products. Therefore, accurate real-time prediction and classification of the coefficient of variation (CV) value, which serves as an indicator of evenness, are of utmost importance. However, current detection methods predominantly rely on offline evenness testing devices, compromising the real-time capability and accuracy of evenness detection. To address this challenge, a semisupervised sequence Gaussian mixture variational autoencoder (VAE) model is developed for predicting and classifying the CV value. This model combines a mix VAE and a sequence-to-sequence structure, integrating a classifier to achieve semisupervised classification of time-series data. To validate the effectiveness of the proposed method, both software and hardware enhancements were implemented on the existing capacitance-based yarn evenness testing device, enabling uninterrupted measurement of yarn evenness and length. The collected data were then used to train the model. Experimental results demonstrate that the proposed model achieves an accuracy rate of 85% in classifying the CV value of the filament yarn.
Article
Full-text available
Soft sensors are widely constructed in process industry to realize process monitoring, quality prediction, and many other import applications. With the development of hardware and software, industrial processes have embraced new characteristics which lead to the poor performance of traditional soft sensor modeling methods. Deep learning, as a kind of data-driven approach, show its great potential in many fields, as well as in soft sensing scenarios. After a period of development, especially in the last five years, many new issues raise which need to be investigated. Therefore, in this paper, the necessity and significance of deep learning for soft sensor applications are demonstrated firstly by analyzing the merits of deep learning and the trends of industrial processes. Next, mainstream deep learning models, tricks, and frameworks/toolkits are summarized and discussed to help designers propel the developing progress of soft sensors. Then, existing works are reviewed and analyzed to discuss the demands and problems occurred in practical applications. Finally, conclusions and prospects are given.
Article
Burn-through point (BTP) is an essential parameter in the iron ore sintering process. Operators usually judge whether the current production is stable by monitoring the BTP. It comes with significant application prospects to predict the BTP accurately. A prediction model of the BTP with fuzzy time series is designed in this paper. First, the fuzzy time series prediction method with the Fuzzy C-Means clustering is presented as the core modeling method. A prediction model of the response is constructed to obtain a timely response to the current BTP. The prediction model of the difference is established to estimate the present unmeasurable disturbance on the BTP. Then, a hybrid prediction model is built, which realizes the composition of these two models by an adjustment factor. Finally, a series of experiments is carried out using the raw time series data from an iron and steel plant. The experimental result shows that the designed model has better prediction performance for the BTP than existing models, which is an advantage resulting from the hybrid structure and the fuzzy time series prediction model with the Fuzzy C-Means clustering. This prediction model of the BTP implies the foundation for the stable control of the iron ore sintering process.
Article
The implementation of soft sensors for industrial processes is expanding in applications for recent machine learning techniques. In this work, strategies based on reservoir computing are applied to developing dynamical models of target variables in a sulfur recovery unit (SRU) of a refinery plant in Italy. In particular, a specific type of recurrent network, namely an echo-state network (ESN), is adopted to estimate key process variables on the SRU. Two process lines are considered to evaluate the proposed algorithm on different datasets in terms of estimation performance and computational effort of the learning process. The obtained results are evaluated in comparison with other recurrent networks, based on long short-term memory, and with other techniques reported in the literature, demonstrating the feasibility of the proposed approach. Furthermore, the introduction of intrinsic plasticity (IP) is also considered to adapt the reservoir parameters to the provided inputs, achieving a significant improvement in the statistical distribution of the results obtained for the pool of learned networks. The reported results show that ESN-IP represents a suitable solution for identifying dynamical models of the industrial processes, avoiding the time-consuming regressor selection procedure, which is needed when a static network is adopted to design a dynamical model.
Article
The converter steelmaking process smelts hot metal to liquid steel and occupies an important position in industry. The composition of liquid steel at the endpoint is an essential quality index, including the concentrations of multiple elements, such as carbon, silicon, and manganese. Accurately predicting endpoint composition is the basis of production optimization. Hence, a multichannel diffusion graph convolutional network (MCDGCN) is presented in this article. Unlike conventional models, the developed MCDGCN describes the converter steelmaking process as a graph to exploit the correlations among element concentrations for an accurate endpoint composition prediction. We also develop a unique $K$ -hop diffusion method to extract the globally consistent information over the graph for predicting each element. The proposed method addresses the composition prediction task for a realistic converter steelmaking process. To the best of our knowledge, this is the first time that up to 15 elements of liquid steel are covered and predicted to present a comprehensive process model. Compared with six benchmark models, MCDGCN presents state-of-the-art results, i.e., an average $R^{2}$ of 0.8475 and an average MAE of 0.0189, which shows that the correlation mining of graph deep learning can indeed improve the prediction performance for endpoint composition.
Article
Data-driven approach has been widely utilized in modeling soft sensor for predicting key quality variables in process engineering area. The soft sensor is generally a time dependent dynamical model between the input and the output. Echo state network (ESN) is a typical data-driven modeling tool, which has exhibited excellent performance in temporal data processing area. However, the memory mode in the traditional ESN lacks flexibility. It is sometimes hard to preserve sufficient input features in the states, especially for modeling long-term dependent soft sensors. To solve this problem, this paper proposes an asynchronously deep echo state network (ADESN), which is composed of a number of sub-reservoirs that are connected one by one in sequence. Additionally, time delay modules are inserted between every two adjacent layers. The ADESN scheme preserves more input history in the states. Moreover, it can realize a selective memory. The validity of the ADESN is demonstrated on modeling a number of numerical and real-life soft sensors.
Article
The operating mode is an essential factor affecting product quality and yield of the sinter ore, which inspires the realization of operating mode recognition. Taking burn-through point (BTP) as the decision parameter of operating mode, an operating mode recognition method based on the fluctuation interval prediction is presented. Firstly, combining the principal component analysis and the fuzzy information granulation method, a fluctuation interval prediction model of the BTP is established through utilizing the Elman neural network. Then, the operating mode classification rules are built according to the data distribution of the BTP in the fluctuation interval. Finally, experiments are executed with the data collected from a factory. The results indicate that it can effectively predict the fluctuation interval of the BTP, and then successfully recognize the operating mode. The proposed method provides a valid reference to control the stable operation of the iron ore sintering process.
Article
It is of great importance to develop advanced soft sensors for ensuring the safety and stability of complex industrial processes. Unluckily, with the increasing scale of chemical processes, it becomes more and more demanding to develop soft sensor with high accuracy. In addition, most of industrial processes are dynamic. As a result, the soft sensors developed using static models cannot achieve acceptable performance. In order to handle this problem, the Echo state network (ESN) as a kind of recurrent neural network is selected. However, the output weights of ESN are calculated linearly. On one hand, the collinear in the reserve layer outputs may decrease the performance; on the other hand, the over-fitting problem may occur. To enhance and improve the ESN performance, singular value decomposition based ESN (SVD-ESN) is presented. In the SVD-ESN method, the singular value decomposition instead of the traditional least square is adopted to calculate the weights between the output layer and the reserve layer. Through singular value analysis in the outputs of the reserve layer, appropriate defining parameters are selected to enhance the accuracy and ensure the computing speed. As a result, the collinearity and over-fitting problem is solved; then the performance of ESN is enhanced. To test and validate the performance of SVD-ESN, the proposed SVD-ESN is developed as soft sensor for the High Density Polyethylene (HDPE) production process and Purified Terephthalic Acid (PTA) production process. Compared with the conventional ESN, Extreme Learning Machine (ELM), Dynamic Window based ELM (DW-ELM) and Long Short-Term Memory (LSTM), the simulation results show that the proposed SVD-ESN model obtains better performance in terms of prediction accuracy, which conforms that the proposed SVD-ESN can be used as an effective dynamic model for developing accurate soft sensors.
Article
Indoor air quality (IAQ) measurements play an important role in the subway ventilation system control, influencing over crucial factors as ventilation energy consumption and commuters’ health. Therefore, faulty sensors may result in misinterpreting the IAQ conditions and misoperating the air delivery rate level in subway stations. However, due to the IAQ data properties of dynamism and non-Gaussian distribution. Linear and fixed structures are not sufficient to extract essential features from the IAQ data. This paper presents a machine learning-based soft sensor validation technique to detect, diagnose, identify, and reconstruct faulty measurements of the multivariate IAQ data in subway stations. The proposed method is memory-gated recurrent neural networks-based autoencoders (MG-RNN-AE), which are capable of processing sequential and dynamic IAQ information. The performance of the sensor validation was evaluated through several metrics to consequently be compared among different methods, being the batch normalization-based gated recurrent unit (BN-GRU) method, the most effective to detect ( = 100%) and reconstruct faulty IAQ sensors ( = 0.45-0.79). Additionally, the effects of the faulty and repaired measurements in the ventilation system were evaluated to determine that the proposed method is capable of finding a sustainable balance between energy demand and commuters’ health level.
Article
Soft sensors have been widely used in industrial processes to improve product quality and ensure safety during production. This paper proposes a semi-supervised deep neural regression network with embedding manifold (called SSE-DNN) for soft sensor modeling that integrates manifold embedding into deep neural regression networks. Manifold embedding is imposed on the hidden layer of the deep neural regression network to form a semi-supervised deep neural regression network. Manifold embedding exploits the local neighbor relationship among industrial data and utilizes unlabeled data effectively to improve the performance of deep neural regression model. The SSE-DNN model exploits the global information and local manifold among industrial large data simultaneously and implements implicitly multi-modal models of industrial process. The soft sensor model based on the SSE-DNN is applied to estimation of total Kjeldahl nitrogen (TKN) in a long-term complicated wastewater treatment process. The experimental results demonstrate that the SSE-DNN model has the better performance than other soft sensors and provides an effective method for soft sensor modeling of complex industrial processes.