ArticlePDF Available

DSTED: A Denoising Spatial-Temporal Encoder-Decoder Framework for Multistep Prediction of Burn-Through Point in Sintering Process

October 2022
IEEE Transactions on Industrial Electronics 69(10):1-1

October 2022
69(10):1-1

DOI:10.1109/TIE.2022.3151960

Authors:

Zhejiang University

Sinter ore is the main raw material of the blast furnace, and burn-through point (BTP) has a direct influence on the yield, quality, and energy consumption of the ironmaking process. Since iron ore sintering is a very complex industrial process with strong nonlinearity, multivariable coupling, random noises, and time variation, traditional soft-sensor models are hard to learn the knowledge of the sintering process. In this article, a multistep prediction model, called denoising spatial–temporal encoder–decoder, is developed to predict BTP in advance. First, the mechanism analysis is carried out to determine the relevant-BTP variables, and the BTP prediction is defined as a sequence-to-sequence modeling problem. Second, motivated by the random noises of industrial data, a denoising gated recurrent unit (DGRU) is designed to alleviate the impact of noise by adding a denoising gate into the GRU. In this case, the encoder with DGRU can better extract the latent variables of original sequence data. Then, spatial–temporal attention is embedded into the decoder to simultaneously capture the time-wise and variable-wise correlations between the latent variables and the target variable BTP. Finally, the experimental results on the real-world dataset of a sintering process demonstrated that the integrated multistep prediction model is effective and feasible.

Process flowchart of iron ore sintering.

…

Structure of spatial-temporal attention.

…

Multistep deep learning prediction model for burning-through point in the sintering process.

…

Prediction performance on several typical parameter settings. (a) Different learning rates. (b) Different hidden layers. (c) Different hidden neurons. (d) Different dropouts.

…

Figures - uploaded by Feng Yan

Content may be subject to copyright.

Content uploaded by Feng Yan

Content may be subject to copyright.

IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. 69, NO. 10, OCTOBER 2022 10735

DSTED: A Denoising Spatial–Temporal

Encoder–Decoder Framework for Multistep

Prediction of Burn-Through Point in Sintering

Process

Feng Yan , Chunjie Yang , Senior Member, IEEE, and Xinmin Zhang , Member, IEEE

Abstract—Sinter ore is the main raw material of the

blast furnace, and burn-through point (BTP) has a direct

inﬂuence on the yield, quality, and energy consumption

of the ironmaking process. Since iron ore sintering is a

very complex industrial process with strong nonlinearity,

multivariable coupling, random noises, and time variation,

traditional soft-sensor models are hard to learn the knowl-

edge of the sintering process. In this article, a multistep

prediction model, called denoising spatial–temporal

encoder–decoder, is developed to predict BTP in advance.

First, the mechanism analysis is carried out to determine

the relevant-BTP variables, and the BTP prediction is

deﬁned as a sequence-to-sequence modeling problem.

Second, motivated by the random noises of industrial data,

a denoising gated recurrent unit (DGRU) is designed to

alleviate the impact of noise by adding a denoising gate

into the GRU. In this case, the encoder with DGRU can

better extract the latent variables of original sequence

data. Then, spatial–temporal attention is embedded into

the decoder to simultaneously capture the time-wise and

variable-wise correlations between the latent variables and

the target variable BTP. Finally, the experimental results on

the real-world dataset of a sintering process demonstrated

that the integrated multistep prediction model is effective

and feasible.

Index Terms—Burn-through point (BTP), denoising gated

recurrent unit (DGRU), multistep prediction, soft-sensor,

spatial–temporal attention.

I. INTRODUCTION

SINTERING process, as a primary modus in the iron and

steel industry, has increasingly received scientists’ attention

in the past few years [1], [2]. Sinter ore is the main iron raw

material of most blast furnaces, and its quality is mainly affected

by operation and state parameters. Burn-through point (BTP),

Manuscript received December 3, 2021; revised January 23, 2022;

accepted February 3, 2022. Date of publication February 23, 2022;

date of current version May 2, 2022. This work was supported by the

National Natural Science Foundation of China under Grant 61933015.

(Corresponding author: Chunjie Yang.)

The authors are with the State Key Laboratory of Industrial Con-

trol Technology, College of Control Science and Engineering, Zhejiang

University, Hangzhou 310000, China (e-mail: yanfeng555@zju.edu.cn;

cjyang999@zju.edu.cn; xinminzhang@zju.edu.cn).

Color versions of one or more ﬁgures in this article are available at

https://doi.org/10.1109/TIE.2022.3151960.

Digital Object Identiﬁer 10.1109/TIE.2022.3151960

as one of the most important state parameters, represents the po-

sition where the sinter process is completed. However, sintering

is a very complicated physical and chemistry process, and the

whole process is nonlinear and dynamic. These characteristics

make it quite hard to establish a precise mathematical model

to predict BTP in the sintering process [3], [4]. The position of

BTP directly inﬂuences the quality of the sintered product. For

example, if BTP is located in front of the optimal position, the

iron ore is not fully burned; if the BTP is behind the desired

position, the quality of iron ore cannot meet the requirements of

blast furnace ironmaking [5]. Therefore, the accurate prediction

of BTP is of great signiﬁcance to the normal operation of the

sintering process and the improvement of product quality.

According to previous investigations, two types of BTP pre-

diction approaches were usually examined: mechanism-based

mathematical model and data-driven model. For example, a

mathematical model was established by Cao et al. [6] to directly

predict BTP using pallet velocity, bed depth, and other state

parameters. But the mechanism-based model is very compli-

cated and time-consuming, and cannot meet the requirements

of real-time prediction. Thus, an increasing number of scholars

have begun to examine the internal properties and patterns of

the sintering process using artiﬁcial intelligence. Toktassynova

et al. [7] used the gray theory model GM (l, n) optimized by

the particle swarm algorithm to predict BTP using a small data

at the beginning of the sintering process. Afterward, Liu et al.

[8] established a prediction system of BTP based on the gradient

boosting decision tree algorithm with the combination of process

knowledge and feature selection method. Subsequently, a hybrid

BTP prediction model was presented based on an artiﬁcial neural

network and multilinear regression error compensation algo-

rithm [9]. Moreover, a dynamic subspace model was developed

to predict BTP based on pallet velocity, the thickness of the

material layer, and other operation variables [10]. In recent years,

fuzzy neural network has been widely used in the intelligent

prediction system. For instance, Wang et al. [11] presented a

fuzzy neural network that can deal with fuzzy information and

have the ability of self- learning in the sintering process. In

another related study, Du et al. [12] designed a hybrid fuzzy

time-series prediction model of BTP with the fuzzy c-means

clustering. Besides, a ﬂuctuation interval prediction model of

See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: Zhejiang University. Downloaded on May 05,2022 at 04:13:15 UTC from IEEE Xplore. Restrictions apply.

10736 IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. 69, NO. 10, OCTOBER 2022

BTP is proposed based on the principal component analysis, the

fuzzy information granulation method, and the Elman neural

network [13]. The factory experiments indicated that it can

effectively predict the ﬂuctuation interval of the BTP and lay a

solid foundation for the stable operation of the sintering process.

These classical modeling methods can hardly recognize the

knowledge of the complex industrial processes using the histor-

ical data. Fortunately, with the emergence of deep learning [14],

[15], there were many great breakthroughs in many research

ﬁelds, such as computer vision [16], natural language process

[17], and speech recognition [18]. In this case, more and more

scholars have started to pay attention to deep learning modeling

methods and apply them to industrial processes [19]–[21]. For

instance, Sun et al. [22] proposed a novel ensemble semisuper-

vised gated stacked autoencoder for key performance indicators

prediction to deal with those excessive unlabeled samples. Fur-

thermore, the memory-gated-based autoencoder was adopted to

detect and diagnose indoor air quality and numerical experi-

ments demonstrated that the measurement model was effective

[23]. Apart from autoencoder, the recurrent neural network was

also applied to industrial processes. Yuan et al. [24] designed a

supervised long short-term memory (LSTM) network to learn

quality-relevant variables in the penicillin fermentation process

and an industrial debutanizer column. In addition, a speciﬁc

type of recurrent network called echo-state network (ESN),

was adopted to estimate key process variables on the sulfur

recovery unit (SRU) [25]. But the traditional ESN cannot model

long-term dependent soft sensors. To solve this problem, asyn-

chronously deep ESN and singular value decomposition-based

ESN (SVD-ESN) were proposed one after another, and their

validity was demonstrated on modeling real-life soft sensors

[26], [27]. Through the analysis mentioned above, deep learning

has been extensively used in modern industrial processes.

However, data-driven methods of the sintering process based

on deep learning are rare due to the difﬁculty of obtaining

data and the process complexity. The existing research works

have not achieved the multistep prediction of BTP, which is

instead very important for the sintering process control. In face

of this situation, how to make full use of deep learning models

with a strong ability in mapping the complex nonlinear relation

to explore the BTP multistep prediction issue has become a

challenging and urgent project. In addition, through the detailed

ﬁeld investigation and mechanism analysis, we have learned

that there are several problems. First, the sintering process is

dynamic, and the BTP multistep prediction can be perceived as a

sequence-to-sequence learning task. But the traditional recurrent

neural networks such as LSTM [28] and gated recurrent unit

(GRU) [29] are hard to handle these tasks. Second, the industrial

environment is extremely complex, and the time-series data

collected from electronic sensors is usually mixed with noisy

data, which also brings some challenges to the sequence model-

ing. Finally, the sintering process is nonlinear and multivariable

coupling, and it is difﬁcult for existing models to capture the

target-relevant hidden dynamics.

To address these challenging problems, an end-to-end ap-

proach, called denoising spatial–temporal encoder–decoder

(DSTED), framework is developed in this article, which treats

the BTP multistep prediction as the sequence-to- sequence task.

Fig. 1. Process ﬂowchart of iron ore sintering.

The speciﬁc contributions of this article are summarized as

follows.

1) According to the sintering mechanism, the BTP-relevant

variables are selected as the input features, and the label

BTP is calculated by ﬁtting the exhaust gas temperature

in the bellows.

2) To the best of our knowledge, the BTP multistep predic-

tion is ﬁrst deﬁned as a sequence-to-sequence problem.

Besides, in the encoder network, to alleviate the effect

of industrial noisy data, a denoising GRU (DGRU) is

proposed to obtain the latent variables representation of

the original input variables.

3) In the decoder network, spatial–temporal attention is

designed to model dynamic spatial–temporal correlations

of data. Speciﬁcally, the spatial attention is applied to

capture the complex spatial correlations between different

latent variables and the target variable. The temporal

attention is used to obtain the dynamic temporal corre-

lations of the time series.

4) A series of comparative experiments are conducted using

the actual industrial data of the sintering plant, and the re-

sults conﬁrmed that the proposed model is more effective

and feasible as compared to the existing baselines.

The rest of this article is organized as follows. Section II

introduces the mechanism of the sintering process and problem

deﬁnition. Section III outlines the procedures of the proposed

method. The results of the case study are discussed in Section IV.

Finally, Section V concludes this article.

II. MECHANISM ANALYSIS AND PROBLEM DEFINITION

In this section, the sintering process and BTP are described

in detail, and the sintering process parameters are classiﬁed

into ﬁve types systematically. Besides, the data characteristics

are brieﬂy analyzed, and the BTP prediction problem is also

deﬁned.

A. Description of the Sintering Process

Sintering process is a continuous and complex production

process with a long process ﬂow and large-scale control equip-

ment. It can be seen as a process in which the materials mix-

ture is powdered into the massive solid under high-temperature

heating conditions. At present, more than 90% of iron and steel

enterprises use the Dwight–Lloyd sintering machine with 24

bellows, as shown in Fig. 1. It is clear that the sintering process

Authorized licensed use limited to: Zhejiang University. Downloaded on May 05,2022 at 04:13:15 UTC from IEEE Xplore. Restrictions apply.

YAN et al.: DSTED: A DENOISING SPATIAL–TEMPORAL ENCODER–DECODER FRAMEWORK 10737

includes ﬁve steps: proportioning, mixing, ignition, sintering

with ventilation, cooling, and screening.

To study the sintering process more conveniently, the whole

production process is approximately regarded as a dynamic

system. All process parameters are divided into ﬁve categories:

raw material parameters, equipment parameters, state parame-

ters, manipulated parameters, and index parameters. From the

perspective of system theory, the index and state parameters

of sintering are produced when raw material and manipulated

parameters act on the equipment.

B. Exploration of Data Analysis

Based on the mechanism description above, we can ﬁnd that

the sintering process is nonlinear, multivariable coupling, and

time-varying. Besides, the sintering environment is very com-

plex, and thus some noisy data will be generated. To illustrate

the motivation of this study, the following key characteristics are

needed to be elaborately described.

1) Time Varying: The whole sintering is a dynamic process

industry, and it takes about 40 min to ﬁnish the sintering task.

In this process, all kinds of process parameters are changing

irregularly with the moving of the trolley. For example, the

bellows negative pressure is constantly adjusted according to

the sintering process.

2) Random Noise: According to the analysis of the sinter-

ing process, it can be seen that the whole sintering process is

complex and dynamic. There usually exists some random noisy

data in the actual industrial process, which brings some difﬁculty

for the establishment of the data-driven model. To provide a

sufﬁcient theoretical basis for the prediction model, it is nec-

essary to conduct random noise analysis using industrial data

collected from the sintering factory. Through the exploration

of these typical variables, we can ﬁnd that usually there exists

some random noise in industrial process data, which has an

adverse impact to the BTP modeling. Hence, it is essential to

improve the antinoise performance of the model, and the details

are interpreted in Section III.

C. Problem Deﬁnition

According to our knowledge, the BTP multistep prediction

can be regarded as a sequence-to-sequence task. Suppose there

are mvariables in the sintering process, each of which can

generate time series. Among these variables, BTP time series

is used as the target variable for making predictions, while the

other variables are used as the input features. For the input

sequence, the input length is set to Th, then the sequence

X=(xt−Th+1,xt−Th+2 ,...,xt)∈RTh×mis regarded as all

input sequence at time t. Similarly, given a time window of

length Tf,weuseY=(yt+1 ,y

t+2,...,y

t+Tf)∈RTfto

represent BTP prediction series.

III. METHODOLOGY

In this section, a DSTED framework is developed for the BTP

multistep prediction. The established encoder–decoder model

consists of two parts: an encoder with denoising GRU and a

Fig. 2. Basic encoder–decoder sequence-to-sequence model.

Fig. 3. Str ucture of D GRU.

decoder with spatial–temporal attention. In the encoder network,

the denoising GRU is designed to alleviate the industrial noise

and improve the ability of the latent variable extractor. In the

decoder network, the temporal attention module is used to learn

the dynamic, and the spatial attention module is used to capture

the relevance of the latent variable and the target variable. The

speciﬁc content of the proposed method is as follows.

A. Basic Encoder–Decoder Framework

Obviously, the BTP multistep prediction task is a typi-

cal sequence-to-sequence problem, the basic encoder–decoder

framework is just suitable for this task. This framework was

proposed by Cho et al. [30] and made up of two parts: encoder

and decoder. The core idea is simple: the encoder network is

used to read the input sentence and compress the information of

the whole sentence into a context vector; then, another decoder

network is used to decode the context vector and decompress

it into a sentence of the target language, as shown in Fig. 2.

The training process is to minimize the conditional probability

between the target sequence and the source sequence.

B. Denoising GRU

Essentially, the encoder–decoder framework consists of a

series of recurrent neural network (RNN) units. Considering the

characteristics and efﬁciency problem of the sintering process,

the GRU network is selected as the basic module of an encoder to

capture the mapping relationships from the time-varying data.

However, we cannot ignore a fact that process data are often

corrupted with many random noises, caused by multiple factors

as environmental disturbances, human interventions, and faulty

sensors. Therefore, it is necessary and meaningful to enhance

the performance of the encoder network.

Motivated by this, a new gate called denoising gate is designed

to alleviate the impact of noises on the basis of GRU. To this

end, the DGRU is developed for the encoder, as shown in Fig. 3.

Authorized licensed use limited to: Zhejiang University. Downloaded on May 05,2022 at 04:13:15 UTC from IEEE Xplore. Restrictions apply.

10738 IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. 69, NO. 10, OCTOBER 2022

Fig. 4. Structure of spatial–temporal attention.

As can be seen from the structure of DGRU, the new unit is

composed of three gates: denoising gate, reset gate, and update

gate. The denoising gate (dt) is mainly to adaptively extract

more useful information from the original data. The reset gate

(rt) controls how much information of the previous state (ht−1)

is reserved and transmitted into the next time. The purpose of

the update gate (zt) is to determinate how much of the current

hidden state (ht) is memorized and updated according to the

intermediate hidden state (˜

ht).

Suppose the ith input variable sequence is X={xi

t−Th+1,

t−Th+2,...,x

t}RTh,μiand sidenote the mean and variance

of the variable xi. According to the theory of statistics, the

variance represents the degree of deviation from the center,

which is used to measure the volatility of data. Then, a guidance

matrix is deﬁned to guide the learning process of the denoising

gate, the formula is expressed by

gt=μ1

t,μ

t,...,μ

t;s1

t,s

t,...,s

tR2m.(1)

Next, the deviation factor eRmand the ﬂuctuation

factor fRmare deﬁned to reﬂect the deviation and ﬂuctuation

range of each variable

e=k(1 −μ)(2)

f=(1−k)s(3)

where kis the weight coefﬁcient connecting the input layer, and

it is used to balance the rate of the two factors eand f

k=σ([xt,ht−1,gt]·Wd).(4)

Furthermore, the denoising gate dtRmis constructed by the

denoising function D(xt),asshowninFig. 5

D(xt)=xt+k(1 −μ)+(1−k)s(5)

dt=D(xt).(6)

Then, the calculation procedures of reset gate (rtRh) and

update gate (ztRh) are as follows:

rt=σ([ht−1,dt]·Wr)(7)

zt=σ([ht−1,dt]·Wz)(8)

ht=tanh

[rtht−1,dt]·W˜

h(9)

ht=(1−zt)ht−1+zt˜

ht(10)

where WdR3m,WrRm×h,WzRm×h, and W˜

hRm×hare

the weights of the update gate, reset gate, hidden layer, and the

last fully connected layer, respectively, and his the number of

hidden neurons.

Finally, the classic backpropagation through time (BPTT)

training algorithm is used to update the weight parameters of

Wd,Wr,Wz,W˜

h, and Woby the optimizer Adam algo-

rithm. Note that the whole network parameters are updated after

the samples go through the encoder–decoder model. Thus, the

speciﬁc loss function is in Section IV-D.

C. Spatial–Temporal Attention

The dynamic features are extracted by the encoder network

composed of a series of DGRU. It is noted that the dynamic

features are also usually called latent variables in the industrial

ﬁeld. Considering the complex multivariable coupling and dy-

namic of the sintering process, a novel spatial–temporal attention

mechanism is embedded in the decoder network to capture

the dynamic spatial and temporal relationships between latent

variables and target variables in the industrial process. It contains

two modules, i.e., the temporal attention module and the spatial

attention module, as shown in Fig. 4.

1) Temporal Attention: In the temporal dimension, there

exists dynamic relevance at different time slices in the sinter-

ing process. Furthermore, the performance of the conventional

encoder–decoder network will degrade as the length of the input

sequence increases. That is to say, each target variable in the

output sequences needs different input information, so a single

mix-length context vector will fail to provide the output target

variable with the required pertinent information in the decoding

process. To solve this problem, the latent variable correlations at

different time steps are calculated by the temporal attention [31]

to adaptively learn different importance of time-varying sam-

ples. In this way,each encoder hidden state is assigned a temporal

attention value. Then, an adaptively weighted content vector is

obtained as the input for the decoder network. Speciﬁcally, each

encoder hidden state is assigned an attention value according to

the similarity between the current encoder output and the hidden

state of the previous decoder. The speciﬁc calculation method is

as follows:

t=score(ht−Th+j,st)

=Vj

ltanh Wj

l[ht−Th+j;st]+b

l(11)

αj

exp ej

t

Th

j=1 exp ej

t(12)

where stdenotes the current decoder hidden state, ht−Th+j

denotes the jth encoder hidden output at time step t,Thand Tf

are the lengths of the encoder and decoder sequences, respec-

tively, Wj

lRTh+Tf,Vj

lRTh+Tf,and bj

lR are all learnable

parameters, ej

tis used to compute the similarity between stand

ht−Th+j, and αj

tis the attention value at time t.

Authorized licensed use limited to: Zhejiang University. Downloaded on May 05,2022 at 04:13:15 UTC from IEEE Xplore. Restrictions apply.

YAN et al.: DSTED: A DENOISING SPATIAL–TEMPORAL ENCODER–DECODER FRAMEWORK 10739

Fig. 5. Multistep deep learning prediction model for burning-through point in the sintering process.

Then, a weighted average of all encoder hidden states is

calculated as follows:

xt=

αj

tht−Th+j.(13)

Finally, the latent variables (context vector) are obtained by

the nonlinear mapping of the concatenation of xtand stthrough

the hyperbolic tangent motivation function (tanh)

ct=tanh (xt,st)(14)

where nis the dimension of the latent variables. Based on the

temporal attention, the latent variables ct=[c1

t,c

t,...,c

t]Rn

extracted by the decoder network can learn more historical

information, acting as the input of the decoder network.

2) Spatial Attention: In the spatial dimension, these la-

tent variables are seen as the advanced representation of the

original input variables. Different latent variables can impose

different effects on the target series yt. Thus, exploring the

target-relevant hidden dynamics can lay a foundation for the

multistep predicting series. However, the impacting weights

are changing dynamically at different times. Inspired by this

fact, for the decoder network, spatial attention is designed to

capture the correlations between these variables and the tar-

get variable, as vividly depicted in Fig. 4. Given that ek

tis

the spatial attention score of the kth latent variable at time t

(et=[e1

t,e

t,...,e

t]Rn); we calculate the attention weight

(i.e., impacting weight) between them as follows:

et=score(ct,y

=Vk

ltanh Wk

l[ct,y

t]+bk

l(15)

βk

t=exp ek

t

n

k=1 exp ek

t(16)

˜ct=β2

tc1

t,β

tc2

t,β

tc3

t, ..., βn

tcn

t(17)

where Wk

lR(n+1)×n,Vk

lR(n+1)×n, and bk

lRnare the

parameters to be learned, βk

tis the spatial attention value,

Fig. 6. Schematic diagram of experimental system implementation.

and ˜ct=[˜c1

t,˜c2

t,...,˜cn

t]Rnis the ﬁnal latent variables af-

ter the spatial attention operation. By exploiting the internal

relevance between the target series and the latent variables,

this attention mechanism can adaptively learn more potential

correlations across different variables, to make better sequence

predictions.

D. Multistep Prediction Model of BTP Based on DSTED

In conclusion, a BTP multistep prediction model is developed

by the DSTED. The whole model consists of three parts: data

generation, an encoder with denoising GRU, and a decoder

with spatial–temporal attention, as shown in Fig. 5 . First, the

relevant-BTP variables are selected through mechanism anal-

ysis, and data are collected and preprocessed from the sinter

plant. Second, the dynamic latent variables are extracted by the

encoder with DGRU. The initial hidden state of the decoder

network is the last hidden output of the encoder network. Next, a

spatial–temporal attention module is embedded into the decoder

to capture the dynamic correlations between latent variables and

the target variable. Then, the extracted latent variables, as well

Authorized licensed use limited to: Zhejiang University. Downloaded on May 05,2022 at 04:13:15 UTC from IEEE Xplore. Restrictions apply.

10740 IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. 69, NO. 10, OCTOBER 2022

Algorithm 1: Denoising spatial–temporal encoder–decoder

model.

Input: The historical sintering data D=(X,Y); hidden

size H;

Hidden layers Num; Batch size B; the length of input

Th;

the length of output Tf; learning rate η; dropout p.

Output: BTP predictions Y

for all available time t(1 ≤t≤num_samples)do

X=(xt−Th+1 ,xt−Th+2,...,xt)∈RTh×m

Y=(yt+1 ,y

t+2,...,y

t+Tf)∈RTf

end for

for epoch p(1 ≤p≤num_epoch)do

for batch b(1 ≤b≤Bdo

encoder hidden state ht=encoder(x1,x

2,x3,...,xTh)

Temporal attention

et=Vj

ltanh(Wj

l[ht−Th+j;st]+bj

Normalization αj

t=exp(ej

Th

j=1 exp(ej

t),xt=

αj

tht−Th+j

latent variable ct=tanh(xt,st)

Spatial attention

t=Vk

ltanh(Wk

l[ct,y

t]+bk

Normalization βk

t=exp(ek

n

k=1 exp(ek

Final latent variable

˜ct=(β1

tc1

t,β

tc2

t,β

tc3

t, ..., βn

tcn

BTP prediction output

Y=decoder(˜ct,y

t+1,y

t+2,...,y

t+Tf)

The parameters are updated after the whole

encoder–decoder by BPTT

end for

as the target variable output at the previous time step are both fed

into the decoder network. Also, a dropout layer is added to reduce

the overﬁtting caused by the complex structure. Finally, a fully

connected layer with a ReLU activation function is appended to

forecast the target BTP. After the forward pass is completed,

the whole model can be trained via the backpropagation al-

gorithm. During the training process, an Adam optimizer is

adopted to train our model by minimizing the following loss

function:

L=1



i=1

((yt−ˆyt)2+λ

2Tf

W2

2+

V2

2

(18)

where Wand Vrepresent the weights of encoder and decoder,

respectively, and λis the regularization coefﬁcient to reduce

the overﬁtting of the model. Finally, the development of the

proposed multistep prediction model is elaborately given in

Algorithm 1.

IV. EXPERIMENTAL STUDIES

In this section, based on the mechanism analysis and data

collected from a real industrial process, extensive experiments

TABLE I

LIST OF VARIABLES

Fig. 7. BTP soft-sensor method.

are carried out to demonstrate the effectiveness of the proposed

BTP multistep prediction model.

A. Experimental System and Dataset Generation

In this experiment, the raw data were collected from the sinter-

ing plant of a steel company in real time every 1 min from Oct. 12,

2021 to Oct. 20, 2021. A workshop of the sintering plant contains

a 360 m2belt type sintering machine, as well as silo, conveying,

cooling, and other equipment. A sintering intelligent control

system in this plant consists of industrial computers, application

software, dynamic data exchange communication interference,

and distributed control system (DCS), as shown in Fig. 6. The

DCS is made up of ﬁve programmable logic controller (PLCs) to

achieve the basic automation, including material control, mixing

control cooling control, igniting control, and desulphurizing

control [32]. All PLC modules send those data to the central

control room, which controls the drive motor of the strand.

Meanwhile, the actual strand velocity is measured by sensors,

and the results calculated by the established models are sent

back to the intelligent controller. All raw data are recorded and

transmitted to the time-series database (InﬂuxDB). According

to the mechanism analysis of Section II, the input variables are

determined and described in Table I.

Due to the complexity of the sintering process, there is no

instrument to measure BTP directly. Here, we used the classic

soft-sensor method, called temperature ﬁtting of the exhaust gas

in the bellows, to calculate BTP. As shown in Fig. 7, the curve is

ﬁtted by a quadratic or high-order polynomial, and the highest

temperature in the ﬁtted curve is the BTP position. Similarly,

Authorized licensed use limited to: Zhejiang University. Downloaded on May 05,2022 at 04:13:15 UTC from IEEE Xplore. Restrictions apply.

YAN et al.: DSTED: A DENOISING SPATIAL–TEMPORAL ENCODER–DECODER FRAMEWORK 10741

Fig. 8. Schematic diagram of sliding window fragment extraction.

the BRP is calculated when the temperature is 180 °C through

investigations. According to the theory of the sintering process,

the last ten bellows need to be calculated for the BTP. The

speciﬁc calculation procedures are as follows:

T(δi,ω)=ω0+ω1δ1

i+ω2δ2

i+...+ωpδp

i(19)

L(ω)=1



i=15

(T(δi,ω)−Ti)2(20)

where δiand Tiare the position and the temperature of ith

bellow, respectively(i=15,16,17,..., 24) ,ω0,ω

1,...,ω

are coefﬁcients to be calculated, and L(ω)is the loss function.

Then, all samples are segmented by the sliding window

method, as shown in Fig. 8. In this way, the BTP sequence

fragments are completely constructed.

B. Experimental Settings

After data preprocessing, 7780 fragments are used for eval-

uating the models. Here, we use the ﬁrst 6000 fragments as

the training set, the next 1000 fragments as the validation set,

and the remaining 780 fragments as the test set. The lengths

of the input and output sequences are 40 and 3, respectively

(Th=40,T

f=3).

In our experiments, all the comparison models are imple-

mented using the Pytorch framework in Python. The test plat-

form includes the laptop equipped with Core i5- 4210H CPU

and 8G RAM. Without loss of generality, the accuracy of the

proposed model is compared with other typical time-series base-

lines: vector autoregressive (VAR), autoregressive integrated

moving average (ARIMA), LSTM, and GRU. The three com-

monly employed statistical indicators (R2,MAE,RMSE) are

used to evaluate the performance of these models. To ensure

the stability of the model prediction, all evaluation indicators of

each method are the average results of 20 trials on the test set.

C. Model Comparison and Results Analysis

The comprehensive performance comparisons of each method

are illustrated in Table II, which are the mean accuracies of all the

time steps. It is obvious that the two traditional statistical time-

series models have very poor performance and their accuracies

are both less than 0.5, only 0.4547 and 0.4895, respectively.

Because both VAR and ARIMA are linear models, it is difﬁcult

TABLE II

COMPARISON OF DIFFERENT METHODS FOR BTP PREDICTION

Fig. 9. Performance comparison with other models.

to capture the nonlinear relationship of the sintering process.

Encouragingly, the two deep learning models of LSTM and GRU

perform better, with R2values exceeding 0.7, which also indi-

cated that recurrent neural networks can learn complex nonlinear

and dynamic characteristics of the industrial process. Although

LSTM and GRU are able to use memory cells to remember

past useful information and model time-varying data for BTP

prediction, it is hard to get rid of the long-term dependences.

That is to say, the recurrent neural networks cannot explicitly

model periodic and trend information, due to their limited ability

to describe temporal dependencies. Hence, the single RNN units

cannot achieve a satisfying effect on the multistep tasks. By con-

trast, the proposed DSTED has demonstrated its feasibility and

superiority in BTP prediction, with the accuracy over 0.9. The

detailed predictions on the testing dataset are further presented

in Fig. 9. It is intuitively seen that the error between the actual

BTP and the predicted BTP using the proposed model is very

Authorized licensed use limited to: Zhejiang University. Downloaded on May 05,2022 at 04:13:15 UTC from IEEE Xplore. Restrictions apply.

10742 IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. 69, NO. 10, OCTOBER 2022

Fig. 10. Performance on different output time steps.

Fig. 11. Performance on different lengths of input sequence. (a) Model

accuracy of different lengths of input sequence. (b) Train time of different

lengths of input sequence.

small. Such ﬁndings reveal that DSTED has greater advantages

in sequence-to-sequence modeling of BTP than the traditional

deep learning models.

To evaluate the long-term stability of our developed model,

we illustrate the stepwise accuracy using the following the seven

time steps in Fig. 10. The results indicate that the prediction

performance will drop gradually when the length of the output

sequence is large because of the long-term dependence. In partic-

ular, we can observe that the accuracy of our model will decrease

dramatically when the output steps exceed 5 min. Because the

prediction error will gradually accumulate as the length of the

output sequence increases. Therefore, there is still a certain gap

between the theoretical level and the actual situation in multistep

prediction. But, after a detailed communication with the onsite

operators of sintering factory, we ﬁnd that the adjustment of

BTP completely depends on the experience of workers in the

present. In fact, short-term forecasting BTP in advance can also

provide constructive guidance for operators to adjust the process

parameters for the normal operation of the sintering process.

Thus, it is still meaningful and essential to achieve the BTP

multistep prediction in advance.

Then, the validation for the length of the input sequence

is also conducted for DSTED. Here, the length of the input

sequence is adjusted from 10 to 80. The three evaluation met-

rics and computational efﬁciency are simultaneously used to

evaluate the performance of our model. From Fig. 11(a),the

R2curve rises up ﬁrst, then drops dramatically at time step

50, and then rises slightly. On the other hand, we can observe

TABLE III

ABLATIVE VARIANTS PERFORMANCE ON BTP PREDICTION

that the average model training time of each epoch increases

signiﬁcantly as the input step size increases, as shown in

Fig. 11(b). After comprehensive comparison and consideration,

the ideal performance of DSTED can be obtained when the input

length is set to 40, which is more suitable for actual industrial

applications.

D. Ablation Study

To further verify the effectiveness of each component in

our DSTED, we also conducted ablation studies from three

aspects: DGRU, temporal attention (TAtt), and spatial atten-

tion (SAtt). We subsequently conducted some components as

the ablative variants. The typical variants of our model are as

follows.

1) Encoder–Decoder: We remove all components;

2) Encoder–Decoder+DGRU: The denoising GRU is em-

bedded into the encoder;

3) Encoder–Decoder+DGRU+TAtt: We omit spatial atten-

tion;

4) Encoder–Decoder+DGRU+TAt t +SAtt (DSTED): Our

integrated model.

As can be seen from Table III, our integrated DSTED outper-

forms all its ablative variants in terms of all evaluation metrics

on the BTP prediction. More speciﬁcally, DSTED has higher

R2, lower RMSE and MAE than by the basic encoder–decoder

without any components. These results also indicate that the

existing deep learning models in NLP are generally difﬁcult to be

directly applied in the ﬁeld of industrial data modeling. Because

there exist some differences between the industrial ﬁelds and

NLP. That is to say, it is necessary to improve the network

structure according to the characteristics of industrial process

data. Then we also ﬁnd that encoder–decoder with denoising

GRU obtains better results than the original encoder–decoder.

This is due to the fact that DGRU can alleviate the impact of

noisy data in the sintering process. In addition, the temporal

attention mechanism is also employed in the decoder network

to determine the discriminative encoder hidden state and capture

the time trend, which brings an improvement in R2from 85.28%

to 88.43%. The reason is that the temporal attention mechanism

can better learn correlation of samples. Finally, our integrated

model with spatial–temporal attention mechanism obtains the

best result since it can not only adaptively capture the dynamic

but also learn the relevance between latent variables and the

target variable BTP. From the results of the ablation study, we

can conclude that all well-designed components in the DSTED

exactly play important roles in the BTP multistep prediction

task.

Authorized licensed use limited to: Zhejiang University. Downloaded on May 05,2022 at 04:13:15 UTC from IEEE Xplore. Restrictions apply.

YAN et al.: DSTED: A DENOISING SPATIAL–TEMPORAL ENCODER–DECODER FRAMEWORK 10743

Fig. 12. Prediction performance on several typical parameter settings.

(a) Different learning rates. (b) Different hidden layers. (c) Different

hidden neurons. (d) Different dropouts.

E. Hyperparameter Tuning

To investigate how different hyperparameters inﬂuence the

BPT prediction performance, we conduct the sensitivity analysis

of several key hyperparameters on the BTP dataset. First, we

adjust the value of learning rate from 0.0001 to 0.05. From

Fig. 12(a), it can be seen that with the increase of the learning

rate, the RMSE of the model has a downward trend. If the

learning rate is larger, the neural network is hard to learn internal

knowledge. But too small learning rate also imposed an adverse

impact on the model training. Noticeably, our model reaches

the best performance when the learning rate is 0.001. Besides,

the number of hidden layers of GRU also plays an important

role in model training. From Fig. 12(b), it is clear that our model

falls into overﬁtting when the hidden layers arrive at 4. Thus, two

hidden layers are suitable for the network. Similarly, by trial and

error, when the number of hidden neurons is set to 20 the RMSE

is the minimum according to Fig. 12(c). Because the number of

input variables is 12, the hidden neurons should not be too small

or too large. Otherwise, the prediction model may be overﬁtting.

For simplicity, the number of hidden layers and hidden neurons

of the encoder is set to be equal to that of the decoder. Moreover,

to avoid overﬁtting, the dropout layer is embedded into the

decoder, and the optimal dropout value is 0.1 through trial and

error, as shown in Fig. 12(d). In addition, the performance of

RMSE with different iterations is also investigated for training

and testing, and the number of training iterations is selected

from range set {10,15,20,35,40}. The experimental ﬁndings

indicate that the RMSE reaches the convergent state when the

iteration is about 20. So, the number of iterations is set to

20 in this study. For another hyperparameter (batch size), the

optimal batch size is selected as 20 by changing the batch

size from the set {10,20,30,40}. As well, the number of input

neurons of the encoder is equal to the dimension of the input

variables.

V. C ONCLUSION

In this article, an end-to-end approach to sequence learning

was proposed and successfully applied to the BTP multistep

prediction of the sintering process. The integrated model was

composed of an encoder with denoising GRU and a decoder with

spatial–temporal attention mechanisms. Speciﬁcally, inspired

by the random noises in the industrial data, we designed a

denoising GRU to reduce the interference of noises and enhance

the ability of latent variables extraction. In addition, the spatial

and temporal attention modules were simultaneously embedded

into the decoder to capture the dynamic relevance of samples

and the correlation between the latent variables and the target

variable. Experimental results on the real-word dataset show

that the multistep prediction accuracy of our proposed model is

superior to the existing models. In the future, we will improve

the structure of deep learning models to solve the problem of

long-term prediction.

REFERENCES

[1] Z. Yuan and B. Wang, “Application of deep belief network in prediction

of secondary chemical components of sinter,” in Proc. 13th IEEE Conf.

Ind. Electron. Appl., 2018, pp. 2746–2751.

[2] W. Chen, B. Wang, Y. Chen, H. Zhang, and X. Li, “Using BP neural

network to predict the sinter comprehensive performance: Feo and sin-

ter yield,” Adv. Mater. Res., vol. 771, pp. 209–212, 2013, doi: 10.4028/

www.scientiﬁc.net/AMR.771.209.

[3] W. Yan, R. Xu, K. Wang, T. Di, and Z. Jiang, “Soft sensor modeling method

based on semisupervised deep learning and its application to wastewater

treatment plant,” Ind. Eng. Chem. Res., vol. 59, no. 10, pp. 4589–4601,

2020, doi: 10.1021/acs.iecr.9b05087.

[4] W. Yan, D. Tang, and Y. Lin, “A data-driven soft sensor modeling method

based on deep learning and its application,” IEEE Trans. Ind. Electron.,

vol. 64, no. 5, pp. 4237–4245, May 2017.

[5] S. Du, M. Wu, X. Chen, J. Hu, and W. Cao, “Intelligent integrated

control for burn-through point to carbon efﬁciency optimization in iron

ore sintering process,” IEEE Trans. Control Syst. Technol., vol. 28, no. 6,

pp. 2497–2505, Nov. 2020.

[6] W. Cao, Y. Zhang, J. She, M. Wu, and Y. Cao, “A dynamic subspace

model for predicting burn-through point in iron sintering process,” Inf.

Sci., vol. 466, pp. 1–12, 2018, doi: 10.1016/j.ins.2018.06.069.

[7] N. Toktassynova et al., “Modelling and control structure of a phosphorite

sinter process with grey system theory,” J. Grey Syst., vol. 32, no. 2,

pp. 150–166, 2020.

[8] S. Liu, Q. Lyu, X. Liu, Y. Sun, and X. Zhang, “A prediction system of

burn through point based on gradient boosting decision tree and decision

rules,” ISIJ Int., vol. 59, no. 12, pp. 2156–2164, 2019.

[9] B. Wang, Y. Fang, J. Sheng, and W. Gui, “BTP prediction model based on

ANN and regression analysis,” in Proc. 2nd Int. Workshop Knowl. Discov.

Data Mining., 2009, pp. 108–111, doi: 10.1109/WKDD.2009.179.

[10] Z. Zhu, G. Geng, and Q. Jiang, “Power system dynamic model reduction

based on extended krylov subspace method,” IEEE Trans. Power Syst.,

vol. 31, no. 6, pp. 4483–4494, Nov. 2016.

[11] J. Wang, X. Li, Y. Li, and K. Wang, “BTP prediction of sintering process

by using multiple models,” in Proc. 26th Chin. Control Decis. Conf., 2014,

pp. 4008–4012.

[12] S. Du, M. Wu, L. Chen, and W. Pedrycz, “Prediction model of

burn-through point with fuzzy time series for iron ore sintering pro-

cess,” Eng. Appl. Artif. Intell., vol. 102, 2021, Art. no. 104259,

doi: 10.1016/j.engappai.2021.104259.

[13] S. Du et al., “Operating mode recognition based on ﬂuctuation interval

prediction for iron ore sintering process,” IEEE/ASME Trans. Mechatron.,

vol. 25, no. 5, pp. 2297–2308, Oct. 2020.

[14] Y. Lecun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521,

no. 7553, pp. 436–444, 2015, doi: 10.1038/nature14539.

[15] A. J. Holden et al., “Reducing the dimensionality of data with neural

networks,” Neural Comput., vol. 18, no. 7, pp. 1527–1554, 2006.

Authorized licensed use limited to: Zhejiang University. Downloaded on May 05,2022 at 04:13:15 UTC from IEEE Xplore. Restrictions apply.

10744 IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. 69, NO. 10, OCTOBER 2022

[16] X. Wu, D. Sahoo, and S. C. H. Hoi, “Recent advances in deep learn-

ing for object detection,” Neurocomputing, vol. 396, pp. 39–64, 2020,

doi: 10.1016/j.neucom.2020.01.085.

[17] J. Chen, X. Qiu, P. Liu, and X. Huang, “Meta multi-task learning for

sequence modeling,” in Proc. 32nd AAAI Conf. Artif. Intell., 2018, vol. 32,

no. 1, pp. 5070–5077.

[18] X. Shi, Z. Chen, H. Wang, D. Y. Yeung, W.K. Wong, and W.C. Woo, “Con-

volutional LSTM network: A machine learning approach for precipitation

nowcasting,” in Proc. Adv. Neural Inf. Process. Syst., 2015, pp. 802–810.

[19] L. Feng, C. Zhao, Y. Li, M. Zhou, H. Qiao, and C. Fu, “Multichannel

diffusion graph convolutional network for the prediction of endpoint

composition in the converter steelmaking process,” IEEE Trans. Instrum.

Meas., vol. 70, 2021, Art no. 3000413.

[20] W. K. Tsinghua, D. Huang, F. Yang, and Y. Jiang, “Soft sensor de-

velopment and applications based on LSTM in deep neural networks,”

in Proc. IEEE Symp. Ser. Comput. Intell., 2017, vol. 2017, pp. 1–6,

doi: 10.1109/SSCI.2017.8280954.

[21] Q. Sun and Z. Ge, “A survey on deep learning for data-driven soft sensors,”

IEEE Trans. Ind. Inform., vol. 17, no. 9, pp. 5853–5866, Sep. 2021.

[22] Q. Sun and Z. Ge, “Deep learning for industrial KPI prediction: When

ensemble learning meets semi-supervised data,” IEEE Trans. Ind. Inform.,

vol. 17, no. 1, pp. 260–269, Jan. 2021.

[23] J. Loy-benitez, S. Heo, and C. Yoo, “Control engineering practice soft

sensor validation for monitoring and resilient control of sequential subway

indoor air quality through memory-gated recurrent neural networks-based

autoencoders,” Control Eng. Pract., vol. 97, 2020, Art. no. 104330,

doi: 10.1016/j.conengprac.2020.104330.

[24] X. Yuan, L. Li, and Y. Wang, “Nonlinear dynamic soft sensor modeling

with supervised long short-term memory network,” IEEE Trans. Ind.

Inform., vol. 16, no. 5, pp. 3168–3176, May 2020.

[25] L. Patanè and M. G. Xibilia, “Echo-state networks for soft sensor

design in an SRU process,” Inf. Sci., vol. 566, pp. 195–214, 2021,

doi: 10.1016/j.ins.2021.03.013.

[26] Y. C. Bo, P. Wang, X. Zhang, and B. Liu, “Modeling data-driven sensor

with a novel deep echo state network,” Chemom. Intell. Lab. Syst., vol. 206,

2020, Art. no. 104062, doi: 10.1016/j.chemolab.2020.104062.

[27] Y. L. He, Y. Tian, Y. Xu, and Q. X. Zhu, “Novel soft sensor development

using echo state network integrated with singular value decomposition:

Application to complex chemical processes,” Chemom. Intell. Lab. Syst.,

vol. 200, 2020, Art. no. 103981, doi: 10.1016/j.chemolab.2020.103981.

[28] S. Hochreiter, “Long short-term memory,” Neural Comput., vol. 9, no. 8,

pp. 1735–1780, 1997.

[29] A. M. Dai, “Semi-supervised sequence learning,” Adv. Neural Inf.Process.

Syst., vol. 28, pp. 3079–3087, 2015.

[30] K. Cho et al., “Learning phrase representations using RNN encoder-

decoder for statistical machine translation,” in Proc. Conf. Empirical

Methods Natural Lang. Process., 2014, pp. 1724–1734, doi: 10.3115/v1/

d14-1179.

[31] D. Bahdanau, K. H. Cho, and Y. Bengio, “Neural machine translation

by jointly learning to align and translate,” in Proc. 3rd Int. Conf. Learn.

Representations, Conf. Track Proc., 2015, pp. 1–15.

[32] C. S. Wang and M. Wu, “Hierarchical intelligent control system and its

application to the sintering process,”IEEE Trans. Ind. Inform., vol. 9, no. 1,

pp. 190–197, Feb. 2013.

Feng Yan received the B.S. degree in vehicle

engineering from the College of Automotive and

Trafﬁc Engineering, Jiangsu University, Zhen-

jiang, China, in 2018, and the M.S. degree in

vehicle engineering from the College of Me-

chanical Vehicle Engineering, Hunan University,

Changsha, China, in 2021. He is currently work-

ing toward the Ph.D. degree in control science

and engineering with the College of Control Sci-

ence and Engineering, Zhejiang University.

His current research interests include deep

learning, data mining, and intelligent optimization in the industrial pro-

cess applications.

Chunjie Yang (Senior Member, IEEE) received

the B.S. degree in machine design, the M.S.

degree in ﬂuid transmission and control, and

the Ph.D. degree in industrial automation from

Zhejiang University, Hangzhou, China, in 1992,

1995, and 1998, respectively.

He is currently a Professor with the College of

Control Science and Engineering, as well as a

Qiushi Distinguished Professor of Zhejiang Uni-

versity. His current research interests include ar-

tiﬁcial intelligence, machine learning modeling,

control, and fault diagnosis for industrial process.

Xinmin Zhang (Member, IEEE) received the

Ph.D. degree in system science from Kyoto Uni-

versity, Kyoto, Japan, in 2019.

From April 2019 to December 2019, he was

a Postdoctoral Research Fellow with the De-

partment of Systems Science, Kyoto University.

He is currently an Associate Professor with the

College of Control Science and Engineering,

Zhejiang University, Hangzhou, China. His re-

search interests include process control, pro-

cess data analysis, machine learning and in-

dustrial big data, and virtual sensing technology with applications to

industrial processes.

Authorized licensed use limited to: Zhejiang University. Downloaded on May 05,2022 at 04:13:15 UTC from IEEE Xplore. Restrictions apply.

Deep learning based self-adaptive modeling of multimode continuous manufacturing processes and its application to rotary drying process

Article

Full-text available

Jun 2024
J INTELL MANUF

Real-time prediction of future process outputs is critical for the model predictive control of continuous manufacturing processes. It helps identify when and how to adjust the process variables under the disturbances. A lot of recurrent neural network-based predictive models have been developed and validated on the simulated processes. Based on that, some works further consider the existence of multiple operating conditions that can be unforeseen and transitive in real-world manufacturing. However, their designed online learning mechanisms mostly focus on fast local tracking without preserving important old knowledge. Besides, the proactive input data adaptation is largely unexplored. To bridge this gap, a novel self-adaptation mechanism is proposed in this paper. This mechanism can be easily integrated into different choices of predictive model to improve the stability of performance towards various changes in a long period of manufacturing. In the proposed mechanism, the components of adaptive sequence filtering and adaptive input normalization first extract the compact and properly scaled features from the growing multivariate time series subject to delayed output response and non-stationarity. Based on the encoder-decoder network as an exemplary predictive model, the component of adaptive model update consists of a non-Euclidean loss for evaluating sequential predictions and a task-free knowledge consolidation strategy for continual learning-based regularization. The application to an industrial rotary drying process is demonstrated, where data streams are collected from four production lines over 14 months. Extensive comparative study shows the superior performance of proposed mechanism and ablation study further verifies the effectiveness of each individual component.

A Survey of Data-Driven Soft Sensing in Ironmaking System: Research Status and Opportunities

Article

Full-text available

Jun 2024

Data-driven soft sensing modeling is becoming a powerful tool in the ironmaking process due to the rapid development of machine learning and data mining. Although various soft sensing techniques have been successfully used in both the sintering process and blast furnace, they have not been comprehensively reviewed. In this work, we provide an overview of recent advances on soft sensing in the ironmaking process, with a special focus on data-driven techniques. First, we present a general soft sensing development framework of the ironmaking process based on the mechanism analysis and process characteristics. Second, we provide a detailed taxonomy of current soft sensing methods categorized by their predictive tasks (i.e., quality indicators prediction, state parameters prediction, etc.). Finally, we outline several insightful and promising directions, such as self-supervised learning and digital twins in the ironmaking process, for future research.

Intelligent Sinter Machine Speed Control System Using Optimized Fuzzy Logic Controller: An Experimental Study in Iron and Steel Plant

Article

Full-text available

Apr 2024

Intelligent control systems developed for production facilities significantly contribute to production efficiency and quality. Using intelligent control systems has now become a necessity in iron and steel sintering plants that produce millions of tonnes annually. Automatic control of the sinter machine speed, which directly affects production efficiency and quality, is one of the first issues to be addressed. The complexity of the sintering process, being affected by many variables, and the nonlinearity of these variables make it difficult to control the machine speed. This study demonstrates that we have overcome this challenge using a fuzzy logic controller (FLC), which is optimized with an adaptive neuro-fuzzy inference system (ANFIS). The FLC we have designed operates with the characteristic point of the thermal state, the mixture level, the vacuum average, and the current speed parameters. We achieved an average success rate of 95%. The developed system automatically controls the speed of the sinter machine with high accuracy, independent of the operator. The system we have developed is used continuously at the Iskenderun Iron & Steel Co. sinter plant. The results obtained from the production facility show that the developed system captures the thermal change in the sinter pallet and manages the machine accordingly, increases the sintering efficiency by at least 10%, and ensures process safety. These results revealed that the developed system can be used effectively in the iron and steel industry and the use of the system will increase efficiency.

Unified Diagnostic and Matching Framework of Fault and Quality for Robotic Grinding System

Article

Full-text available

Mar 2024

This article explores the corresponding relationship between the equipment fault and grinding quality in a robotic grinding system, and establishes a unified and lightweight monitoring and matching framework, providing a perceptual basis for accurate tracking and effective control of grinding quality. Firstly, a multi-channel vibration imaging method named Wavetrizorn is developed based on vibration signals, and the images generated were used to train a fault diagnosis model for the equipment. Particularly, a lossy reconstruction algorithm based on wavelet packet and convolutional autoencoder (WPCAE) is proposed for vibration signals with strong noise, which can help networks to extract the fault information. Then a regression model mapping from vibration signal to force signal is established based on the reconstructed signal graphs to monitor the grinding quality. Finally, to match the fault type and the grinding quality, a unique canonical correlation feature (CCF) is proposed and calculated, which can achieve precise quality traceability. Consequently, during the online monitoring, it is only necessary to use vibration signals to regress the CCF to accurately match the fault type and grinding quality with significant efficiency. The effectiveness of the framework is verified on a robotic grinding system in the laboratory.

Identification of working conditions and prediction of FeO content in sintering process of iron ore fines

Article

Jun 2024

The iron oxide (FeO) content had a significant impact on both the metallurgical properties of sintered ores and the economic indicators of the sintering process. Precisely predicting FeO content possessed substantial potential for enhancing the quality of sintered ore and optimizing the sintering process. A multi-model integrated prediction framework for FeO content during the iron ore sintering process was presented. By applying the affinity propagation clustering algorithm, different working conditions were efficiently classified and the support vector machine algorithm was utilized to identify these conditions. Comparison of several models under different working conditions was carried out. The regression prediction model characterized by high precision and robust stability was selected. The model was integrated into the comprehensive multi-model framework. The precision, reliability and credibility of the model were validated through actual production data, yielding an impressive accuracy of 94.57% and a minimal absolute error of 0.13 in FeO content prediction. The real-time prediction of FeO content provided excellent guidance for on-site sinter production.

Unveiling dynamics changes: Singular spectrum analysis-based method for detecting concept drift in industrial data streams

Article

Mar 2024
KNOWL-BASED SYST

Application of deep learning in iron ore sintering process: a review

Article

Full-text available

Mar 2024
J IRON STEEL RES INT

In the wake of the era of big data, the techniques of deep learning have become an essential research direction in the machine learning field and are beginning to be applied in the steel industry. The sintering process is an extremely complex industrial scene. As the main process of the blast furnace ironmaking industry, it has great economic value and environmental protection significance for iron and steel enterprises. It is also one of the fields where deep learning is still in the exploration stage. In order to explore the application prospects of deep learning techniques in iron ore sintering, a comprehensive summary and conclusion of deep learning models for intelligent sintering were presented after reviewing the sintering process and deep learning models in a large number of research literatures. Firstly, the mechanisms and characteristics of parameters in sintering processes were introduced and analysed in detail, and then, the development of iron ore sintering simulation techniques was introduced. Secondly, deep learning techniques were introduced, including commonly used models of deep learning and their applications. Thirdly, the current status of applications of various types of deep learning models in sintering processes was elaborated in detail from the aspects of prediction, controlling, and optimisation of key parameters. Generally speaking, deep learning models that could be more effectively implemented in more situations of the sintering and even steel industry chain will promote the intelligent development of the metallurgical industry.

Soft-Sensing of Burn-Through Point Based on Weighted Kernel Just-in-Time Learning and Fuzzy Broad-Learning System in Sintering Process

Article

May 2024

Burn-through point (BTP) is an essential thermal state parameter in a sintering process, which is a direct reflection of the stability of this process. However, it cannot be measured online. Soft-sensing technology offers a reliable method for estimating unmeasurable variables in industrial processes. Here, a soft-sensing model for BTP based on weighted kernel just-in-time learning (WKJITL) and fuzzy broad-learning system (FBLS) is built. First, an abnormal production data detection and correction strategy is employed to process the production data, and the mechanism analysis and mutual information analysis are utilized to specify the detectable process variables that are directly related to BTP. Then, the WKJITL method is proposed to obtain historical production data similar to the query data of BTP for local learning modeling, and the FBLS is utilized as an efficient modeling method for the soft-sensing prediction of BTP. Finally, the results of simulation experiments based on actual sintering production data reveal that the developed soft-sensing model of BTP exhibits better prediction accuracy and efficiency compared with some advanced modeling methods. Furthermore, the proposed method is of general nature and can also be easily applied to other industrial processes.

A Hybrid Spatial-temporal Deep Learning Prediction Model of Industrial Methanol-to-Olefins Process

Article

Dec 2023

Methanol-to-olefins, as a promising non-oil pathway for the synthesis of light olefins, has been successfully industrialized. The accurate prediction of process variables can yield significant benefits for advanced process control and optimization. The challenge of this task is underscored by the failure of traditional methods in capturing the complex characteristics of industrial processes, such as high nonlinearities, dynamics, and data distribution shift caused by diverse operating conditions. In this paper, we propose a novel hybrid spatial-temporal deep learning prediction model to address these issues. Firstly, a unique data normalization technique called reversible instance normalization is employed to solve the problem of different data distributions. Subsequently, convolutional neural networks integrated with the self-attention mechanism are utilized to extract the temporal patterns. Meanwhile, a multi-graph convolutional network is leveraged to model the spatial interactions. Afterward, the extracted temporal and spatial features are fused as input into a fully connected neural network to complete the prediction. Finally, the outputs are denormalized to obtain the ultimate results. The monitoring results of the dynamic trends of process variables in an actual industrial methanol-to-olefins process demonstrate that our model not only achieves superior prediction performance but also can reveal complex spatial-temporal relationships using the learned attention matrices and adjacency matrices, making the model more interpretable. Lastly, this model is deployed onto an end-to-end Industrial Internet Platform, which achieves effective practical results.

Semisupervised Classification With Sequence Gaussian Mixture Variational Autoencoder

Article

Jan 2023

Evenness of filament yarn is a crucial indicator that significantly impacts the quality of downstream textile products. Therefore, accurate real-time prediction and classification of the coefficient of variation (CV) value, which serves as an indicator of evenness, are of utmost importance. However, current detection methods predominantly rely on offline evenness testing devices, compromising the real-time capability and accuracy of evenness detection. To address this challenge, a semisupervised sequence Gaussian mixture variational autoencoder (VAE) model is developed for predicting and classifying the CV value. This model combines a mix VAE and a sequence-to-sequence structure, integrating a classifier to achieve semisupervised classification of time-series data. To validate the effectiveness of the proposed method, both software and hardware enhancements were implemented on the existing capacitance-based yarn evenness testing device, enabling uninterrupted measurement of yarn evenness and length. The collected data were then used to train the model. Experimental results demonstrate that the proposed model achieves an accuracy rate of 85% in classifying the CV value of the filament yarn.

A Survey on Deep Learning for Data-Driven Soft Sensors

Article

Full-text available

Jan 2021

Soft sensors are widely constructed in process industry to realize process monitoring, quality prediction, and many other import applications. With the development of hardware and software, industrial processes have embraced new characteristics which lead to the poor performance of traditional soft sensor modeling methods. Deep learning, as a kind of data-driven approach, show its great potential in many fields, as well as in soft sensing scenarios. After a period of development, especially in the last five years, many new issues raise which need to be investigated. Therefore, in this paper, the necessity and significance of deep learning for soft sensor applications are demonstrated firstly by analyzing the merits of deep learning and the trends of industrial processes. Next, mainstream deep learning models, tricks, and frameworks/toolkits are summarized and discussed to help designers propel the developing progress of soft sensors. Then, existing works are reviewed and analyzed to discuss the demands and problems occurred in practical applications. Finally, conclusions and prospects are given.

Prediction model of burn-through point with fuzzy time series for iron ore sintering process

Article

Apr 2021
ENG APPL ARTIF INTEL

Burn-through point (BTP) is an essential parameter in the iron ore sintering process. Operators usually judge whether the current production is stable by monitoring the BTP. It comes with significant application prospects to predict the BTP accurately. A prediction model of the BTP with fuzzy time series is designed in this paper. First, the fuzzy time series prediction method with the Fuzzy C-Means clustering is presented as the core modeling method. A prediction model of the response is constructed to obtain a timely response to the current BTP. The prediction model of the difference is established to estimate the present unmeasurable disturbance on the BTP. Then, a hybrid prediction model is built, which realizes the composition of these two models by an adjustment factor. Finally, a series of experiments is carried out using the raw time series data from an iron and steel plant. The experimental result shows that the designed model has better prediction performance for the BTP than existing models, which is an advantage resulting from the hybrid structure and the fuzzy time series prediction model with the Fuzzy C-Means clustering. This prediction model of the BTP implies the foundation for the stable control of the iron ore sintering process.

Echo-state networks for soft sensor design in an SRU process

Article

Mar 2021
INFORM SCIENCES

The implementation of soft sensors for industrial processes is expanding in applications for recent machine learning techniques. In this work, strategies based on reservoir computing are applied to developing dynamical models of target variables in a sulfur recovery unit (SRU) of a refinery plant in Italy. In particular, a specific type of recurrent network, namely an echo-state network (ESN), is adopted to estimate key process variables on the SRU. Two process lines are considered to evaluate the proposed algorithm on different datasets in terms of estimation performance and computational effort of the learning process. The obtained results are evaluated in comparison with other recurrent networks, based on long short-term memory, and with other techniques reported in the literature, demonstrating the feasibility of the proposed approach. Furthermore, the introduction of intrinsic plasticity (IP) is also considered to adapt the reservoir parameters to the provided inputs, achieving a significant improvement in the statistical distribution of the results obtained for the pool of learned networks. The reported results show that ESN-IP represents a suitable solution for identifying dynamical models of the industrial processes, avoiding the time-consuming regressor selection procedure, which is needed when a static network is adopted to design a dynamical model.

Multichannel Diffusion Graph Convolutional Network for the Prediction of End-Point Composition in the Converter Steelmaking Process

Article

Nov 2020

The converter steelmaking process smelts hot metal to liquid steel and occupies an important position in industry. The composition of liquid steel at the endpoint is an essential quality index, including the concentrations of multiple elements, such as carbon, silicon, and manganese. Accurately predicting endpoint composition is the basis of production optimization. Hence, a multichannel diffusion graph convolutional network (MCDGCN) is presented in this article. Unlike conventional models, the developed MCDGCN describes the converter steelmaking process as a graph to exploit the correlations among element concentrations for an accurate endpoint composition prediction. We also develop a unique $K$ -hop diffusion method to extract the globally consistent information over the graph for predicting each element. The proposed method addresses the composition prediction task for a realistic converter steelmaking process. To the best of our knowledge, this is the first time that up to 15 elements of liquid steel are covered and predicted to present a comprehensive process model. Compared with six benchmark models, MCDGCN presents state-of-the-art results, i.e., an average $R^{2}$ of 0.8475 and an average MAE of 0.0189, which shows that the correlation mining of graph deep learning can indeed improve the prediction performance for endpoint composition.

Modelling and Control Structure of a Phosphorite Sinter Process with Grey System Theory

Article

Oct 2020

Modeling data-driven sensor with a novel deep echo state network

Article

Jun 2020
CHEMOMETR INTELL LAB

Data-driven approach has been widely utilized in modeling soft sensor for predicting key quality variables in process engineering area. The soft sensor is generally a time dependent dynamical model between the input and the output. Echo state network (ESN) is a typical data-driven modeling tool, which has exhibited excellent performance in temporal data processing area. However, the memory mode in the traditional ESN lacks flexibility. It is sometimes hard to preserve sufficient input features in the states, especially for modeling long-term dependent soft sensors. To solve this problem, this paper proposes an asynchronously deep echo state network (ADESN), which is composed of a number of sub-reservoirs that are connected one by one in sequence. Additionally, time delay modules are inserted between every two adjacent layers. The ADESN scheme preserves more input history in the states. Moreover, it can realize a selective memory. The validity of the ADESN is demonstrated on modeling a number of numerical and real-life soft sensors.

Operating Mode Recognition Based on Fluctuation Interval Prediction for Iron Ore Sintering Process

Article

Oct 2020

The operating mode is an essential factor affecting product quality and yield of the sinter ore, which inspires the realization of operating mode recognition. Taking burn-through point (BTP) as the decision parameter of operating mode, an operating mode recognition method based on the fluctuation interval prediction is presented. Firstly, combining the principal component analysis and the fuzzy information granulation method, a fluctuation interval prediction model of the BTP is established through utilizing the Elman neural network. Then, the operating mode classification rules are built according to the data distribution of the BTP in the fluctuation interval. Finally, experiments are executed with the data collected from a factory. The results indicate that it can effectively predict the fluctuation interval of the BTP, and then successfully recognize the operating mode. The proposed method provides a valid reference to control the stable operation of the iron ore sintering process.

Novel soft sensor development using echo state network integrated with singular value decomposition: Application to complex chemical processes

Article

Feb 2020
CHEMOMETR INTELL LAB

It is of great importance to develop advanced soft sensors for ensuring the safety and stability of complex industrial processes. Unluckily, with the increasing scale of chemical processes, it becomes more and more demanding to develop soft sensor with high accuracy. In addition, most of industrial processes are dynamic. As a result, the soft sensors developed using static models cannot achieve acceptable performance. In order to handle this problem, the Echo state network (ESN) as a kind of recurrent neural network is selected. However, the output weights of ESN are calculated linearly. On one hand, the collinear in the reserve layer outputs may decrease the performance; on the other hand, the over-fitting problem may occur. To enhance and improve the ESN performance, singular value decomposition based ESN (SVD-ESN) is presented. In the SVD-ESN method, the singular value decomposition instead of the traditional least square is adopted to calculate the weights between the output layer and the reserve layer. Through singular value analysis in the outputs of the reserve layer, appropriate defining parameters are selected to enhance the accuracy and ensure the computing speed. As a result, the collinearity and over-fitting problem is solved; then the performance of ESN is enhanced. To test and validate the performance of SVD-ESN, the proposed SVD-ESN is developed as soft sensor for the High Density Polyethylene (HDPE) production process and Purified Terephthalic Acid (PTA) production process. Compared with the conventional ESN, Extreme Learning Machine (ELM), Dynamic Window based ELM (DW-ELM) and Long Short-Term Memory (LSTM), the simulation results show that the proposed SVD-ESN model obtains better performance in terms of prediction accuracy, which conforms that the proposed SVD-ESN can be used as an effective dynamic model for developing accurate soft sensors.

Soft sensor validation for monitoring and resilient control of sequential subway indoor air quality through memory-gated recurrent neural networks-based autoencoders

Article

Feb 2020
CONTROL ENG PRACT

Indoor air quality (IAQ) measurements play an important role in the subway ventilation system control, influencing over crucial factors as ventilation energy consumption and commuters’ health. Therefore, faulty sensors may result in misinterpreting the IAQ conditions and misoperating the air delivery rate level in subway stations. However, due to the IAQ data properties of dynamism and non-Gaussian distribution. Linear and fixed structures are not sufficient to extract essential features from the IAQ data. This paper presents a machine learning-based soft sensor validation technique to detect, diagnose, identify, and reconstruct faulty measurements of the multivariate IAQ data in subway stations. The proposed method is memory-gated recurrent neural networks-based autoencoders (MG-RNN-AE), which are capable of processing sequential and dynamic IAQ information. The performance of the sensor validation was evaluated through several metrics to consequently be compared among different methods, being the batch normalization-based gated recurrent unit (BN-GRU) method, the most effective to detect ( = 100%) and reconstruct faulty IAQ sensors ( = 0.45-0.79). Additionally, the effects of the faulty and repaired measurements in the ventilation system were evaluated to determine that the proposed method is capable of finding a sustainable balance between energy demand and commuters’ health level.

A Soft Sensor Modeling Method Based on Semi-Supervised Deep Learning and its Application to Wastewater Treatment Plant

Article

Feb 2020

Soft sensors have been widely used in industrial processes to improve product quality and ensure safety during production. This paper proposes a semi-supervised deep neural regression network with embedding manifold (called SSE-DNN) for soft sensor modeling that integrates manifold embedding into deep neural regression networks. Manifold embedding is imposed on the hidden layer of the deep neural regression network to form a semi-supervised deep neural regression network. Manifold embedding exploits the local neighbor relationship among industrial data and utilizes unlabeled data effectively to improve the performance of deep neural regression model. The SSE-DNN model exploits the global information and local manifold among industrial large data simultaneously and implements implicitly multi-modal models of industrial process. The soft sensor model based on the SSE-DNN is applied to estimation of total Kjeldahl nitrogen (TKN) in a long-term complicated wastewater treatment process. The experimental results demonstrate that the SSE-DNN model has the better performance than other soft sensors and provides an effective method for soft sensor modeling of complex industrial processes.

DSTED: A Denoising Spatial-Temporal Encoder-Decoder Framework for Multistep Prediction of Burn-Through Point in Sintering Process

Abstract and Figures

Recommended publications

Stacked Spatial-Temporal Autoencoder for Quality Prediction in Industrial Processes

A 3D Convolution-Based Burn-Through Point Multi-Step Prediction Model for Sintering Process

Data‐Driven Modeling Methods in Sintering Process: Current Research Status and Perspectives

A Decomposition-based Encoder-Decoder Framework for Multi-step Prediction of Burn-Through Point in S...