ArticlePDF Available

A Real-Time Steel Surface Defect Detection Approach With High Accuracy

Authors:

Abstract

Surface defect inspection is a key step to ensure the quality of the hot rolled steel surface. However, current advanced detection methods have high precision but low detection speed, which hinders the application of the detector in actual production. In this work, a real-time detection network (RDN) focusing on both speed and accuracy is proposed to solve the problem of steel surface defect detection. RDN takes ResNet-dcn, a modular encoding and decoding network with light weights, as the basic convolutional architecture whose backbone is pre-trained on ImageNet. To improve the detection accuracy, a skip layer connection module (SCM) and a pyramid feature fusion module (PFM) are involved into RDN. On the standard dataset NEU-DET, the proposed method can achieve the state-of-the-art recognition speed of 64 frames per second(FPS) and the mean average precision of 80.0% on a single GPU, which fully meets the requirements of the detection accuracy and speed in the actual production line.
IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 71, 2022 5005610
A Real-Time Steel Surface Defect Detection
Approach With High Accuracy
Wenyan Wang , Chunfeng Mi ,ZihengWu ,KunLu , Hongming Long ,BaigenPan ,
Dan Li ,JunZhang ,PengChen , and Bing Wang ,Senior Member, IEEE
Abstract Surface defect inspection is a key step to ensure the
quality of the hot rolled steel surface. However, current advanced
detection (DET) methods have high precision but low detection
speed, which hinders the application of the detector in actual
production. In this work, a real-time detection network (RDN)
focusing on both speed and accuracy is proposed to solve the
problem of steel surface defect detection. RDN takes ResNet-
dcn, a modular encoding, and decoding network with light
weights, as the basic convolutional architecture whose backbone
is pretrained on ImageNet. To improve the detection accuracy,
a skip layer connection module (SCM) and a pyramid feature
fusion module (PFM) are involved into RDN. On the standard
dataset NEU-DET, the proposed method can achieve the state-of-
the-art recognition speed of 64 frames per second (FPS) and the
mean average precision of 80.0% on a single GPU, which fully
meets the requirements of the detection accuracy and speed in
the actual production line.
Index Terms—Detection (DET) speed and accuracy, hot rolled
steel surface, pyramid feature fusion module (PFM), skip layer
connection module (SCM), surface defect detection.
I. INTRODUCTION
AS THE basic raw material of iron and steel industry,
the surface quality of hot rolled strips seriously affects
the physical and chemical properties of downstream and
final products [1]–[3]. However, traditional detection (DET)
methods based on artificial observation are time-consuming
Manuscript received July 28, 2021; revised October 9, 2021; accepted Octo-
ber 19, 2021. Date of publication November 15, 2021; date of current version
March 3, 2022. This work was supported by the National Natural Science
Foundation of China under Grant 62172004, Grant 61672035, and Grant
61872004; and in part by the Educational Commission of Anhui Province
under Grant KJ2019ZD05. The Associate Editor coordinating the review
process for this article was Dr. Jing Lei. (Corresponding authors: Bing Wang;
Pen g Ch e n. )
Wenyan Wang and Hongming Long are with the School of Metallurgical
Engineering, Anhui University of Technology, Ma’anshan, Anhui 243002,
China, and also with the Key Laboratory of Metallurgical Emission Reduction
and Resources Recycling, Ministry of Education, Anhui University of Tech-
nology, Ma’anshan 243002, China (e-mail: wenyanwang9203@gmail.com;
13956233905@126.com).
Chunfeng Mi, Ziheng Wu, Kun Lu, Dan Li, and Bing Wang are
with the School of Electrical and Information Engineering, Anhui
University of Technology, Ma’anshan, Anhui 243032, China (e-mail:
michunfeng64@gmail.com; wziheng@ahut.edu.cn; kunlu0819@gmail.com;
lanldok@163.com; wangbing@ustc.edu).
Baigen Pan is with the Anhui Provincial Quality Supervision and Inspection
Center for Motor Products and Parts, Xuancheng, Anhui 242500, China
(e-mail: pbg770302@aliyun.com).
Jun Zhang and Peng Chen are with the Co-Innovation Center for Information
Supply and Assurance Technology, Anhui University, Hefei, Anhui 230032,
China (e-mail: junzhang@ahu.edu.cn; pchen@ahu.edu.cn).
Digital Object Identifier 10.1109/TIM.2021.3127648
and unreliable. Therefore, it is desirable to allow a machine
to automatically detect surface quality.
Automatic defect detection technology based on machine
vision has good performance in object classification tasks
[4]–[9]. However, the defect features it uses require manual
operation, and the changeable steel surface defects with low-
dimensional artificial features generally have poor general-
ization performance, while the extraction process of a large
number of features will slow down the detection process,
which hinders the application of machine vision-based defect
detection methods in the actual industrial production [9].
In recent years, the deep-learning-based methods have made
many breakthroughs in the classification of surface defects
of hot rolled steel strip [10]–[12]. Konovalenko et al. [11]
proposed a number of convolutional neural networks (CNNs)
based on residual network ResNet-50, in which the class
weight, minority class over-sampling, and focal loss strategies
are adopted to solve the data imbalance of training samples.
The experimental results show that this method can effectively
detect defects with high accuracy of 96.91% [11]. Wang et
al. [13] developed an improved CNN model with reduced
training parameters and improved the accuracy and speed
of defect classification to 99.63% and 333 frames per sec-
ond (FPS), respectively. Azimi et al. [14] first segment the
microstructure of low carbon steel by full convolution neural
network, and then identify the category of components through
these structures. This kind of research provides a robust and
objective method for steel quality appreciation [14].
However, in the classification task, the model is only
required to give each defect image a defect category. When
an image contains two or more types of defects, as shown in
Fig. 1, the classification model will not classify these defects in
detail. In contrast, object detection technology can enumerate
all types of defects and accurately indicate the location of these
defects. Wang et al. [15] proposed an improved ResNet50 and
enhanced faster RCNN for steel surface defect detection, and
achieved the classification accuracy and speed of the model
of 98.2% and 63 ms, which significantly improves the perfor-
mance of the model compared with other methods. Aiming
at the surface defect detection of hot-rolled strip and on the
NEU-DET dataset with the characteristics of large intraclass
difference and high interclass similarity, He et al. [16] pro-
posed an end-to-end defect detection network that integrating
multilevel features into one feature map, and achieved the
mean average precision (mAP) of 82.3%. However, when they
1557-9662 © 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: Anhui University of Technology. Downloaded on April 09,2022 at 09:04:44 UTC from IEEE Xplore. Restrictions apply.
5005610 IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 71, 2022
Fig. 1. Surface defect image of hot rolled strip containing two kinds of
defects.
increased the detection speed to 20 FPS, the mAP of the
detection model drops to 70% [16]. Also on the NEU-DET
dataset, Hatab et al. [17] used the YOLO network for defect
detection, and achieved the mAP of 70.66% and the detection
speed of 85 ms per image. Kou et al. [18] developed an end-to-
end defect detection model based on YOLO-V3, and yielded
72.2% mAP on the NEU-DET dataset. Aiming at the problem
that deep learning method is difficult to recognize small and
complex targets, Zhao et al. [19] proposed an improved Faster-
RCNN method, and achieved the mAP of 75%. To describe
the defect information in more detail, some segmenta-
tion models are also proposed in recent years [20], [21].
However, it is impractical to segment defects of a steel plate
in industrial production, and the accuracy and speed of the
detection methods are out of balance.
To achieve faster detection speed and improve the general-
ization performance of the model, a real-time defect detection
network (RDN) is proposed in this work, which has three
innovations in the structure. First, compared with the existing
classic object detection network, such as Faster RCNN, which
usually uses a regional proposal network (RPN) to generate
bounding boxes, RDN is an end-to-end detection network.
Furthermore, different from one-stage detection method that
lists as many proposals as possible, such as YOLO, SSD, and
M2Det, RDN uses the center point of the target to generate
a bounding box, which is an anchor free method with faster
detection speed.
Second, the highest-level feature map containing rich
semantic information is uaually used to detect defects in deep-
learning-based detection networks [22], [23]. However, in the
NEU-DET dataset, there are many small defects that need to
be detected, and the characteristics of large similarity between
defect categories and small differences within classes make
the model difficult to effectively detect defect categories and
location information for a single feature map. It has been
proven that different levels of features in CNN represent
different information about an object [24], i.e., low-level
features contain precise location information, while high-level
features have rich contextual semantics. Therefore, in this
work, the features extracted from the last convolution layer
of each stage of the backbone are integrated to make full use
of the shallow layer information, which is called the skip-
layer connection module (SCM). In addition, to improve the
detection performance of defect images with large intraclass
differences and high interclass similarity, a pyramid feature
fusion module (PFM) that combines multiple high-feature
layers is also proposed.
Thirdly, inspection speed is an important problem for the
application of the model into actual production. In order to
accelerate the detection speed, a lightweight and modular
ResNet-18 network is used as the feature extraction backbone
in this work. In particular, ResNet-18 is more suitable for
small sample datasets because it has few parameters to be
trained. The experimental results show that the lightweight
backbone and feature fusion strategy can significantly improve
the accuracy and speed of the detection model.
In summary, the main contributions of this work are as
follows.
1) For steel defect classification and localization, an end-to-
end accuracy-speed balanced defect detection network is
introduced.
2) A SCM between shallow layers is proposed, which can
provide more accurate position information for low-
contrast images.
3) The PFM is proposed to improve the network’s ability to
detect defects with large intraclass differences and high
interclass similarity.
4) Using a simple backbone ResNet-18, our network runs
at 64 FPS with mAP of 80.0%, which fully meets the
speed requirements in actual production.
II. REAL-TIME DEFECT DETECTION NETWORK
The overall architecture of the balanced defect detection
network proposed in this work is shown in Fig. 2. RDN
uses baseline convolution architecture and multilevel feature
fusion to extract features from the input image, and then uses
the keypoint evaluation and regression method to obtain the
center point and size attributes of the object. Considering
the characteristics of the NEU-DET dataset mentioned above,
this work constructed two multilevel feature fusion modules
with few parameters. One is the SCM which combines the
features with the same resolution, the other is the PFM. More
details of the backbone selection and feature fusion module
are described below.
A. Baseline Convolution Architecture
Generally, a network backbone with a small number of
parameters and a pretrained model on ImageNet can avoid
over-fitting and accelerate the speed of model detection, espe-
cially for small datasets [26]. Compared with some backbones
widely used in object classification and detection, such as
VGG, DLA, and ResNeXt-101, ResNet with 18 convolution
layers have fewer model parameters and modular convolution
Authorized licensed use limited to: Anhui University of Technology. Downloaded on April 09,2022 at 09:04:44 UTC from IEEE Xplore. Restrictions apply.
WANG et al.: REAL-TIME STEEL SURFACE DEFECT DETECTION APPROACH WITH HIGH ACCURACY 5005610
TAB L E I
DETAILED STRUCTURES OF RESNET18-DCN
structure, which make it easier to integrate and deploy on the
industrial devices [22], [23], [25], [27]. In addition, its superior
performance in the task of surface defect classification of hot
rolled strip has been proven in this work. Therefore, ResNet-18
is selected as the backbone of the industrial defects detection
model.
Different from the previous object detection network, where
the highest-level feature map with the smallest size of back-
bone is adopted to detect and classify the attributes of targets,
an encoding and decoding network ResNet18-dcn with large
output resolution is proposed in this work as the baseline
convolution architecture since the highest-level feature map
has a large receptive field, which is not conducive to the
detection of small targets [24], [28]. In the decoding network
of ResNet18-dcn, the channel number and resolution of the
feature map are adjusted by the deformation convolution and
deconvolution. The detailed structures of ResNet18-dcn are
given in Table I.
B. Skip-Layer Connection Module
In deep learning methods, the category and location infor-
mation of defects are mainly obtained by multiple convolution
of the input image. However, the receptive field of a deep
feature map will expand with the increase of the number
of layers, which weakens the perception of local location
information, and therefore affects the accurate positioning.
Particularly, the loss of location information aggravates the
difficulty of small object detection. The size of the receptive
field is defined as follows:
lk=lk1+fk1
k1
i=1
si(1)
where lk1is the receptive field size of the (k1)th layer,
fkis the size of the filter or pooled kernel, and siis the stride
of layer i.
The schematic of receptive filed is shown in Fig. 3. It can
be found that the shallow layer has a small receptive field,
which is beneficial to detect small-scale defects. Therefore,
this work involves the shallow features into the deep feature
maps to make full use of the location information of defects.
In addition, a decoding network is added to alleviate the
location loss caused by very small feature map.
It has been proven that skip layer connection (SLC) can
simplify the network learning process, enhance gradient propa-
gation and make the feature expression of input variables more
comprehensive [23], [27], [29], [30]. Additionally, relevant
research also shows that long SLC with multiple SLCs can per-
form in-depth supervision of the entire network structure [31].
To improve the detection performance of the model for small
objects, an SCM that contains multiple long SLCs is designed
into the network without increasing the calculation of the
model. In RDN, the SCM is introduced between two feature
maps of the same size, as shown by the blue connecting line
in Fig. 2.
C. Pyramid Feature Fusion Module
In detection networks, the feature pyramid is an important
component for detecting objects at different scales [24], [28].
However, it has a slow detection speed and a large amount
of parameters. To accurately distinguish the defect types with
a large intraclass difference and interclass similarity, a simple
PFM which integrates multiple high-level features into one
feature map is designed to promote the highest-level feature
map have sufficient sample feature information.
To fuse features with different sizes in the inverted pyramid,
as shown in Fig. 2, PFM is attached to the decoding structure
of the network, which is connected by yellow lines. In PFM,
four feature maps at different levels in the decoding network
are used as input. The output is the sum of the corresponding
elements of these feature maps after size transformation.
To keep the output resolution of the network, four input
branches in PFM are expanded to the same size by deconvo-
lution operations with different convolution kernel sizes and
strides.
III. TRAINING AND INFERENCE
A. Training
The defect detector based on keypoints model an object as
a single point (the center point of its bounding box), and then
uses this keypoint to get the target category, size (width and
height), and center point offset. In the RDN structure shown
in Fig. 2, the cls layer is used to output the class probability
of the target and a keypoint heatmap. Let IRW×H×3be
an input image of width Wand height H,thecls layer will
produce a keypoint heatmap ˆ
Yxyc =[0,1](W/R)×(H/R)×Cwhere
Ris the output stride of the network and Cis the number
of defect categories. In this work, Ris 4, and the defect
type is 6. A prediction ˆ
Yxyc =1 corresponds to a detected
object, while ˆ
Yxyc =0 is background. The training objective
is a penalty-reduced pixel-wise logistic regression with focal
loss
L=−1
N
xyc 1ˆ
Yxycαlogˆ
Yxyc,if Yxyc =1
1Yxycβˆ
Yxyc log1ˆ
Yxyc,otherwise
(2)
Authorized licensed use limited to: Anhui University of Technology. Downloaded on April 09,2022 at 09:04:44 UTC from IEEE Xplore. Restrictions apply.
5005610 IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 71, 2022
Fig. 2. Overall architecture of RDN.
where ˆ
Yxyc represents the probability of category cat the
position (x,y),Yxyc is the value corresponding to the ground-
truth center point, which can be obtained by Gaussian kernel
exp (((x˜p2
x)+(y˜p2
y)/2σ2
p)),where ˜p=(p/R)is
the center point position of ground truth object kon the low-
resolution output feature map, pis ground-truth center point
location, and σpis the standard deviation of object adaptation.
αand βare used to increase the loss of samples that are
difficult to classify and reduce the penalty of locations around
the ground truth, respectively, [32], [33]. Nis the number
of objects in an image. In focal loss, the previous work has
shown that, more weight becomes concentrated on the hard
negative examples substantially as αincreases from 0 and 2,
substantially more weight becomes concentrated on the hard
negative examples. When α=2, the model achieves the best
detection performance [32]. On this basis, Law and Deng [33]
proposed a variant of focal loss and set βto 4, which allows
the network to mainly optimize the hard samples. In this work,
the same target detection algorithm is studied. Therefore, the
values of αand βarealsosetto2and4.
Suppose the truth coordinates of target kis (x1,y1,x2,y2),
then its size sk, i,e., width and height, is (x2x1,y2y1).
The loc_wh layer is used to predict the size of k, and it can
be trained with a smoothL1loss function defined as
Lsize =1
N
N
k=1
smoothL1ˆ
Sksk(3)
where
smoothL1=0.5x2,if |x|<1
|x|−0.5,otherwise.
ˆ
Skis the predicted size.
To compensate for the input image pixel position error
caused by down pooling operation in the network, the offset
size of center-point position is predicted in this work, and the
loss function is defined as
Loff =1
N
xyc
smoothL1ˆ
Op
R˜p (4)
where ˆ
Odenotes the predicted center offset of target, pis
ground-true center point location, and ˜pis the position of p
on the low-resolution output feature map.
According to the above definitions, the overall training
objective of this work is to minimize the multitask loss
function, which is defined as
Lxyc =L+λsizeLsize +Loff (5)
where λsize =0.1, and its value is determined by the experi-
mental results in Section IV-D of this work.
B. Inference
In the inference phase, to suppress the overlap box, the
3×3 max-pooling operation is first used on the heatmap of
each category, and then the top 50 peaks of the remaining
response values are kept. Let ˆ
Pcbe the set of ndetected center
points ˆp=(ˆxi,ˆyi)n
i=1of class c, the detection coordinate of
the target is
ˆxi+δˆxiˆwi
2,ˆyi+δˆyi
ˆ
hi
2,ˆxi+δˆxi+ˆwi
2,ˆyi+δˆyi
ˆ
hi
2
(6)
where ˆxiˆyi)is the center point offset and (ˆwi,ˆ
hi)is the
predicted size. For a specific test image, the detection result
is that the rectangular boxes produced by multiple scales and
filtered after the nonmaximum suppression method [34].
IV. EXPERIMENTS
A. Implementation Details
For all experiments based on RDN, the input image size
is 384 ×384, the initial learning rate is 1.25 ×104,and
Authorized licensed use limited to: Anhui University of Technology. Downloaded on April 09,2022 at 09:04:44 UTC from IEEE Xplore. Restrictions apply.
WANG et al.: REAL-TIME STEEL SURFACE DEFECT DETECTION APPROACH WITH HIGH ACCURACY 5005610
TAB L E I I
DETECTION RESULTS OF DIFFERENT BASELINE NETWORKS
Fig. 3. Schematic of receptive field size.
then decrease it to 1.25 ×105and 1.25 ×106at 60 and
120 epochs, and stop at 160 epochs. RDN is developed with
PyTorch v1.3.1. We conduct experiments on the NVIDIA Titan
X GPU hardware platform. The batch size is set to 24, and
Adam is used to optimizing the overall objective. To avoid
over fitting of the model, several data augmentation methods,
such as shift, rotation, reflection, etc., are performed in training
phase. It is worth noting that the values of these hyperparame-
ters are obtained by referring to some related object detection
documents and verified and/or adjusted through a large number
of experiments [11], [24], [38]–[44].
The data used in this work comes from a surface defect
dataset named NEU-DET constructed by the Northeast Uni-
versity, China [14]. The database collects gray images of six
typical surface defects of the hot-rolled strip, including rolled-
in scale (Rs), scratches (Sc), pitted surface (Ps), inclusion (In),
patches (Pa), and crazing (Cr), and provides annotation files
for these defects. The dataset contains 300 images for each
type and therefore a total of 1800 for six types of surface
defects of the hot-rolled strip. Some of them are shown in
Fig. 4. For the defect detection task in this work, 70% of the
images (210 images for each of them) are randomly selected
to train the model, and the remains are used for model testing.
B. Defect Classification on NEU
Defect classification is one of the tasks of defect detec-
tion, and therefore a good defect classification performance
should be positively correlated with a stronger defect detection
performance. Therefore, we report the performance of the
feature extraction method used in our work, such as VGG16,
Fig. 4. Six typical defect sample images in NEU-DET database. (a) PS.
(b) In. (c) RS. (d) Cr. (e) Sc. (f) Pa.
ResNet18, ResNet34, ResNet50, DLA34, ResNeXt50, and
ResNeXt101 [22], [27], [35]. As a result, each network can
classify all kinds of defects correctly, and the accuracy is
100%, except VGG16 which achieves the classification accu-
racy of 99.63%. Moreover, the number of training parameters
are 250M, 11M, 21M, 23M, 15M, 42M, and 22M. Therefore,
ResNet18, which has the best classification performance and
the least parameters, is the best basic network backbone.
C. Detection Performance of RDN
1) Detection Performance on NEU-DET: To achieve better
detection speed in detecting specific steel surface defects,
an end-to-end detection network is proposed in this work. The
detection results of defect samples using different improved
network structures are shown in Table II. It can be seen
that three baseline networks, i.e., ResNet18-dcn, ResNet18-ds,
and ResNet18-dsf, can detect the defects of the steel surface.
The mAP of these networks is 71.5%, 77.6%, 80.0%, and
the speed is 70, 70, and 64 FPS, respectively. Under the
baseline ResNet18-dsf, RDN achieves the best performance
with the mAP of 80.0% and the speed of 64 FPS. The average
precision (AP) is 87.0% for Ps, 84.9% for In, 64.4% for Rs,
53.7% for Cr, 95.9% for Sc, and 92.4% for Pa.
Compared with ResNet18-dcn, it can be found that
ResNet18-ds with SLC improves the mAP by 6.1%. For each
defect, the AP increased by 2.5%, 5.5%, 2.9%, 21.5%, 4.7%,
and 0.4%, respectively, and the detection speed remained
at 70 FPS, which shows that SCM integrated with shallow
Authorized licensed use limited to: Anhui University of Technology. Downloaded on April 09,2022 at 09:04:44 UTC from IEEE Xplore. Restrictions apply.
5005610 IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 71, 2022
TABLE III
ABLATION STUDY OF RDN
Fig. 5. Influence of different improved modules on test results. (a) Influence
of SCM. (b) Influence of PFM.
features can provide more accurate object information without
adding a lot of calculations. In more detail, as shown in
Fig. 5(a), it can be seen that the SCM is effective for small tar-
get detection. When the PFM is introduced into ResNet18-ds,
the AP of each defect is improved to some extent, and the
overall mAP is increased to 80.0%. This demonstrates that the
fusion of the pyramid feature is beneficial to the identification
and location of steel surface defects with large intraclass
differences and high interclass similarity. Specifically, a large
number of defect images with the high similarity between
target and background, such as crazing and pitted surfaces,
are successfully located, as shown in Fig. 5(b). Without bells
and whistles, RDN using the ResNet18-dsf as the baseline
architecture achieves the best tradeoff between speed and
precision.
2) Real-Time Analysis: In RDN, only 8.71 s were spent
when 540 images were detected, which means the average time
to detect an image is 0.016 s and the detection speed is 64 FPS.
In the actual production line, the shooting field of a single
camera is 50–100 cm and the maximum production speed is
typically 30 m/s, which requires the detection equipment to
have at least 30–60 FPS detection speed [36]. Therefore, the
64 FPS detection speed of the proposed RDN in this work can
meet the real-time requirements in actual production.
D. Additional Experiments
Since RDN is composed of multiple subcomponents, it is
necessary to measure their effectiveness to the final perfor-
mance. In addition, to obtain the optimal detection perfor-
mance on the same detection architecture, some experiments
are carried out to select appropriate hyperparameters, such as
regression loss function and loss weight.
1) Ablation Study: In the defect detection algorithm, the
backbone not only affects the model performance, but also
relates to the inference speed. In this work, to obtain the
best detection speed, three experiments with ResNet18-dcn,
ResNet34-dcn, and DLA34 as backbone were performed.
It can be seen from the second, third, and fourth columns of
Table III that as the complexity of the backbone decreases, the
model performance gradually deteriorates, while the detection
speed shows a positive change trend. In particular, when
ResNet18-dcn is used as the backbone, the model detection
speed can reach 70 FPS. To improve the model performance,
SCM is introduced, and the model detection accuracy is
improved from 71.5% to 77.6% as illustrated in column 5.
Furthermore, when the PFM is involved in our network, the
final detection accuracy is improved to 80% mAP, as shown
in the sixth column.
2) Regression Loss: In object detection methods, the loca-
tion and size information related to the bounding box of the
object is usually obtained by regression. L1, L2, balanced L1,
and smooth L1 are main regression loss functions [38]. In this
work, we compare the detection performance of these four
regression losses. As a result, their corresponding mAP are
78.7%, 78.1%, 70.8%, and 80.0%, respectively. It can be found
that the smoothL1is superior to other loss functions in mAP.
3) Bounding Box Size Weight: To analyze the sensitivity
of the bounding box size weight λsize,wehavecompared
the performance of mAP with the increments of λsize with
0.05 and 0.1 when the values of λsize are in the range of (0,0.5]
and (0.5,1], respectively. The performance between (0,0.5]is
78.6%, 80.0%, 78.9%, 78.7%, 78.6%, 78.9%, 77.4%, 77.5%,
77.7%, and 76.7%, respectively. It can be seen that the
mAP performance is better when λsize is set to 0.1. In the
range of (0.5,1], the detection performance of the model is
76.8%, 76%, 74.2%, 73.8%, and 74.1%, respectively, which
shows a downward trend with the increase of λsize. Therefore,
loc_wh =0.1 is the best bounding box size weight.
Authorized licensed use limited to: Anhui University of Technology. Downloaded on April 09,2022 at 09:04:44 UTC from IEEE Xplore. Restrictions apply.
WANG et al.: REAL-TIME STEEL SURFACE DEFECT DETECTION APPROACH WITH HIGH ACCURACY 5005610
TAB L E I V
COMPARISON WITH STATE-OF -THE-ART METHODS
Fig. 6. Three kinds of SCM styles. (a) Same level composition, named SCM_S. (b) Higher-level composition, SCM_H. (c) Lower level composition, SCM_L.
dec and pool refer to deconvolution and max pooling.
E. Comparison With State-of-the-Art
To evaluate the effectiveness of the proposed RDN, several
state-of-the-art one-stage and two-stage detection methods are
compared in the predictive mAP and the detection time per
image when the same NEU-DET dataset is used. Two-stage
approaches, i.e., Faster RCNN [24] and Cascade RCNN [38],
which use the region proposal network to generate candi-
date bounding box, are considered for comparison. Another
two-stage steel surface defect detection structure named DDN
presented by Song and Yan [12] has also been compared.
For one-stage methods, seven anchor-based and anchor-free
detectors, i.e., M2Det [39], SSD [40], FCOS [41], ATSS [42],
YOLOv3 [43], improved YOLOv3 [17], and CenterNet [44],
are implemented. Among them, multiple backbones are per-
formed to get more competitive detection precision or speed,
such as faster RCNN, DDN, M2Det and CenterNet, and
so on. The detection results of different models are shown
in Table IV.
It can be seen from Table IV that two-stage detectors can
achieve better detection accuracy, and the mAP of Faster
RCNN, Cascade RCNN, and DDN is 77.9%, 73.3%, and
82.3%, respectively. However, the inference process for each
image of these models is very slow. The detection speed
of the DDN method is 11 FPS when ResNet-50 is selected
as the backbone. In contrast, one-stage methods can get
better detection speed, but the detection precision is unstable
relatively. Therefore, in this work, a portable detector based on
the pretrained model is constructed. Compared with the DDN
which has the highest detection mAP of 82.3% and speed
of 11 FPS, this work introduces the SLC and PFM to improve
the detection precision and uses ResNet with 18 convolution
layers as the backbone to reduce the number of model para-
meters. As a result, our proposed method achieves the mAP of
80.0%, and the speed of 64 FPS, which demonstrates that the
RDN method proposed in this work achieves the best detection
speed in detecting steel surface defects.
V. D ISCUSSION
A. Comparisons of Different SCM Styles
In Section IV-C, it is briefly demonstrated that SCM
[as shown in Fig. 6(a)] can improve detection performance by
fusing features with the same resolution. However, whether
other combination manners of these layers may result in
better performance? To this end, this work designs two other
composite styles to integrate feature maps of different sizes,
as shown in Fig. 6(b) and (c).
Authorized licensed use limited to: Anhui University of Technology. Downloaded on April 09,2022 at 09:04:44 UTC from IEEE Xplore. Restrictions apply.
5005610 IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 71, 2022
TAB L E V
DETECTION PERFORMANCE OF DIFFERENT COMPOSITION STYLES
TAB L E V I
DETECTION PERFORMANCE OF COMBING DIFFERENT PFM LAYE R S
It can be seen from Table V that the lower-level composition
style that combines the large feature maps in the encoding
network and the small feature map in the decoding network
obtains the mAP of 77.2% and the speed of 61 FPS. The
higher-level composition style that combines the small feature
maps in the encoding network and the large feature map in
the decoding network obtains the optimal mAP of 80.2%.
However, the deconvolution operation added in this style
reduces the detection speed of the detector to 59 FPS, which is
slower than the same level composition style. As a comparison,
SCM with the same level combination, named SCM_S, has the
mAP of 80.0% and the speed of 64 FPS, which can meet the
minimum speed requirements of different industrial production
scenarios.
B. Comparison of Different Combination PFM Layers
To explore which level of feature integration can achieve the
best precision improvement without reducing the resolution
of the output feature map, this work compares the detection
performance of fusing features of different levels, such as
conv5 module in encoding network, up-conv1, up-conv2 and
up-conv3 module in decoding network. As in Section V-A,
we also take ResNet18-dcn as the baseline. The detection
results of different integration manners are shown in Table VI.
It can be found that after introducing SCM, the mAP of
the model is 77.6%. With the increase of the introduced
feature map, the detection accuracy is improved in different
degrees, while the detection speed is not significantly reduced.
The mAP of merging two- or three-layer features is 77.8%,
78.6%, and 78.5%, respectively, and their speed bias is only
1 FPS. These show that the feature at different levels can
provide unique defect information and fusion more of them
is an effective way to improve the detection accuracy of steel
surface defects.
VI. CONCLUSION
In view of the high accuracy and low-speed problems of
current steel surface defect detectors, this work proposes a
real-time steel surface defect detection method, called RDN.
This is an end-to-end detection network that can provide defect
categories and precise locations. RDN uses a portable and
modular encoding and decoding network as the basic network
architecture to improve the detection speed. At the same
time, the SCM and PFM are designed to improve detection
accuracy. As a result, the detection method proposed in this
work achieves a real-time detection speed of 64 FPS and the
detection mAP of 80.0%.
Compared with the most state-of-the-art algorithms, our
model not only has faster detection speed but also has more
competitive detection accuracy and it can well identify defects
with a similar background. In addition, our model can detect
more defect morphology than the segmentation algorithm.
However, it cannot provide information on defect localization
in the pixel level as the segmentation algorithms can do. The
performance of our model is developed and obtained on the
NVIDIA Titan X GPU hardware platform. However, its detec-
tion speed may be affected by the computing power. Therefore,
a middle-end hardware device is a necessary prerequisite for
the application of our model.
In the future, our work can be improved in the following
aspects: 1) trying to use segmentation algorithm to get pixel-
level information of defects; 2) designing a detector to detect
the low-quality defect images independently to further improve
the accuracy of the model; and 3) try to embed our proposed
algorithm into mobile devices in the real production-line.
REFERENCES
[1] H. Di, X. Ke, Z. Peng, and Z. Dongdong, “Surface defect classification
of steels with a new semi-supervised learning method,” Opt. Lasers Eng.,
vol. 117, pp. 40–48, Jun. 2019.
[2] L. Xu, G. Tian, L. Zhang, and X. Zheng, “Research of surface defect
detection method of hot rolled strip steel based on generative adversarial
network,” in Proc. Chin. Autom. Congr. (CAC), Nov. 2019, pp. 401–404.
[3] J. Gao, W. Yu, and C. He, “The research on defect recognition method
for rail magnetic flux leakage detecting,” in Proc. Int. Conf. Meas., Inf.
Control, vol. 2, May 2012, pp. 745–750.
[4] Q. Luo, Y. Sun, P. Li, O. Simpson, L. Tian, and H. Yigang, “Generalized
completed local binary patterns for time-efficient steel surface defect
classification,” IEEE Trans. Instrum. Meas., vol. 68, no. 3, pp. 667–679,
Mar. 2019.
[5] K. Xu, S. Liu, and Y. Ai, “Application of Shearlet transform to
classification of surface defects for metals,” Image Vis. Comput., vol. 35,
pp. 23–30, Mar. 2015.
[6] K. Liu, H. Wang, H. Chen, E. Qu, Y. Tian, and H. Sun, “Steel
surface defect detection using a new Haar–Weibull-variance model in
unsupervised manner,” IEEE Trans. Instrum. Meas., vol. 66, no. 10,
pp. 2585–2596, Oct. 2017.
[7] Z. Xue-Wu, D. Yan-Qiong, L. Yan-Yun, S. Ai-Ye, and L. Rui-Yu,
“A vision inspection system for the surface defects of strongly reflected
metal based on multi-class SVM,” Expert Syst. Appl., vol. 38, no. 5,
pp. 5930–5939, May 2011.
[8] S. Mei, H. Yang, and Z. Yin, “An unsupervised-learning-based approach
for automated defect inspection on textured surfaces,IEEE Trans.
Instrum. Meas., vol. 67, no. 6, pp. 1266–1277, Jun. 2018.
[9] Q. Luo et al., “Surface defect classification for hot-rolled steel strips
by selectively dominant local binary patterns,” IEEE Access,vol.7,
pp. 23488–23499, 2019.
[10] L. Yi, G. Li, and M. Jiang, “An end-to-end steel strip surface defects
recognition system based on convolutional neural networks,” Steel Res.
Int., vol. 88, no. 2, Feb. 2017, Art. no. 1600068.
[11] I. Konovalenko, P. Maruschak, J. Brezinová, J. Viˇnáš, and J. Brezina,
“Steel surface defect classification using deep residual neural network,”
Metals, vol. 10, no. 6, p. 846, Jun. 2020.
[12] K. Song and Y. Yan, “A noise robust method based on completed local
binary patterns for hot-rolled steel strip surface defects,” Appl. Surf. Sci.,
vol. 285, no. 21, pp. 858–864, Nov. 2013.
Authorized licensed use limited to: Anhui University of Technology. Downloaded on April 09,2022 at 09:04:44 UTC from IEEE Xplore. Restrictions apply.
WANG et al.: REAL-TIME STEEL SURFACE DEFECT DETECTION APPROACH WITH HIGH ACCURACY 5005610
[13] W. Wang et al., “Surface defects classification of hot rolled strip based
on improved convolutional neural network,” ISIJ Int., vol. 61, no. 5,
pp. 1579–1583, 2021.
[14] S. M. Azimi, D. Britz, M. Engstler, M. Fritz, and F. Mücklich,
“Advanced steel microstructural classification by deep learning meth-
ods,” Sci. Rep., vol. 8, no. 1, Dec. 2018, Art. no. 2128.
[15] S. Wang, X. Xia, L. Ye, and B. Yang, “Automatic detection and
classification of steel surface defect using deep convolutional neural
networks,” Metals, vol. 11, no. 3, p. 388, Feb. 2021.
[16] Y. He, K. Song, Q. Meng, and Y. Yan, “An end-to-end steel surface
defect detection approach via fusing multiple hierarchical features,”
IEEE Trans. Instrum. Meas., vol. 69, no. 4, pp. 1493–1504, Apr. 2020.
[17] M. Hatab, H. Malekmohamadi, and A. Amira, “Surface defect detection
using YOLO network,” in Intelligent Systems and Applications,K.Arai,
S. Kapoor, and R. Bhatia, Eds. Cham, Switzerland: Springer, 2021,
pp. 505–515.
[18] X. Kou, S. Liu, K. Cheng, and Y. Qian, “Development of a YOLO-V3-
based model for detecting defects on steel strip surface,” Measurement,
vol. 182, Sep. 2021, Art. no. 109454.
[19] W. Zhao, F. Chen, H. Huang, D. Li, and W. Cheng, “A new steel defect
detection algorithm based on deep learning,” Comput. Intell. Neurosci.,
vol. 2021, pp. 1–13, Mar. 2021.
[20] H. Dong, K. Song, Y. He, J. Xu, Y. Yan, and Q. Meng, “PGA-Net: Pyra-
mid feature fusion and global context attention network for automated
surface defect detection,” IEEE Trans. Ind. Informat., vol. 16, no. 12,
pp. 7448–7458, Dec. 2020.
[21] H. Wang, J. Zhang, Y. Tian, H. Chen, H. Sun, and K. Liu, “A sim-
ple guidance template-based defect detection method for strip steel
surfaces,” IEEE Trans. Ind. Informat., vol. 15, no. 5, pp. 2798–2809,
May 2019.
[22] K. Simonyan and A. Zisserman, “Very deep convolutional networks
for large-scale image recognition,” CoRR, vol. abs/1409.1556, pp. 1–14,
Apr. 2015.
[23] S. Xie, R. Girshick, P. Dollar, Z. Tu, and K. He, “Aggregated residual
transformations for deep neural networks,” in Proc. IEEE Conf. Comput.
Vis. Pattern Recognit. (CVPR), Jul. 2017, pp. 5987–5995.
[24] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards
real-time object detection with region proposal networks,” IEEE Trans.
Pattern Anal. Mach. Intell., vol. 39, no. 6, pp. 1137–1149, Jun. 2017.
[25] A. G. Howard et al., “MobileNets: Efficient convolutional neural net-
works for mobile vision applications,” 2017, arXiv:1704.04861.
[26] Y. He, K. Song, H. Dong, and Y. Yan, “Semi-supervised defect classifi-
cation of steel surface based on multi-training and generative adversarial
network,” Opt. Lasers Eng., vol. 122, pp. 294–302, Nov. 2019.
[27] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for
image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.
(CVPR), Jun. 2016, pp. 770–778.
[28] J. Dai, Y. Li, K. He, and J. Sun, “R-FCN: Object detection via
region-based fully convolutional networks,” in Proc. Adv. Neural Inf.
Process. Syst. (NIPS). Red Hook, NY, USA: Curran Associates, 2016,
pp. 379–387.
[29] R. K. Srivastava, K. Greff, and J. Schmidhuber, “Training very deep
networks,” in Proc. Adv. Neural Inf. Process. Syst. (NIPS),vol.2.
Cambridge, MA, USA: MIT Press, Dec. 2015, pp. 2377–2385.
[30] A. Veit, M. Wilber, and S. Belongie, “Residual networks are exponential
ensembles of relatively shallow networks,” in Proc. Adv. Neural Inf.
Process. Syst., May 2016, pp. 1–9.
[31] O. Ronneberger, “Invited talk: U-Net convolutional networks for
biomedical image segmentation,” in Bildverarbeitung für die Medi-
zin, K. H. Maier-Hein, K. Fritzsche, T. M. Deserno, T. Lehmann,
H. Handels, and T. Tolxdorff, Eds. Berlin, Germany: Springer, 2017,
p. 3.
[32] T. Lin, P. Goyal, R. B. Girshick, K. He, and P. Dollár, “Focal loss for
dense object detection,” CoRR, vol. abs/1708.02002, pp. 1–9, Aug. 2017.
[33] H. Law and J. Deng, “CornerNet: Detecting objects as paired keypoints,”
Int. J. Comput. Vis., vol. 128, pp. 642–656, Dec. 2020.
[34] K. He, X. Zhang, S. Ren, and J. Sun, “Spatial pyramid pooling in
deep convolutional networks for visual recognition,” IEEE Trans. Pattern
Anal. Mach. Intell., vol. 37, no. 9, pp. 1904–1916, Sep. 2015.
[35] F. Yu, D. Wang, E. Shelhamer, and T. Darrell, “Deep layer aggregation,”
2017, arXiv:1707.06484.
[36] J. Li, Z. Su, J. Geng, and Y. Yin, “Real-time detection of steel strip
surface defects based on improved YOLO detection network,” IFAC-
PapersOnLine, vol. 51, no. 21, pp. 76–81, 2018.
[37] J. Pang, K. Chen, J. Shi, and H. Feng, “Libra R-CNN: Towards balanced
learning for object detection,” in Proc. IEEE/CVF Conf. Comput. Vis.
Pattern Recognit. (CVPR), Jun. 2019, pp. 821–830.
[38] Z. Cai and N. Vasconcelos, “Cascade R-CNN: High quality object
detection and instance segmentation,” IEEE Trans. Pattern Anal. Mach.
Intell., vol. 43, no. 5, pp. 1483–1498, May 2021.
[39] Q. Zhao, T. Sheng, Y. Wang, Z. Tang, Y. Chen, L. Cai, and H. Ling,
“M2Det: A single-shot object detector based on multi-level feature
pyramid network, in Proc. AAAI Conf. Artif. Intell., vol. 33, 2019,
pp. 9259–9266.
[40] W. Liu et al., “SSD: Single shot multibox detector,” in Computer
Vision—ECCV 2016, B. Leibe, J. Matas, N. Sebe, and M. Welling, Eds.
Cham, Switzerland: Springer, 2016, pp. 21–37.
[41] Z. Tian, C. Shen, H. Chen, and T. He, “FCOS: Fully convolutional
one-stage object detection,” in Proc. IEEE/CVF Int. Conf. Comput. Vis.
(ICCV), Oct. 2019, pp. 9626–9635.
[42] S. Zhang, C. Chi, Y. Yao, Z. Lei, and S. Z. Li, “Bridging the gap between
anchor-based and anchor-free detection via adaptive training sample
selection,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit.
(CVPR), Jun. 2020, pp. 9756–9765.
[43] J. Redmon and A. Farhadi, “YOLOv3: An incremental improvement,”
2018, arXiv:1804.02767.
[44] X. Zhou, D. Wang, and P. Krähenbühl, “Objects as points,” 2019,
arXiv:1904.07850.
Wen ya n Wa ng received the B.S. degree from the
School of Electrical Engineering, Industrial and
Commercial College, Anhui University of Technol-
ogy, Ma’anshan, China, in 2015, and the M.S. degree
from the School of Electrical Information and Engi-
neering, Anhui University of Technology, in 2018,
where she is currently pursuing the Ph.D. degree
with the School of Metallurgical Engineering.
Her research interests include deep learning, intel-
ligent inspection, data mining, and bioinformatics.
Chunfeng Mi received the B.S. degree from the
School of Electrical Engineering, Industrial and
Commercial College, Anhui University of Tech-
nology, Ma’anshan, China, in 2019, where he is
currently pursuing the M.S. degree with the School
of Electrical Information and Engineering.
His research interests include intelligent inspec-
tion, deep learning, and bioinformatics.
Ziheng Wu received the B.S. degree from
Tongling University, Tongling, China, in 2009, the
M.S. degree from Zhejiang Sci-Tech University,
Hangzhou, China, in 2012, and the Ph.D. degree
from the University of Science and Technology of
China, Hefei, China, in 2018.
He is currently a Lecturer with the Anhui Univer-
sity of Technology, Ma’anshan, China. His research
interests include machine learning, artificial intel-
ligence, gray systems theory, medical informatics,
science engineering, and intelligent control.
Kun Lu received the B.S. degree from the School of
Electrical Information and Engineering, Hohai Uni-
versity Wentian College, Ma’anshan, China, in 2016,
and the M.S. degree from the School of Electrical
Information and Engineering, Anhui University of
Technology, Ma’anshan, in 2019, where he is cur-
rently pursuing the Ph.D. degree with the School of
Management Science and Engineering.
He research interests include machine learning,
data mining, and bioinformatics.
Authorized licensed use limited to: Anhui University of Technology. Downloaded on April 09,2022 at 09:04:44 UTC from IEEE Xplore. Restrictions apply.
5005610 IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 71, 2022
Hongming Long received the Ph.D. degree from
the Department of Iron Metallurgy, Central South
University, Changsha, China, in 2007.
He is the Director of the Key Laboratory of Met-
allurgical Emission Reduction and Resources Recy-
cling, Ministry of Education, Ma’anshan, China.
From 2007 to present, he works with the Anhui
University of Technology, Ma’anshan. His research
interest includes emission reduction of flue gas
pollutants (NOx, dioxin and PM2.5) in metallur-
gical industry, and comprehensive utilization of
metallurgical solid waste resources.
Mr. Long is a Member of Expert Committee of China Metal Society and a
Secretary-General of the Branch of Metallurgical Solid Waste Resource.
Baigen Pan received the B.S. degree from the
Department of Physics, School of Physics, Anqing
Normal University, Anqing, China, in 2003, and the
M.S. degree from the School of Instrument Science
and Opto-Electronics Engineering, Hefei University
of Technology, Hefei, China, in 2010.
His research interests include deep learning, data
mining, and bioinformatics.
Dan Li was born in 1976. She received the
bachelor’s and master’s degrees from the Anhui
University of Technology, Ma’anshan, China, in
1997 and 2004, respectively, and the Ph.D. degree
from the Nanjing University of Aeronautics and
Astronautics, Nanjing, China, in 2008.
She is mainly engaged in image processing,
machine vision, and autonomous navigation.
Jun Zhang received the bachelor’s degree from
the Hefei University of Technology, Hefei, China,
in 1995, the master’s degree from the Institute of
Intelligent Machine, Chinese Academy of Sciences,
Hefei, in 2004, and the Ph.D. degree from the
University of Science and Technology of China,
Hefei, in 2007.
He served with the University of Louisville,
Louisville, KY, USA, from 2009 to 2011, as a Post-
Doctoral Fellow. He is currently an Associate Pro-
fessor with the School of Electrical Engineering and
Automation, Anhui University, Hefei. He has published more than 40 articles
in international conferences and journals. He focuses on deep learning with
application to bioinformatics, cheminformatics, computer vision, and so on.
Peng Chen received the bachelor’s degree from the
Electronic Engineering Institute, PLA, Hefei, China,
in 1997, the master’s degree from the Kunming
University of Science and Technology, Kunming,
China, in 2003, and the Ph.D. degree from the
University of Science and Technology of China,
Hefei, in 2007.
He served with the City University of Hong Kong,
Hong Kong, in 2006, as a Senior Research Asso-
ciate, Howard University, Washington, DC, USA,
from 2008 to 2009, as a Post-Doctoral Fellow,
Nanyang Technological University, Singapore, from 2009 to 2010, as a
Research Fellow, and the King Abdullah University of Science and Tech-
nology (KAUST), Thuwal, Saudi Arabia, from 2012 to 2014, as a Post-
Doctoral Fellow. He is a Professor with the School of Computer Science and
Technology, Institute of Physical Science and Information Technology, Anhui
University, Hefei. He specializes in machine learning and data mining with
applications to bioinformatics, drug discovery, computer vision, and so on.
Bing Wang (Senior Member, IEEE) received the
B.S. and M.S. degrees from the Hefei University of
Technology, Hefei, China, in 1998 and 2004, respec-
tively, and the Ph.D. degree from the University of
Science and Technology of China, Hefei, in 2006.
He worked as a Senior Research Associate with
the City University of Hong Kong, Hong Kong,
from 2006 to 2007, and a Post-Doctoral Fellow with
the University of Louisville, Louisville, KY, USA,
and Vanderbilt University, Nashville, TN, USA,
from 2008 to 2012. He is currently serving as a Full
Professor with the School of Electrical and Information Engineering, Anhui
University of Technology, Ma’anshan. He has more than 150 publications.
His research interests mainly focus on machine learning, image processing,
computational biology, and chemoinformatics.
Authorized licensed use limited to: Anhui University of Technology. Downloaded on April 09,2022 at 09:04:44 UTC from IEEE Xplore. Restrictions apply.
... Although the accuracy of steel surface defect detection was improved by replacing the backbone and introducing a feature pyramid structure, it also reduced the detection speed. Wang et al. [25] proposed a real-time detection network using the lightweight encoding and decoding network ResNet-dcn as the basic convolutional architecture. However, this method has poor detection accuracy when detecting defects with irregular shapes and highly dense distribution. ...
Article
Full-text available
The detection of steel surface defects is of great significance to steel production. In order to better meet the requirements of accuracy, real-time, and lightweight model, this paper proposes a highly efficient and lightweight steel surface defect detection method based on YOLOv5n. Firstly, ODMobileNetV2 composed of MobileNetV2 and ODConv is used as the backbone to improve the defect feature extraction capability. Secondly, GSConv is utilized in the neck to achieve deep information fusion through channel concatenation and shuffling, enhancing the ability of feature fusion. Finally, this paper proposes a spatial-channel reconstruction block (SCRB) designed to suppress redundant features and improve the representation ability of defect features through feature separation and reconstruction. Experimental results show that this method achieves 84.1% mAP and 109 FPS on the NEU-DET dataset, and 72.9% mAP and 110.1 FPS on the GC10-DET dataset, enabling accurate and efficient detection. Furthermore, the number of parameters is only 5.04M, which has a significant lightweight advantage.
... CABF-FCOS [1] utilized an anchor-free framework, a channel attention mechanism, and a bidirectional feature fusion network for rapid and effective defect detection on strip steel surface. Wang et al. [29] proposed the RDN by using ResNet-dcn and achieved a speed of 64 FPS and 80.0% mAP on the NEU-DET dataset. MSFT-YOLO [30] integrated a TRANS module and multi-scale feature fusion to optimize defect detection. ...
Article
Full-text available
Deep learning algorithms have gained widespread usage in defect detection systems. However, existing methods are not satisfied for large-scale applications on surface defect detection of strip steel. In this paper, we propose a precise and efficient detection model, named CABF-YOLO, based on the YOLOX for strip steel surface defects. Firstly, we introduce the Triplet Convolutional Coordinate Attention (TCCA) module in the backbone of the YOLOX. By factorizing the pooling operation, the TCCA module can accurately capture cross-channel features to identify the location information of defects. Secondly, we design a novel Bidirectional Fusion (BF) strategy in the neck of the YOLOX. The BF strategy enhances the fusion of low-level and high-level semantic information to obtain fine-grained information. Lastly, the original bounding box loss function is replaced by the EIoU loss function. In the EIoU loss function, the penalty term is redefined to consider the overlap area, central point, and side length of the required regressions to accelerate the convergence rate and localization accuracy. On the benchmark NEU-DET dataset and GC10-DET dataset, the experimental results show that the CABF-YOLO achieves superior performance compared with other comparison models and satisfies the real-time detection requirement of industrial production.
... www.nature.com/scientificreports/ in images may be lost as they traverse through the convolutional layers of CNNs, potentially causing reduced detection accuracy and inaccurate target localization 18,19 . Deep learning-based methods usually address this issue by fusing feature maps from multiple different levels 7,[20][21][22][23] . However, this multiscale feature fusion approach also has some problems [24][25][26][27][28] . ...
Article
Full-text available
To improve the precision of defect categorization and localization in images, this paper proposes an approach for detecting surface defects in hot-rolled steel strips. The approach uses an improved YOLOv5 network model to overcome the issues of inadequate feature extraction capacity and suboptimal feature integration when identifying surface defects on steel strips. The proposed method achieves higher detection accuracy and localization precision, making it more competitive and applicable in real production. Firstly, the multi-scale feature fusion (MSF) strategy is utilized to fuse shallow and deep features effectively and enrich detailed information relevant to target defects. Secondly, the CSPLayer Res2Attention block (CRA block) residual module is introduced to reduce the loss of defect information during hierarchical transmission, thereby enhancing the extraction of fine-grained features and improving the perception of details and global features. Finally, the experimental results indicate that the mAP on the NEU-DET and GC10-DET datasets approaches 78.5% and 67.3%, respectively, which is 4.9% and 2.1% higher than that of the baseline. Meanwhile, it has higher precision and more precise localization capabilities than other methods. Furthermore, it also achieves 59.2% mAP on the APDDD dataset, indicating its potential for growth in further domains.
... In industrial production, metal surface defects have a serious impact on the performance of the parts, it is necessary to detect the surface of metal parts to ensure quality. Through the detection of defects on the metal surface, defects can be found early, avoiding the loss of subsequent processing, testing and other links, and reducing production costs [2]. The more automated the detection of surface defects of metal parts, the higher the detection efficiency, which helps to improve production efficiency and productivity. ...
Article
Full-text available
Surface defect inspection of metal components plays a critical role in ensuring product quality, enhancing production efficiency, and reducing costs, with particular emphasis on the detection for small-sized surface defects to ensure the safety and reliability of metal components during their usage. The existing detection methods of small size defects on the surface of metal components have some shortcomings, such as low precision and poor real-time performance. To solve these two problems, this paper proposes a real-time defect detection method based on the improved YOLO. Firstly, LSandGlass module is used to replace the residual module in the backbone network, which reduces information loss, eliminates the low-resolution feature layer, and minimizes the semantic loss. The network then uses lightweight Ghost convolution at the neck to extract network features. In addition, the convolutional block attention mechanism (CBAM) module is added to improve the detection precision of small size defects. Finally, the soft intersection over union (SIoU) is used to further enhance the target detection capability. The experiment was carried out on the self-made hexagonal bolt data set of typical commonly used metal components. The experimental results show that compared to the original YOLOv5, the mAP (0.5) is improved by 5.3% to 95.50%, and the reasoning FPS is improved by 21 fps to 95 fps. These results indicate that the proposed LCG-YOLO improves the real-time detection performance of metal component surface defects.
... Current solutions include transfer (Yao et al., 2020), unsupervised learning (Zhou et al., 2022), meta-learning (Hospedales et al., 2021), etc. FSL method can generally be classified into three categories: enhancing training data as prior knowledge ), improving the model to limit the hypothesis space more effectively , and improving the algorithm to find the optimal hypothesis in the given hypothesis space (Xu et al., 2017). These methods have been widely used in many fields, such as object detection (Wang et al., 2019a(Wang et al., , 2019b, image segmentation (Wang et al., 2019a(Wang et al., , 2019b, and image classification (Yong et al., 2022), and there have been successful examples in defect detection (Sheynin et al., 2021;Wang et al., 2022). ...
Article
Full-text available
Visual-based defect detection is a crucial but challenging task in industrial quality control. Most mainstream methods rely on large amounts of existing or related domain data as auxiliary information. However, in actual industrial production, there are often multi-batch, low-volume manufacturing scenarios with rapidly changing task demands, making it difficult to obtain sufficient and diverse defect data. This paper proposes a parallel solution that uses a human–machine knowledge hybrid augmentation method to help the model extract unknown important features. Specifically, by incorporating experts' knowledge of abnormality to create data with rich features, positions, sizes, and backgrounds, we can quickly accumulate an amount of data from scratch and provide it to the model as prior knowledge for few-data learning. The proposed method was evaluated on the magnetic tile dataset and achieved F1-scores of 60.73%, 70.82%, 77.09%, and 82.81% when using 2, 5, 10, and 15 training images, respectively. Compared to the traditional augmentation method's F1-score of 64.59%, the proposed method achieved an 18.22% increase in the best result, demonstrating its feasibility and effectiveness in few-data industrial defect detection.
... Zhou et al. [27] proposed the DACNet to detect strip steel surface defects, Dong et al. [28] used pyramid feature fusion to enhance defect detection, Wang et al. [28] proposed the DACNet to detect surface defects in strip steel. detection, Wang et al. [29] propose a new pyramid feature fusion module, Yu et al. [30] replaced the feature pyramid network (FPN) in Neck with a bi-directional feature fusion network (BFFN), Zeng et al. [31] made full use of the contextual information for the detection of tiny defects in PCBs.As can be seen, there have been many studies related to the utilization of different semantic features and the use of contextual information as a way to improve the performance of industrial defect detection models, but the above methods integrate features with different shades of semantics and do not consider pre-filtering the conflicting feature information of the different shades of semantics prior to the integration, which affects the further improvement of the model's performance. ...
Article
Full-text available
The detection of tiny defects in industrial products is important for improving the quality of industrial products and maintaining production safety. Currently, image-based defect detection methods are ineffective in detecting tiny and variously shaped defects. Therefore, this paper proposes a tiny defect detection network (TD-Net) for industrial products to improve the effectiveness of tiny defect detection. TD-Net improves the overall defect detection effect, especially the detection effect of tiny defects, by solving the problems of downsampling of tiny defects, pre-filtering of conflicting deep and shallow semantic information, and cascading fusion of multi-scale information. Specifically, this paper proposes the Defect Downsampling (DD) module to realize the defect information supplementation during the backbone downsampling process and improve the problem that the stepwise convolution easily misses the detection of tiny defects. Meanwhile, the Semantic Information Interaction Module (SIIM) is proposed, which fuses deep and shallow semantic features, and is designed to interact the fused features with shallow features to optimize the detection of tiny defects. Finally, the Scale Information Fusion Module (SIFM) is proposed to improve the Path Aggregation Network (PANet) for cascading fusion and information focus on different scale information, which enables further improvement of defect detection performance of TD-Net. Extensive experimental results on the NEU–DET data set (76.8 $$\%$$ % mAP), the Peking University PCB defect data set (96.2 $$\%$$ % mAP) and the GC10-DET data set (71.5 $$\%$$ % mAP) show that the proposed TD-Net achieves competitive results compared with SOTA methods with the equivalent parameter quantity.
Article
Full-text available
Steel surface defect detection is crucial in manufacturing, but achieving high accuracy and real-time performance with limited computing resources is challenging. To address this issue, this paper proposes DFFNet, a lightweight fusion network, for fast and accurate steel surface defect detection. Firstly, a lightweight backbone network called LDD is introduced, utilizing partial convolution to reduce computational complexity and extract spatial features efficiently. Then, PANet is enhanced using the Efficient Feature-Optimized Converged Network and a Feature Enhancement Aggregation Module (FEAM) to improve feature fusion. FEAM combines the Efficient Layer Aggregation Network and reparameterization techniques to extend the receptive field for defect perception, and reduce information loss for small defects. Finally, a WIOU loss function with a dynamic non-monotonic mechanism is designed to improve defect localization in complex scenes. Evaluation results on the NEU-DET dataset demonstrate that the proposed DFFNet achieves competitive accuracy with lower computational complexity, with a detection speed of 101 FPS, meeting real-time performance requirements in industrial settings. Furthermore, experimental results on the PASCAL VOC and MS COCO datasets demonstrate the strong generalization capability of DFFNet for object detection in diverse scenarios.
Article
In strip steel production, detecting surface defects is crucial for ensuring product quality and optimizing production line efficiency. However, detecting defects is complicated by the variations in size, complex structures, and the wide range of defect morphologies present in strip steel. To tackle these challenges, this paper proposes a strip steel surface defect detection network via adaptive focusing features (AFF-Net). Firstly, an adaptive focusing feature block (AFF-Block) is proposed, which applies the “Diffusion-Aggregation” thought. This block repositions and adaptively assigns weights to defect features, guiding the network to focus on defect features and more effectively capture defects’ spatial and morphological changes. Subsequently, a focused feature pyramid network (Foc-FPN) is proposed to enhance the network’s adaptability to complex defects through multi-scale focusing fusion. This innovative structure adaptively balances the semantic gap of defect features at different scales and alleviates the abstraction feature overload. The proposed algorithm achieved a mean Average Precision (mAP@IoU=0.5) of 83.5% on the public NEU-DET dataset for strip steel surface defects, surpassing the baseline network by 8.2%. Compared to existing models, this detection method strengthens the connection between defect characteristics and more effectively detects irregularly distributed defects in complex strip steel surface images.
Article
Full-text available
Defect detection is extensively utilized within the metal industry, particularly for identifying surface imperfections on steel strips. However, the current methods still face challenges in detecting small and elongated defects on steel strips. Such defects occupy a relatively small pixel percentage within the entire image. The repeated downsampling in convolutional networks, coupled with the dynamic changes in the receptive field, can result in the potential loss of these minute defects. To mitigate the problem, our paper proposes EC-YOLO, a real-time defect detection network for steel strips of the above peculiar defects. Firstly, the 1D convolution in the efficient channel attention bottleneck (EB) module enhances the feature extraction ability of the backbone for small and elongated defects, while also facilitating the attentional mechanism for modeling channel features. Secondly, Context Transformation Networks integrate cross-stage localized blocks, referred to as CC modules, to enhance the understanding of feature semantic contextual information. Thirdly, a self-constructed dataset containing both small and elongated defects is used for understanding where such defects are more relevant in feature fusion and extraction. On the public datasets GC10-DET and NEU-DET, the improved model achieves mean Average Precision (mAP) scores of 71% and 83%, respectively, surpassing the performance of other mainstream models. The mAP of the enhanced model on the SLD-DET dataset reaches 87.5%, demonstrating its superiority in detecting both small and elongated defects.
Article
Full-text available
Surface defect classification of hot-rolled strip based on machine vision is a challenge task caused by the diversity of defect morphology, high inter-class similarity, and the real-time requirements in actual production. In this work, VGG16-ADB, an improved VGG16 convolution neural network, is proposed to address the problem of defect identification of hot-rolled strip. The improved network takes VGG16 as the benchmark model, reduces the system consumption and memory occupation by reducing the depth and width of network structure, and adds the batch normalization layer to accelerate the convergence speed of the model. Based on a standard dataset NEU, the proposed method can achieve the classification accuracy of 99.63% and the recognition speed of 333 FPS, which fully meets the requirements of detection accuracy and speed in the actual production line. The experimental results also show the superiority of VGG16-ADB over existing classification models for surface defect classification of hot-rolled strip.
Article
Full-text available
In recent years, more and more scholars devoted themselves to the research of the target detection algorithm due to the continuous development of deep learning. Among them, the detection and recognition of small and complex targets are still a problem to be solved. The authors of this article have understood the shortcomings of the deep learning detection algorithm in detecting small and complex defect targets and would like to share a new improved target detection algorithm in steel surface defect detection. The steel surface defects will affect the quality of steel seriously. We find that most of the current detection algorithms for NEU-DET dataset detection accuracy are low, so we choose to verify a steel surface defect detection algorithm based on machine vision on this dataset for the problem of defect detection in steel production. A series of improvement measures are carried out in the traditional Faster R-CNN algorithm, such as reconstructing the network structure of Faster R-CNN. Based on the small features of the target, we train the network with multiscale fusion. For the complex features of the target, we replace part of the conventional convolution network with a deformable convolution network. The experimental results show that the deep learning network model trained by the proposed method has good detection performance, and the mean average precision is 0.752, which is 0.128 higher than the original algorithm. Among them, the average precision of crazing, inclusion, patches, pitted surface, rolled in scale and scratches is 0.501, 0.791, 0.792, 0.874, 0.649, and 0.905, respectively. The detection method is able to identify small target defects on the steel surface effectively, which can provide a reference for the automatic detection of steel defects.
Article
Full-text available
Automatic detection of steel surface defects is very important for product quality control in the steel industry. However, the traditional method cannot be well applied in the production line, because of its low accuracy and slow running speed. The current, popular algorithm (based on deep learning) also has the problem of low accuracy, and there is still a lot of room for improvement. This paper proposes a method combining improved ResNet50 and enhanced faster region convolutional neural networks (faster R-CNN) to reduce the average running time and improve the accuracy. Firstly, the image input into the improved ResNet50 model, which add the deformable revolution network (DCN) and improved cutout to classify the sample with defects and without defects. If the probability of having a defect is less than 0.3, the algorithm directly outputs the sample without defects. Otherwise, the samples are further input into the improved faster R-CNN, which adds spatial pyramid pooling (SPP), enhanced feature pyramid networks (FPN), and matrix NMS. The final output is the location and classification of the defect in the sample or without defect in the sample. By analyzing the data set obtained in the real factory environment, the accuracy of this method can reach 98.2%. At the same time, the average running time is faster than other models.
Article
Full-text available
An automated method for detecting and classifying three classes of surface defects in rolled metal has been developed, which allows for conducting defectoscopy with specified parameters of efficiency and speed. The possibility of using the residual neural networks for classifying defects has been investigated. The classifier based on the ResNet50 neural network is accepted as a basis. The model allows classifying images of flat surfaces with damage of three classes with the general accuracy of 96.91% based on the test data. The use of ResNet50 is shown to provide excellent recognition, high speed, and accuracy, which makes it an effective tool for detecting defects on metal surfaces.
Article
During steel strip production, mechanical forces and environmental factors cause surface defects of the steel strip. Therefore, detection of such defects is key to the production of high quality products. Moreover, surface defects of the steel strip cause great economic losses to the high-tech industry. So far, few studies have explored methods of identifying the defects, and most of the currently available algorithms are not sufficiently effective. Therefore, we developed an end-to-end defect detection model based on YOLO-V3. Briefly, the anchor-free feature selection mechanism was utilized to select an ideal feature scale for model training, replace the anchor-based structure, and shorten the computing time. Next, specially designed dense convolution blocks were introduced into the model to extract rich feature information, which effectively improves feature reuse, feature propagation, and enhances the characterization ability of the network. The experimental results show that, compared with other comparison models, the improved model proposed in this study has higher performance. For instance, the proposed model yielded 71.3% mAP on the GC10-DET dataset, and 72.2% mAP on the NEU-DET dataset.
Chapter
Tracking has traditionally been the art of following interest points through space and time. This changed with the rise of powerful deep networks. Nowadays, tracking is dominated by pipelines that perform object detection followed by temporal association, also known as tracking-by-detection. We present a simultaneous detection and tracking algorithm that is simpler, faster, and more accurate than the state of the art. Our tracker, CenterTrack, applies a detection model to a pair of images and detections from the prior frame. Given this minimal input, CenterTrack localizes objects and predicts their associations with the previous frame. That’s it. CenterTrack is simple, online (no peeking into the future), and real-time. It achieves \(67.8\%\) MOTA on the MOT17 challenge at 22 FPS and \(89.4\%\) MOTA on the KITTI tracking benchmark at 15 FPS, setting a new state of the art on both datasets. CenterTrack is easily extended to monocular 3D tracking by regressing additional 3D attributes. Using monocular video input, it achieves \(28.3\%\) AMOTA@0.2 on the newly released nuScenes 3D tracking benchmark, substantially outperforming the monocular baseline on this benchmark while running at 28 FPS.
Chapter
Detecting defects on surfaces such as steel, can be a challenging task because defects have complex and unique features. These defects occur in many production lines and vary from one production line to another. In order to detect these defects, the You Only Look Once (YOLO) detector which uses a Convolutional Neural Network (CNN), is used and received only minor modifications. YOLO is trained and tested on a dataset containing six kinds of defects to achieve accurate detection and classification. The network can also obtain the coordinates of the detected bounding boxes, giving the size and location of the detected defects. Since manual defect detection is expensive, labor-intensive and inefficient, this paper contributes to the sophistication and improvement of manufacturing processes. This system can be installed on chipsets and deployed to a factory line to greatly improve quality control and be part of smart internet of things (IoT) based factories in the future. YOLO achieves a respectable 70.66% mean average precision (mAP) despite the small dataset and minor modifications to the network.