ArticlePDF Available

A Real-Time Steel Surface Defect Detection Approach With High Accuracy

November 2021
IEEE Transactions on Instrumentation and Measurement PP(99):1-1

November 2021
PP(99):1-1

DOI:10.1109/TIM.2021.3127648

Authors:

Mi Chunfeng

Anhui University of Technology

Show all 10 authorsHide

Surface defect inspection is a key step to ensure the quality of the hot rolled steel surface. However, current advanced detection methods have high precision but low detection speed, which hinders the application of the detector in actual production. In this work, a real-time detection network (RDN) focusing on both speed and accuracy is proposed to solve the problem of steel surface defect detection. RDN takes ResNet-dcn, a modular encoding and decoding network with light weights, as the basic convolutional architecture whose backbone is pre-trained on ImageNet. To improve the detection accuracy, a skip layer connection module (SCM) and a pyramid feature fusion module (PFM) are involved into RDN. On the standard dataset NEU-DET, the proposed method can achieve the state-of-the-art recognition speed of 64 frames per second(FPS) and the mean average precision of 80.0% on a single GPU, which fully meets the requirements of the detection accuracy and speed in the actual production line.

Content uploaded by Bing Wang

Content may be subject to copyright.

IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 71, 2022 5005610

A Real-Time Steel Surface Defect Detection

Approach With High Accuracy

Wenyan Wang , Chunfeng Mi ,ZihengWu ,KunLu , Hongming Long ,BaigenPan ,

Dan Li ,JunZhang ,PengChen , and Bing Wang ,Senior Member, IEEE

Abstract— Surface defect inspection is a key step to ensure the

quality of the hot rolled steel surface. However, current advanced

detection (DET) methods have high precision but low detection

speed, which hinders the application of the detector in actual

production. In this work, a real-time detection network (RDN)

focusing on both speed and accuracy is proposed to solve the

problem of steel surface defect detection. RDN takes ResNet-

dcn, a modular encoding, and decoding network with light

weights, as the basic convolutional architecture whose backbone

is pretrained on ImageNet. To improve the detection accuracy,

a skip layer connection module (SCM) and a pyramid feature

fusion module (PFM) are involved into RDN. On the standard

dataset NEU-DET, the proposed method can achieve the state-of-

the-art recognition speed of 64 frames per second (FPS) and the

mean average precision of 80.0% on a single GPU, which fully

meets the requirements of the detection accuracy and speed in

the actual production line.

Index Terms—Detection (DET) speed and accuracy, hot rolled

steel surface, pyramid feature fusion module (PFM), skip layer

connection module (SCM), surface defect detection.

I. INTRODUCTION

AS THE basic raw material of iron and steel industry,

the surface quality of hot rolled strips seriously affects

the physical and chemical properties of downstream and

ﬁnal products [1]–[3]. However, traditional detection (DET)

methods based on artiﬁcial observation are time-consuming

Manuscript received July 28, 2021; revised October 9, 2021; accepted Octo-

ber 19, 2021. Date of publication November 15, 2021; date of current version

March 3, 2022. This work was supported by the National Natural Science

Foundation of China under Grant 62172004, Grant 61672035, and Grant

61872004; and in part by the Educational Commission of Anhui Province

under Grant KJ2019ZD05. The Associate Editor coordinating the review

process for this article was Dr. Jing Lei. (Corresponding authors: Bing Wang;

Pen g Ch e n. )

Wenyan Wang and Hongming Long are with the School of Metallurgical

Engineering, Anhui University of Technology, Ma’anshan, Anhui 243002,

China, and also with the Key Laboratory of Metallurgical Emission Reduction

and Resources Recycling, Ministry of Education, Anhui University of Tech-

nology, Ma’anshan 243002, China (e-mail: wenyanwang9203@gmail.com;

13956233905@126.com).

Chunfeng Mi, Ziheng Wu, Kun Lu, Dan Li, and Bing Wang are

with the School of Electrical and Information Engineering, Anhui

University of Technology, Ma’anshan, Anhui 243032, China (e-mail:

michunfeng64@gmail.com; wziheng@ahut.edu.cn; kunlu0819@gmail.com;

lanldok@163.com; wangbing@ustc.edu).

Baigen Pan is with the Anhui Provincial Quality Supervision and Inspection

Center for Motor Products and Parts, Xuancheng, Anhui 242500, China

(e-mail: pbg770302@aliyun.com).

Jun Zhang and Peng Chen are with the Co-Innovation Center for Information

Supply and Assurance Technology, Anhui University, Hefei, Anhui 230032,

China (e-mail: junzhang@ahu.edu.cn; pchen@ahu.edu.cn).

Digital Object Identiﬁer 10.1109/TIM.2021.3127648

and unreliable. Therefore, it is desirable to allow a machine

to automatically detect surface quality.

Automatic defect detection technology based on machine

vision has good performance in object classiﬁcation tasks

[4]–[9]. However, the defect features it uses require manual

operation, and the changeable steel surface defects with low-

dimensional artiﬁcial features generally have poor general-

ization performance, while the extraction process of a large

number of features will slow down the detection process,

which hinders the application of machine vision-based defect

detection methods in the actual industrial production [9].

In recent years, the deep-learning-based methods have made

many breakthroughs in the classiﬁcation of surface defects

of hot rolled steel strip [10]–[12]. Konovalenko et al. [11]

proposed a number of convolutional neural networks (CNNs)

based on residual network ResNet-50, in which the class

weight, minority class over-sampling, and focal loss strategies

are adopted to solve the data imbalance of training samples.

The experimental results show that this method can effectively

detect defects with high accuracy of 96.91% [11]. Wang et

al. [13] developed an improved CNN model with reduced

training parameters and improved the accuracy and speed

of defect classiﬁcation to 99.63% and 333 frames per sec-

ond (FPS), respectively. Azimi et al. [14] ﬁrst segment the

microstructure of low carbon steel by full convolution neural

network, and then identify the category of components through

these structures. This kind of research provides a robust and

objective method for steel quality appreciation [14].

However, in the classiﬁcation task, the model is only

required to give each defect image a defect category. When

an image contains two or more types of defects, as shown in

Fig. 1, the classiﬁcation model will not classify these defects in

detail. In contrast, object detection technology can enumerate

all types of defects and accurately indicate the location of these

defects. Wang et al. [15] proposed an improved ResNet50 and

enhanced faster RCNN for steel surface defect detection, and

achieved the classiﬁcation accuracy and speed of the model

of 98.2% and 63 ms, which signiﬁcantly improves the perfor-

mance of the model compared with other methods. Aiming

at the surface defect detection of hot-rolled strip and on the

NEU-DET dataset with the characteristics of large intraclass

difference and high interclass similarity, He et al. [16] pro-

posed an end-to-end defect detection network that integrating

multilevel features into one feature map, and achieved the

mean average precision (mAP) of 82.3%. However, when they

See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: Anhui University of Technology. Downloaded on April 09,2022 at 09:04:44 UTC from IEEE Xplore. Restrictions apply.

5005610 IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 71, 2022

Fig. 1. Surface defect image of hot rolled strip containing two kinds of

defects.

increased the detection speed to 20 FPS, the mAP of the

detection model drops to 70% [16]. Also on the NEU-DET

dataset, Hatab et al. [17] used the YOLO network for defect

detection, and achieved the mAP of 70.66% and the detection

speed of 85 ms per image. Kou et al. [18] developed an end-to-

end defect detection model based on YOLO-V3, and yielded

72.2% mAP on the NEU-DET dataset. Aiming at the problem

that deep learning method is difﬁcult to recognize small and

complex targets, Zhao et al. [19] proposed an improved Faster-

RCNN method, and achieved the mAP of 75%. To describe

the defect information in more detail, some segmenta-

tion models are also proposed in recent years [20], [21].

However, it is impractical to segment defects of a steel plate

in industrial production, and the accuracy and speed of the

detection methods are out of balance.

To achieve faster detection speed and improve the general-

ization performance of the model, a real-time defect detection

network (RDN) is proposed in this work, which has three

innovations in the structure. First, compared with the existing

classic object detection network, such as Faster RCNN, which

usually uses a regional proposal network (RPN) to generate

bounding boxes, RDN is an end-to-end detection network.

Furthermore, different from one-stage detection method that

lists as many proposals as possible, such as YOLO, SSD, and

M2Det, RDN uses the center point of the target to generate

a bounding box, which is an anchor free method with faster

detection speed.

Second, the highest-level feature map containing rich

semantic information is uaually used to detect defects in deep-

learning-based detection networks [22], [23]. However, in the

NEU-DET dataset, there are many small defects that need to

be detected, and the characteristics of large similarity between

defect categories and small differences within classes make

the model difﬁcult to effectively detect defect categories and

location information for a single feature map. It has been

proven that different levels of features in CNN represent

different information about an object [24], i.e., low-level

features contain precise location information, while high-level

features have rich contextual semantics. Therefore, in this

work, the features extracted from the last convolution layer

of each stage of the backbone are integrated to make full use

of the shallow layer information, which is called the skip-

layer connection module (SCM). In addition, to improve the

detection performance of defect images with large intraclass

differences and high interclass similarity, a pyramid feature

fusion module (PFM) that combines multiple high-feature

layers is also proposed.

Thirdly, inspection speed is an important problem for the

application of the model into actual production. In order to

accelerate the detection speed, a lightweight and modular

ResNet-18 network is used as the feature extraction backbone

in this work. In particular, ResNet-18 is more suitable for

small sample datasets because it has few parameters to be

trained. The experimental results show that the lightweight

backbone and feature fusion strategy can signiﬁcantly improve

the accuracy and speed of the detection model.

In summary, the main contributions of this work are as

follows.

1) For steel defect classiﬁcation and localization, an end-to-

end accuracy-speed balanced defect detection network is

introduced.

2) A SCM between shallow layers is proposed, which can

provide more accurate position information for low-

contrast images.

3) The PFM is proposed to improve the network’s ability to

detect defects with large intraclass differences and high

interclass similarity.

4) Using a simple backbone ResNet-18, our network runs

at 64 FPS with mAP of 80.0%, which fully meets the

speed requirements in actual production.

II. REAL-TIME DEFECT DETECTION NETWORK

The overall architecture of the balanced defect detection

network proposed in this work is shown in Fig. 2. RDN

uses baseline convolution architecture and multilevel feature

fusion to extract features from the input image, and then uses

the keypoint evaluation and regression method to obtain the

center point and size attributes of the object. Considering

the characteristics of the NEU-DET dataset mentioned above,

this work constructed two multilevel feature fusion modules

with few parameters. One is the SCM which combines the

features with the same resolution, the other is the PFM. More

details of the backbone selection and feature fusion module

are described below.

A. Baseline Convolution Architecture

Generally, a network backbone with a small number of

parameters and a pretrained model on ImageNet can avoid

over-ﬁtting and accelerate the speed of model detection, espe-

cially for small datasets [26]. Compared with some backbones

widely used in object classiﬁcation and detection, such as

VGG, DLA, and ResNeXt-101, ResNet with 18 convolution

layers have fewer model parameters and modular convolution

Authorized licensed use limited to: Anhui University of Technology. Downloaded on April 09,2022 at 09:04:44 UTC from IEEE Xplore. Restrictions apply.

WANG et al.: REAL-TIME STEEL SURFACE DEFECT DETECTION APPROACH WITH HIGH ACCURACY 5005610

TAB L E I

DETAILED STRUCTURES OF RESNET18-DCN

structure, which make it easier to integrate and deploy on the

industrial devices [22], [23], [25], [27]. In addition, its superior

performance in the task of surface defect classiﬁcation of hot

rolled strip has been proven in this work. Therefore, ResNet-18

is selected as the backbone of the industrial defects detection

model.

Different from the previous object detection network, where

the highest-level feature map with the smallest size of back-

bone is adopted to detect and classify the attributes of targets,

an encoding and decoding network ResNet18-dcn with large

output resolution is proposed in this work as the baseline

convolution architecture since the highest-level feature map

has a large receptive ﬁeld, which is not conducive to the

detection of small targets [24], [28]. In the decoding network

of ResNet18-dcn, the channel number and resolution of the

feature map are adjusted by the deformation convolution and

deconvolution. The detailed structures of ResNet18-dcn are

given in Table I.

B. Skip-Layer Connection Module

In deep learning methods, the category and location infor-

mation of defects are mainly obtained by multiple convolution

of the input image. However, the receptive ﬁeld of a deep

feature map will expand with the increase of the number

of layers, which weakens the perception of local location

information, and therefore affects the accurate positioning.

Particularly, the loss of location information aggravates the

difﬁculty of small object detection. The size of the receptive

ﬁeld is deﬁned as follows:

lk=lk−1+fk−1∗

k−1



i=1

si(1)

where lk−1is the receptive ﬁeld size of the (k−1)th layer,

fkis the size of the ﬁlter or pooled kernel, and siis the stride

of layer i.

The schematic of receptive ﬁled is shown in Fig. 3. It can

be found that the shallow layer has a small receptive ﬁeld,

which is beneﬁcial to detect small-scale defects. Therefore,

this work involves the shallow features into the deep feature

maps to make full use of the location information of defects.

In addition, a decoding network is added to alleviate the

location loss caused by very small feature map.

It has been proven that skip layer connection (SLC) can

simplify the network learning process, enhance gradient propa-

gation and make the feature expression of input variables more

comprehensive [23], [27], [29], [30]. Additionally, relevant

research also shows that long SLC with multiple SLCs can per-

form in-depth supervision of the entire network structure [31].

To improve the detection performance of the model for small

objects, an SCM that contains multiple long SLCs is designed

into the network without increasing the calculation of the

model. In RDN, the SCM is introduced between two feature

maps of the same size, as shown by the blue connecting line

in Fig. 2.

C. Pyramid Feature Fusion Module

In detection networks, the feature pyramid is an important

component for detecting objects at different scales [24], [28].

However, it has a slow detection speed and a large amount

of parameters. To accurately distinguish the defect types with

a large intraclass difference and interclass similarity, a simple

PFM which integrates multiple high-level features into one

feature map is designed to promote the highest-level feature

map have sufﬁcient sample feature information.

To fuse features with different sizes in the inverted pyramid,

as shown in Fig. 2, PFM is attached to the decoding structure

of the network, which is connected by yellow lines. In PFM,

four feature maps at different levels in the decoding network

are used as input. The output is the sum of the corresponding

elements of these feature maps after size transformation.

To keep the output resolution of the network, four input

branches in PFM are expanded to the same size by deconvo-

lution operations with different convolution kernel sizes and

strides.

III. TRAINING AND INFERENCE

A. Training

The defect detector based on keypoints model an object as

a single point (the center point of its bounding box), and then

uses this keypoint to get the target category, size (width and

height), and center point offset. In the RDN structure shown

in Fig. 2, the cls layer is used to output the class probability

of the target and a keypoint heatmap. Let I∈RW×H×3be

an input image of width Wand height H,thecls layer will

produce a keypoint heatmap ˆ

Yxyc =[0,1](W/R)×(H/R)×Cwhere

Ris the output stride of the network and Cis the number

of defect categories. In this work, Ris 4, and the defect

type is 6. A prediction ˆ

Yxyc =1 corresponds to a detected

object, while ˆ

Yxyc =0 is background. The training objective

is a penalty-reduced pixel-wise logistic regression with focal

loss

L=−1

N

xyc 1−ˆ

Yxycαlogˆ

Yxyc,if Yxyc =1

1−Yxycβˆ

Yxyc log1−ˆ

Yxyc,otherwise

(2)

Authorized licensed use limited to: Anhui University of Technology. Downloaded on April 09,2022 at 09:04:44 UTC from IEEE Xplore. Restrictions apply.

5005610 IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 71, 2022

Fig. 2. Overall architecture of RDN.

where ˆ

Yxyc represents the probability of category cat the

position (x,y),Yxyc is the value corresponding to the ground-

truth center point, which can be obtained by Gaussian kernel

exp (−((x−˜p2

x)+(y−˜p2

y)/2σ2

p)),where ˜p=(p/R)is

the center point position of ground truth object kon the low-

resolution output feature map, pis ground-truth center point

location, and σpis the standard deviation of object adaptation.

αand βare used to increase the loss of samples that are

difﬁcult to classify and reduce the penalty of locations around

the ground truth, respectively, [32], [33]. Nis the number

of objects in an image. In focal loss, the previous work has

shown that, more weight becomes concentrated on the hard

negative examples substantially as αincreases from 0 and 2,

substantially more weight becomes concentrated on the hard

negative examples. When α=2, the model achieves the best

detection performance [32]. On this basis, Law and Deng [33]

proposed a variant of focal loss and set βto 4, which allows

the network to mainly optimize the hard samples. In this work,

the same target detection algorithm is studied. Therefore, the

values of αand βarealsosetto2and4.

Suppose the truth coordinates of target kis (x1,y1,x2,y2),

then its size sk, i,e., width and height, is (x2−x1,y2−y1).

The loc_wh layer is used to predict the size of k, and it can

be trained with a smoothL1loss function deﬁned as

Lsize =1



k=1

smoothL1ˆ

Sk−sk(3)

where

smoothL1=0.5x2,if |x|<1

|x|−0.5,otherwise.

Skis the predicted size.

To compensate for the input image pixel position error

caused by down pooling operation in the network, the offset

size of center-point position is predicted in this work, and the

loss function is deﬁned as

Loff =1

N

xyc

smoothL1ˆ

O−p

R−˜p (4)

where ˆ

Odenotes the predicted center offset of target, pis

ground-true center point location, and ˜pis the position of p

on the low-resolution output feature map.

According to the above deﬁnitions, the overall training

objective of this work is to minimize the multitask loss

function, which is deﬁned as

Lxyc =L+λsizeLsize +Loff (5)

where λsize =0.1, and its value is determined by the experi-

mental results in Section IV-D of this work.

B. Inference

In the inference phase, to suppress the overlap box, the

3×3 max-pooling operation is ﬁrst used on the heatmap of

each category, and then the top 50 peaks of the remaining

response values are kept. Let ˆ

Pcbe the set of ndetected center

points ˆp=(ˆxi,ˆyi)n

i=1of class c, the detection coordinate of

the target is

ˆxi+δˆxi−ˆwi

2,ˆyi+δˆyi−

2,ˆxi+δˆxi+ˆwi

2,ˆyi+δˆyi−

2

(6)

where (δ ˆxi,δˆyi)is the center point offset and (ˆwi,ˆ

hi)is the

predicted size. For a speciﬁc test image, the detection result

is that the rectangular boxes produced by multiple scales and

ﬁltered after the nonmaximum suppression method [34].

IV. EXPERIMENTS

A. Implementation Details

For all experiments based on RDN, the input image size

is 384 ×384, the initial learning rate is 1.25 ×10−4,and

Authorized licensed use limited to: Anhui University of Technology. Downloaded on April 09,2022 at 09:04:44 UTC from IEEE Xplore. Restrictions apply.

WANG et al.: REAL-TIME STEEL SURFACE DEFECT DETECTION APPROACH WITH HIGH ACCURACY 5005610

TAB L E I I

DETECTION RESULTS OF DIFFERENT BASELINE NETWORKS

Fig. 3. Schematic of receptive ﬁeld size.

then decrease it to 1.25 ×10−5and 1.25 ×10−6at 60 and

120 epochs, and stop at 160 epochs. RDN is developed with

PyTorch v1.3.1. We conduct experiments on the NVIDIA Titan

X GPU hardware platform. The batch size is set to 24, and

Adam is used to optimizing the overall objective. To avoid

over ﬁtting of the model, several data augmentation methods,

such as shift, rotation, reﬂection, etc., are performed in training

phase. It is worth noting that the values of these hyperparame-

ters are obtained by referring to some related object detection

documents and veriﬁed and/or adjusted through a large number

of experiments [11], [24], [38]–[44].

The data used in this work comes from a surface defect

dataset named NEU-DET constructed by the Northeast Uni-

versity, China [14]. The database collects gray images of six

typical surface defects of the hot-rolled strip, including rolled-

in scale (Rs), scratches (Sc), pitted surface (Ps), inclusion (In),

patches (Pa), and crazing (Cr), and provides annotation ﬁles

for these defects. The dataset contains 300 images for each

type and therefore a total of 1800 for six types of surface

defects of the hot-rolled strip. Some of them are shown in

Fig. 4. For the defect detection task in this work, 70% of the

images (210 images for each of them) are randomly selected

to train the model, and the remains are used for model testing.

B. Defect Classiﬁcation on NEU

Defect classiﬁcation is one of the tasks of defect detec-

tion, and therefore a good defect classiﬁcation performance

should be positively correlated with a stronger defect detection

performance. Therefore, we report the performance of the

feature extraction method used in our work, such as VGG16,

Fig. 4. Six typical defect sample images in NEU-DET database. (a) PS.

(b) In. (c) RS. (d) Cr. (e) Sc. (f) Pa.

ResNet18, ResNet34, ResNet50, DLA34, ResNeXt50, and

ResNeXt101 [22], [27], [35]. As a result, each network can

classify all kinds of defects correctly, and the accuracy is

100%, except VGG16 which achieves the classiﬁcation accu-

racy of 99.63%. Moreover, the number of training parameters

are 250M, 11M, 21M, 23M, 15M, 42M, and 22M. Therefore,

ResNet18, which has the best classiﬁcation performance and

the least parameters, is the best basic network backbone.

C. Detection Performance of RDN

1) Detection Performance on NEU-DET: To achieve better

detection speed in detecting speciﬁc steel surface defects,

an end-to-end detection network is proposed in this work. The

detection results of defect samples using different improved

network structures are shown in Table II. It can be seen

that three baseline networks, i.e., ResNet18-dcn, ResNet18-ds,

and ResNet18-dsf, can detect the defects of the steel surface.

The mAP of these networks is 71.5%, 77.6%, 80.0%, and

the speed is 70, 70, and 64 FPS, respectively. Under the

baseline ResNet18-dsf, RDN achieves the best performance

with the mAP of 80.0% and the speed of 64 FPS. The average

precision (AP) is 87.0% for Ps, 84.9% for In, 64.4% for Rs,

53.7% for Cr, 95.9% for Sc, and 92.4% for Pa.

Compared with ResNet18-dcn, it can be found that

ResNet18-ds with SLC improves the mAP by 6.1%. For each

defect, the AP increased by 2.5%, 5.5%, 2.9%, 21.5%, 4.7%,

and 0.4%, respectively, and the detection speed remained

at 70 FPS, which shows that SCM integrated with shallow

Authorized licensed use limited to: Anhui University of Technology. Downloaded on April 09,2022 at 09:04:44 UTC from IEEE Xplore. Restrictions apply.

5005610 IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 71, 2022

TABLE III

ABLATION STUDY OF RDN

Fig. 5. Inﬂuence of different improved modules on test results. (a) Inﬂuence

of SCM. (b) Inﬂuence of PFM.

features can provide more accurate object information without

adding a lot of calculations. In more detail, as shown in

Fig. 5(a), it can be seen that the SCM is effective for small tar-

get detection. When the PFM is introduced into ResNet18-ds,

the AP of each defect is improved to some extent, and the

overall mAP is increased to 80.0%. This demonstrates that the

fusion of the pyramid feature is beneﬁcial to the identiﬁcation

and location of steel surface defects with large intraclass

differences and high interclass similarity. Speciﬁcally, a large

number of defect images with the high similarity between

target and background, such as crazing and pitted surfaces,

are successfully located, as shown in Fig. 5(b). Without bells

and whistles, RDN using the ResNet18-dsf as the baseline

architecture achieves the best tradeoff between speed and

precision.

2) Real-Time Analysis: In RDN, only 8.71 s were spent

when 540 images were detected, which means the average time

to detect an image is 0.016 s and the detection speed is 64 FPS.

In the actual production line, the shooting ﬁeld of a single

camera is 50–100 cm and the maximum production speed is

typically 30 m/s, which requires the detection equipment to

have at least 30–60 FPS detection speed [36]. Therefore, the

64 FPS detection speed of the proposed RDN in this work can

meet the real-time requirements in actual production.

D. Additional Experiments

Since RDN is composed of multiple subcomponents, it is

necessary to measure their effectiveness to the ﬁnal perfor-

mance. In addition, to obtain the optimal detection perfor-

mance on the same detection architecture, some experiments

are carried out to select appropriate hyperparameters, such as

regression loss function and loss weight.

1) Ablation Study: In the defect detection algorithm, the

backbone not only affects the model performance, but also

relates to the inference speed. In this work, to obtain the

best detection speed, three experiments with ResNet18-dcn,

ResNet34-dcn, and DLA34 as backbone were performed.

It can be seen from the second, third, and fourth columns of

Table III that as the complexity of the backbone decreases, the

model performance gradually deteriorates, while the detection

speed shows a positive change trend. In particular, when

ResNet18-dcn is used as the backbone, the model detection

speed can reach 70 FPS. To improve the model performance,

SCM is introduced, and the model detection accuracy is

improved from 71.5% to 77.6% as illustrated in column 5.

Furthermore, when the PFM is involved in our network, the

ﬁnal detection accuracy is improved to 80% mAP, as shown

in the sixth column.

2) Regression Loss: In object detection methods, the loca-

tion and size information related to the bounding box of the

object is usually obtained by regression. L1, L2, balanced L1,

and smooth L1 are main regression loss functions [38]. In this

work, we compare the detection performance of these four

regression losses. As a result, their corresponding mAP are

78.7%, 78.1%, 70.8%, and 80.0%, respectively. It can be found

that the smoothL1is superior to other loss functions in mAP.

3) Bounding Box Size Weight: To analyze the sensitivity

of the bounding box size weight λsize,wehavecompared

the performance of mAP with the increments of λsize with

0.05 and 0.1 when the values of λsize are in the range of (0,0.5]

and (0.5,1], respectively. The performance between (0,0.5]is

78.6%, 80.0%, 78.9%, 78.7%, 78.6%, 78.9%, 77.4%, 77.5%,

77.7%, and 76.7%, respectively. It can be seen that the

mAP performance is better when λsize is set to 0.1. In the

range of (0.5,1], the detection performance of the model is

76.8%, 76%, 74.2%, 73.8%, and 74.1%, respectively, which

shows a downward trend with the increase of λsize. Therefore,

loc_wh =0.1 is the best bounding box size weight.

Authorized licensed use limited to: Anhui University of Technology. Downloaded on April 09,2022 at 09:04:44 UTC from IEEE Xplore. Restrictions apply.

WANG et al.: REAL-TIME STEEL SURFACE DEFECT DETECTION APPROACH WITH HIGH ACCURACY 5005610

TAB L E I V

COMPARISON WITH STATE-OF -THE-ART METHODS

Fig. 6. Three kinds of SCM styles. (a) Same level composition, named SCM_S. (b) Higher-level composition, SCM_H. (c) Lower level composition, SCM_L.

dec and pool refer to deconvolution and max pooling.

E. Comparison With State-of-the-Art

To evaluate the effectiveness of the proposed RDN, several

state-of-the-art one-stage and two-stage detection methods are

compared in the predictive mAP and the detection time per

image when the same NEU-DET dataset is used. Two-stage

approaches, i.e., Faster RCNN [24] and Cascade RCNN [38],

which use the region proposal network to generate candi-

date bounding box, are considered for comparison. Another

two-stage steel surface defect detection structure named DDN

presented by Song and Yan [12] has also been compared.

For one-stage methods, seven anchor-based and anchor-free

detectors, i.e., M2Det [39], SSD [40], FCOS [41], ATSS [42],

YOLOv3 [43], improved YOLOv3 [17], and CenterNet [44],

are implemented. Among them, multiple backbones are per-

formed to get more competitive detection precision or speed,

such as faster RCNN, DDN, M2Det and CenterNet, and

so on. The detection results of different models are shown

in Table IV.

It can be seen from Table IV that two-stage detectors can

achieve better detection accuracy, and the mAP of Faster

RCNN, Cascade RCNN, and DDN is 77.9%, 73.3%, and

82.3%, respectively. However, the inference process for each

image of these models is very slow. The detection speed

of the DDN method is 11 FPS when ResNet-50 is selected

as the backbone. In contrast, one-stage methods can get

better detection speed, but the detection precision is unstable

relatively. Therefore, in this work, a portable detector based on

the pretrained model is constructed. Compared with the DDN

which has the highest detection mAP of 82.3% and speed

of 11 FPS, this work introduces the SLC and PFM to improve

the detection precision and uses ResNet with 18 convolution

layers as the backbone to reduce the number of model para-

meters. As a result, our proposed method achieves the mAP of

80.0%, and the speed of 64 FPS, which demonstrates that the

RDN method proposed in this work achieves the best detection

speed in detecting steel surface defects.

V. D ISCUSSION

A. Comparisons of Different SCM Styles

In Section IV-C, it is brieﬂy demonstrated that SCM

[as shown in Fig. 6(a)] can improve detection performance by

fusing features with the same resolution. However, whether

other combination manners of these layers may result in

better performance? To this end, this work designs two other

composite styles to integrate feature maps of different sizes,

as shown in Fig. 6(b) and (c).

Authorized licensed use limited to: Anhui University of Technology. Downloaded on April 09,2022 at 09:04:44 UTC from IEEE Xplore. Restrictions apply.

5005610 IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 71, 2022

TAB L E V

DETECTION PERFORMANCE OF DIFFERENT COMPOSITION STYLES

TAB L E V I

DETECTION PERFORMANCE OF COMBING DIFFERENT PFM LAYE R S

It can be seen from Table V that the lower-level composition

style that combines the large feature maps in the encoding

network and the small feature map in the decoding network

obtains the mAP of 77.2% and the speed of 61 FPS. The

higher-level composition style that combines the small feature

maps in the encoding network and the large feature map in

the decoding network obtains the optimal mAP of 80.2%.

However, the deconvolution operation added in this style

reduces the detection speed of the detector to 59 FPS, which is

slower than the same level composition style. As a comparison,

SCM with the same level combination, named SCM_S, has the

mAP of 80.0% and the speed of 64 FPS, which can meet the

minimum speed requirements of different industrial production

scenarios.

B. Comparison of Different Combination PFM Layers

To explore which level of feature integration can achieve the

best precision improvement without reducing the resolution

of the output feature map, this work compares the detection

performance of fusing features of different levels, such as

conv5 module in encoding network, up-conv1, up-conv2 and

up-conv3 module in decoding network. As in Section V-A,

we also take ResNet18-dcn as the baseline. The detection

results of different integration manners are shown in Table VI.

It can be found that after introducing SCM, the mAP of

the model is 77.6%. With the increase of the introduced

feature map, the detection accuracy is improved in different

degrees, while the detection speed is not signiﬁcantly reduced.

The mAP of merging two- or three-layer features is 77.8%,

78.6%, and 78.5%, respectively, and their speed bias is only

1 FPS. These show that the feature at different levels can

provide unique defect information and fusion more of them

is an effective way to improve the detection accuracy of steel

surface defects.

VI. CONCLUSION

In view of the high accuracy and low-speed problems of

current steel surface defect detectors, this work proposes a

real-time steel surface defect detection method, called RDN.

This is an end-to-end detection network that can provide defect

categories and precise locations. RDN uses a portable and

modular encoding and decoding network as the basic network

architecture to improve the detection speed. At the same

time, the SCM and PFM are designed to improve detection

accuracy. As a result, the detection method proposed in this

work achieves a real-time detection speed of 64 FPS and the

detection mAP of 80.0%.

Compared with the most state-of-the-art algorithms, our

model not only has faster detection speed but also has more

competitive detection accuracy and it can well identify defects

with a similar background. In addition, our model can detect

more defect morphology than the segmentation algorithm.

However, it cannot provide information on defect localization

in the pixel level as the segmentation algorithms can do. The

performance of our model is developed and obtained on the

NVIDIA Titan X GPU hardware platform. However, its detec-

tion speed may be affected by the computing power. Therefore,

a middle-end hardware device is a necessary prerequisite for

the application of our model.

In the future, our work can be improved in the following

aspects: 1) trying to use segmentation algorithm to get pixel-

level information of defects; 2) designing a detector to detect

the low-quality defect images independently to further improve

the accuracy of the model; and 3) try to embed our proposed

algorithm into mobile devices in the real production-line.

REFERENCES

[1] H. Di, X. Ke, Z. Peng, and Z. Dongdong, “Surface defect classiﬁcation

of steels with a new semi-supervised learning method,” Opt. Lasers Eng.,

vol. 117, pp. 40–48, Jun. 2019.

[2] L. Xu, G. Tian, L. Zhang, and X. Zheng, “Research of surface defect

detection method of hot rolled strip steel based on generative adversarial

network,” in Proc. Chin. Autom. Congr. (CAC), Nov. 2019, pp. 401–404.

[3] J. Gao, W. Yu, and C. He, “The research on defect recognition method

for rail magnetic ﬂux leakage detecting,” in Proc. Int. Conf. Meas., Inf.

Control, vol. 2, May 2012, pp. 745–750.

[4] Q. Luo, Y. Sun, P. Li, O. Simpson, L. Tian, and H. Yigang, “Generalized

completed local binary patterns for time-efﬁcient steel surface defect

classiﬁcation,” IEEE Trans. Instrum. Meas., vol. 68, no. 3, pp. 667–679,

Mar. 2019.

[5] K. Xu, S. Liu, and Y. Ai, “Application of Shearlet transform to

classiﬁcation of surface defects for metals,” Image Vis. Comput., vol. 35,

pp. 23–30, Mar. 2015.

[6] K. Liu, H. Wang, H. Chen, E. Qu, Y. Tian, and H. Sun, “Steel

surface defect detection using a new Haar–Weibull-variance model in

unsupervised manner,” IEEE Trans. Instrum. Meas., vol. 66, no. 10,

pp. 2585–2596, Oct. 2017.

[7] Z. Xue-Wu, D. Yan-Qiong, L. Yan-Yun, S. Ai-Ye, and L. Rui-Yu,

“A vision inspection system for the surface defects of strongly reﬂected

metal based on multi-class SVM,” Expert Syst. Appl., vol. 38, no. 5,

pp. 5930–5939, May 2011.

[8] S. Mei, H. Yang, and Z. Yin, “An unsupervised-learning-based approach

for automated defect inspection on textured surfaces,” IEEE Trans.

Instrum. Meas., vol. 67, no. 6, pp. 1266–1277, Jun. 2018.

[9] Q. Luo et al., “Surface defect classiﬁcation for hot-rolled steel strips

by selectively dominant local binary patterns,” IEEE Access,vol.7,

pp. 23488–23499, 2019.

[10] L. Yi, G. Li, and M. Jiang, “An end-to-end steel strip surface defects

recognition system based on convolutional neural networks,” Steel Res.

Int., vol. 88, no. 2, Feb. 2017, Art. no. 1600068.

[11] I. Konovalenko, P. Maruschak, J. Brezinová, J. Viˇnáš, and J. Brezina,

“Steel surface defect classiﬁcation using deep residual neural network,”

Metals, vol. 10, no. 6, p. 846, Jun. 2020.

[12] K. Song and Y. Yan, “A noise robust method based on completed local

binary patterns for hot-rolled steel strip surface defects,” Appl. Surf. Sci.,

vol. 285, no. 21, pp. 858–864, Nov. 2013.

Authorized licensed use limited to: Anhui University of Technology. Downloaded on April 09,2022 at 09:04:44 UTC from IEEE Xplore. Restrictions apply.

WANG et al.: REAL-TIME STEEL SURFACE DEFECT DETECTION APPROACH WITH HIGH ACCURACY 5005610

[13] W. Wang et al., “Surface defects classiﬁcation of hot rolled strip based

on improved convolutional neural network,” ISIJ Int., vol. 61, no. 5,

pp. 1579–1583, 2021.

[14] S. M. Azimi, D. Britz, M. Engstler, M. Fritz, and F. Mücklich,

“Advanced steel microstructural classiﬁcation by deep learning meth-

ods,” Sci. Rep., vol. 8, no. 1, Dec. 2018, Art. no. 2128.

[15] S. Wang, X. Xia, L. Ye, and B. Yang, “Automatic detection and

classiﬁcation of steel surface defect using deep convolutional neural

networks,” Metals, vol. 11, no. 3, p. 388, Feb. 2021.

[16] Y. He, K. Song, Q. Meng, and Y. Yan, “An end-to-end steel surface

defect detection approach via fusing multiple hierarchical features,”

IEEE Trans. Instrum. Meas., vol. 69, no. 4, pp. 1493–1504, Apr. 2020.

[17] M. Hatab, H. Malekmohamadi, and A. Amira, “Surface defect detection

using YOLO network,” in Intelligent Systems and Applications,K.Arai,

S. Kapoor, and R. Bhatia, Eds. Cham, Switzerland: Springer, 2021,

pp. 505–515.

[18] X. Kou, S. Liu, K. Cheng, and Y. Qian, “Development of a YOLO-V3-

based model for detecting defects on steel strip surface,” Measurement,

vol. 182, Sep. 2021, Art. no. 109454.

[19] W. Zhao, F. Chen, H. Huang, D. Li, and W. Cheng, “A new steel defect

detection algorithm based on deep learning,” Comput. Intell. Neurosci.,

vol. 2021, pp. 1–13, Mar. 2021.

[20] H. Dong, K. Song, Y. He, J. Xu, Y. Yan, and Q. Meng, “PGA-Net: Pyra-

mid feature fusion and global context attention network for automated

surface defect detection,” IEEE Trans. Ind. Informat., vol. 16, no. 12,

pp. 7448–7458, Dec. 2020.

[21] H. Wang, J. Zhang, Y. Tian, H. Chen, H. Sun, and K. Liu, “A sim-

ple guidance template-based defect detection method for strip steel

surfaces,” IEEE Trans. Ind. Informat., vol. 15, no. 5, pp. 2798–2809,

May 2019.

[22] K. Simonyan and A. Zisserman, “Very deep convolutional networks

for large-scale image recognition,” CoRR, vol. abs/1409.1556, pp. 1–14,

Apr. 2015.

[23] S. Xie, R. Girshick, P. Dollar, Z. Tu, and K. He, “Aggregated residual

transformations for deep neural networks,” in Proc. IEEE Conf. Comput.

Vis. Pattern Recognit. (CVPR), Jul. 2017, pp. 5987–5995.

[24] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards

real-time object detection with region proposal networks,” IEEE Trans.

Pattern Anal. Mach. Intell., vol. 39, no. 6, pp. 1137–1149, Jun. 2017.

[25] A. G. Howard et al., “MobileNets: Efﬁcient convolutional neural net-

works for mobile vision applications,” 2017, arXiv:1704.04861.

[26] Y. He, K. Song, H. Dong, and Y. Yan, “Semi-supervised defect classiﬁ-

cation of steel surface based on multi-training and generative adversarial

network,” Opt. Lasers Eng., vol. 122, pp. 294–302, Nov. 2019.

[27] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for

image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.

(CVPR), Jun. 2016, pp. 770–778.

[28] J. Dai, Y. Li, K. He, and J. Sun, “R-FCN: Object detection via

region-based fully convolutional networks,” in Proc. Adv. Neural Inf.

Process. Syst. (NIPS). Red Hook, NY, USA: Curran Associates, 2016,

pp. 379–387.

[29] R. K. Srivastava, K. Greff, and J. Schmidhuber, “Training very deep

networks,” in Proc. Adv. Neural Inf. Process. Syst. (NIPS),vol.2.

Cambridge, MA, USA: MIT Press, Dec. 2015, pp. 2377–2385.

[30] A. Veit, M. Wilber, and S. Belongie, “Residual networks are exponential

ensembles of relatively shallow networks,” in Proc. Adv. Neural Inf.

Process. Syst., May 2016, pp. 1–9.

[31] O. Ronneberger, “Invited talk: U-Net convolutional networks for

biomedical image segmentation,” in Bildverarbeitung für die Medi-

zin, K. H. Maier-Hein, K. Fritzsche, T. M. Deserno, T. Lehmann,

H. Handels, and T. Tolxdorff, Eds. Berlin, Germany: Springer, 2017,

p. 3.

[32] T. Lin, P. Goyal, R. B. Girshick, K. He, and P. Dollár, “Focal loss for

dense object detection,” CoRR, vol. abs/1708.02002, pp. 1–9, Aug. 2017.

[33] H. Law and J. Deng, “CornerNet: Detecting objects as paired keypoints,”

Int. J. Comput. Vis., vol. 128, pp. 642–656, Dec. 2020.

[34] K. He, X. Zhang, S. Ren, and J. Sun, “Spatial pyramid pooling in

deep convolutional networks for visual recognition,” IEEE Trans. Pattern

Anal. Mach. Intell., vol. 37, no. 9, pp. 1904–1916, Sep. 2015.

[35] F. Yu, D. Wang, E. Shelhamer, and T. Darrell, “Deep layer aggregation,”

2017, arXiv:1707.06484.

[36] J. Li, Z. Su, J. Geng, and Y. Yin, “Real-time detection of steel strip

surface defects based on improved YOLO detection network,” IFAC-

PapersOnLine, vol. 51, no. 21, pp. 76–81, 2018.

[37] J. Pang, K. Chen, J. Shi, and H. Feng, “Libra R-CNN: Towards balanced

learning for object detection,” in Proc. IEEE/CVF Conf. Comput. Vis.

Pattern Recognit. (CVPR), Jun. 2019, pp. 821–830.

[38] Z. Cai and N. Vasconcelos, “Cascade R-CNN: High quality object

detection and instance segmentation,” IEEE Trans. Pattern Anal. Mach.

Intell., vol. 43, no. 5, pp. 1483–1498, May 2021.

[39] Q. Zhao, T. Sheng, Y. Wang, Z. Tang, Y. Chen, L. Cai, and H. Ling,

“M2Det: A single-shot object detector based on multi-level feature

pyramid network,” in Proc. AAAI Conf. Artif. Intell., vol. 33, 2019,

pp. 9259–9266.

[40] W. Liu et al., “SSD: Single shot multibox detector,” in Computer

Vision—ECCV 2016, B. Leibe, J. Matas, N. Sebe, and M. Welling, Eds.

Cham, Switzerland: Springer, 2016, pp. 21–37.

[41] Z. Tian, C. Shen, H. Chen, and T. He, “FCOS: Fully convolutional

one-stage object detection,” in Proc. IEEE/CVF Int. Conf. Comput. Vis.

(ICCV), Oct. 2019, pp. 9626–9635.

[42] S. Zhang, C. Chi, Y. Yao, Z. Lei, and S. Z. Li, “Bridging the gap between

anchor-based and anchor-free detection via adaptive training sample

selection,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit.

(CVPR), Jun. 2020, pp. 9756–9765.

[43] J. Redmon and A. Farhadi, “YOLOv3: An incremental improvement,”

2018, arXiv:1804.02767.

[44] X. Zhou, D. Wang, and P. Krähenbühl, “Objects as points,” 2019,

arXiv:1904.07850.

Wen ya n Wa ng received the B.S. degree from the

School of Electrical Engineering, Industrial and

Commercial College, Anhui University of Technol-

ogy, Ma’anshan, China, in 2015, and the M.S. degree

from the School of Electrical Information and Engi-

neering, Anhui University of Technology, in 2018,

where she is currently pursuing the Ph.D. degree

with the School of Metallurgical Engineering.

Her research interests include deep learning, intel-

ligent inspection, data mining, and bioinformatics.

Chunfeng Mi received the B.S. degree from the

School of Electrical Engineering, Industrial and

Commercial College, Anhui University of Tech-

nology, Ma’anshan, China, in 2019, where he is

currently pursuing the M.S. degree with the School

of Electrical Information and Engineering.

His research interests include intelligent inspec-

tion, deep learning, and bioinformatics.

Ziheng Wu received the B.S. degree from

Tongling University, Tongling, China, in 2009, the

M.S. degree from Zhejiang Sci-Tech University,

Hangzhou, China, in 2012, and the Ph.D. degree

from the University of Science and Technology of

China, Hefei, China, in 2018.

He is currently a Lecturer with the Anhui Univer-

sity of Technology, Ma’anshan, China. His research

interests include machine learning, artiﬁcial intel-

ligence, gray systems theory, medical informatics,

science engineering, and intelligent control.

Kun Lu received the B.S. degree from the School of

Electrical Information and Engineering, Hohai Uni-

versity Wentian College, Ma’anshan, China, in 2016,

and the M.S. degree from the School of Electrical

Information and Engineering, Anhui University of

Technology, Ma’anshan, in 2019, where he is cur-

rently pursuing the Ph.D. degree with the School of

Management Science and Engineering.

He research interests include machine learning,

data mining, and bioinformatics.

Authorized licensed use limited to: Anhui University of Technology. Downloaded on April 09,2022 at 09:04:44 UTC from IEEE Xplore. Restrictions apply.

5005610 IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 71, 2022

Hongming Long received the Ph.D. degree from

the Department of Iron Metallurgy, Central South

University, Changsha, China, in 2007.

He is the Director of the Key Laboratory of Met-

allurgical Emission Reduction and Resources Recy-

cling, Ministry of Education, Ma’anshan, China.

From 2007 to present, he works with the Anhui

University of Technology, Ma’anshan. His research

interest includes emission reduction of ﬂue gas

pollutants (NOx, dioxin and PM2.5) in metallur-

gical industry, and comprehensive utilization of

metallurgical solid waste resources.

Mr. Long is a Member of Expert Committee of China Metal Society and a

Secretary-General of the Branch of Metallurgical Solid Waste Resource.

Baigen Pan received the B.S. degree from the

Department of Physics, School of Physics, Anqing

Normal University, Anqing, China, in 2003, and the

M.S. degree from the School of Instrument Science

and Opto-Electronics Engineering, Hefei University

of Technology, Hefei, China, in 2010.

His research interests include deep learning, data

mining, and bioinformatics.

Dan Li was born in 1976. She received the

bachelor’s and master’s degrees from the Anhui

University of Technology, Ma’anshan, China, in

1997 and 2004, respectively, and the Ph.D. degree

from the Nanjing University of Aeronautics and

Astronautics, Nanjing, China, in 2008.

She is mainly engaged in image processing,

machine vision, and autonomous navigation.

Jun Zhang received the bachelor’s degree from

the Hefei University of Technology, Hefei, China,

in 1995, the master’s degree from the Institute of

Intelligent Machine, Chinese Academy of Sciences,

Hefei, in 2004, and the Ph.D. degree from the

University of Science and Technology of China,

Hefei, in 2007.

He served with the University of Louisville,

Louisville, KY, USA, from 2009 to 2011, as a Post-

Doctoral Fellow. He is currently an Associate Pro-

fessor with the School of Electrical Engineering and

Automation, Anhui University, Hefei. He has published more than 40 articles

in international conferences and journals. He focuses on deep learning with

application to bioinformatics, cheminformatics, computer vision, and so on.

Peng Chen received the bachelor’s degree from the

Electronic Engineering Institute, PLA, Hefei, China,

in 1997, the master’s degree from the Kunming

University of Science and Technology, Kunming,

China, in 2003, and the Ph.D. degree from the

University of Science and Technology of China,

Hefei, in 2007.

He served with the City University of Hong Kong,

Hong Kong, in 2006, as a Senior Research Asso-

ciate, Howard University, Washington, DC, USA,

from 2008 to 2009, as a Post-Doctoral Fellow,

Nanyang Technological University, Singapore, from 2009 to 2010, as a

Research Fellow, and the King Abdullah University of Science and Tech-

nology (KAUST), Thuwal, Saudi Arabia, from 2012 to 2014, as a Post-

Doctoral Fellow. He is a Professor with the School of Computer Science and

Technology, Institute of Physical Science and Information Technology, Anhui

University, Hefei. He specializes in machine learning and data mining with

applications to bioinformatics, drug discovery, computer vision, and so on.

Bing Wang (Senior Member, IEEE) received the

B.S. and M.S. degrees from the Hefei University of

Technology, Hefei, China, in 1998 and 2004, respec-

tively, and the Ph.D. degree from the University of

Science and Technology of China, Hefei, in 2006.

He worked as a Senior Research Associate with

the City University of Hong Kong, Hong Kong,

from 2006 to 2007, and a Post-Doctoral Fellow with

the University of Louisville, Louisville, KY, USA,

and Vanderbilt University, Nashville, TN, USA,

from 2008 to 2012. He is currently serving as a Full

Professor with the School of Electrical and Information Engineering, Anhui

University of Technology, Ma’anshan. He has more than 150 publications.

His research interests mainly focus on machine learning, image processing,

computational biology, and chemoinformatics.

Authorized licensed use limited to: Anhui University of Technology. Downloaded on April 09,2022 at 09:04:44 UTC from IEEE Xplore. Restrictions apply.

A Highly Efficient and Lightweight Detection Method for Steel Surface Defect

Article

Full-text available

Jun 2024
J NONDESTRUCT EVAL

The detection of steel surface defects is of great significance to steel production. In order to better meet the requirements of accuracy, real-time, and lightweight model, this paper proposes a highly efficient and lightweight steel surface defect detection method based on YOLOv5n. Firstly, ODMobileNetV2 composed of MobileNetV2 and ODConv is used as the backbone to improve the defect feature extraction capability. Secondly, GSConv is utilized in the neck to achieve deep information fusion through channel concatenation and shuffling, enhancing the ability of feature fusion. Finally, this paper proposes a spatial-channel reconstruction block (SCRB) designed to suppress redundant features and improve the representation ability of defect features through feature separation and reconstruction. Experimental results show that this method achieves 84.1% mAP and 109 FPS on the NEU-DET dataset, and 72.9% mAP and 110.1 FPS on the GC10-DET dataset, enabling accurate and efficient detection. Furthermore, the number of parameters is only 5.04M, which has a significant lightweight advantage.

CABF-YOLO: a precise and efficient deep learning method for defect detection on strip steel surface

Article

Full-text available

Apr 2024
PATTERN ANAL APPL

Deep learning algorithms have gained widespread usage in defect detection systems. However, existing methods are not satisfied for large-scale applications on surface defect detection of strip steel. In this paper, we propose a precise and efficient detection model, named CABF-YOLO, based on the YOLOX for strip steel surface defects. Firstly, we introduce the Triplet Convolutional Coordinate Attention (TCCA) module in the backbone of the YOLOX. By factorizing the pooling operation, the TCCA module can accurately capture cross-channel features to identify the location information of defects. Secondly, we design a novel Bidirectional Fusion (BF) strategy in the neck of the YOLOX. The BF strategy enhances the fusion of low-level and high-level semantic information to obtain fine-grained information. Lastly, the original bounding box loss function is replaced by the EIoU loss function. In the EIoU loss function, the penalty term is redefined to consider the overlap area, central point, and side length of the required regressions to accelerate the convergence rate and localization accuracy. On the benchmark NEU-DET dataset and GC10-DET dataset, the experimental results show that the CABF-YOLO achieves superior performance compared with other comparison models and satisfies the real-time detection requirement of industrial production.

Surface defect detection of hot rolled steel based on multi-scale feature fusion and attention mechanism residual block

Article

Full-text available

Apr 2024

To improve the precision of defect categorization and localization in images, this paper proposes an approach for detecting surface defects in hot-rolled steel strips. The approach uses an improved YOLOv5 network model to overcome the issues of inadequate feature extraction capacity and suboptimal feature integration when identifying surface defects on steel strips. The proposed method achieves higher detection accuracy and localization precision, making it more competitive and applicable in real production. Firstly, the multi-scale feature fusion (MSF) strategy is utilized to fuse shallow and deep features effectively and enrich detailed information relevant to target defects. Secondly, the CSPLayer Res2Attention block (CRA block) residual module is introduced to reduce the loss of defect information during hierarchical transmission, thereby enhancing the extraction of fine-grained features and improving the perception of details and global features. Finally, the experimental results indicate that the mAP on the NEU-DET and GC10-DET datasets approaches 78.5% and 67.3%, respectively, which is 4.9% and 2.1% higher than that of the baseline. Meanwhile, it has higher precision and more precise localization capabilities than other methods. Furthermore, it also achieves 59.2% mAP on the APDDD dataset, indicating its potential for growth in further domains.

LCG-YOLO: A Real-time Surface Defect Detection Method for Metal Components

Article

Full-text available

Jan 2024

Surface defect inspection of metal components plays a critical role in ensuring product quality, enhancing production efficiency, and reducing costs, with particular emphasis on the detection for small-sized surface defects to ensure the safety and reliability of metal components during their usage. The existing detection methods of small size defects on the surface of metal components have some shortcomings, such as low precision and poor real-time performance. To solve these two problems, this paper proposes a real-time defect detection method based on the improved YOLO. Firstly, LSandGlass module is used to replace the residual module in the backbone network, which reduces information loss, eliminates the low-resolution feature layer, and minimizes the semantic loss. The network then uses lightweight Ghost convolution at the neck to extract network features. In addition, the convolutional block attention mechanism (CBAM) module is added to improve the detection precision of small size defects. Finally, the soft intersection over union (SIoU) is used to further enhance the target detection capability. The experiment was carried out on the self-made hexagonal bolt data set of typical commonly used metal components. The experimental results show that compared to the original YOLOv5, the mAP (0.5) is improved by 5.3% to 95.50%, and the reasoning FPS is improved by 21 fps to 95 fps. These results indicate that the proposed LCG-YOLO improves the real-time detection performance of metal component surface defects.

Human–machine knowledge hybrid augmentation method for surface defect detection based few-data learning

Article

Full-text available

Mar 2024
J INTELL MANUF

Visual-based defect detection is a crucial but challenging task in industrial quality control. Most mainstream methods rely on large amounts of existing or related domain data as auxiliary information. However, in actual industrial production, there are often multi-batch, low-volume manufacturing scenarios with rapidly changing task demands, making it difficult to obtain sufficient and diverse defect data. This paper proposes a parallel solution that uses a human–machine knowledge hybrid augmentation method to help the model extract unknown important features. Specifically, by incorporating experts' knowledge of abnormality to create data with rich features, positions, sizes, and backgrounds, we can quickly accumulate an amount of data from scratch and provide it to the model as prior knowledge for few-data learning. The proposed method was evaluated on the magnetic tile dataset and achieved F1-scores of 60.73%, 70.82%, 77.09%, and 82.81% when using 2, 5, 10, and 15 training images, respectively. Compared to the traditional augmentation method's F1-score of 64.59%, the proposed method achieved an 18.22% increase in the best result, demonstrating its feasibility and effectiveness in few-data industrial defect detection.

TD-Net:tiny defect detection network for industrial products

Article

Full-text available

Feb 2024

The detection of tiny defects in industrial products is important for improving the quality of industrial products and maintaining production safety. Currently, image-based defect detection methods are ineffective in detecting tiny and variously shaped defects. Therefore, this paper proposes a tiny defect detection network (TD-Net) for industrial products to improve the effectiveness of tiny defect detection. TD-Net improves the overall defect detection effect, especially the detection effect of tiny defects, by solving the problems of downsampling of tiny defects, pre-filtering of conflicting deep and shallow semantic information, and cascading fusion of multi-scale information. Specifically, this paper proposes the Defect Downsampling (DD) module to realize the defect information supplementation during the backbone downsampling process and improve the problem that the stepwise convolution easily misses the detection of tiny defects. Meanwhile, the Semantic Information Interaction Module (SIIM) is proposed, which fuses deep and shallow semantic features, and is designed to interact the fused features with shallow features to optimize the detection of tiny defects. Finally, the Scale Information Fusion Module (SIFM) is proposed to improve the Path Aggregation Network (PANet) for cascading fusion and information focus on different scale information, which enables further improvement of defect detection performance of TD-Net. Extensive experimental results on the NEU–DET data set (76.8 $$\%$$ % mAP), the Peking University PCB defect data set (96.2 $$\%$$ % mAP) and the GC10-DET data set (71.5 $$\%$$ % mAP) show that the proposed TD-Net achieves competitive results compared with SOTA methods with the equivalent parameter quantity.

DFFNet: a lightweight approach for efficient feature-optimized fusion in steel strip surface defect detection

Article

Full-text available

Jun 2024

Steel surface defect detection is crucial in manufacturing, but achieving high accuracy and real-time performance with limited computing resources is challenging. To address this issue, this paper proposes DFFNet, a lightweight fusion network, for fast and accurate steel surface defect detection. Firstly, a lightweight backbone network called LDD is introduced, utilizing partial convolution to reduce computational complexity and extract spatial features efficiently. Then, PANet is enhanced using the Efficient Feature-Optimized Converged Network and a Feature Enhancement Aggregation Module (FEAM) to improve feature fusion. FEAM combines the Efficient Layer Aggregation Network and reparameterization techniques to extend the receptive field for defect perception, and reduce information loss for small defects. Finally, a WIOU loss function with a dynamic non-monotonic mechanism is designed to improve defect localization in complex scenes. Evaluation results on the NEU-DET dataset demonstrate that the proposed DFFNet achieves competitive accuracy with lower computational complexity, with a detection speed of 101 FPS, meeting real-time performance requirements in industrial settings. Furthermore, experimental results on the PASCAL VOC and MS COCO datasets demonstrate the strong generalization capability of DFFNet for object detection in diverse scenarios.

AFF-Net: A Strip Steel Surface Defect Detection Network via Adaptive Focusing Features

Article

Jan 2024

In strip steel production, detecting surface defects is crucial for ensuring product quality and optimizing production line efficiency. However, detecting defects is complicated by the variations in size, complex structures, and the wide range of defect morphologies present in strip steel. To tackle these challenges, this paper proposes a strip steel surface defect detection network via adaptive focusing features (AFF-Net). Firstly, an adaptive focusing feature block (AFF-Block) is proposed, which applies the “Diffusion-Aggregation” thought. This block repositions and adaptively assigns weights to defect features, guiding the network to focus on defect features and more effectively capture defects’ spatial and morphological changes. Subsequently, a focused feature pyramid network (Foc-FPN) is proposed to enhance the network’s adaptability to complex defects through multi-scale focusing fusion. This innovative structure adaptively balances the semantic gap of defect features at different scales and alleviates the abstraction feature overload. The proposed algorithm achieved a mean Average Precision (mAP@IoU=0.5) of 83.5% on the public NEU-DET dataset for strip steel surface defects, surpassing the baseline network by 8.2%. Compared to existing models, this detection method strengthens the connection between defect characteristics and more effectively detects irregularly distributed defects in complex strip steel surface images.

EC-YOLO: Effectual Detection Model for Steel Strip Surface Defects Based on YOLO-V5

Article

Full-text available

Jan 2024

Defect detection is extensively utilized within the metal industry, particularly for identifying surface imperfections on steel strips. However, the current methods still face challenges in detecting small and elongated defects on steel strips. Such defects occupy a relatively small pixel percentage within the entire image. The repeated downsampling in convolutional networks, coupled with the dynamic changes in the receptive field, can result in the potential loss of these minute defects. To mitigate the problem, our paper proposes EC-YOLO, a real-time defect detection network for steel strips of the above peculiar defects. Firstly, the 1D convolution in the efficient channel attention bottleneck (EB) module enhances the feature extraction ability of the backbone for small and elongated defects, while also facilitating the attentional mechanism for modeling channel features. Secondly, Context Transformation Networks integrate cross-stage localized blocks, referred to as CC modules, to enhance the understanding of feature semantic contextual information. Thirdly, a self-constructed dataset containing both small and elongated defects is used for understanding where such defects are more relevant in feature fusion and extraction. On the public datasets GC10-DET and NEU-DET, the improved model achieves mean Average Precision (mAP) scores of 71% and 83%, respectively, surpassing the performance of other mainstream models. The mAP of the enhanced model on the SLD-DET dataset reaches 87.5%, demonstrating its superiority in detecting both small and elongated defects.

GCE-YOLOv5s Based Surface Defect Detection Algorithm for Strip Steel

Conference Paper

Nov 2023

Surface Defects Classification of Hot Rolled Strip Based on Improved Convolutional Neural Network

Article

Full-text available

May 2021

Surface defect classification of hot-rolled strip based on machine vision is a challenge task caused by the diversity of defect morphology, high inter-class similarity, and the real-time requirements in actual production. In this work, VGG16-ADB, an improved VGG16 convolution neural network, is proposed to address the problem of defect identification of hot-rolled strip. The improved network takes VGG16 as the benchmark model, reduces the system consumption and memory occupation by reducing the depth and width of network structure, and adds the batch normalization layer to accelerate the convergence speed of the model. Based on a standard dataset NEU, the proposed method can achieve the classification accuracy of 99.63% and the recognition speed of 333 FPS, which fully meets the requirements of detection accuracy and speed in the actual production line. The experimental results also show the superiority of VGG16-ADB over existing classification models for surface defect classification of hot-rolled strip.

A New Steel Defect Detection Algorithm Based on Deep Learning

Article

Full-text available

Mar 2021
Comput Intell Neurosci

In recent years, more and more scholars devoted themselves to the research of the target detection algorithm due to the continuous development of deep learning. Among them, the detection and recognition of small and complex targets are still a problem to be solved. The authors of this article have understood the shortcomings of the deep learning detection algorithm in detecting small and complex defect targets and would like to share a new improved target detection algorithm in steel surface defect detection. The steel surface defects will affect the quality of steel seriously. We find that most of the current detection algorithms for NEU-DET dataset detection accuracy are low, so we choose to verify a steel surface defect detection algorithm based on machine vision on this dataset for the problem of defect detection in steel production. A series of improvement measures are carried out in the traditional Faster R-CNN algorithm, such as reconstructing the network structure of Faster R-CNN. Based on the small features of the target, we train the network with multiscale fusion. For the complex features of the target, we replace part of the conventional convolution network with a deformable convolution network. The experimental results show that the deep learning network model trained by the proposed method has good detection performance, and the mean average precision is 0.752, which is 0.128 higher than the original algorithm. Among them, the average precision of crazing, inclusion, patches, pitted surface, rolled in scale and scratches is 0.501, 0.791, 0.792, 0.874, 0.649, and 0.905, respectively. The detection method is able to identify small target defects on the steel surface effectively, which can provide a reference for the automatic detection of steel defects.

Automatic Detection and Classification of Steel Surface Defect Using Deep Convolutional Neural Networks

Article

Full-text available

Feb 2021

Automatic detection of steel surface defects is very important for product quality control in the steel industry. However, the traditional method cannot be well applied in the production line, because of its low accuracy and slow running speed. The current, popular algorithm (based on deep learning) also has the problem of low accuracy, and there is still a lot of room for improvement. This paper proposes a method combining improved ResNet50 and enhanced faster region convolutional neural networks (faster R-CNN) to reduce the average running time and improve the accuracy. Firstly, the image input into the improved ResNet50 model, which add the deformable revolution network (DCN) and improved cutout to classify the sample with defects and without defects. If the probability of having a defect is less than 0.3, the algorithm directly outputs the sample without defects. Otherwise, the samples are further input into the improved faster R-CNN, which adds spatial pyramid pooling (SPP), enhanced feature pyramid networks (FPN), and matrix NMS. The final output is the location and classification of the defect in the sample or without defect in the sample. By analyzing the data set obtained in the real factory environment, the accuracy of this method can reach 98.2%. At the same time, the average running time is faster than other models.

Steel Surface Defect Classification Using Deep Residual Neural Network

Article

Full-text available

Jun 2020

An automated method for detecting and classifying three classes of surface defects in rolled metal has been developed, which allows for conducting defectoscopy with specified parameters of efficiency and speed. The possibility of using the residual neural networks for classifying defects has been investigated. The classifier based on the ResNet50 neural network is accepted as a basis. The model allows classifying images of flat surfaces with damage of three classes with the general accuracy of 96.91% based on the test data. The use of ResNet50 is shown to provide excellent recognition, high speed, and accuracy, which makes it an effective tool for detecting defects on metal surfaces.

FCOS: Fully Convolutional One-Stage Object Detection

Conference Paper

Full-text available

Oct 2019

Development of a YOLO-V3-based model for detecting defects on steel strip surface

Article

May 2021
MEASUREMENT

During steel strip production, mechanical forces and environmental factors cause surface defects of the steel strip. Therefore, detection of such defects is key to the production of high quality products. Moreover, surface defects of the steel strip cause great economic losses to the high-tech industry. So far, few studies have explored methods of identifying the defects, and most of the currently available algorithms are not sufficiently effective. Therefore, we developed an end-to-end defect detection model based on YOLO-V3. Briefly, the anchor-free feature selection mechanism was utilized to select an ideal feature scale for model training, replace the anchor-based structure, and shorten the computing time. Next, specially designed dense convolution blocks were introduced into the model to extract rich feature information, which effectively improves feature reuse, feature propagation, and enhances the characterization ability of the network. The experimental results show that, compared with other comparison models, the improved model proposed in this study has higher performance. For instance, the proposed model yielded 71.3% mAP on the GC10-DET dataset, and 72.2% mAP on the NEU-DET dataset.

Tracking Objects as Points

Chapter

Oct 2020

Tracking has traditionally been the art of following interest points through space and time. This changed with the rise of powerful deep networks. Nowadays, tracking is dominated by pipelines that perform object detection followed by temporal association, also known as tracking-by-detection. We present a simultaneous detection and tracking algorithm that is simpler, faster, and more accurate than the state of the art. Our tracker, CenterTrack, applies a detection model to a pair of images and detections from the prior frame. Given this minimal input, CenterTrack localizes objects and predicts their associations with the previous frame. That’s it. CenterTrack is simple, online (no peeking into the future), and real-time. It achieves $67.8\%$ MOTA on the MOT17 challenge at 22 FPS and $89.4\%$ MOTA on the KITTI tracking benchmark at 15 FPS, setting a new state of the art on both datasets. CenterTrack is easily extended to monocular 3D tracking by regressing additional 3D attributes. Using monocular video input, it achieves $28.3\%$ AMOTA@0.2 on the newly released nuScenes 3D tracking benchmark, substantially outperforming the monocular baseline on this benchmark while running at 28 FPS.

Surface Defect Detection Using YOLO Network

Chapter

Jan 2021

Detecting defects on surfaces such as steel, can be a challenging task because defects have complex and unique features. These defects occur in many production lines and vary from one production line to another. In order to detect these defects, the You Only Look Once (YOLO) detector which uses a Convolutional Neural Network (CNN), is used and received only minor modifications. YOLO is trained and tested on a dataset containing six kinds of defects to achieve accurate detection and classification. The network can also obtain the coordinates of the detected bounding boxes, giving the size and location of the detected defects. Since manual defect detection is expensive, labor-intensive and inefficient, this paper contributes to the sophistication and improvement of manufacturing processes. This system can be installed on chipsets and deployed to a factory line to greatly improve quality control and be part of smart internet of things (IoT) based factories in the future. YOLO achieves a respectable 70.66% mean average precision (mAP) despite the small dataset and minor modifications to the network.

Bridging the Gap Between Anchor-Based and Anchor-Free Detection via Adaptive Training Sample Selection

Conference Paper