Content uploaded by Yun-Kyu An
Author content
All content in this area was uploaded by Yun-Kyu An on Mar 20, 2019
Content may be subject to copyright.
Original Article
Structural Health Monitoring
1–16
ÓThe Author(s) 2019
Article reuse guidelines:
sagepub.com/journals-permissions
DOI: 10.1177/1475921718821719
journals.sagepub.com/home/shm
Deep learning–based autonomous
concrete crack evaluation through
hybrid image scanning
Keunyoung Jang
1
, Namgyu Kim
2
and Yun-Kyu An
1
Abstract
This article proposes a deep learning–based autonomous concrete crack detection technique using hybrid images. The
hybrid images combining vision and infrared thermography images are able to improve crack detectability while minimiz-
ing false alarms. In particular, large-scale concrete-made infrastructures such as bridge and dam can be effectively
inspected by spatially scanning the unmanned vehicle–mounted hybrid imaging system including a vision camera, an infra-
red camera, and a continuous-wave line laser. However, the expert-dependent decision-making for crack identification
which has been widely used in industrial fields is often cumbersome, time-consuming, and unreliable. As a target con-
crete structure gets larger, automated decision-making becomes more desirable from the practical point of view. The
proposed technique is able to achieve automated crack identification and visualization by transfer learning of a well-
trained deep convolutional neural network, that is, GoogLeNet, while retaining the advantages of the hybrid images. The
proposed technique is experimentally validated using a lab-scale concrete specimen with cracks of various sizes. The test
results reveal that macro- and microcracks are automatically visualized while minimizing false alarms.
Keywords
Concrete crack detection, deep convolutional neural network, hybrid image scanning, vision image, infrared thermogra-
phy, structural health monitoring
Introduction
Crack is one of the critical damage types in concrete
used as a representative construction material. Initial
concrete cracks inevitably produced by shrinkage dur-
ing the curing process are not typically considered as
structural damage. However, severe structural-level
cracks generated by external loads may propagate
along the surface and through-thickness directions by
repeated external loads. The propagated cracks may
lead to serious structural problems such as strength
reduction, corrosion of reinforcing rebar, and even
structural failure. Therefore, concrete cracks need to be
detected and managed from their early stage from the
safety point of view. During the last few decades, the
expert-dependent visual inspection has been widely per-
formed to manage the concrete cracks. However, the
visual inspection is often time-consuming, labor inten-
sive, unreliable, and sometimes not applicable to inac-
cessible areas of a target structure.
1
To tackle the technical issues, a number of non-
destructive evaluation (NDE) techniques have been
proposed. Fiber optic sensors were embedded in a tar-
get structure to detect concrete cracks thanks to the
advantages of being thin, lightweight, and independent
from electromagnetic interference.
2
However, their ser-
vice life is often shorter than the design life of civil
infrastructures, making the embedded sensors difficult
to be replaced. Moreover, its contact sensing mechan-
ism may suffer from several technical problems such as
limited sensing area and imperfect bonding condition.
Contact-type ultrasonic techniques have also been pro-
posed as alternatives.
3–5
Although the ultrasonic tech-
niques have high crack detectability, ultrasonic waves
are highly attenuated in concrete materials. Moreover,
1
Department of Architectural Engineering, Sejong University, Seoul,
South Korea
2
Department of Civil and Environmental Engineering, Sejong University,
Seoul, South Korea
Corresponding author:
Yun-Kyu An, Department of Architectural Engineering, Sejong University,
209, Neungdong-ro, Gwangjin-gu, Seoul 143-747, South Korea.
Email: yunkyuan@sejong.ac.kr
the complex signal interpretation is typically required
due to the inhomogeneous characteristics of concrete
materials. They also require a number of spatial mea-
surement points to cover the large inspection area and
similarly share the contact sensing mechanism limita-
tions. As another contact-type NDE technique, impact-
echo techniques have been proposed.
6,7
They are easy
to use and suitable for single-side inspection, but unex-
pected reflections coming from structures’ boundaries
may interrupt the measured data analysis.
Alternatively, fiber-reinforced concrete techniques have
been proposed as a sensor-less technique.
8,9
By insert-
ing conductive fibers into concrete materials, the con-
crete structure is able to be used as a sensor itself.
However, their crack detectability highly depends on
the manufacturing process of the fiber-reinforced con-
crete, and the performance under environmental varia-
tions such as temperature and humidity changes is not
fully validated yet.
To overcome the limitations of the contact-type
NDE techniques, various non-contact NDE techniques
have been proposed. Digital image correlation (DIC)
technique compares digital photographs at different
deformation stages for crack detection.
10,11
However, it
is more suitable for well-controlled laboratory environ-
ments than field inspection due to the requirement of
precise camera alignment and reference points of a tar-
get surface. Compared to DIC, vision-based crack
detection technique is more practical and widely
accepted thanks to its advantages of simplicity, non-
contact, cost effectiveness, and intuitive interpretation
of data.
12,13
Recently, vision cameras have been com-
bined with robots
14
or unmanned aerial vehicles
(UAVs)
15–17
to detect cracks on inaccessible and exten-
sive areas of a target structure. However, the perfor-
mance of the vision technique highly depends on the
image capturing conditions such as capturing angle,
illuminance, and undesired contaminant in the air or
target surface, causing false alarms. Alternatively, one
of the promising techniques for crack detection is laser
infrared (IR) thermography. The laser IR thermogra-
phy is able to similarly provide intuitive crack images in
a fully non-contact way. The superiority of the laser IR
thermography over the vision-based techniques for
crack detection is that it can detect subsurface damage
as well as surface damage and is robust against sensing
environments by employing laser excitation sources,
which has been already proven by applying it to metal-
lic structures and semiconductor chips.
18,19
However,
the excessive sensitivity to crack detection may inversely
disturb precise crack evaluation in concrete because of
a number of non-structural-level initial concrete cracks.
As the amount of sensing data collected from a large
target structure is getting larger, the expert-dependent
data interpretation becomes more time-consuming and
cumbersome. Thus, there have been a number of trials
to automate the data interpretation in these days. In
particular, deep convolutional neural networks (CNNs)
have been applied to classify vision images for pave-
ment crack detection,
20
nuclear power plant damage
inspection,
21
steel box girder crack identification,
22
and
concrete crack detection.
23,24
Although a number of
effective CNN architectures have been developed and
proven depending on data types and applications, the
sensing data–driven false alarm issues could not be
resolved yet.
In this study, a hybrid image scanning (HIS) system
combining the vision and laser IR thermography tech-
niques is newly developed, and deep CNN–based
autonomous concrete crack evaluation algorithm is
proposed. The proposed technique has the following
superiorities over the existing techniques: (1) fully non-
contact, non-destructive, and fast crack evaluation even
in inaccessible areas of a large concrete structure can be
effectively achieved by mounting the HIS system onto
UAVs; (2) data-driven false alarms can be remarkably
reduced by retaining the advantages of vision and IR
images; (3) the limited field of view (FOV) issues of the
vision and IR cameras, which is one of the technical
hurdles on data analysis, are resolved by developing a
time–spatial-integrated (TSI) coordinate transform;
and (4) autonomous decision-making for crack detec-
tion is accomplished by employing a tailored deep
CNN process. The developed system and algorithm are
experimentally validated using lab-scale concrete speci-
men with various size cracks, as a core technology
before embedding onto UAVs.
This article is organized as follows. Section ‘‘The
HIS system’’ explains the configuration and working
principle of the HIS system. Then, section ‘‘The deep
CNN–based crack evaluation algorithm’’ shows the
overall deep CNN process including signal and image
processing. Subsequently, the HIS system and deep
CNN algorithm are experimentally validated using a
lab-scale concrete specimen with real cracks of various
sizes in section ‘‘Experimental validation.’’ This article
concludes with a brief summary and discussions in sec-
tion ‘‘Conclusion.’’
The HIS system
Figure 1 shows the HIS system composed of excitation,
sensing, and control units. The excitation unit consists
of a continuous-wave (CW) line laser, a line beam gen-
erator, a collimator, and a focusing lens, which gener-
ates thermal waves onto a target concrete structure.
The sensing unit comprising vision and IR cameras
records the surface condition and the corresponding
thermal wave propagation along the concrete structure
2Structural Health Monitoring 00(0)
while spatially scanning. Then, the control computer in
the control unit activates the excitation and sensing
units and analyzes the saved data using the control and
processing programs coded by LabVIEW
Ò
and
MATLAB
Ò
, respectively. The HIS system will be
mounted on UAVs for moving along the predeter-
mined scanning route as shown in Figure 1. Note that
the excitation and sensing units are synchronized with
the control unit and controlled by the control computer
in the control unit.
The detailed working principle of the HIS system is
as follows. Once the control computer in the control
unit sends out control signals to the excitation and sen-
sing units, the laser driver generates a current signal to
activate the CW laser emitting a point laser beam. The
point laser beam is transformed to a line-shaped laser
beam through the line beam generator in the excitation
unit. Once the line-shaped laser beam is focused onto a
target surface through the collimator and focusing lens,
the thermal waves are generated along the target sur-
face. Simultaneously, the vision and IR cameras in the
sensing unit are operated to acquire the surface condi-
tion and thermal wave responses. Here, the thermal
wave responses are measured by only the IR camera
because the invisible range laser source is used. When
the control signal is transmitted to UAVs from the con-
trol computer, the HIS system automatically scans the
target structure along the predetermined scanning
route. Then, the measured vision and IR images which
are time and spatially changed within each FOV are
instantaneously transmitted and saved in the control
computer as raw vision (V
R
) and IR (I
R
) images,
respectively. The V
R
and I
R
images need to be
processed for precise crack evaluation because they are
varied in the time and spatial domains.
The deep CNN–based crack evaluation
algorithm
Since the I
R
and V
R
images obtained in a broad area
become massive, the expert-dependent decision-making
is quite time-consuming and cumbersome. Thus, not
only signal or image processing but also the deep
learning–based autonomous decision-making process is
strongly desirable. The main superiority of the algo-
rithm is that the I
R
and V
R
images are simultaneously
used for the autonomous decision-making process,
making it possible to minimize false alarms. This chap-
ter explains how crack information is automatically
extracted and visualized from the I
R
and V
R
images.
The overall procedure of the proposed deep CNN–
based crack evaluation algorithm is shown in Figure 2.
Since the I
R
and V
R
images acquired by spatially scan-
ning the HIS system are continuously changed in the
time and spatial domains, precise crack evaluation is
difficult. Thus, the spatially scanned I
R
and V
R
images
as a function of time need to be eventually converted to
the spatially integrated images. The details of each step
are explained in the subsequent subsections.
Image distortion calibration
Since the I
R
and V
R
images are often distorted due to
the wide angles of the camera lens, distortion calibra-
tion is needed for precise crack evaluation. In this
Figure 1. Schematics of the proposed hybrid image scanning (HIS) system.
Jang et al. 3
study, the camera calibration algorithm developed by
Zhang
25
is used, because the IR camera can also be
assumed as a pin-hole vision camera model
26
s~
m=AR
jt½
~
Mwith A=
fxskew cfxcx
0fycy
001
2
6
43
7
5,
Rjt½=
r11 r12 r13 t1
r21 r22 r23 t2
r31 r32 r33 t3
2
6
43
7
5
ð1Þ
where sis an arbitrary scale factor; ~
m=½xyz1T
and ~
M=½XYZ1Trepresent the camera and
world coordinates, respectively; Aand ½Rjtare each
camera’s intrinsic and extrinsic parameters, respectively;
in particular, fxand fyare the focal lengths; cxand cyare
the principal points and skew cfxis the skew coefficient
in the camera’s intrinsic parameters; and rij and tkare
the rotation and translation matrices, respectively. The
pin-hole camera model describes the mathematical rela-
tionship between the three-dimensional (3D) real world
coordinate and the projection on the two-dimensional
(2D) image plane. The calibration marker represents the
3D real world coordinate. Without loss of generality,
the calibration marker is assumed on Z=0.Then,a
homography matrix (H)between the calibration marker
and the image is defined as
H=h1h2h3
½=lAr
1r2t½ð2Þ
where lis an arbitrary scalar and r1and r2denote the
elements of R. Given an image of the calibration mar-
ker, Hcan be estimated based on maximum likelihood
criterion.
25
Assuming that Gaussian noise with zero
mean and the covariance matrix Lmi, the maximum
likelihood estimation of Hcan obtained by minimizing
the following objective function
J=X
i
mi^
mi
ðÞ
TL1
mimi^
mi
ðÞwith ^
mi=1
hT
3
~
Mi
hT
1
~
Mi
hT
2
~
Mi
ð3Þ
where
hiis the ith row of H. Zhang et al. assume that
Lmi=s2Ifor all i. This is reasonable if points are
extracted independently with the same procedure.
Equation (3) becomes a non-linear least-squares prob-
lem, and the non-linear minimization is conducted
Figure 2. Overview of the deep CNN–based crack evaluation algorithm. The I
C
and V
C
images are the distortion-calibrated IR and
vision images obtained from the I
R
and V
R
images, respectively. The I
ROI
and V
ROI
images denote the time–spatial-integrated IR and
vision images through the TSI coordinate transformation, respectively. The I
P
image represents the signal-processed IR images, and
the V
D
image is the resultant images obtained by the deep CNN process of the V
ROI
image. Then, the I
M
images are the crack region
images of the I
P
image selected by matching the crack regions of V
D
. Next, the crack existence of the I
M
images is evaluated by the
deep CNN process, and the I
D
images include only crack information. Finally, the final image represents only crack features by
mapping the I
D
images on the V
D
image.
4Structural Health Monitoring 00(0)
using the Levenberg–Marquardt algorithm (LMA).
27
Non-linear optimization like an LMA requires an ini-
tial guess. Assuming initial guess to homogeneous
equations x=½
hT
1
hT
2
hT
3T, equation (3) can be
rewritten as
Lx=
~
MT0Tu~
MT
0T~
MTv~
MT
x=0ð4Þ
When npoints are obtained in one image, L becomes
a2n39 matrix. As xis defined up to a scale factor,
the solution is well known to be the right singular vec-
tor of L associated with the smallest singular value.
Since L is numerically poor, the results can be enhanced
by performing a simple data normalization. Once His
estimated, it gives using the fact that r1and r2are
orthonormal
28
hT
1ATA1h2=0ð5Þ
hT
1ATA1h1=hT
2ATA1h2ð6Þ
Each homography provides the two basic constraints
on the camera intrinsic. Three independent orientations
are sufficient to solve for camera intrinsic linearly. If A
is known by closed-form solution,
29
the extrinsic para-
meters can be readily obtained
r1=lA1h1
r2=lA1h2
t=lA1h3
ð7Þ
Once the intrinsic and extrinsic parameters of the IR
and vision cameras are obtained using the calibration
marker, the I
C
and V
C
images can be, respectively,
obtained from the I
R
and V
R
images using equation (1)
as shown in Figure 3.
TSI coordinate transform
Because the HIS system continuously moves along the
predetermined scanning route, the physical inspection
areas on the I
C
and V
C
images are also continuously
changed as a function of time. Thus, it is difficult to
analyze thermal wave propagation on the entire region
of interest (ROI) using the I
C
images. In this step, the
I
C
and V
C
images are, respectively, transformed to the
spatially integrated IR (I
ROI
) and vision (V
ROI
) images
using the TSI coordinate transform. Here, the I
C
and
V
C
images share the same ROI but different spatial
resolutions. Note that the I
C
images are more complex
to be analyzed than the V
C
images, because they depend
on the laser excitation parameters.
First, the analysis area exposed to laser excitation
needs to be determined within FOV because the line
laser excitation may not cover the entire FOV.
Assuming that the HIS system scans along the horizon-
tal direction (x-axis in Figure 4), the intensity profile of
the line laser beam typically follows a Gaussian distri-
bution
19
along the y-axis as shown in Figure 4. Thus,
the analysis area can be determined by tracing the mid-
points along the x-axis and their affected boundaries
along the y-axis. Here, the mid-points can be selected
using m(x)of the Gaussian distribution, and the
affected boundary is able to be obtained by calculating
95% confidence interval of the Gaussian distribution.
Note that the analysis area physically means where the
Figure 3. Image distortion calibration using a calibration marker.
Jang et al. 5
enough thermal energy is injected by the line laser
beam to induce thermal wave propagation within the
I
C
images.
Next, the determined analysis areas are spatially
integrated as a function of time using the following TSI
coordinate transform, making it possible to reconstruct
the I
ROI
and V
ROI
images as shown in Figure 5
x
y
t
2
43
5=
0
0
1=v
0
1
0
v
0
0
0
mxðÞ
0
2
43
5
x
y
t
1
2
6
43
7
5ð8Þ
where m(x)2s<y<m(x)+2s,vis the scanning speed,
and the superscript means the transformed
coordinate. The y-axis is constant because only hori-
zontal directional scanning is assumed in this study.
The TSI coordinate transform is based on the physical
phenomenon that a specific spatial point is heated and
subsequently cooled by the line laser exposure as time
passed. The x-axis data on the I
C
images can be
regarded as the thermal variation in the time domain at
a specific point of FOV, and the t-axis data on the I
C
images are able to be considered as the thermal change
in the spatial domain at a specific time. Thus, each data
can be converted into the new integrated ROI coordi-
nate, that is, x,y, and taxis, using equation (8).
The I
ROI
images eventually show as if the entire ROI is
simultaneously and uniformly heated and subsequently
cooled in the spatially stationary condition
Figure 4. Determination of the analysis area on the I
C
images.
Figure 5. Overview of the TSI coordinate transform.
6Structural Health Monitoring 00(0)
x
y
=00v
010
x
y
t
2
43
5ð9Þ
Similarly, the V
C
image can be reconstructed using
equation (9). Since there is no laser excitation, the data
are simply integrated in the spatial domain.
Phase mapping and spatial derivative
Since the time-varying I
ROI
images cannot properly
reveal multiple cracks, additional data processing pro-
cedures such as phase mapping and spatial derivative
are necessary for precise multiple crack visualization
(Figure 6). In particular, macrocracks are typically
overwhelmed, while microcracks are hidden due to the
amplitude difference of the crack-induced features. The
phase mapping process enables cracks of various sizes
to be effectively visualized by normalizing the crack-
induced features along all pixels of interest. First, all
pixel values of the I
ROI
images are transformed to the
complex values along the taxis using Hilbert
transform
30
Hx
,y,t
ðÞ=P1
pð
‘
‘
IROI x,y,tðÞ
ttdtð10Þ
where Pis the Cauchy value of the integral and tis the
short time interval. Then, the instantaneous phase val-
ues (u(x,y,t)) of each pixel are simply obtained
ux,y,t
ðÞ=arctan Im H x,y,t
ðÞ½
Re H x,y,t
ðÞ½
ð11Þ
where Re and Im represent the real and imaginary
parts, respectively. Equation (11) physically means that
the responses are normalized between –pand p,mak-
ing it possible to effectively visualize even hidden micro-
cracks. However, not only crack-induced feature but
also undesired noise components might be augmented
by the phase mapping process. Thus, the denoising pro-
cess is subsequently carried out. First, u(x,y,t)is
accumulated along the taxis
[x,yðÞ=X
t
ux,y,t
ðÞ ð12Þ
where [(x,y)is the accumulation data along the
taxis. Then, the spatial derivative is subsequently
applied to [(x,y)along the xdirection which is the
scanning direction assumed in this study
Fx
,yðÞ=∂[x,yðÞ
∂xð13Þ
where F(x,y)is the spatial derivative value. The I
P
image can be obtained by reassigning F(x,y)to the
xand y-coordinate. Finally, the I
P
image visualizes
multiple cracks without noise components in the static
conditions covering the entire ROI.
The deep CNN process
Once the V
ROI
image is reconstructed by the TSI coor-
dinate transform, cracks are automatically extracted
through the deep CNN process. In this study, a pre-
trained deep CNN model, that is, GoogLeNet,
31
is
used for transfer learning. The GoogLeNet is one of
the well-known multi-layered CNN models designed
for visual pattern classification. It consists of 22 layers
including 9 inception modules as well as general convo-
lutional layers as shown in Figure 7. Here, each incep-
tion module is designed with 1 31 convolutional
layers at the beginning before the large convolutions
for reducing the dimensionality of feature maps. The
detailed structure of the inception module is shown in
Figure 7. For transplanting the GoogLeNet model into
concrete crack detection, the last two layers, that is, the
softmax and classification layers, are retrained with a
training set, having two classification outputs, that is,
intact and crack.
As for network training and validation, in total
20,000 images including concrete crack and non-crack
(intact) images are prepared by augmenting and seg-
menting 200 raw images. The representative images are
shown in Appendix 1. Among them, 9000 crack images
and 9000 intact images are used for network training,
and the other 1000 crack images and 1000 intact images
Figure 6. Phase mapping and spatial derivative.
Jang et al. 7
are selected as the validation set. All the prepared
images are then resized to 224 3224 33 pixels, main-
taining the aspect ratio in consideration of the
GoogLeNet’s input layer. Here, the training and vali-
dation sets are definitely distinct from each other. The
stochastic gradient descent with momentum is used as
a solver with 20 train epochs, and an initial learning
rate of 0.0001 is used.
23
Note that a high-performance
graphics processing unit which has 12 GB memory and
3840 cores is employed to expedite the network training
and also classification processes.
Once the tailored deep CNN is trained, the V
ROI
image is fed to the network for automated crack detec-
tion. To reduce false alarms, the V
ROI
image is scanned
by 16 different-sized masks without an overlapping
region as shown in Figure 8. The ranges of the mask
sizes are 122–144 pixels and 163–192 pixels in the hori-
zontal and vertical axes, respectively. The
corresponding 16 different probability maps are then
obtained and averaged for establishing a single prob-
ability map as shown in Figure 8. Here, the probability
map has the same resolution as the V
ROI
image. Each
pixel on the probability map has a positive value rang-
ing from 0 to 1. As the pixel value is, for example,
closer to 1, it indicates that there is a higher probability
of crack existence within the pixel.
Next, potential crack regions can be defined in the
V
ROI
image by selecting the probability values exceed-
ing 20% as shown in Figure 9. However, a lot of noise
components are still included in the potential crack
regions. Thus, the statistical denoising process is subse-
quently conducted for precise crack evaluation. A med-
ian filter is applied to the potential crack regions, and
the probability density function of the corresponding
pixel values is estimated by fitting a Weibull distribu-
tion which is one of the extreme value distributions.
32
Figure 7. Overview of the deep CNN architecture established using GoogLeNet.
Figure 8. The deep CNN process using the V
ROI
image.
8Structural Health Monitoring 00(0)
Then, the threshold value corresponding to a one-sided
99% confidence interval is established and applied to
all the pixel values for constructing the V
D
image shown
in Figure 9.
Decision-making by image matching
Although the V
D
image is able to automatically pro-
vide clear crack information, there can be a number of
data-driven false alarms due to rough surface condi-
tions, illuminance, contaminants, and so on. On the
other hand, the I
P
image can minimize the false alarms
thanks to its robustness against the arbitrary distur-
bances.
33
To reduce such false alarms, the image
matching between the V
D
and I
P
images is performed
as shown in Figure 10. First, the pixel resolution of the
V
D
image is reduced to have the same pixel resolution
as the I
P
image, because the IR camera typically has a
much lower pixel resolution than the vision camera
one. Then, the potential crack regions are selected on
the V
D
image using rectangular masks. Subsequently,
the corresponding crack regions are automatically
marked with the same size rectangular masks on the I
P
image. The marked crack regions of the I
P
image are
then extracted and resized as the resolution of
224 3224 33 pixels, defined as the I
M
images, for the
deep CNN process as shown in Figure 10(a). Note that
the rectangular mask location might not be exactly
matched between the V
D
and I
P
images due to their
pixel resolution mismatch. Nevertheless, the subsequent
CNN results using the I
M
images are not significantly
affected, because the I
M
images are used for double-
checking crack existence rather than crack quantifica-
tion. Next, the deep CNN process is repeated on the I
M
images except for mask scanning, as shown in Figure
10(b). After the deep CNN process, only crack images,
coined as the I
D
images, are retained and mapped on
the V
D
image by resizing them to the original rectangu-
lar mask size as depicted in Figure 10(c) and (d).
Finally, the final image shows only crack information
by retaining the advantages of the vision and IR
images, making it possible to reduce the vision data–
driven false alarms and enhance the reliability of crack
evaluation.
Figure 9. The statistical denoising process.
Figure 10. The decision-making procedure: (a) selection of the potential crack regions, (b) deep CNN process, (c) crack
identification, and (d) crack mapping on the V
D
image.
Jang et al. 9
Experimental validation
Test setup
The HIS system is experimentally validated using a lab-
scale concrete specimen with multiple cracks. Figure 11
shows the lab-scale test setup of the HIS system with
the concrete specimen located 700 mm away from the
HIS system. The control computer sends out control
signals to the CW line laser to generate the line laser
beam with an invisible wavelength of 950 nm. The line
laser beam with the size of 5 3200 mm
2
is targeted
through a collimator to the specimen. The peak inten-
sity of the line laser beam is set to 25 mW/mm
2
, which
is chosen by considering the thermal conductivity of
concrete (0.8 W m
21
k
21
) and scanning speed (23 mm/
s). The corresponding thermal waves of the concrete
specimen are recorded for 22 s by the IR camera (A65,
FLIR) with a frame rate of 30 Hz, a spectral range of
3–5 mm, and a resolution of 640 3512 pixels. The sur-
face images of the concrete specimen are also recorded
for 22 s by the vision camera (Hero 4, GoPro) with a
frame rate of 30 Hz and a resolution of 3264 32448
pixels. Here, the IR and vision cameras share the ROI.
In this study, a scanning jig is used to simulate the
UAV-mounted scanning mechanism as the very first
stage. Note that the HIS system will be mounted onto
UAVs or unmanned robots instead of the scanning jig
for practical usages although its miniaturization and
optimization are still underway. In particular, the
weight of the CW line laser can be reduced up to less
than 1.5 kg using the lightweight ceramic cooler and
packaging case.
The specially designed concrete specimen has the
dimensions of 1000 3500 3100 mm
3
and the com-
pressive strength of 103 MPa. The specimen is pre-
pared by mixing cement, silica sand, fly ash, super-
plasticizer, and water. The detailed mixing composition
is summarized in Table 1. During the curing process,
150-mm-width acrylic slots are inserted to make artifi-
cial cracks. The generated artificial cracks are divided
into two types, that is, macrocrack (˜500 mm) and
microcrack (\500 mm), defined in this study for conve-
nience. In addition, a fake crack with 1 mm width is
created using a pencil for the positive false alarm test.
The target ROI with the dimensions of 750 3240 mm
2
is defined so that macro- and microcracks, fake crack,
and non-cracked areas can be included (Figure 12).
Test results
Image distortion calibration. Once the V
R
and I
R
images
are obtained using the HIS system, the V
C
and I
C
images are obtained by conducting the calibration pro-
cess using a calibration marker shown in Figure 13.
Figure 13 shows the representative V
R
and V
C
images,
revealing that the images captured by the wide-angle
vision camera are distorted and successfully calibrated.
TSI coordinate transform. As expected, the V
C
and I
C
images within the ROI are time and spatially changed,
making it difficult to be analyzed as they are. To recon-
struct the V
ROI
and I
ROI
images, the TSI coordinate
transform is performed. First, the m(x)and svalues are
computed on the I
C
images. The determined ROI has a
height of 240 mm equivalent to 533 pixels on the I
C
images. Subsequently, the V
ROI
and I
ROI
images are
constructed using the TSI coordinate transform as
defined in equations (8) and (9). Figure 14(a) and (b)
Table 1. Mixing composition of the concrete specimen (%).
Cement
(type)
Silica
sand
Fly ash Super-plasticizer Water
100 (III) 100 15 0.9 35
Figure 11. Lab-scale test setup of the HIS system.
10 Structural Health Monitoring 00(0)
shows the V
ROI
image and the representative I
ROI
image at 1 s after laser excitation at each spatial point,
respectively. Even though the line laser is sequentially
scanned along the x-axis within the ROI, it looks as if
the entire ROI is simultaneously and uniformly heated
on the I
ROI
images. In reality, laser heating is not per-
fectly uniform on the entire ROI, but thermal wave
generation is enough to analyze the crack existence
within the ROI. Although crack existence can be intui-
tively observed in the I
ROI
images, they are still changed
in the time domain and have a number of unwanted
noise components as displayed in Figure 14(b).
Phase mapping and spatial derivative. To precisely evaluate
multiple cracks, the phase mapping and spatial deriva-
tive processes are subsequently applied to the I
ROI
images. It can be observed from Figure 15 that cracks
of various sizes are clearly visualized in the I
P
image
without undesired noise components. In particular,
microcracks are well visualized regardless of the ampli-
tude difference even when macro- and microcracks
coexist in a single image.
The deep CNN process. To automatically detect cracks
in the V
ROI
image, the V
ROI
image is fed to the pre-
trained deep CNN. The mask resolutions ranging from
122 3163 to 144 3192 pixels are used in this study.
Figure 12. Concrete specimen with various cracks and the target ROI.
Figure 13. Representative V
R
and V
C
images at 10 s.
Figure 14. TSI coordinate transform results: (a) V
ROI
image
and (b) representative I
ROI
image at 1 s.
Jang et al. 11
The representative outputs of the CNN process are dis-
played in Appendix 2. The entire deep CNN process is
repeated three times, and then its probability maps are
averaged to reduce the resultant errors. Based on the
probability map, the potential crack regions in the
V
ROI
image can be identified. Subsequently, the statisti-
cal denoising process is conducted, and the V
D
image is
obtained by mapping the potential crack locations on
the V
ROI
image as shown in Figure 16(a). Here, the rec-
tangular masks indicate the possible crack regions. The
performance of the deep CNN process using the vision
image can be evaluated by calculating the reliability
indices such as precision and recall
Precision =Tp
Tp +Fp ð14Þ
Recall =Tp
Tp +Fn ð15Þ
where Tp, Fp, and Fn represent the true positive, false
positive, and false negative, respectively. The precision
and recall values are computed as 59.84% and 97.26%,
respectively. The precision value physically representing
the false-positive alarm is relatively low, because the
fake crack is recognized by the deep CNN process as a
real crack. On the other hand, the recall value corre-
sponding to the false-negative alarm is relatively high,
meaning that cracks of various sizes are successfully
detected by the vision-based deep CNN process thanks
to the well-controlled laboratory conditions.
Decision-making by image matching. In order to reduce the
vision data–driven false alarms, the image matching
process is performed. The potential crack regions on
the I
P
image are selected by matching the crack loca-
tions on the V
D
image as shown in Figure 16. Then, the
extracted I
M
images are all tested by the deep CNN
process again. Only crack images are saved as the I
D
images and used for mapping on the V
D
image,
resulting in the final image. Otherwise, non-crack
images identified by the deep CNN process are dis-
carded as shown in Figure 10. The image matching
results show that the fake crack outlined by the dash-
single dotted box (yellow in color) is clearly filtered out
on the I
D
image as shown in Figure 16(b). Finally,
Figure 17 shows the final image containing only crack
information. To compare the performances of crack
detectability between the V
D
and final images, Table 2
summarizes the precision and recall indices. The two
indices are clearly enhanced when the hybrid images
including vision and IR images are used for the deep
CNN process. In particular, the precision index is
remarkably increased due to the fake crack. Then, the
recall index is also increased, meaning that the final
Figure 15. The I
P
image.
Figure 16. The deep CNN process results with crack masks:
(a) V
D
image and (b) I
D
images mapped on the I
P
image.
Table 2. Comparison of crack detectability between the vision
and final images (%).
Vision image
(V
D
image)
Final image
(V
D
and I
D
images)
Precision 59.84 98.72
Recall 97.26 99.23
12 Structural Health Monitoring 00(0)
image provides higher reliability for crack detection
than the V
D
image.
Integration of the HIS system with UAVs
Since the HIS system used in the preliminary indoor
tests is relatively large and heavy, its miniaturization
and packaging is now progressing to reduce the physi-
cal size and weight for mounting onto UAVs. The mod-
ified HIS system will be mounted onto the sticking-type
UAV as shown in Figure 18. The sticking-type UAV
can inspect a target structure by sticking the target sur-
face, making it possible to effectively reduce movement
and vibration during data acquisition of the HIS sys-
tem. Also, the effective working distance between the
HIS system and the target surface is able to be retained
for precise crack evaluation. Furthermore, the sticking-
type UAV is robust against unexpected turbulence
around the large civil infrastructures. The correspond-
ing outdoor tests will be performed on cable-stayed
grand bridges in South Korea.
Conclusion
This study presented deep learning–based concrete
crack detection using hybrid images. An HIS system
combining vision and IR thermography images was
newly developed for unmanned vehicle or robot-
mounted autonomous crack inspection of large-scale
concrete structures. Then, a deep CNN–based autono-
mous crack detection algorithm using the hybrid
images was proposed. The proposed system and algo-
rithm were experimentally validated using a lab-scale
concrete specimen with cracks of various sizes as the
very first stage of this concept. The test results revealed
that macro- and microcracks automatically and suc-
cessfully are visualized by minimizing false alarms. In
particular, false-negative and false-positive alarms were
remarkably reduced using hybrid images compared to
using only the vision image, resulting in crack detection
reliability improvement.
As a follow-up study, the proposed HIS system is
now being miniaturized and optimized to mount it onto
UAVs. Then, outdoor tests under varying environmen-
tal and operational conditions will be thoroughly car-
ried out, and it will be applied to real civil
infrastructures such as bridges, dams, and buildings.
Various real data such as shadow, dust on the surface,
rust, etc. will be additionally trained for real applica-
tion. The proposed technique is able to become a pro-
mising crack inspection alternative in large civil
infrastructures by minimizing inspection time, false
alarms, and unreliable experts’ intervention.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with
respect to the research, authorship, and/or publication of this
article.
Funding
The author(s) disclosed receipt of the following financial sup-
port for the research, authorship, and/or publication of this
article: The research described in this article was financially
Figure 17. The final image.
Figure 18. Sticking-type UAV with the HIS system: (a)
schematic design and (b) the prototype model.
Jang et al. 13
supported by a grant (17SCIP-C116873-02) from
Construction Technology Research Program funded by the
Ministry of Land, Infrastructure and Transport of Korean
government and Basic Science Research Program of the
National Research Foundation of Korea (NRF) funded by
the Ministry of Science, ICT and Future Planning
(2015R1C1A1A01052625).
References
1. Chang P, Flatau A and Liu S. Review paper: health mon-
itoring of civil infrastructure. Struct Health Monit 2003;
2: 257–267.
2. Maheshwari M, Annamdas V, Pang J, et al. Crack moni-
toring using multiple smart materials; fiber-optic sensors
& piezo sensors. Int J Smart Nano Mater 2017; 8: 41–55.
3. Dumoulin C and Deraemaeker A. Real-time fast ultraso-
nic monitoring of concrete cracking using embedded
piezoelectric transducers. Smart Mater Struct 2017; 26:
104006.
4. Ham S, Song H, Oelze M, et al. A contactless ultrasonic
surface wave approach to characterize distributed crack-
ing damage in concrete. Ultrasonics 2017; 75: 46–57.
5. Menendez E, Victores J, Montero R, et al. Tunnel struc-
tural inspection and assessment using an autonomous
robotic system. Automat Constr 2018; 87: 117–126.
6. Hlava Z. Detection of crack in a concrete element by
impact-echo method. Ultrasound 2009; 64: 12–16.
7. Li B, Ushiroda K, Yang L, et al. Wall-climbing robot for
non-destructive evaluation using impact-echo and metric
learning SVM. Int J Intell Robot Appl 2017; 1: 255–270.
8. Han B, Zhang K, Yu X, et al. Electrical characteristics
and pressure-sensitive response measurements of carboxyl
MWNT/cement composites. Cement Concrete Compos
2012; 34: 794–800.
9. Chen P and Chung D. Carbon fiber reinforced concrete
for smart structures capable of non-destructive flaw
detection. Smart Mater Struct 1993; 2: 22–33.
10. McCormick N and Lord J. Digital image correlation.
Mater Today 2010; 13: 52–54.
11. Helm J. Digital image correlation for specimens with mul-
tiple growing cracks. Exp Mech 2008; 48: 753–762.
12. Jahanshahi M, Masri S, Padgett C, et al. An innovative
methodology for detection and quantification of cracks
through incorporation of depth perception. Mach Vision
Appl 2013; 24: 227–241.
13. Koch C, Paal S, Rashidi A, et al. Achievements and chal-
lenges in machine vision-based inspection of large con-
crete structures. Adv Struct Eng 2014; 17: 303–318.
14. Ho H, Kim K, Park Y, et al. An efficient image-based
damage detection for cable surface in cable-stayed
bridges. NDT&E Int 2013; 58: 18–23.
15. Kim H, Ahn E, Cho S, et al. Comparative analysis of
image binarization methods for crack identification in
concrete structures. Cement Concrete Res 2017; 99: 53–61.
16. Zhong X, Peng X, Yan S, et al. Assessment of the feasi-
bility of detecting concrete cracks in images acquired by
unmanned aerial vehicles. Automat Constr 2018; 89:
49–57.
17. Ellenberg A, Kontsos A, Moon F, et al. Bridge related
damage quantification using unmanned aerial vehicle ima-
gery. Struct Control Health Monit 2016; 23: 1168–1179.
18. An YK, Yang J, Hwang S, et al. Line laser lock-in ther-
mography for instantaneous imaging of cracks in semi-
conductor chips. Opt Laser Eng 2015; 73: 128–136.
19. Yang J, Hwang S, An YK, et al. Multi-spot laser lock-in
thermography for real-time imaging of cracks in semicon-
ductor chips during a manufacturing process. J Mater
Process Tech 2016; 229: 94–101.
20. ZhangA,WangK,LiB,etal.Automatedpixel-levelpave-
ment crack detection on 3D asphalt surfaces using a deep-
learning network. Comput-Aided Civ Inf 2017; 32: 12297.
21. Chen FC and Jahanshahi R. NB-CNN: deep learning-
based crack detection using convolutional neural network
and Naı
¨ve Bayes data fusion. IEEE T Ind Electron 2017;
65: 17519431.
22. Xu Y, Bao Y, Chen J, et al. Surface fatigue crack identifi-
cation in steel box girder of bridges by a deep fusion con-
volutional neural network based on consumer-grade
camera images. Struct Health Monit. Epub ahead of print
2 April 2018. DOI: 10.1177/1475921718764873.
23. Cha YJ, Choi W and Bu
¨yu
¨ko
¨ztu
¨rk O. Deep learning-
based crack damage detection using convolutional neural
networks. Comput Aided Civil Inf 2017; 32: 361–378.
24. Kim H, Ahn E, Shin M, et al. Crack and noncrack classi-
fication from concrete surface images using machine
learning. Struct Health Monit. Epub ahead of print 23
April 2018. DOI: 10.1177/1475921718768747.
25. Zhang Z. A flexible new technique for camera calibration.
IEEE T Pattern Anal Mach Intell 2010; 22: 1330–1334.
26. Vidas S, Lakemond R, Denman S, et al. A mask-based
approach for the geometric calibration of thermal-infrared
cameras. IEEE T Instrum Meas 2012; 61: 1625–1635.
27. More J. The Levenberg-Marquardt algorithm: implemen-
tation and theory. In: Watson GA (ed.) Lecture notes in
mathematics, vol. 630. New York: Springer, 1977.
28. Kanatani K, Ohta N and Kanazawa Y. Optimal homo-
graphy computation with a reliability measure. IEICE T
Inform Syst 2000; E83-D: 1369–1374.
29. Brown D. Close-range camera calibration. Photogram
Eng 1971; 37: 855–866.
30. Hahn S. Hilbert transforms in signal processing.Nor-
wood, MA: Artech House, 1996.
31. Szegedy C, Liu W, Jia Y, et al. Going deeper with convo-
lutions. In: Proceedings of the IEEE conference on com-
puter vision and pattern recognition (CVPR), Boston,
MA, 7–12 June 2015. New York: IEEE.
32. An YK, Park B and Sohn H. Complete noncontact laser
ultrasonic imaging for automated crack visualization in a
plate. Smart Mater Struct 2013; 22: 025022.
33. An YK, Kim J and Sohn H. Laser lock-in thermography
for detection of surface-breaking fatigue cracks on
uncoated steel structures. NDT&E Int 2014; 65: 54–63.
14 Structural Health Monitoring 00(0)
Appendix 1. Representative training images.
Jang et al. 15
Appendix 2. Representative outputs of the CNN process.
16 Structural Health Monitoring 00(0)