ArticlePDF Available

Deep learning-based automatic volumetric damage quantification using depth camera

Authors:

Abstract and Figures

A depth camera or 3-dimensional scanner was used as a sensor for traditional methods to quantify the identified concrete spalling damage in terms of volume. However, to quantify the concrete spalling damage automatically, the first step is to detect (i.e., identify) the concrete spalling. The multiple spots of spalling can be possible within a single structural element or in multiple structural elements. However, there is, as of yet, no method to detect concrete spalling automatically using deep learning methods. Therefore, in this paper, a faster region-based convolutional neural network (Faster R-CNN)-based concrete spalling damage detection method is proposed with an inexpensive depth sensor to quantify multiple instances of spalling simultaneously in the same surface separately and consider multiple surfaces in structural elements. A database composed of 1091 images (with 853 × 1440 pixels) labeled for volumetric damage is developed, and the deep learning network is then modified, trained, and validated using the proposed database. The damage quantification is automatically performed by processing the depth data, identifying surfaces, and isolating the damage after merging the output from the Faster R-CNN with the depth stream of the sensor. The trained Faster R-CNN presented an average precision (AP) of 90.79%. Volume quantifications show a mean precision error (MPE) of 9.45% when considering distances from 100 cm to 250 cm between the element and the sensor. Also, an MPE of 3.24% was obtained for maximum damage depth measurements across the same distance range.
Content may be subject to copyright.
Automation in Construction, 2019 (99), 114-124
1
Deep learning-based automatic volumetric damage quantification using a
depth camera
Gustavo H. Beckmana, Dimos Polyzoisb, Young-Jin Chac*
a M.Sc. Student, Department of Civil Engineering, University of Manitoba, Winnipeg, MB, Canada
b Professor, Department of Civil Engineering, University of Manitoba, Winnipeg, MB, Canada
c Assistant Professor, Department of Civil Engineering, University of Manitoba, Winnipeg, Canada,
*Corresponding Author: young.cha@umanitoba.ca
Abstract:
A depth camera or 3-dimensional scanner was used as a sensor for traditional methods to quantify
the identified concrete spalling damage in terms of volume. However, to quantify the concrete
spalling damage automatically, the first step is to detect (i.e., identify) the concrete spalling. The
multiple spots of spalling can be possible within a single structural element or in multiple structural
elements. However, there is, as of yet, no method to detect concrete spalling automatically using
deep learning methods. Therefore, in this paper, a faster region-based convolutional neural
network (Faster R-CNN)-based concrete spalling damage detection method is proposed with an
inexpensive depth sensor to quantify multiple instances of spalling simultaneously in the same
surface separately and consider multiple surfaces in structural elements. A database composed of
1,091 images (with 853
1,440 pixels) labeled for volumetric damage is developed, and the deep
learning network is then modified, trained, and validated using the proposed database. The damage
quantification is automatically performed by processing the depth data, identifying surfaces, and
isolating the damage after merging the output from the Faster R-CNN with the depth stream of the
sensor. The trained Faster R-CNN presented an average precision (AP) of 90.79%. Volume
quantifications show a mean precision error (MPE) of 9.45% when considering distances from 100
cm to 250 cm between the element and the sensor. Also, an MPE of 3.24% was obtained for
maximum damage depth measurements across the same distance range.
Keywords: Convolutional neural network; deep learning; depth sensor; concrete spalling; volume
quantification.
1. Introduction
With the aging of concrete, structural elements in service conditions deteriorate, leading to the
failure of structures. As any other material, concrete has a lifespan and is susceptible to damage,
which can be a result of external agents such as overloading, excessive usage, or simply material
degradation. Structural condition assessments constitute a necessary means by which to evaluate
and keep track of structural capabilities.
To overcome the limitations of visual inspections, which depend on the judgement and
ratings of trained inspectors, the usage of sensors and other data-acquisition techniques along with
Automation in Construction, 2019 (99), 114-124
2
computational methods can be implemented [1-3]. Strain gauge extensometers [4], accelerometers
[5], and computer simulations [6] are some of the tools that can be used to assess structural integrity
under certain scenarios, identifying or predicting damage. Despite its accuracy, a contact sensor-
based data acquisition network is often expensive and difficult to manipulate, especially when
considering long-term monitoring.
Computer-vision sensors present an alternative for data acquisition. By extracting and
processing data from images, systems can be designed to operate from a distance. Two-
dimensional (2D) imaging and high-speed video have been used to determine parameters such as
displacements [7-8] and surface deformation [9], to detect and quantify potential structural hazards
[10-11], and to acquire the dynamic properties of structures in motion [12].
The detection and quantification of potential structural hazards through 2D imaging can
benefit from machine learning implementations due to the distinct visual characteristics
corresponding to each kind of damage. Chen et al. [13] implemented a support vector machine
(SVM)-based method for corrosion detection on steel bridges. Cha et al. [10] proposed a
convolution neural network (CNN) capable of effectively detecting cracks. Region-based CNNs
have also proven to be a useful tool for fast, accurate object detection and localization [14-15], and
their concept was successfully brought into the structural health monitoring field, enabling the
detection of multiple types of structural damage [16].
With the incorporation of depth into 2D imaging, three-dimensional (3D) analysis became
a technique that has recently been introduced into the scope of some studies regarding structural
health monitoring. Whereas 2D images could lead to the detection and classification of damage
such as cracks, 3D data open opportunities for other measurements to be performed, allowing the
detection and quantification of volumetric damage, such as delamination, spalling, and other
material losses caused by external agents.
There are not many methodologies involving 3D imaging concrete health assessment reported in
the literature, and most of the published articles make use of the structure-from-motion (SfM)
approach [17], where 2D RGB images, in conjunction with scaling parameters and other
properties, are merged together to generate a 3D virtual model. Torok et al. [18] developed an
SfM-based methodology that consisted of data collection from a robotic platform in a post-disaster
scenario, focusing on the 3D reconstruction of the elements, damage recognition, and geometrical
characteristics.
Another approach to obtaining 3D data is to use special cameras that contain all red, green,
and blue channels and, additionally, an extra depth channel (RGB-D), containing both color and
depth sensors on the same device. When considering pavement health assessments, several authors
proposed methodologies to detect and quantify potholes through a 3D point cloud extracted from
inexpensive consumer depth cameras. A model for pothole volume quantification utilizing both
RGB and depth data derived from the Microsoft Kinect V1 sensor was proposed [19-22]. However,
these studies implemented damage quantification methodologies using a depth camera over
damage that were already known by the operator. Moreover, the above literature proposed fixed
depth camera setups, where the distance between the sensor and the object is previously measured
Automation in Construction, 2019 (99), 114-124
3
and used within the algorithm. This approach restricts their usage in various scenarios and makes
it impractical for them to be used in any automated implementations.
The majority of concrete elements, such as beams and slabs, are flat. Other constructive
elements, such as walls, can also be flat; thus, it is necessary to differentiate concrete surfaces from
other types of flat surfaces that may be present in the analyzed frame. Furthermore, the existing
quantification methods cannot identify spalling or any other type of volumetric damage
independently and will only provide the amount of volumetric loss relative to the analyzed surface.
Therefore, a damage detection and localization method becomes necessary to allow for a fully
automated depth camera damage assessment system.
To overcome the limitations mentioned above, we propose a fully automatic damage detection,
localization, and quantification method via the integration of an inexpensive consumer RGB-D
sensor, Microsoft Kinect V2, and Faster R-CNN to detect damage. The proposed method does not
require any input of premeasured distance between the analyzed element and the RGB-D sensor
and is able to operate in a fully mobile setup. The remainder of this paper is composed of section
2: methodology, section 3: experimental procedures, section 4: results and analysis, and section 5:
conclusion.
2. Methodology
The main objective of this study is to automatically detect and localize and accurately quantify the
amount of concrete spalled from a concrete element using an inexpensive consumer depth camera
through the integration of an advanced region-based deep convolutional neural network (R-CNN)
[23]. Its flowchart is shown in Figure 1. Overall, the algorithm is divided into four steps: 1) RGB-
D sensor-based RGB image and depth data collection; 2) the detection and localization of damage
using a deep, faster R-CNN-based method, providing a bounding box for the detected damage (the
details are explained in section 2.2 and its subsections); 3) the segmentation of the element’s
surface utilizing the bounding box provided from step 2 and point cloud information; and 4) the
segmentation and quantification of the detected damage extracted from the surface provided from
step 3 and point cloud information, as shown in Figure 1.
Figure 1. Algorithm flow chart
2.1 RGB-D sensor
Automation in Construction, 2019 (99), 114-124
4
The Microsoft Kinect V2 was utilized as a depth camera. Its technical parameters are listed in
Table 1. It provides RGB image data and point cloud data with resolutions of 1,920
1,080 pixels
and 512
424 pixels, respectively. The utilized RGB-D camera put into practice the time-of-flight
(ToF) concept, which consists of a device with both an infrared (IR) emitter and an IR sensor. The
IR emitter projects the infrared light onto the subject, and the light bounces off of the surfaces and
back to the device, where the IR sensor captures the reflected light. The distance between the IR
sensor and the IR emitter is known, thus the device is able to determine 3D coordinates for each
of the sensor’s pixels based on how long the IR light took to travel from the emitter to the sensor.
2.1.1 Point cloud acquisition
MATLAB fully supports Microsoft Kinect V2 interface and provides accessible functions to
acquire raw depth data. The depth device outputs a 3D point mesh, which is already converted into
local coordinates using the sensor’s center pixel as its origin. The sensor’s internal algorithm
assigns not-a-number (NaN) values to depth pixels not acquired properly by the sensor on all three
arrays (x, y, and z). Through these implementations, it is possible to isolate the coordinates of each
point. The overall depth frame is 512 pixels wide and 424 pixels tall.
Due to lens distortion and vignetting, the depth values at the extremes of the frame are
susceptible to greater error rates. The reviewed literature [22] recommends the usage of the central
300
300 pixels of the depth sensor to avoid this issue and work distances within the range of 1.0
m to 2.5 m from the sensor plane to the subject to obtain maximum precision. Based on this, these
same settings were used in data acquisition.
2.1.2 Noise filtering
To reduce the amount of noise on the depth image, a median filter was implemented. The median
filter covered a 5
5 pixel area. The median filter works by replacing each pixel in the depth map
with the median value of its neighbors. This assures that the denoised depth map does not contain
any value that did not previously exist within the dataset.
Table 1. Microsoft Kinect V2 characteristics
Microsoft Kinect V2
RGB camera
1,920
1,080 pixels
Depth camera
512
424 pixels
Max depth distance
4.5 m
Min depth distance
0.5 m
Vertical field of view
6
Horizontal field of view
7
2.2. Damage detection using Faster R-CNN
To detect and localize the concrete spalling damage on various concrete beams, the Faster R-CNN-
based damage detection method was adopted (Cha et al., 2018). To detect concrete spalling, the
Automation in Construction, 2019 (99), 114-124
5
Faster R-CNN-based method only uses the RGB image data provided by RGB-D sensor. The
Faster R-CNN was originally developed by Ren et al. [15] to overcome the limitations of
traditional R-CNN [14] and Fast R-CNN [14] for object detection and localization. The original
CNN does not provide any bounding box or localization information on the detected object within
an image, but the R-CNN and Fast R-CNN provide bounding boxes within an image to localize
the detected object. However, the two traditional region-based methods have disadvantages in
computational efficiently and running time in providing the bounding box.
Even though the Fast R-CNN was improved in terms of computational cost and running
time by adopting a selective search method compared to R-CNN, it is still not able to provide real-
time detection and localization of the object. Ren et al. [15] developed a region proposal network
(RPN) to improve the computational cost to localize the detected object. The developed RPN is
integrated to the existing Fast R-CNN by sharing the same CNN architecture as that shown in
Figure 2 to reduce computational cost.
Figure 2. Faster R-CNN-based damage detection
2.2.1. Region proposal network (RPN)
The goal of the RPN is to take an image as input and output region proposals, as shown in Figure
2. Each of the proposals has a score corresponding to the membership of a set of classes against
the background. To reduce computational cost, the Fast R-CNN method and the RPN share a
common set of convolutional layers.
The input image passes through a convolutional network that outputs a set of convolutional
feature maps. Then, a sliding window runs over these generated convolutional feature maps. At
each sliding-window location, multiple region proposals are predicted based on a predefined
number of anchor boxes. An anchor is a reference box with a set of scales and aspect ratios and is
centered at the sliding window in question. The code uses three scales and three aspect rations by
Automation in Construction, 2019 (99), 114-124
6
default, which results in nine anchors for each sliding position. A positive or negative classification
is computed for each anchor considering the interest-over-union (IoU) between the analyzed
anchor and ground-truth bounding boxes on the image. Anchors with overlaps greater than 70%
are classified as positive (1), whereas overlaps smaller than 30% are classified as negative (0).
Anchors with overlaps between 30% and 70% are not considered for the training objective. The
anchors along with their respective classifications are fed into a sliding convolutional layer,
followed by a rectified linear unit (ReLU) layer and then onto a fully connected layer.
The features output by the fully connected layer are fed into a classification layer (softmax)
and into a regressor layer. The regressor layer determines the bounding boxes for the predictions,
and the classifier outputs a probability score varying from 0 to 1, indicating whether or not the
proposed bounding box contains an object (score of 1) or background (score of 0). The proposed
bounding boxes are referred to as region proposals.
The network is trained end-to-end for both classification and regressor layers by mini-batch
sampling. Each mini-batch comes from a single image with several positive and negative anchors.
The loss function of a mini-batch is calculated for each image by randomly sampling 256 anchors,
where at least 128 anchors are positive in order to reduce negative bias toward the loss function.
Both the IoU percentage and the final classification score for each region proposal are considered
when calculating the loss values. All layers are randomly initialized from a zero-mean Gaussian
distribution with a standard deviation of 0.01. Shared convolutional layers are initialized by pre-
training a model for ImageNet classification.
2.2.2. Fast R-CNN
The Fast R-CNN network takes as input the set of precomputed region proposals output from the
RPN method and an entire image, as shown in Figure 2. These proposals are then combined with
the original feature map output from the initial CNN, resulting in regions of interest (RoI). A fixed-
length feature vector is extracted from the resulting feature map by a RoI pooling layer for each
object proposal. These vectors are used as input into fully connected layers, followed then by
regressor and softmax layers, outputting the location and classification of bounding boxes.
Therefore, through this process, the Fast R-CNN provides more fine-tuned information on
bounding boxes and their classification through fully connected layers of the Fast R-CNN.
2.2.3. ZF-Net for Faster-RCNN
Ren et al. [15] investigated two architecture models when implementing the Faster R-CNN
methodology: the Zeiler and Fergus model, also known as ZF-Net [24] and the Simonyan and
Zisserman model, also known as VGG-16 [25]. Comparing both models, ZF-Net has the fastest
training and testing speed. The ZF-Net was first introduced as comprising five convolutional layers
(CONV), two fully connected layers (FC), and a softmax layer for output, winning the Large-scale
Visual Recognition Challenge 2013 (ILSVRC2013). The original ZF-Net was modified by the
authors to implement Faster R-CNN. The new architectures are described in Tables A.1 and A.2
Automation in Construction, 2019 (99), 114-124
7
for both RPN and Fast R-CNN stages, as shown in Figures 3 and 4. To share computational power,
the RPN and Fast R-CNN stages share the outputs from layers 1 through 9.
Figure 3. Modified ZF-Net for RPN
Figure 4. Modified ZF-Net for Fast R-CNN
To allow RPN, the authors replaced the last max-pooling layer and fully connected layer
with a sliding convolutional layer, which is followed by a fully connected layer. The softmax layer
is replaced by a softmax and regressor layer. The convolutional layers are followed by an ReLU
activation function in order to provide non-linearity to the model [27], allowing each sliding
window to be mapped to a lower-dimensional feature, and feeding it into the softmax and regressor
layer. For Fast R-CNN, the last max pooling layer of the original network is replaced by an RoI
pooling layer. Dropout layers were also added between the first three fully connected layers to
prevent overfitting of the model. As the with RPN model, the softmax layer was replaced by a
softmax and regressor layer.
2.3. Surface segmentation
To quantify the volume of the spalling, the standard surface of the detected spalling should be
identified. A plane fitting implementation using the random sample consensus (RANSAC)
algorithm is performed [19]. The RANSAC algorithm consists of a random selection of points to
fit in a plane according to predefined parameters, such as a reference vector, a distance threshold,
and the maximum angular distance between each point’s normal and the reference vector. The
Automation in Construction, 2019 (99), 114-124
8
algorithm fits a plane to the greatest number of similar data points according to the assigned
threshold values. However, concrete elements, such as columns and beams, can often be found
offset from other elements, such as walls or slabs, and thus can result in a wrong fitting surface for
the algorithm since these slab or wall elements usually possess surface areas significantly larger
than common structural elements. To overcome this issue, the plane of the bounding box generated
by the Faster R-CNN is used for the RANSAC algorithm to fit a plane.
In the presented scenarios, a reference vector parallel to the sensor’s normal is adopted
along with a maximum angular distance. This eliminates the requirement for the sensor to be
place perfectly parallel to the surface plane. A distance threshold of 5 mm was also used as a
parameter for the RANSAC function. Every data point that does not satisfy the assigned thresholds
is considered an outlier.
The reviewed literature identified the desired damaged surfaces based on the fixed, initial
parameters acquired from external measurements of the implemented setup. Having a fixed-
distance system is not practical for mobile, unmanned applications. Furthermore, the reviewed
methodologies did not consider the possibility of having more than one surface on the frame,
regardless of their integrity state [19-21]. The proposed surface segmentation routine fits a plane
regardless of the distance between the sensor and the damaged surface as long as it is located
within the equipment’s working distance, relying solely on the data output from the sensor, without
the need of any other measurement of any kind.
2.4.Damage segmentation
Analyzing the 3D depth cloud, the outliers of the desired fitted plane are then analyzed to
determine whether or not they are located within the limits of the inliers of the same plane. If so,
the outliers are classified as damage. This process simultaneously filters possible NaN values from
the depth data.
To allow the separate quantification of multiple damage within the same surface, a
hierarchical cluster analysis routine is implemented to segment the outliers as individual damage.
The routine first calculates the Euclidean distance (Eq. 1) between each pair of points and
within the data input, which is, in this case, the outliers, and then proceeds linking the pairs that
are close together into binary clusters. These clusters are linked with others to form larger clusters,
repeatedly, until all the data have been grouped. A maximum number of clusters is predetermined,
and each cluster index is then attributed to its corresponding depth point.
In the presented methodology, a maximum number of 10 clusters was predefined. This
allows the segmentation of up to 10 isolated damage within the same surface. The clustering
routine is key to volume quantification as it segments each damage area based on its depth data
relative to the surface, ensuring that even pixels located outside of the bounding boxes, previously
generated by the Faster R-CNN, are considered for damage quantification.
     
,
(1)
Automation in Construction, 2019 (99), 114-124
9
where i corresponds to each point’s coordinate reference axis.
2.5. Volume quantification
The fitted plane model returns the plane’s normal vector along with the constant coefficients a, b,
c, and d, which make part of the plane equation (Eq. 2). These values can be extracted and used to
create a   (Eq. 3) function to determine the exact depth distance of each point from
the fitted plane, as shown in Figure 5.
              ,
   
,
where  is depth coordinate of the nth pixel.
Figure 5. Volume quantification
The area covered by each pixel varies accordingly to its own depth plane. With closer
distances, pixels can cover an area as small as 1 mm², whereas longer depths can reach values
greater than 35 mm². The coverage area of the nth pixel is determined according to Eq. (4).
     ,
where
= Area of coverage (m2) of the nth pixel
= Constant based on Kinect’s horizontal field of view (0.0024066)
(2)
(4)
(3)
Automation in Construction, 2019 (99), 114-124
10
= Constant based on Kinect’s vertical field of view (0.0024698)
To calculate the volume covered by each pixel, it is necessary to obtain the distance
between the pixel itself and the surface plane of the element. The surface plane is ruled by Eq. (2),
and the depth of each pixel within the plane can be acquired from Eq. (3). Therefore, it is possible
to obtain the value of the depth relative to the element’s surface by subtracting the corresponding
z coordinate of the fitted plane from every pixel’s z coordinate (Eq. 5). Figure 5 facilitates an easy
understanding of this theory.
   ,
where
= Relative depth of the nth pixel
= Depth coordinate of the nth pixel
= Depth coordinate of the corresponding nth pixel on the fitted plane
Since the proposed routine calculates the volume of each and every pixel within the
damaged area, it allows a proper quantification of any volumetric damage, regardless of its
geometric form.
For each segmented damage, the volume of each corresponding pixel is calculated by
multiplying its area of coverage and its relative depth to the fitted plane, and then these are added
together (Eq. 6). The final volume is calculated by the summation of each individual instance of
damage (Eq. 7).
   ,
,
where
di
V
is the calculated volume for the ith individual damage, and
t
V
is the total calculated
volume.
3. Experiments
The first step of this method is training, validation, and testing the Faster R-CNN-based concrete
spalling detection method. Using the detected concrete spalling information, damage
quantification is conducted by the depth cloud information with the method described in section
2.
3.1. Faster R-CNN training, validation, and testing
Faster R-CNN has a significant task in the methodology: damage detection. To allow for spalling
detection, 749 images with a resolution of 2,560
1,440 pixels and an aspect ratio of 16:9 were
collected using a smartphone camera. The camera, aspect ratio, and resolution were chosen in such
(6)
(7)
(5)
Automation in Construction, 2019 (99), 114-124
11
a way that the output images would be similar to those produced by the Kinect RGB sensor. The
images were captured with distances of 0.5 to 2.5 m between the camera and the subjects under
different lighting conditions. The subjects contained artificial damage produced under lab
conditions. The acquired images were then cropped into 1,091 images, with dimensions of 853
1,440 pixels, which comprise the proposed database. The cropped images were annotated
according to the proposed damage type utilizing the free annotation software LabelImg [28]. This
software allows easy image annotation through a user-friendly interface. The program outputs
extensible markup language (XML) files corresponding to the annotated image in the Pascal Visual
Object Classes format. The XML files contain all image information, such as name and size as
well as bounding box coordinates. The labels and bounding boxes for 1,935 objects were produced.
Figure 6 shows examples of annotated images.
Figure 6. Annotated dataset sample
Training, validation, and testing datasets were manually produced to assure that no damage
would be present in both sets simultaneously. A total of 191 images with a resolution of 853
1,440 pixel images were selected to contain the testing set. The remaining 900 images were
randomly selected to compose the training and validation sets, such that the training set contains
600 images and the validation set contains 300 images. Data augmentation was applied. Horizontal
flipping and exposure adjustments were performed on the training and validation image sets, which
was posteriorly used for the training and validation of the network. The final number of images
containing the dataset after augmentation was applied was 3,455, with a total of 5,522 bounding
boxes labeled for spalling.
The experiments were conducted using the open source, pre-trained, Faster R-CNN [15],
MATLAB 2017a, CUDA 8.0, and CUDNN 5.1 on a desktop computer equipped with an Intel Core
i7-6700k 4.2 GHz CPU, 16 GB DDR4 ram memory and an EVGA GTX 1070 FTW with 8 GB of
video ram memory as graphics processing unit (GPU). Deep-learning models require extensive
datasets to be trained accurately, and, furthermore, it is time-consuming. To overcome the
Automation in Construction, 2019 (99), 114-124
12
limitations of having a relatively small dataset, the pre-trained model of Faster R-CNN is used in
this process. The RPN and Fast R-CNN networks are trained with the learning rate set to 0.001,
momentum set to 0.9, and weight decay set to 0.0005, performing 80,000 and 40,000 iterations,
respectively, for the final training.
There is no simple approach for determining the best anchor parameters for the Faster R-
CNN; thus, a trial-and-error approach was used. To determine optimal anchor sizes and ratios for
the training set, 12 combinations of anchor aspect ratios and three combinations of anchor sizes
were analyzed. The network was trained a total of 37 times, of which 36 were trials. To reduce
training time, the number of iterations performed on each training trial was 10 times fewer than
the original number of iterations described previously, resulting in 8,000 and 4,000 iterations for
the RPN and Fast R-CNN, respectively. The combination that resulted in the best AP was selected
and trained with the full number of iterations.
To enhance object detection for small damage, the input images for the training and
validation of the Faster R-CNN were scaled from 853 pixels on the shortest side to 1,000 pixels,
conserving the aspect ratio of the original 853
1,440 pixel input images. The scaling parameter
resizes the training images based on a predefined dimension. The proposed method enlarges the
input images to allow the detection of smaller damage and enhance the capabilities of the network.
The sliding convolutional layer of the RPN was set as having a stride of 16 on the feature map.
The scaling and stride parameters were defined through trial and error.
3.2. Damage segmentation and quantification
To test the proposed method, the damage detection and quantification was carried out under two
different scenarios, using different test subjects. Initially, a polystyrene foam test rig with a total
of eight circular-shaped simulation damage was developed, as shown in Figure 7(c), numbered 1
through 8. The damage dimensions are randomly attributed, and the inner surfaces are heat-treated
to remove roughness. As verified by [27], harsh direct sunlight interferes greatly with the output
point cloud since the sensors acquire depth data through IR imaging. The readings were taken
indoors, under indirect sunlight, to ensure optimum reading conditions.
(a) (b) (c)
Figure 7. (a) Sensor setup, (b) damage on concrete, (c) damage on foam
Automation in Construction, 2019 (99), 114-124
13
A concrete beam 35 cm in height was utilized for the second stage of the experiment, as
shown in Figure 7(ab). Four different damage of random dimensions simulating material spalling
were broken into the member using a jack hammer, as shown in Figure 7(b), numbered 1 through
4. No treatment was performed on the damages inner surfaces. The readings were taken during
the day, indoors, under indirect sunlight and artificial lighting to ensure best data accuracy.
Ground truth volume measurements were performed on both elements. The voids were
filled with tap water and set to rest for one hour to reach surface saturation, then refilled to surface
level. The water was extracted using a syringe, and volumes were measured through a graduated
cylinder. The maximum depth for each damage was taken using a caliper. A steel plate of known
thickness was placed on top of the surfaces to allow proper measurement.
Figure 7(a) illustrates the depth sensor initial setup to acquire data on the concrete element.
The setup procedures were the same as those performed in the first scenario. The sensor was placed
on top of a sturdy tripod to prevent any type of shaking and to allow a proper alignment between
the sensor plane and the element. The sensor was positioned approximately in parallel with the
elements surfaces, aiming directly at their center. No measurement tool was utilized to assure
alignment. The depth frames were taken at distances varying from 1.0 m to 2.5 m, with 0.25 m
increments, totaling seven reading distances.
4. Results and analysis
Using the trained Faster R-CNN-based spalling detection method, the experiments described in
section 3 were analyzed.
4.1. Faster R-CNN
The training of the network was performed using the four-step strategy. The training time on GPU-
mode is approximately 14 hours due to the resolution of the training set. Smaller resolutions lead
to faster training.
(a) (b) (c)
Figure 8. Detected volumetric losses on outdoor concrete elements
Automation in Construction, 2019 (99), 114-124
14
The network requires 0.08 seconds in GPU mode to evaluate each 853
1,440-pixel image
and 2.80 seconds in CPU-mode. The highest AP amongst all trainings was 90.79%. For such
precision, the anchor sizes and anchor aspect ratios were found to be 128, 480, and 880, and 0.5,
.85, and 1.85, respectively. Testing the final model with new images yielded acceptable results.
The output of the network on real-life damage is illustrated in Figure 8(ac). Confidence levels on
the presented cases ranged from 60.5% in Figure 8(a) to 85.9% in Figure 8(c).
When using Figure 7(b) as input, the network provided accurate detections, as shown in
Figure 9. Bounding boxes were drawn within an acceptable range past damage limits, and
confidence levels were strictly high, with 100% certainty.
Figure 9. Detected volumetric losses on laboratory specimen
4.2. Damage quantification
A total of three readings were taken for each of the seven distances at each scenario, and the
average of all three volume calculations was taken into consideration. The algorithm is able to
identify the proposed surfaces and to segment and quantify the damage, regardless of their size,
depth, and the distance between the sensor and the element. The bounding boxes indicate the
presence of damage within the concrete surface and are used to segment such surface. When in
possession of the surface coordinates, it is possible to extract pixels corresponding to damage.
Damage pixels that are outside of the bounding box but are within the surface limits are considered
for volume quantification, as mentioned in section 2.4.
The volume (V) calculation results for the polystyrene test rig are listed in Table 2. The
relative error for total volume ranged from 1.49% for the closest distance to 13.83% for the farthest
distance, and the mean precision error (MPE) in the volume calculation of all individual damage
considering all distances was 14.9%.
Automation in Construction, 2019 (99), 114-124
15
Despite the variance, the errors within the range were consistent, varying by small amounts
when increasing the distance between the sensor and the element. Other than the smallest working
distance of 100 cm, the data collected at 200 cm from the test subject resulted in the best accuracy.
One potential cause of such results is the sharp edges of the carvings on the test rig. At shorter
distances, some projected IR light can be blocked by the carving edges as the IR projector of the
Kinect V2 was not aligned with the center of each carving individually but with the center of the
test rig itself.
In addition, the smallest damage (2 and 4 in Figure 7(c)) of the test rig presented the highest
mean precision errors of 30.42% and 25.57%, respectively, reaching up to 49% at the farthest
distance. Removing these damage from the total MPE results in a reduction of 4.37%. Analyzing
the closest working distance of 100 cm, the obtained MPE was 9.22% considering all damage and
7.01% when both damage 2 and 4 were not taken into consideration due to a large amount of error.
These values are greater than the 5.47% MPE obtained from [21], but the tested material needs to
be taken into consideration since the authors used a concrete test rig. The reflectance of the IR
light depends on the material that it is being reflected off of [29], therefore the test rig being of
distinct materials could account for most of the difference in the observed error rates. Other than
this, the MPEs of individual damage, excluding those from damage 2 and 4, were all smaller than
the error observed in [19].
Table 2. Test rig at distances from 100 to 250 cm
Damage
1
2
3
4
5
6
7
8
Total
Ground Truth V (cm³)
68
17
99
10
61
371
210
130
966
Damage
100
(cm)
Calc. V (cm³)
64
14.8
104.5
7.9
52.27
351.7
222.7
133.7
951.57
Error (%)
5.88
12.94
5.56
21.00
14.31
5.20
6.05
2.85
1.49
125
Calc. V (cm³)
62.2
14.2
97.9
7.4
48
343.6
202.6
111.7
887.6
Error (%)
8.53
16.47
1.11
26.00
21.31
7.39
3.52
14.08
8.12
150
Calc. V (cm³)
68.6
7.9
99.2
7.7
47.8
336.9
199.9
103.1
871.1
Error (%)
0.88
53.53
0.20
23.00
21.64
9.19
4.81
20.69
9.82
175
Calc. V (cm³)
78
12.7
113.3
11.4
59.6
331.4
183.8
100.4
890.6
Error (%)
14.71
25.29
14.44
14.00
2.30
10.67
12.48
22.77
7.81
200
Calc. V (cm³)
75.3
19.4
104.9
8.2
57.4
334.8
185.2
114.5
899.7
Error (%)
10.74
14.12
5.96
18.00
5.90
9.76
11.81
11.92
6.86
225
Calc. V (cm³)
58.3
9.7
91.9
5.2
46.2
307
188.3
121.8
828.4
Error (%)
14.26
42.94
7.17
48.00
24.26
17.25
10.33
6.31
14.24
250
Calc. V (cm³)
63.3
7.9
85.9
3.1
36.8
319.7
177.3
102.9
796.9
Error (%)
6.91
53.53
13.23
69.00
39.67
13.83
15.57
20.85
17.51
Automation in Construction, 2019 (99), 114-124
16
Figures 10 through 12 were obtained by processing the depth frame taken at a distance of
100 cm with the proposed methodology. Figure 10 illustrates the depth cloud corresponding to the
test rig after noise reduction. The test rig was offset from a wall (identified by the yellow points).
As described in section 2.1.1, only the center 300 × 300 pixels were taken into consideration. The
extracted surface of the test rig is illustrated in Figure 11. Even in the presence of a second plane
surface, the algorithm correctly detected the test rig as the desired surface. The segmented damage
can be viewed in Figure 12 in both 2D(a) and 3D(b).
Figure 10. Foam denoised depth cloud Figure 11. Foam extracted surface
(a) (b)
Figure 12. Foam damage segmentation: (a) front view, (b) 3D view
The volume calculations for the concrete beam are listed in Table 3. Similar to the previous
scenario, extreme error readings over total volume were observed for the extreme working
distances, ranging from 5.82% to 13.78%. The observed MPE in the volume calculation of all
individual damage considering all distances was 9.45%. The surface roughness of the concrete
Automation in Construction, 2019 (99), 114-124
17
damage could be responsible for the significant error difference between the values from the test
rig at short distances. The most accurate volume calculation other than that at the closest distance
was obtained at a distance of 225 cm, similar to the test rig. All damage showed similar individual
MPEs across the range, and size did not exert any influence on the accuracy. The total MPE of
9.45% is 37% smaller than the 15% error rate presented in [20]. In addition, since both the beam
and the test rig utilized by [21] are made of concrete, it is possible to compare both results directly.
The MPE of 5.28% obtained through the experiment at the shortest distance is smaller than the
5.47% error rate acquired by the authors, even with a farther minimum working distance. This
demonstrates the superior capabilities of the proposed methodology over those of previous studies.
Figures 13 through 15 were obtained by processing the depth frame of the concrete beam,
taken at a distance of 100 cm. Figure 13 illustrates the portion of the depth cloud acquired by the
sensor after noise reduction, as described in section 2.1.2. The entire frame covers an area of
approximately 6 m wide and 3 m tall. The beam surface is the dark-blue region at the bottom of
Figure 13, centered at coordinates (0.0, 0.0). As described in section 2.1.1, only the center 300 ×
300 pixels were taken into consideration. The extracted surface of the concrete element is shown
in Figure 14. The segmented damage can be viewed in Figure 15 in both 2D(a) and 3D(b).
Figure 13. Concrete beam denoised depth cloud
Automation in Construction, 2019 (99), 114-124
18
Figure 14. Concrete beam extracted surface
(a) (b)
Figure 15. Concrete beam damage segmentation: (a) front view, (b) 3D view
Table 3. Concrete beam at distances from 175 to 250 cm
Damage
1
2
3
4
Total
Ground Truth V (cm³)
503.33
429.33
114.67
165.67
1213
Distance (cm)
100
Calc. Volume (cm³)
473.6
383
118.9
166.9
1142.4
Error (%)
5.91
10.79
3.69
0.74
5.82
125
Calc. Volume (cm³)
495
402
106.5
147.6
1151.1
Error (%)
1.65
6.37
7.12
10.91
5.10
150
Calc. Volume (cm³)
471.7
402.9
90.8
133.4
1098.8
Error (%)
6.28
6.16
20.82
19.48
9.41
175
Calc. Volume (cm³)
440.6
357.3
92.7
131.2
1021.8
Error (%)
12.46
16.78
19.16
20.81
15.76
200
Calc. Volume (cm³)
432.5
330.9
114.3
157.8
1035.5
Error (%)
14.07
22.93
0.32
4.75
14.63
225
Calc. Volume (cm³)
453
369.9
104.8
162.5
1090.2
Error (%)
10.00
13.84
8.61
1.91
10.12
250
Calc. Volume (cm³)
432.6
361.3
101.7
150.2
1045.8
Error (%)
14.05
15.85
11.31
9.34
13.78
Automation in Construction, 2019 (99), 114-124
19
Maximum depth values for each set of damage on each frame were also taken. Table 4 lists
the obtained values along with their respective ground truth for each scenario. The sensor sustained
a considerable accuracy of less than 10% for maximum depth measurements on all tests.
Considering the shortest distance of 100 cm, a relative error of 1.15% was obtained, whereas [21]
presented an accuracy of 2.58% for maximum depth measurements on the concrete test rig at a
distance of 80 cm. When compared with the MPE of 8% obtained from [22], the maximum depth
measurements of the proposed methodology on the concrete beam show a better result even at the
farthest distance of 250 cm.
Table 4. Maximum depth measurements
Foam
Concrete
Distance
(cm)
Ground
(cm)
Measured (cm)
Error
(%)
Distance
(cm)
Ground
(cm)
Measured
(cm)
Error (%)
100
3.79
3.64
3.96
100
5.2
5.26
1.15
125
3.79
3.67
3.17
125
5.2
5.19
0.19
150
3.79
3.63
4.22
150
5.2
5.07
2.50
175
3.79
3.58
5.54
175
5.2
5.06
2.69
200
3.79
3.52
7.12
200
5.2
5.03
3.27
225
3.79
3.46
8.71
225
5.2
4.92
5.38
250
3.79
3.42
9.76
250
5.2
4.81
7.50
MPE = 6.07%
MPE = 3.24%
Some observed errors were higher on the test rig than on the concrete beam, possibly due
to the reflectance capabilities of the material, as mentioned before. Furthermore, the shape and
position of the damage could impact on the accuracy of the measurements as well. The fact that
the IR projector was not aligned with the horizontal centerline of each individual row of damage
but with the one between both rows, as well as the sharp edge of the damage, could have had a
significant impact on the IR distribution within the inner surface, causing small shadowing below
the top and bottom areas of some of the damage, depending on the working distance. This is not
observed on the concrete beam since the damage are located along the element following a
centerline. Damage 3 and 4 from the concrete beam have, in fact, different horizontal centerlines,
but they are not far from each other, and the inward deflection from the edges is not steep, hence
the neglected effect.
5. Conclusions
Concrete spalling is one of the most common types of structural damage and should be quantified
in terms of volume for accurate damage assessment. The traditional method of measuring the
volume of concrete spalling is to use a depth camera, 3D scanner, etc. However, to automatically
quantify concrete spalling in multiple locations separately within the same structural surface or
multiple different surfaces simultaneously, the first critical step is to identify the concrete spalling
and its standard structural surface. In order to realize this, this paper proposed the Faster R-CNN-
Automation in Construction, 2019 (99), 114-124
20
based concrete spalling damage detection method integrated with an inexpensive depth camera for
damage quantification.
The Faster R-CNN provides the multiple locations of the concrete spalling in the same
structural surfaces or multiple different structural surfaces. Based on the location and depth
information on the detected concrete spalling, the standard structural surfaces were identified and
segmented using the RANSAC algorithm. The differences between the standard surface and
detected concrete spalling in terms of depth information are the calculated volumes of the spalling.
With this approach, the proposed methodology identifies and quantifies volumes of multiple
surface concrete spalling regardless of the distance between the depth sensor and the element,
relying solely on the outputs from the RGB-D sensor.
In this paper, we have tested the capabilities of the Microsoft Kinect V2 RGB-D camera as
a tool for detecting and quantifying volumetric damage along a plane concrete surface in
conjunction with a Faster R-CNN-based damage detection mechanism. The data provided by the
sensor allows the easy extraction of geometric properties with considerable accuracy. When
compared to other volume quantification tools, such as laser scanners, the Kinect V2 is much less
expensive. Stereo photogrammetry systems can be reliable as well, but the RGB-D camera outputs
the depth map immediately, requiring no further scaling method of any sort, allowing
instantaneous data manipulation.
The newly proposed methodology proves useful when ideal setup conditions are not met
and a more versatile approach is needed. By coupling the Faster R-CNN damage detection
mechanism with the plane fitting method, which works under any distance within the sensor range,
the system can be implemented on unmanned vehicles, allowing remote data acquisition in places
of difficult or hazardous access. With an AP value of 90.79% for damage detection by the Faster
R-CNN-based stage and mean precision error values below 10% for individual volume
measurements, the presented methodology proves its capabilities, providing a reliable damage
detection and quantification system for structural health monitoring. Moreover, this proposed
method can be a prototype of automatic concrete spalling damage detection and quantification in
terms of volume with an inexpensive depth sensor. Any advanced type of depth camera can be
applicable with the proposed method for better quantification accuracy.
References
[1] Sohn, H., Farrar, C.R., Hemez, F.M., Shunk, D.D., Stinemates, D.W., Nadler, B.R. and Czarnecki, J.J., 2003. A
review of structural health monitoring literature: 19962001. Los Alamos National Laboratory, USA.
[2] Farrar, C.R. and Worden, K., 2010. An introduction to structural health monitoring. In New Trends in Vibration
Based Structural Health Monitoring (pp. 1-17). Springer, Vienna.
[3] Brownjohn, J.M., 2007. Structural health monitoring of civil infrastructure. Philosophical Transactions of the
Royal Society of London A: Mathematical, Physical and Engineering Sciences, 365(1851), pp.589-622.
[4] Zhang, J., Guo, S.L., Wu, Z.S. and Zhang, Q.Q., 2015. Structural identification and damage detection through
long-gauge strain measurements. Engineering Structures, 99, pp.173-183.
[5] Lynch, J.P. and Loh, K.J., 2006. A summary review of wireless sensors and sensor networks for structural health
monitoring. Shock and Vibration Digest, 38(2), pp.91-130.
Automation in Construction, 2019 (99), 114-124
21
[6] Friswell, M.I. and Penny, J.E., 2002. Crack modeling for structural health monitoring. Structural health
monitoring, 1(2), pp.139-148.
[7] Malesa, M., Szczepanek, D., Kujawińska, M., Świercz, A. and Kołakowski, P., 2010. Monitoring of civil
engineering structures using digital image correlation technique. In EPJ Web of Conferences (Vol. 6, p. 31014). EDP
Sciences.
[8] Chen, J.G., Wadhwa, N., Cha, Y.J., Durand, F., Freeman, W.T. and Buyukozturk, O., 2015. Modal identification
of simple structures with high-speed video using motion magnification. Journal of Sound and Vibration, 345, pp.58-
71.
[9] Choi, S. and Shah, S.P., 1997. Measurement of deformations on concrete subjected to compression using image
correlation. Experimental Mechanics, 37(3), pp.307-313.
[10] Cha, Y.J., Choi, W. and Büyüköztürk, O., 2017. Deep learning‐based crack damage detection using convolutional
neural networks. Computer‐Aided Civil and Infrastructure Engineering, 32(5), pp.361-378.
[11] Cha, Y.J., You, K. and Choi, W., 2016. Vision-based detection of loosened bolts using the Hough transform and
support vector machines. Automation in Construction, 71, pp.181-188.
[12] Cha, Y.J., Chen, J.G. and Büyüköztürk, O., 2017. Output-only computer vision based damage detection using
phase-based optical flow and unscented Kalman filters. Engineering Structures, 132, pp.300-313.
[13] Chen, P.H., Shen, H.K., Lei, C.Y. and Chang, L.M., 2012. Support-vector-machine-based method for automated
steel bridge rust assessment. Automation in Construction, 23, pp.9-19.
[14] Girshick, R., 2015. Fast r-cnn. In Proceedings of the IEEE international conference on computer vision (pp.
1440-1448).
[15] Ren, S., He, K., Girshick, R. and Sun, J., 2017. Faster R-CNN: towards real-time object detection with region
proposal networks. IEEE Transactions on Pattern Analysis & Machine Intelligence, (6), pp.1137-1149.
[16] Cha, Y.J., Choi, W., Suh, G., Mahmoudkhani, S. and Büyüköztürk, O., 2018. Autonomous structural visual
inspection using region‐based deep learning for detecting multiple damage types. Computer‐Aided Civil and
Infrastructure Engineering, 33(9), pp.731-747.
[17] Snavely, N., 2011. Scene reconstruction and visualization from internet photo collections: A survey. IPSJ
Transactions on Computer Vision and Applications, 3, pp.44-66.
[18] Torok, M.M., Golparvar-Fard, M. and Kochersberger, K.B., 2013. Image-based automated 3D crack detection
for post-disaster building assessment. Journal of Computing in Civil Engineering, 28(5), p.A4014004.
[19] Jahanshahi, M.R., Jazizadeh, F., Masri, S.F. and Becerik-Gerber, B., 2012. Unsupervised approach for
autonomous pavement-defect detection and quantification using an inexpensive depth sensor. Journal of Computing
in Civil Engineering, 27(6), pp.743-754.
[20] Moazzam, I., Kamal, K., Mathavan, S., Usman, S. and Rahman, M., 2013, October. Metrology and visualization
of potholes using the microsoft kinect sensor. In Intelligent Transportation Systems-(ITSC), 2013 16th International
IEEE Conference on (pp. 1284-1291). IEEE.
[21] Kamal, K., Mathavan, S., Zafar, T., Moazzam, I., Ali, A., Ahmad, S.U. and Rahman, M., 2018. Performance
assessment of kinect as a sensor for pothole imaging and metrology. International Journal of Pavement
Engineering, 19(7), pp.565-576.
[22] Yuan, C. and Cai, H., 2014. Automatic detection of pavement surface defects using consumer depth camera.
In Construction Research Congress 2014: Construction in a Global Network (pp. 974-983).
[23] Beckman Gomes, G.H. 2018. Deep learning-based volumetric damage quantification using an inexpensive depth
camera. MSc thesis, University of Mantoba, Winnipeg, Canada.
[24] Zeiler, M.D. and Fergus, R., 2014, September. Visualizing and understanding convolutional networks.
In European Conference on Computer Vision (pp. 818-833). Springer, Cham.
[25] Simonyan, K. and Zisserman, A., 2014. Very deep convolutional networks for large-scale image
recognition. arXiv preprint arXiv:1409.1556.
[26] Nair, V. and Hinton, G.E., 2010. Rectified linear units improve restricted boltzmann machines. In Proceedings
of the 27th International Conference on Machine Learning (ICML-10) (pp. 807-814).
Automation in Construction, 2019 (99), 114-124
22
[27] Steward, J., Lichti, D., Chow, J., Ferber, R. and Osis, S., 2015. Performance assessment and calibration of the
Kinect 2.0 time-of-flight range camera for use in motion capture applications. FIG Working week 2015, pp.1-14.
[28] Tzutalin. LabelImg. Git code (2015). Retrieved February 9, 2018, from https://github.com/tzutalin/labelImg
[29] Mangold, K., Shaw, J.A. and Vollmer, M., 2013. The physics of near-infrared photography. European Journal
of Physics, 34(6), p.S51.
Appendix
Table A.1 ZF-Net architecture for RPN
Layer
Type
Filter Size
Stride
Depth
1
CONV+ReLU
7 × 7
2
96
2
LRN
-
-
-
3
Max pooling
3 × 3
2
96
4
CONV+ReLU
5 × 5
2
256
5
LRN
-
-
-
6
Max pooling
3 × 3
2
256
7
CONV+ReLU
3 × 3
1
384
8
CONV+ReLU
3 × 3
1
384
9
CONV+ReLU
3 × 3
1
256
10
Sliding CONV+ReLU
3 × 3
1
256
11
FC
-
-
256
12
Softmax & Regressor
-
-
-
Table A.2 ZF-Net architecture for Fast R-CNN
Layer
Type
Filter Size
Stride
Depth
1
CONV+ReLU
7 × 7
2
96
2
LRN
-
-
-
3
Max pooling
3 × 3
2
96
4
CONV+ReLU
5 × 5
2
256
5
LRN
-
-
-
6
Max pooling
3 × 3
2
256
7
CONV+ReLU
3 × 3
1
384
8
CONV+ReLU
3 × 3
1
384
9
CONV+ReLU
3 × 3
1
256
10
RoI pooling
-
-
256
11
FC+ReLU
-
-
4096
12
Dropout
-
-
-
13
FC+ReLU
-
-
4096
14
Dropout
-
-
-
15
FC+ReLU
-
-
6
16
Softmax & Regressor
-
-
-
... However, the recognition accuracy was not satisfactory in some studies, highlighting the limitations of deep learning structures (18,19). Beckman et al. developed an advanced deep learning technique coupled with a structural surface fitting algorithm for automated volumetric damage quantification using a depth camera (20). The model reported an average accuracy of 90.79% and a mean accuracy error of 9.45%. ...
... The main advantage of deep learning methods lies in their remarkable capability to automatically learn hierarchical representative features that are robust to background noises, translation, and distortion of target objects without human intervention (20)(21)(22)(23)(24). The application of deep learning methods for detecting roadway problems through GPR data holds immense promise. ...
Article
This research introduces an innovative method for detecting subsurface cracks within pavements by leveraging ground penetrating radar (GPR) technology in conjunction with advanced deep learning techniques. Its primary aim is to significantly improve the accuracy and efficiency of pavement assessment, particularly for operational and maintenance purposes. The proposed model, GPR-YOLOR (You Only Learn One Representation), extends the YOLOR framework and incorporates a region of interest within the top pavement layer to detect subsurface cracks. While the model can be trained with annotated data, the main challenge lies in validating results in the field because of the inability to visually inspect subsurface conditions and the high cost associated with direct coring. To overcome this challenge, we propose an alternative approach that utilizes the co-occurrence of surface cracks as pseudo labels, allowing for easy verification. To ensure that surface cracks correspond to subsurface cracks, the focus is exclusively on transverse cracks that develop in a bottom-up manner, such as fatigue and reflective cracks. Through this methodology, our GPR-YOLOR model achieves an F1 score of 0.72, with a precision of 0.76 and a recall of 0.68. The results from field validation underscore the effectiveness of the GPR-YOLOR model in accurately identifying subsurface cracks, highlighting its practical significance in conducting field condition assessments.
... Nevertheless, the FOS-based method is considered not cost-effective due to the necessity of capturing reflected optical waves with extremely short wavelengths using precise and expensive interrogators [18]. Computer vision and image processing are emerging as a promising and innovative solution for monitoring civil structures, demonstrating effectiveness in identifying visible issues such as cracks, spalling, and corrosion on the surfaces of structures [19][20][21]. However, this technology faces limitations in detecting invisible progressive damages such as strand relaxation in prestressed anchorages. ...
Article
Full-text available
Structural damage in the steel bridge anchorage, if not diagnosed early, could pose a severe risk of structural collapse. Previous studies have mainly focused on diagnosing prestress loss as a specific type of damage. This study is among the first for the automated identification of multiple types of anchorage damage, including strand damage and bearing plate damage, using deep learning combined with the EMA (electromechanical admittance) technique. The proposed approach employs the 1D CNN (one-dimensional convolutional neural network) algorithm to autonomously learn optimal features from the raw EMA data without complex transformations. The proposed approach is validated using the raw EMA response of a steel bridge anchorage specimen, which contains substantial nonlinearities in damage characteristics. A K-fold cross-validation approach is used to secure a rigorous performance evaluation and generalization across different scenarios. The method demonstrates superior performance compared to established 1D CNN models in assessing multiple damage types in the anchorage specimen, offering a potential alternative paradigm for data-driven damage identification in steel bridge anchorages.
... Task Algorithms Kadarla et al. [30] Crack propagation using video footage CNN Vundekode et al. [33] Identifying three veritas of damages ANN Jery Hola et al. [35] Assessment of compressive strength ANN Manish et al. [36] Compressive strength prediction ANN Younq et al. [37] Concrete compressive strength ANN Gustavo H. Beckman et al. [38] Concrete spalling damage detection and quantification CNN Keunyoung Jang et al. [39] Detection of micro and macro concrete cracks CNN FuTao Ni et al. [40] Concrete thin crack identification and width measurement CNN AT Huynh et al. [41] Prediction of compressive strength ANN, DNN Narazaki Y et al. [42] Damage detecting and quantification CNN Asteris P et al. [43] Concrete compressive strength ANN conditions as shown in Table 2. Surface images may be tainted with noise, shadow, dust, or excess brightness, necessitating more robust and sophisticated categorization systems. Several researches have tackled similar practical issues, depending on the uses. ...
Conference Paper
Full-text available
Recent advancements in sensor technology, as well as fast progress in internet-based cloud computation; data-driven approaches in structural health monitoring (SHM) are gaining prominence. The majority of time is utilized for reviewing & analyzing the data received from various sensors deployed in structures. This data analysis helps in understating the structural stability and its current state with certain limitations. Considering this fact, integration with Machine Learning (ML) in SHM has attracted significant attention among researchers. This paper is principally aimed at understanding and reviewing of vast literature available in sensor-based data-driven approaches using ML. The implementation and methodology of vibration-based, vision-based monitoring, along with some of the ML algorithms used for SHM are discussed. Nevertheless, a perspective on the importance of data-driven SHM in the future is also presented. Conclusions are drawn from the review discuss the prospects and potential limitations of ML approaches in data-driven SHM applications.
... These can be regular cameras placed at new angles and positions or depth cameras that generate not only visual information but also distance data. For instance, Beckman et al. [32] proposed a solution for crack detection using Faster R-CNN with a sensitivity detection network using such cameras. ...
Article
Full-text available
Efficient damage detection of trailers is essential for improving processes at inland intermodal terminals. This paper presents an automated damage detection (ADD) algorithm for trailers utilizing ensemble learning based on YOLOv8 and RetinaNet networks. The algorithm achieves 88.33% accuracy and an 81.08% F1-score on the real-life trailer damage dataset by leveraging the strengths of each object detection model. YOLOv8 is trained explicitly for detecting belt damage, while RetinaNet handles detecting other damage types and is used for cropping trailers from images. These one-stage detectors outperformed the two-stage Faster R-CNN in all tested tasks within this research. Furthermore, the algorithm incorporates slice-aided hyper inference, which significantly contributes to the efficient processing of high-resolution trailer images. Integrating the proposed ADD solution into terminal operating systems allows a substantial workload reduction at the ingate of intermodal terminals and supports, therefore, more sustainable transportation solutions.
... Several researchers have developed techniques for damage detection based on deep learning. Beckman et al. [1] has detected concrete spalling by using a depth camera and a ZF-Net model for a faster region-based convolutional neural network. Hoang et al. [2] proposed a computer vision-based approach that utilizes jellyfish searchoptimized support vector classification to classify deep and shallow spalling images. ...
Article
Quality assurance and maintenance play a crucial role in engineering construction, as they have a significant impact on project safety. One common issue in concrete structures is the presence of defects. To enhance the automation level of concrete defect repairs, this study proposes a computer vision-based robotic system, which is based on three-dimensional (3D) printing technology to repair defects. This system integrates multiple sensors such as light detection and ranging (LiDAR) and camera. LiDAR is utilized to model concrete pipelines and obtain geometric parameters regarding their appearance. Additionally, a convolutional neural network (CNN) is employed with a depth camera to locate defects in concrete structures. Furthermore, a method for coordinate transformation is presented to convert the obtained coordinates into executable ones for a robotic arm. Finally, the feasibility of this concrete defect repair method is validated through simulation and experiments.
Article
Weathering effects caused by physical, chemical, or biological processes result in visible damages that alter the appearance of stones’ surfaces. Consequently, weathered stone monuments can offer a distorted perception of the artworks to the point of making their interpretation misleading. Being able to detect and monitor decay is crucial for restorers and curators to perform important tasks such as identifying missing parts, assessing the preservation state, or evaluating curating strategies. Decay mapping, the process of identifying weathered zones of artworks, is essential for preservation and research projects. This is usually carried out by marking the affected parts of the monument on a 2D drawing or picture of it. One of the main problems of this methodology is that it is manual work based only on experts’ observations. This makes the process slow and often results in disparities between the mappings of the same monument made by different experts. In this paper, we focus on the weathering effect known as “scaling”, following the ICOMOS ISCS definition. We present a novel technique for detecting, segmenting, and classifying these effects on stone monuments. Our method is user-friendly, requiring minimal user input. By analyzing 3D reconstructed data considering geometry and appearance, the method identifies scaling features and segments weathered regions, classifying them by scaling subtype. It shows improvements over previous approaches and is well-received by experts, representing a significant step towards objective stone decay mapping.
Article
Purpose Cracks are prevalent signs of pavement distress found on highways globally. The use of artificial intelligence (AI) and deep learning (DL) for crack detection is increasingly considered as an optimal solution. Consequently, this paper introduces a novel, fully connected, optimised convolutional neural network (CNN) model using feature selection algorithms for the purpose of detecting cracks in highway pavements. Design/methodology/approach To enhance the accuracy of the CNN model for crack detection, the authors employed a fully connected deep learning layers CNN model along with several optimisation techniques. Specifically, three optimisation algorithms, namely adaptive moment estimation (ADAM), stochastic gradient descent with momentum (SGDM), and RMSProp, were utilised to fine-tune the CNN model and enhance its overall performance. Subsequently, the authors implemented eight feature selection algorithms to further improve the accuracy of the optimised CNN model. These feature selection techniques were thoughtfully selected and systematically applied to identify the most relevant features contributing to crack detection in the given dataset. Finally, the authors subjected the proposed model to testing against seven pre-trained models. Findings The study's results show that the accuracy of the three optimisers (ADAM, SGDM, and RMSProp) with the five deep learning layers model is 97.4%, 98.2%, and 96.09%, respectively. Following this, eight feature selection algorithms were applied to the five deep learning layers to enhance accuracy, with particle swarm optimisation (PSO) achieving the highest F-score at 98.72. The model was then compared with other pre-trained models and exhibited the highest performance. Practical implications With an achieved precision of 98.19% and F-score of 98.72% using PSO, the developed model is highly accurate and effective in detecting and evaluating the condition of cracks in pavements. As a result, the model has the potential to significantly reduce the effort required for crack detection and evaluation. Originality/value The proposed method for enhancing CNN model accuracy in crack detection stands out for its unique combination of optimisation algorithms (ADAM, SGDM, and RMSProp) with systematic application of multiple feature selection techniques to identify relevant crack detection features and comparing results with existing pre-trained models.
Article
Full-text available
Computer vision-based techniques were developed to overcome the limitations of visual inspection by trained human resources and to detect structural damage in images remotely, but most methods detect only specific types of damage, such as concrete or steel cracks. To provide quasi real-time simultaneous detection of multiple types of damages, a Faster Region-based Con-volutional Neural Network (Faster R-CNN)-based structural visual inspection method is proposed. To realize this, a database including 2,366 images (with 500 × 375 pixels) labeled for five types of damages-concrete crack, steel corrosion with two levels (medium and high), bolt corrosion, and steel delamination-is developed. Then, the architecture of the Faster R-CNN is modified, trained, validated, and tested using this database. Results show 90.6%, 83.4%, 82.1%, 98.1%, and 84.7% average precision (AP) ratings for the five damage types, respectively, with a mean AP of 87.8%. The robustness of the trained Faster R-CNN is evaluated and demonstrated using 11 new 6,000 × 4,000-pixel images taken of different structures. Its performance is also compared to that of the traditional CNN-based method. Considering that the proposed method provides a remarkably fast test speed (0.03 seconds per image with 500 × 375 resolution), a frame-* To whom correspondence should be addressed. E-mail: Young. Cha@umanitoba.ca. work for quasi real-time damage detection on video using the trained networks is developed.
Article
Full-text available
A number of image processing techniques (IPTs) have been implemented for detecting civil infrastructure defects to partially replace human-conducted on-site inspections. These IPTs are primarily used to manipulate images to extract defect features, such as cracks in concrete and steel surfaces. However, the extensively varying real-world situations (e.g., lighting and shadow changes) can lead to challenges to the wide adoption of IPTs. To overcome these challenges, this article proposes a vision-based method using a deep architecture of con-volutional neural networks (CNNs) for detecting concrete cracks without calculating the defect features. As CNNs are capable of learning image features automatically , the proposed method works without the conjugation of IPTs for extracting features. The designed CNN is trained on 40 K images of 256 × 256 pixel resolutions and, consequently, records with about 98% accuracy. The trained CNN is combined with a sliding window technique to scan any image size larger than 256 × 256 pixel resolutions. The robustness and adaptability of the proposed approach are tested on 55 images of 5,888 × 3,584 pixel resolutions taken from a different structure which is not used for training and validation processes under various conditions (e.g., strong light spot, shadows , and very thin cracks). Comparative studies are conducted to examine the performance of the proposed CNN using traditional Canny and Sobel edge detection methods. The results show that the proposed method shows
Article
Many contact-sensor-based methods for structural damage detection have been developed. However, these methods have difficulty compensating for environmental effects, such as variation or changes in temperature and humidity, which may lead to false alarms. In order to partially overcome these disadvantages, vision-based approaches have been developed to detect corrosions, cracks, delamination, and voids. However, there are few such approaches for loosened bolts. Therefore, we propose a novel vision-based detection method. Target images of loosened bolts were taken by a smartphone camera. From the images, simple damage-sensitive features, such as the horizontal and vertical lengths of the bolt head, were calculated automatically using the Hough transform and other image processing techniques. A linear support vector machine was trained with the aforementioned features, thereby building a robust classifier capable of automatically differentiating tight bolts from loose bolts. Leave-one-out cross-validation was adapted to analyze the performance of the proposed algorithm. The results highlight the excellent performance of the proposed approach to detecting loosened bolts, and that it can operate in quasi-real-time.
Article
Potholes are one of the key defects that affect the performance of roads and highway networks. Metrological features of a pothole provide useful metrics for road distress measurement and severity analysis. This paper presents a performance analysis of Kinect as a sensor for pothole imaging and metrology. Depth images of paved surfaces are collected from concrete and asphalt roads using this sensor. Three-dimensional (3D) meshes are generated for a variety of pothole configurations in order to visualise and to calculate their different metrological features. The sensor is benchmarked using a test-rig with pothole-like depressions or artificial potholes of known dimensions to evaluate sensor performance under different real-life imaging conditions, such as through the media of clear, muddy and oily water. Error in measurement due to surface roughness is also studied. Another source of error that is due to the presence of foreign objects such as stones and pebbles in the form of negative depth, is also discussed and compensated. Results show that a mean percentage error of 2.58 and 5.47% in depth and volumetric calculations, respectively.
Article
State-of-the-art object detection networks depend on region proposal algorithms to hypothesize object locations. Advances like SPPnet and Fast R-CNN have reduced the running time of these detection networks, exposing region proposal computation as a bottleneck. In this work, we introduce a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals. An RPN is a fully-convolutional network that simultaneously predicts object bounds and objectness scores at each position. RPNs are trained end-to-end to generate high-quality region proposals, which are used by Fast R-CNN for detection. With a simple alternating optimization, RPN and Fast R-CNN can be trained to share convolutional features. For the very deep VGG-16 model, our detection system has a frame rate of 5fps (including all steps) on a GPU, while achieving state-of-the-art object detection accuracy on PASCAL VOC 2007 (73.2% mAP) and 2012 (70.4% mAP) using 300 proposals per image. The code will be released.