ArticlePDF Available

Deep learning-based automatic volumetric damage quantification using depth camera

December 2018
Automation in Construction 99:114-124

December 2018
99:114-124

DOI:10.1016/j.autcon.2018.12.006

Authors:

Gustavo H. Beckman

University of Manitoba

Dimos Polyzois

University of Manitoba

Youngjin Cha

University of Manitoba

A depth camera or 3-dimensional scanner was used as a sensor for traditional methods to quantify the identified concrete spalling damage in terms of volume. However, to quantify the concrete spalling damage automatically, the first step is to detect (i.e., identify) the concrete spalling. The multiple spots of spalling can be possible within a single structural element or in multiple structural elements. However, there is, as of yet, no method to detect concrete spalling automatically using deep learning methods. Therefore, in this paper, a faster region-based convolutional neural network (Faster R-CNN)-based concrete spalling damage detection method is proposed with an inexpensive depth sensor to quantify multiple instances of spalling simultaneously in the same surface separately and consider multiple surfaces in structural elements. A database composed of 1091 images (with 853 × 1440 pixels) labeled for volumetric damage is developed, and the deep learning network is then modified, trained, and validated using the proposed database. The damage quantification is automatically performed by processing the depth data, identifying surfaces, and isolating the damage after merging the output from the Faster R-CNN with the depth stream of the sensor. The trained Faster R-CNN presented an average precision (AP) of 90.79%. Volume quantifications show a mean precision error (MPE) of 9.45% when considering distances from 100 cm to 250 cm between the element and the sensor. Also, an MPE of 3.24% was obtained for maximum damage depth measurements across the same distance range.

Microsoft Kinect V2 characteristics.

…

Algorithm flow chart 2.1 RGB-D sensor

…

Faster R-CNN-based damage detection

…

Concrete beam at distances from 175 to 250 cm.

…

Modified ZF-Net for RPN

…

Figures - uploaded by Youngjin Cha

Content may be subject to copyright.

Content uploaded by Youngjin Cha

Content may be subject to copyright.

Automation in Construction, 2019 (99), 114-124

Deep learning-based automatic volumetric damage quantification using a

depth camera

Gustavo H. Beckmana, Dimos Polyzoisb, Young-Jin Chac*

a M.Sc. Student, Department of Civil Engineering, University of Manitoba, Winnipeg, MB, Canada

b Professor, Department of Civil Engineering, University of Manitoba, Winnipeg, MB, Canada

c Assistant Professor, Department of Civil Engineering, University of Manitoba, Winnipeg, Canada,

*Corresponding Author: young.cha@umanitoba.ca

Abstract:

A depth camera or 3-dimensional scanner was used as a sensor for traditional methods to quantify

the identified concrete spalling damage in terms of volume. However, to quantify the concrete

spalling damage automatically, the first step is to detect (i.e., identify) the concrete spalling. The

multiple spots of spalling can be possible within a single structural element or in multiple structural

elements. However, there is, as of yet, no method to detect concrete spalling automatically using

deep learning methods. Therefore, in this paper, a faster region-based convolutional neural

network (Faster R-CNN)-based concrete spalling damage detection method is proposed with an

inexpensive depth sensor to quantify multiple instances of spalling simultaneously in the same

surface separately and consider multiple surfaces in structural elements. A database composed of

1,091 images (with 853



1,440 pixels) labeled for volumetric damage is developed, and the deep

learning network is then modified, trained, and validated using the proposed database. The damage

quantification is automatically performed by processing the depth data, identifying surfaces, and

isolating the damage after merging the output from the Faster R-CNN with the depth stream of the

sensor. The trained Faster R-CNN presented an average precision (AP) of 90.79%. Volume

quantifications show a mean precision error (MPE) of 9.45% when considering distances from 100

cm to 250 cm between the element and the sensor. Also, an MPE of 3.24% was obtained for

maximum damage depth measurements across the same distance range.

Keywords: Convolutional neural network; deep learning; depth sensor; concrete spalling; volume

quantification.

1. Introduction

With the aging of concrete, structural elements in service conditions deteriorate, leading to the

failure of structures. As any other material, concrete has a lifespan and is susceptible to damage,

which can be a result of external agents such as overloading, excessive usage, or simply material

degradation. Structural condition assessments constitute a necessary means by which to evaluate

and keep track of structural capabilities.

To overcome the limitations of visual inspections, which depend on the judgement and

ratings of trained inspectors, the usage of sensors and other data-acquisition techniques along with

Automation in Construction, 2019 (99), 114-124

computational methods can be implemented [1-3]. Strain gauge extensometers [4], accelerometers

[5], and computer simulations [6] are some of the tools that can be used to assess structural integrity

under certain scenarios, identifying or predicting damage. Despite its accuracy, a contact sensor-

based data acquisition network is often expensive and difficult to manipulate, especially when

considering long-term monitoring.

Computer-vision sensors present an alternative for data acquisition. By extracting and

processing data from images, systems can be designed to operate from a distance. Two-

dimensional (2D) imaging and high-speed video have been used to determine parameters such as

displacements [7-8] and surface deformation [9], to detect and quantify potential structural hazards

[10-11], and to acquire the dynamic properties of structures in motion [12].

The detection and quantification of potential structural hazards through 2D imaging can

benefit from machine learning implementations due to the distinct visual characteristics

corresponding to each kind of damage. Chen et al. [13] implemented a support vector machine

(SVM)-based method for corrosion detection on steel bridges. Cha et al. [10] proposed a

convolution neural network (CNN) capable of effectively detecting cracks. Region-based CNNs

have also proven to be a useful tool for fast, accurate object detection and localization [14-15], and

their concept was successfully brought into the structural health monitoring field, enabling the

detection of multiple types of structural damage [16].

With the incorporation of depth into 2D imaging, three-dimensional (3D) analysis became

a technique that has recently been introduced into the scope of some studies regarding structural

health monitoring. Whereas 2D images could lead to the detection and classification of damage

such as cracks, 3D data open opportunities for other measurements to be performed, allowing the

detection and quantification of volumetric damage, such as delamination, spalling, and other

material losses caused by external agents.

There are not many methodologies involving 3D imaging concrete health assessment reported in

the literature, and most of the published articles make use of the structure-from-motion (SfM)

approach [17], where 2D RGB images, in conjunction with scaling parameters and other

properties, are merged together to generate a 3D virtual model. Torok et al. [18] developed an

SfM-based methodology that consisted of data collection from a robotic platform in a post-disaster

scenario, focusing on the 3D reconstruction of the elements, damage recognition, and geometrical

characteristics.

Another approach to obtaining 3D data is to use special cameras that contain all red, green,

and blue channels and, additionally, an extra depth channel (RGB-D), containing both color and

depth sensors on the same device. When considering pavement health assessments, several authors

proposed methodologies to detect and quantify potholes through a 3D point cloud extracted from

inexpensive consumer depth cameras. A model for pothole volume quantification utilizing both

RGB and depth data derived from the Microsoft Kinect V1 sensor was proposed [19-22]. However,

these studies implemented damage quantification methodologies using a depth camera over

damage that were already known by the operator. Moreover, the above literature proposed fixed

depth camera setups, where the distance between the sensor and the object is previously measured

Automation in Construction, 2019 (99), 114-124

and used within the algorithm. This approach restricts their usage in various scenarios and makes

it impractical for them to be used in any automated implementations.

The majority of concrete elements, such as beams and slabs, are flat. Other constructive

elements, such as walls, can also be flat; thus, it is necessary to differentiate concrete surfaces from

other types of flat surfaces that may be present in the analyzed frame. Furthermore, the existing

quantification methods cannot identify spalling or any other type of volumetric damage

independently and will only provide the amount of volumetric loss relative to the analyzed surface.

Therefore, a damage detection and localization method becomes necessary to allow for a fully

automated depth camera damage assessment system.

To overcome the limitations mentioned above, we propose a fully automatic damage detection,

localization, and quantification method via the integration of an inexpensive consumer RGB-D

sensor, Microsoft Kinect V2, and Faster R-CNN to detect damage. The proposed method does not

require any input of premeasured distance between the analyzed element and the RGB-D sensor

and is able to operate in a fully mobile setup. The remainder of this paper is composed of section

2: methodology, section 3: experimental procedures, section 4: results and analysis, and section 5:

conclusion.

2. Methodology

The main objective of this study is to automatically detect and localize and accurately quantify the

amount of concrete spalled from a concrete element using an inexpensive consumer depth camera

through the integration of an advanced region-based deep convolutional neural network (R-CNN)

[23]. Its flowchart is shown in Figure 1. Overall, the algorithm is divided into four steps: 1) RGB-

D sensor-based RGB image and depth data collection; 2) the detection and localization of damage

using a deep, faster R-CNN-based method, providing a bounding box for the detected damage (the

details are explained in section 2.2 and its subsections); 3) the segmentation of the element’s

surface utilizing the bounding box provided from step 2 and point cloud information; and 4) the

segmentation and quantification of the detected damage extracted from the surface provided from

step 3 and point cloud information, as shown in Figure 1.

Figure 1. Algorithm flow chart

2.1 RGB-D sensor

Automation in Construction, 2019 (99), 114-124

The Microsoft Kinect V2 was utilized as a depth camera. Its technical parameters are listed in

Table 1. It provides RGB image data and point cloud data with resolutions of 1,920



1,080 pixels

and 512



424 pixels, respectively. The utilized RGB-D camera put into practice the time-of-flight

(ToF) concept, which consists of a device with both an infrared (IR) emitter and an IR sensor. The

IR emitter projects the infrared light onto the subject, and the light bounces off of the surfaces and

back to the device, where the IR sensor captures the reflected light. The distance between the IR

sensor and the IR emitter is known, thus the device is able to determine 3D coordinates for each

of the sensor’s pixels based on how long the IR light took to travel from the emitter to the sensor.

2.1.1 Point cloud acquisition

MATLAB fully supports Microsoft Kinect V2 interface and provides accessible functions to

acquire raw depth data. The depth device outputs a 3D point mesh, which is already converted into

local coordinates using the sensor’s center pixel as its origin. The sensor’s internal algorithm

assigns not-a-number (NaN) values to depth pixels not acquired properly by the sensor on all three

arrays (x, y, and z). Through these implementations, it is possible to isolate the coordinates of each

point. The overall depth frame is 512 pixels wide and 424 pixels tall.

Due to lens distortion and vignetting, the depth values at the extremes of the frame are

susceptible to greater error rates. The reviewed literature [22] recommends the usage of the central

300



300 pixels of the depth sensor to avoid this issue and work distances within the range of 1.0

m to 2.5 m from the sensor plane to the subject to obtain maximum precision. Based on this, these

same settings were used in data acquisition.

2.1.2 Noise filtering

To reduce the amount of noise on the depth image, a median filter was implemented. The median

filter covered a 5



5 pixel area. The median filter works by replacing each pixel in the depth map

with the median value of its neighbors. This assures that the denoised depth map does not contain

any value that did not previously exist within the dataset.

Table 1. Microsoft Kinect V2 characteristics

Microsoft Kinect V2

RGB camera

1,920



1,080 pixels

Depth camera

512



424 pixels

Max depth distance

4.5 m

Min depth distance

0.5 m

Vertical field of view

60̊

Horizontal field of view

70̊

2.2. Damage detection using Faster R-CNN

To detect and localize the concrete spalling damage on various concrete beams, the Faster R-CNN-

based damage detection method was adopted (Cha et al., 2018). To detect concrete spalling, the

Automation in Construction, 2019 (99), 114-124

Faster R-CNN-based method only uses the RGB image data provided by RGB-D sensor. The

Faster R-CNN was originally developed by Ren et al. [15] to overcome the limitations of

traditional R-CNN [14] and Fast R-CNN [14] for object detection and localization. The original

CNN does not provide any bounding box or localization information on the detected object within

an image, but the R-CNN and Fast R-CNN provide bounding boxes within an image to localize

the detected object. However, the two traditional region-based methods have disadvantages in

computational efficiently and running time in providing the bounding box.

Even though the Fast R-CNN was improved in terms of computational cost and running

time by adopting a selective search method compared to R-CNN, it is still not able to provide real-

time detection and localization of the object. Ren et al. [15] developed a region proposal network

(RPN) to improve the computational cost to localize the detected object. The developed RPN is

integrated to the existing Fast R-CNN by sharing the same CNN architecture as that shown in

Figure 2 to reduce computational cost.

Figure 2. Faster R-CNN-based damage detection

2.2.1. Region proposal network (RPN)

The goal of the RPN is to take an image as input and output region proposals, as shown in Figure

2. Each of the proposals has a score corresponding to the membership of a set of classes against

the background. To reduce computational cost, the Fast R-CNN method and the RPN share a

common set of convolutional layers.

The input image passes through a convolutional network that outputs a set of convolutional

feature maps. Then, a sliding window runs over these generated convolutional feature maps. At

each sliding-window location, multiple region proposals are predicted based on a predefined

number of anchor boxes. An anchor is a reference box with a set of scales and aspect ratios and is

centered at the sliding window in question. The code uses three scales and three aspect rations by

Automation in Construction, 2019 (99), 114-124

default, which results in nine anchors for each sliding position. A positive or negative classification

is computed for each anchor considering the interest-over-union (IoU) between the analyzed

anchor and ground-truth bounding boxes on the image. Anchors with overlaps greater than 70%

are classified as positive (1), whereas overlaps smaller than 30% are classified as negative (0).

Anchors with overlaps between 30% and 70% are not considered for the training objective. The

anchors along with their respective classifications are fed into a sliding convolutional layer,

followed by a rectified linear unit (ReLU) layer and then onto a fully connected layer.

The features output by the fully connected layer are fed into a classification layer (softmax)

and into a regressor layer. The regressor layer determines the bounding boxes for the predictions,

and the classifier outputs a probability score varying from 0 to 1, indicating whether or not the

proposed bounding box contains an object (score of 1) or background (score of 0). The proposed

bounding boxes are referred to as region proposals.

The network is trained end-to-end for both classification and regressor layers by mini-batch

sampling. Each mini-batch comes from a single image with several positive and negative anchors.

The loss function of a mini-batch is calculated for each image by randomly sampling 256 anchors,

where at least 128 anchors are positive in order to reduce negative bias toward the loss function.

Both the IoU percentage and the final classification score for each region proposal are considered

when calculating the loss values. All layers are randomly initialized from a zero-mean Gaussian

distribution with a standard deviation of 0.01. Shared convolutional layers are initialized by pre-

training a model for ImageNet classification.

2.2.2. Fast R-CNN

The Fast R-CNN network takes as input the set of precomputed region proposals output from the

RPN method and an entire image, as shown in Figure 2. These proposals are then combined with

the original feature map output from the initial CNN, resulting in regions of interest (RoI). A fixed-

length feature vector is extracted from the resulting feature map by a RoI pooling layer for each

object proposal. These vectors are used as input into fully connected layers, followed then by

regressor and softmax layers, outputting the location and classification of bounding boxes.

Therefore, through this process, the Fast R-CNN provides more fine-tuned information on

bounding boxes and their classification through fully connected layers of the Fast R-CNN.

2.2.3. ZF-Net for Faster-RCNN

Ren et al. [15] investigated two architecture models when implementing the Faster R-CNN

methodology: the Zeiler and Fergus model, also known as ZF-Net [24] and the Simonyan and

Zisserman model, also known as VGG-16 [25]. Comparing both models, ZF-Net has the fastest

training and testing speed. The ZF-Net was first introduced as comprising five convolutional layers

(CONV), two fully connected layers (FC), and a softmax layer for output, winning the Large-scale

Visual Recognition Challenge 2013 (ILSVRC2013). The original ZF-Net was modified by the

authors to implement Faster R-CNN. The new architectures are described in Tables A.1 and A.2

Automation in Construction, 2019 (99), 114-124

for both RPN and Fast R-CNN stages, as shown in Figures 3 and 4. To share computational power,

the RPN and Fast R-CNN stages share the outputs from layers 1 through 9.

Figure 3. Modified ZF-Net for RPN

Figure 4. Modified ZF-Net for Fast R-CNN

To allow RPN, the authors replaced the last max-pooling layer and fully connected layer

with a sliding convolutional layer, which is followed by a fully connected layer. The softmax layer

is replaced by a softmax and regressor layer. The convolutional layers are followed by an ReLU

activation function in order to provide non-linearity to the model [27], allowing each sliding

window to be mapped to a lower-dimensional feature, and feeding it into the softmax and regressor

layer. For Fast R-CNN, the last max pooling layer of the original network is replaced by an RoI

pooling layer. Dropout layers were also added between the first three fully connected layers to

prevent overfitting of the model. As the with RPN model, the softmax layer was replaced by a

softmax and regressor layer.

2.3. Surface segmentation

To quantify the volume of the spalling, the standard surface of the detected spalling should be

identified. A plane fitting implementation using the random sample consensus (RANSAC)

algorithm is performed [19]. The RANSAC algorithm consists of a random selection of points to

fit in a plane according to predefined parameters, such as a reference vector, a distance threshold,

and the maximum angular distance between each point’s normal and the reference vector. The

Automation in Construction, 2019 (99), 114-124

algorithm fits a plane to the greatest number of similar data points according to the assigned

threshold values. However, concrete elements, such as columns and beams, can often be found

offset from other elements, such as walls or slabs, and thus can result in a wrong fitting surface for

the algorithm since these slab or wall elements usually possess surface areas significantly larger

than common structural elements. To overcome this issue, the plane of the bounding box generated

by the Faster R-CNN is used for the RANSAC algorithm to fit a plane.

In the presented scenarios, a reference vector parallel to the sensor’s normal is adopted

along with a 5˚ maximum angular distance. This eliminates the requirement for the sensor to be

place perfectly parallel to the surface plane. A distance threshold of 5 mm was also used as a

parameter for the RANSAC function. Every data point that does not satisfy the assigned thresholds

is considered an outlier.

The reviewed literature identified the desired damaged surfaces based on the fixed, initial

parameters acquired from external measurements of the implemented setup. Having a fixed-

distance system is not practical for mobile, unmanned applications. Furthermore, the reviewed

methodologies did not consider the possibility of having more than one surface on the frame,

regardless of their integrity state [19-21]. The proposed surface segmentation routine fits a plane

regardless of the distance between the sensor and the damaged surface as long as it is located

within the equipment’s working distance, relying solely on the data output from the sensor, without

the need of any other measurement of any kind.

2.4.Damage segmentation

Analyzing the 3D depth cloud, the outliers of the desired fitted plane are then analyzed to

determine whether or not they are located within the limits of the inliers of the same plane. If so,

the outliers are classified as damage. This process simultaneously filters possible NaN values from

the depth data.

To allow the separate quantification of multiple damage within the same surface, a

hierarchical cluster analysis routine is implemented to segment the outliers as individual damage.

The routine first calculates the Euclidean distance  (Eq. 1) between each pair of points  and 

within the data input, which is, in this case, the outliers, and then proceeds linking the pairs that

are close together into binary clusters. These clusters are linked with others to form larger clusters,

repeatedly, until all the data have been grouped. A maximum number of clusters is predetermined,

and each cluster index is then attributed to its corresponding depth point.

In the presented methodology, a maximum number of 10 clusters was predefined. This

allows the segmentation of up to 10 isolated damage within the same surface. The clustering

routine is key to volume quantification as it segments each damage area based on its depth data

relative to the surface, ensuring that even pixels located outside of the bounding boxes, previously

generated by the Faster R-CNN, are considered for damage quantification.

     

,

(1)

Automation in Construction, 2019 (99), 114-124

where i corresponds to each point’s coordinate reference axis.

2.5. Volume quantification

The fitted plane model returns the plane’s normal vector along with the constant coefficients a, b,

c, and d, which make part of the plane equation (Eq. 2). These values can be extracted and used to

create a    (Eq. 3) function to determine the exact depth distance of each point from

the fitted plane, as shown in Figure 5.

              ,

   

,

where  is depth coordinate of the nth pixel.

Figure 5. Volume quantification

The area covered by each pixel varies accordingly to its own depth plane. With closer

distances, pixels can cover an area as small as 1 mm², whereas longer depths can reach values

greater than 35 mm². The coverage area of the nth pixel is determined according to Eq. (4).

      ,

where

= Area of coverage (m2) of the nth pixel

 = Constant based on Kinect’s horizontal field of view (0.0024066)

(2)

(4)

(3)

Automation in Construction, 2019 (99), 114-124

= Constant based on Kinect’s vertical field of view (0.0024698)

To calculate the volume covered by each pixel, it is necessary to obtain the distance

between the pixel itself and the surface plane of the element. The surface plane is ruled by Eq. (2),

and the depth of each pixel within the plane can be acquired from Eq. (3). Therefore, it is possible

to obtain the value of the depth relative to the element’s surface by subtracting the corresponding

z coordinate of the fitted plane from every pixel’s z coordinate (Eq. 5). Figure 5 facilitates an easy

understanding of this theory.

   ,

where

= Relative depth of the nth pixel

= Depth coordinate of the nth pixel

= Depth coordinate of the corresponding nth pixel on the fitted plane

Since the proposed routine calculates the volume of each and every pixel within the

damaged area, it allows a proper quantification of any volumetric damage, regardless of its

geometric form.

For each segmented damage, the volume of each corresponding pixel is calculated by

multiplying its area of coverage and its relative depth to the fitted plane, and then these are added

together (Eq. 6). The final volume is calculated by the summation of each individual instance of

damage (Eq. 7).

    ,

 ,

where

is the calculated volume for the ith individual damage, and

is the total calculated

volume.

3. Experiments

The first step of this method is training, validation, and testing the Faster R-CNN-based concrete

spalling detection method. Using the detected concrete spalling information, damage

quantification is conducted by the depth cloud information with the method described in section

3.1. Faster R-CNN training, validation, and testing

Faster R-CNN has a significant task in the methodology: damage detection. To allow for spalling

detection, 749 images with a resolution of 2,560



1,440 pixels and an aspect ratio of 16:9 were

collected using a smartphone camera. The camera, aspect ratio, and resolution were chosen in such

(6)

(7)

(5)

Automation in Construction, 2019 (99), 114-124

a way that the output images would be similar to those produced by the Kinect RGB sensor. The

images were captured with distances of 0.5 to 2.5 m between the camera and the subjects under

different lighting conditions. The subjects contained artificial damage produced under lab

conditions. The acquired images were then cropped into 1,091 images, with dimensions of 853



1,440 pixels, which comprise the proposed database. The cropped images were annotated

according to the proposed damage type utilizing the free annotation software LabelImg [28]. This

software allows easy image annotation through a user-friendly interface. The program outputs

extensible markup language (XML) files corresponding to the annotated image in the Pascal Visual

Object Classes format. The XML files contain all image information, such as name and size as

well as bounding box coordinates. The labels and bounding boxes for 1,935 objects were produced.

Figure 6 shows examples of annotated images.

Figure 6. Annotated dataset sample

Training, validation, and testing datasets were manually produced to assure that no damage

would be present in both sets simultaneously. A total of 191 images with a resolution of 853



1,440 pixel images were selected to contain the testing set. The remaining 900 images were

randomly selected to compose the training and validation sets, such that the training set contains

600 images and the validation set contains 300 images. Data augmentation was applied. Horizontal

flipping and exposure adjustments were performed on the training and validation image sets, which

was posteriorly used for the training and validation of the network. The final number of images

containing the dataset after augmentation was applied was 3,455, with a total of 5,522 bounding

boxes labeled for spalling.

The experiments were conducted using the open source, pre-trained, Faster R-CNN [15],

MATLAB 2017a, CUDA 8.0, and CUDNN 5.1 on a desktop computer equipped with an Intel Core

i7-6700k 4.2 GHz CPU, 16 GB DDR4 ram memory and an EVGA GTX 1070 FTW with 8 GB of

video ram memory as graphics processing unit (GPU). Deep-learning models require extensive

datasets to be trained accurately, and, furthermore, it is time-consuming. To overcome the

Automation in Construction, 2019 (99), 114-124

limitations of having a relatively small dataset, the pre-trained model of Faster R-CNN is used in

this process. The RPN and Fast R-CNN networks are trained with the learning rate set to 0.001,

momentum set to 0.9, and weight decay set to 0.0005, performing 80,000 and 40,000 iterations,

respectively, for the final training.

There is no simple approach for determining the best anchor parameters for the Faster R-

CNN; thus, a trial-and-error approach was used. To determine optimal anchor sizes and ratios for

the training set, 12 combinations of anchor aspect ratios and three combinations of anchor sizes

were analyzed. The network was trained a total of 37 times, of which 36 were trials. To reduce

training time, the number of iterations performed on each training trial was 10 times fewer than

the original number of iterations described previously, resulting in 8,000 and 4,000 iterations for

the RPN and Fast R-CNN, respectively. The combination that resulted in the best AP was selected

and trained with the full number of iterations.

To enhance object detection for small damage, the input images for the training and

validation of the Faster R-CNN were scaled from 853 pixels on the shortest side to 1,000 pixels,

conserving the aspect ratio of the original 853



1,440 pixel input images. The scaling parameter

resizes the training images based on a predefined dimension. The proposed method enlarges the

input images to allow the detection of smaller damage and enhance the capabilities of the network.

The sliding convolutional layer of the RPN was set as having a stride of 16 on the feature map.

The scaling and stride parameters were defined through trial and error.

3.2. Damage segmentation and quantification

To test the proposed method, the damage detection and quantification was carried out under two

different scenarios, using different test subjects. Initially, a polystyrene foam test rig with a total

of eight circular-shaped simulation damage was developed, as shown in Figure 7(c), numbered 1

through 8. The damage dimensions are randomly attributed, and the inner surfaces are heat-treated

to remove roughness. As verified by [27], harsh direct sunlight interferes greatly with the output

point cloud since the sensors acquire depth data through IR imaging. The readings were taken

indoors, under indirect sunlight, to ensure optimum reading conditions.

(a) (b) (c)

Figure 7. (a) Sensor setup, (b) damage on concrete, (c) damage on foam

Automation in Construction, 2019 (99), 114-124

A concrete beam 35 cm in height was utilized for the second stage of the experiment, as

shown in Figure 7(a–b). Four different damage of random dimensions simulating material spalling

were broken into the member using a jack hammer, as shown in Figure 7(b), numbered 1 through

4. No treatment was performed on the damage’s inner surfaces. The readings were taken during

the day, indoors, under indirect sunlight and artificial lighting to ensure best data accuracy.

Ground truth volume measurements were performed on both elements. The voids were

filled with tap water and set to rest for one hour to reach surface saturation, then refilled to surface

level. The water was extracted using a syringe, and volumes were measured through a graduated

cylinder. The maximum depth for each damage was taken using a caliper. A steel plate of known

thickness was placed on top of the surfaces to allow proper measurement.

Figure 7(a) illustrates the depth sensor initial setup to acquire data on the concrete element.

The setup procedures were the same as those performed in the first scenario. The sensor was placed

on top of a sturdy tripod to prevent any type of shaking and to allow a proper alignment between

the sensor plane and the element. The sensor was positioned approximately in parallel with the

elements’ surfaces, aiming directly at their center. No measurement tool was utilized to assure

alignment. The depth frames were taken at distances varying from 1.0 m to 2.5 m, with 0.25 m

increments, totaling seven reading distances.

4. Results and analysis

Using the trained Faster R-CNN-based spalling detection method, the experiments described in

section 3 were analyzed.

4.1. Faster R-CNN

The training of the network was performed using the four-step strategy. The training time on GPU-

mode is approximately 14 hours due to the resolution of the training set. Smaller resolutions lead

to faster training.

(a) (b) (c)

Figure 8. Detected volumetric losses on outdoor concrete elements

Automation in Construction, 2019 (99), 114-124

The network requires 0.08 seconds in GPU mode to evaluate each 853



1,440-pixel image

and 2.80 seconds in CPU-mode. The highest AP amongst all trainings was 90.79%. For such

precision, the anchor sizes and anchor aspect ratios were found to be 128, 480, and 880, and 0.5,

.85, and 1.85, respectively. Testing the final model with new images yielded acceptable results.

The output of the network on real-life damage is illustrated in Figure 8(a–c). Confidence levels on

the presented cases ranged from 60.5% in Figure 8(a) to 85.9% in Figure 8(c).

When using Figure 7(b) as input, the network provided accurate detections, as shown in

Figure 9. Bounding boxes were drawn within an acceptable range past damage limits, and

confidence levels were strictly high, with 100% certainty.

Figure 9. Detected volumetric losses on laboratory specimen

4.2. Damage quantification

A total of three readings were taken for each of the seven distances at each scenario, and the

average of all three volume calculations was taken into consideration. The algorithm is able to

identify the proposed surfaces and to segment and quantify the damage, regardless of their size,

depth, and the distance between the sensor and the element. The bounding boxes indicate the

presence of damage within the concrete surface and are used to segment such surface. When in

possession of the surface coordinates, it is possible to extract pixels corresponding to damage.

Damage pixels that are outside of the bounding box but are within the surface limits are considered

for volume quantification, as mentioned in section 2.4.

The volume (V) calculation results for the polystyrene test rig are listed in Table 2. The

relative error for total volume ranged from 1.49% for the closest distance to 13.83% for the farthest

distance, and the mean precision error (MPE) in the volume calculation of all individual damage

considering all distances was 14.9%.

Automation in Construction, 2019 (99), 114-124

Despite the variance, the errors within the range were consistent, varying by small amounts

when increasing the distance between the sensor and the element. Other than the smallest working

distance of 100 cm, the data collected at 200 cm from the test subject resulted in the best accuracy.

One potential cause of such results is the sharp edges of the carvings on the test rig. At shorter

distances, some projected IR light can be blocked by the carving edges as the IR projector of the

Kinect V2 was not aligned with the center of each carving individually but with the center of the

test rig itself.

In addition, the smallest damage (2 and 4 in Figure 7(c)) of the test rig presented the highest

mean precision errors of 30.42% and 25.57%, respectively, reaching up to 49% at the farthest

distance. Removing these damage from the total MPE results in a reduction of 4.37%. Analyzing

the closest working distance of 100 cm, the obtained MPE was 9.22% considering all damage and

7.01% when both damage 2 and 4 were not taken into consideration due to a large amount of error.

These values are greater than the 5.47% MPE obtained from [21], but the tested material needs to

be taken into consideration since the authors used a concrete test rig. The reflectance of the IR

light depends on the material that it is being reflected off of [29], therefore the test rig being of

distinct materials could account for most of the difference in the observed error rates. Other than

this, the MPEs of individual damage, excluding those from damage 2 and 4, were all smaller than

the error observed in [19].

Table 2. Test rig at distances from 100 to 250 cm

Damage

Total

Ground Truth V (cm³)

371

210

130

966

Damage

100

(cm)

Calc. V (cm³)

14.8

104.5

7.9

52.27

351.7

222.7

133.7

951.57

Error (%)

5.88

12.94

5.56

21.00

14.31

5.20

6.05

2.85

1.49

125

Calc. V (cm³)

62.2

14.2

97.9

7.4

343.6

202.6

111.7

887.6

Error (%)

8.53

16.47

1.11

26.00

21.31

7.39

3.52

14.08

8.12

150

Calc. V (cm³)

68.6

7.9

99.2

7.7

47.8

336.9

199.9

103.1

871.1

Error (%)

0.88

53.53

0.20

23.00

21.64

9.19

4.81

20.69

9.82

175

Calc. V (cm³)

12.7

113.3

11.4

59.6

331.4

183.8

100.4

890.6

Error (%)

14.71

25.29

14.44

14.00

2.30

10.67

12.48

22.77

7.81

200

Calc. V (cm³)

75.3

19.4

104.9

8.2

57.4

334.8

185.2

114.5

899.7

Error (%)

10.74

14.12

5.96

18.00

5.90

9.76

11.81

11.92

6.86

225

Calc. V (cm³)

58.3

9.7

91.9

5.2

46.2

307

188.3

121.8

828.4

Error (%)

14.26

42.94

7.17

48.00

24.26

17.25

10.33

6.31

14.24

250

Calc. V (cm³)

63.3

7.9

85.9

3.1

36.8

319.7

177.3

102.9

796.9

Error (%)

6.91

53.53

13.23

69.00

39.67

13.83

15.57

20.85

17.51

Automation in Construction, 2019 (99), 114-124

Figures 10 through 12 were obtained by processing the depth frame taken at a distance of

100 cm with the proposed methodology. Figure 10 illustrates the depth cloud corresponding to the

test rig after noise reduction. The test rig was offset from a wall (identified by the yellow points).

As described in section 2.1.1, only the center 300 × 300 pixels were taken into consideration. The

extracted surface of the test rig is illustrated in Figure 11. Even in the presence of a second plane

surface, the algorithm correctly detected the test rig as the desired surface. The segmented damage

can be viewed in Figure 12 in both 2D(a) and 3D(b).

Figure 10. Foam denoised depth cloud Figure 11. Foam extracted surface

(a) (b)

Figure 12. Foam damage segmentation: (a) front view, (b) 3D view

The volume calculations for the concrete beam are listed in Table 3. Similar to the previous

scenario, extreme error readings over total volume were observed for the extreme working

distances, ranging from 5.82% to 13.78%. The observed MPE in the volume calculation of all

individual damage considering all distances was 9.45%. The surface roughness of the concrete

Automation in Construction, 2019 (99), 114-124

damage could be responsible for the significant error difference between the values from the test

rig at short distances. The most accurate volume calculation other than that at the closest distance

was obtained at a distance of 225 cm, similar to the test rig. All damage showed similar individual

MPEs across the range, and size did not exert any influence on the accuracy. The total MPE of

9.45% is 37% smaller than the 15% error rate presented in [20]. In addition, since both the beam

and the test rig utilized by [21] are made of concrete, it is possible to compare both results directly.

The MPE of 5.28% obtained through the experiment at the shortest distance is smaller than the

5.47% error rate acquired by the authors, even with a farther minimum working distance. This

demonstrates the superior capabilities of the proposed methodology over those of previous studies.

Figures 13 through 15 were obtained by processing the depth frame of the concrete beam,

taken at a distance of 100 cm. Figure 13 illustrates the portion of the depth cloud acquired by the

sensor after noise reduction, as described in section 2.1.2. The entire frame covers an area of

approximately 6 m wide and 3 m tall. The beam surface is the dark-blue region at the bottom of

Figure 13, centered at coordinates (0.0, 0.0). As described in section 2.1.1, only the center 300 ×

300 pixels were taken into consideration. The extracted surface of the concrete element is shown

in Figure 14. The segmented damage can be viewed in Figure 15 in both 2D(a) and 3D(b).

Figure 13. Concrete beam denoised depth cloud

Automation in Construction, 2019 (99), 114-124

Figure 14. Concrete beam extracted surface

(a) (b)

Figure 15. Concrete beam damage segmentation: (a) front view, (b) 3D view

Table 3. Concrete beam at distances from 175 to 250 cm

Damage

Total

Ground Truth V (cm³)

503.33

429.33

114.67

165.67

1213

Distance (cm)

100

Calc. Volume (cm³)

473.6

383

118.9

166.9

1142.4

Error (%)

5.91

10.79

3.69

0.74

5.82

125

Calc. Volume (cm³)

495

402

106.5

147.6

1151.1

Error (%)

1.65

6.37

7.12

10.91

5.10

150

Calc. Volume (cm³)

471.7

402.9

90.8

133.4

1098.8

Error (%)

6.28

6.16

20.82

19.48

9.41

175

Calc. Volume (cm³)

440.6

357.3

92.7

131.2

1021.8

Error (%)

12.46

16.78

19.16

20.81

15.76

200

Calc. Volume (cm³)

432.5

330.9

114.3

157.8

1035.5

Error (%)

14.07

22.93

0.32

4.75

14.63

225

Calc. Volume (cm³)

453

369.9

104.8

162.5

1090.2

Error (%)

10.00

13.84

8.61

1.91

10.12

250

Calc. Volume (cm³)

432.6

361.3

101.7

150.2

1045.8

Error (%)

14.05

15.85

11.31

9.34

13.78

Automation in Construction, 2019 (99), 114-124

Maximum depth values for each set of damage on each frame were also taken. Table 4 lists

the obtained values along with their respective ground truth for each scenario. The sensor sustained

a considerable accuracy of less than 10% for maximum depth measurements on all tests.

Considering the shortest distance of 100 cm, a relative error of 1.15% was obtained, whereas [21]

presented an accuracy of 2.58% for maximum depth measurements on the concrete test rig at a

distance of 80 cm. When compared with the MPE of 8% obtained from [22], the maximum depth

measurements of the proposed methodology on the concrete beam show a better result even at the

farthest distance of 250 cm.

Table 4. Maximum depth measurements

Foam

Concrete

Distance

(cm)

Ground

(cm)

Measured (cm)

Error

(%)

Distance

(cm)

Ground

(cm)

Measured

(cm)

Error (%)

100

3.79

3.64

3.96

100

5.2

5.26

1.15

125

3.79

3.67

3.17

125

5.2

5.19

0.19

150

3.79

3.63

4.22

150

5.2

5.07

2.50

175

3.79

3.58

5.54

175

5.2

5.06

2.69

200

3.79

3.52

7.12

200

5.2

5.03

3.27

225

3.79

3.46

8.71

225

5.2

4.92

5.38

250

3.79

3.42

9.76

250

5.2

4.81

7.50

MPE = 6.07%

MPE = 3.24%

Some observed errors were higher on the test rig than on the concrete beam, possibly due

to the reflectance capabilities of the material, as mentioned before. Furthermore, the shape and

position of the damage could impact on the accuracy of the measurements as well. The fact that

the IR projector was not aligned with the horizontal centerline of each individual row of damage

but with the one between both rows, as well as the sharp edge of the damage, could have had a

significant impact on the IR distribution within the inner surface, causing small shadowing below

the top and bottom areas of some of the damage, depending on the working distance. This is not

observed on the concrete beam since the damage are located along the element following a

centerline. Damage 3 and 4 from the concrete beam have, in fact, different horizontal centerlines,

but they are not far from each other, and the inward deflection from the edges is not steep, hence

the neglected effect.

5. Conclusions

Concrete spalling is one of the most common types of structural damage and should be quantified

in terms of volume for accurate damage assessment. The traditional method of measuring the

volume of concrete spalling is to use a depth camera, 3D scanner, etc. However, to automatically

quantify concrete spalling in multiple locations separately within the same structural surface or

multiple different surfaces simultaneously, the first critical step is to identify the concrete spalling

and its standard structural surface. In order to realize this, this paper proposed the Faster R-CNN-

Automation in Construction, 2019 (99), 114-124

based concrete spalling damage detection method integrated with an inexpensive depth camera for

damage quantification.

The Faster R-CNN provides the multiple locations of the concrete spalling in the same

structural surfaces or multiple different structural surfaces. Based on the location and depth

information on the detected concrete spalling, the standard structural surfaces were identified and

segmented using the RANSAC algorithm. The differences between the standard surface and

detected concrete spalling in terms of depth information are the calculated volumes of the spalling.

With this approach, the proposed methodology identifies and quantifies volumes of multiple

surface concrete spalling regardless of the distance between the depth sensor and the element,

relying solely on the outputs from the RGB-D sensor.

In this paper, we have tested the capabilities of the Microsoft Kinect V2 RGB-D camera as

a tool for detecting and quantifying volumetric damage along a plane concrete surface in

conjunction with a Faster R-CNN-based damage detection mechanism. The data provided by the

sensor allows the easy extraction of geometric properties with considerable accuracy. When

compared to other volume quantification tools, such as laser scanners, the Kinect V2 is much less

expensive. Stereo photogrammetry systems can be reliable as well, but the RGB-D camera outputs

the depth map immediately, requiring no further scaling method of any sort, allowing

instantaneous data manipulation.

The newly proposed methodology proves useful when ideal setup conditions are not met

and a more versatile approach is needed. By coupling the Faster R-CNN damage detection

mechanism with the plane fitting method, which works under any distance within the sensor range,

the system can be implemented on unmanned vehicles, allowing remote data acquisition in places

of difficult or hazardous access. With an AP value of 90.79% for damage detection by the Faster

R-CNN-based stage and mean precision error values below 10% for individual volume

measurements, the presented methodology proves its capabilities, providing a reliable damage

detection and quantification system for structural health monitoring. Moreover, this proposed

method can be a prototype of automatic concrete spalling damage detection and quantification in

terms of volume with an inexpensive depth sensor. Any advanced type of depth camera can be

applicable with the proposed method for better quantification accuracy.

References

[1] Sohn, H., Farrar, C.R., Hemez, F.M., Shunk, D.D., Stinemates, D.W., Nadler, B.R. and Czarnecki, J.J., 2003. A

review of structural health monitoring literature: 1996–2001. Los Alamos National Laboratory, USA.

[2] Farrar, C.R. and Worden, K., 2010. An introduction to structural health monitoring. In New Trends in Vibration

Based Structural Health Monitoring (pp. 1-17). Springer, Vienna.

[3] Brownjohn, J.M., 2007. Structural health monitoring of civil infrastructure. Philosophical Transactions of the

Royal Society of London A: Mathematical, Physical and Engineering Sciences, 365(1851), pp.589-622.

[4] Zhang, J., Guo, S.L., Wu, Z.S. and Zhang, Q.Q., 2015. Structural identification and damage detection through

long-gauge strain measurements. Engineering Structures, 99, pp.173-183.

[5] Lynch, J.P. and Loh, K.J., 2006. A summary review of wireless sensors and sensor networks for structural health

monitoring. Shock and Vibration Digest, 38(2), pp.91-130.

Automation in Construction, 2019 (99), 114-124

[6] Friswell, M.I. and Penny, J.E., 2002. Crack modeling for structural health monitoring. Structural health

monitoring, 1(2), pp.139-148.

[7] Malesa, M., Szczepanek, D., Kujawińska, M., Świercz, A. and Kołakowski, P., 2010. Monitoring of civil

engineering structures using digital image correlation technique. In EPJ Web of Conferences (Vol. 6, p. 31014). EDP

Sciences.

[8] Chen, J.G., Wadhwa, N., Cha, Y.J., Durand, F., Freeman, W.T. and Buyukozturk, O., 2015. Modal identification

of simple structures with high-speed video using motion magnification. Journal of Sound and Vibration, 345, pp.58-

71.

[9] Choi, S. and Shah, S.P., 1997. Measurement of deformations on concrete subjected to compression using image

correlation. Experimental Mechanics, 37(3), pp.307-313.

[10] Cha, Y.J., Choi, W. and Büyüköztürk, O., 2017. Deep learning‐based crack damage detection using convolutional

neural networks. Computer‐Aided Civil and Infrastructure Engineering, 32(5), pp.361-378.

[11] Cha, Y.J., You, K. and Choi, W., 2016. Vision-based detection of loosened bolts using the Hough transform and

support vector machines. Automation in Construction, 71, pp.181-188.

[12] Cha, Y.J., Chen, J.G. and Büyüköztürk, O., 2017. Output-only computer vision based damage detection using

phase-based optical flow and unscented Kalman filters. Engineering Structures, 132, pp.300-313.

[13] Chen, P.H., Shen, H.K., Lei, C.Y. and Chang, L.M., 2012. Support-vector-machine-based method for automated

steel bridge rust assessment. Automation in Construction, 23, pp.9-19.

[14] Girshick, R., 2015. Fast r-cnn. In Proceedings of the IEEE international conference on computer vision (pp.

1440-1448).

[15] Ren, S., He, K., Girshick, R. and Sun, J., 2017. Faster R-CNN: towards real-time object detection with region

proposal networks. IEEE Transactions on Pattern Analysis & Machine Intelligence, (6), pp.1137-1149.

[16] Cha, Y.J., Choi, W., Suh, G., Mahmoudkhani, S. and Büyüköztürk, O., 2018. Autonomous structural visual

inspection using region‐based deep learning for detecting multiple damage types. Computer‐Aided Civil and

Infrastructure Engineering, 33(9), pp.731-747.

[17] Snavely, N., 2011. Scene reconstruction and visualization from internet photo collections: A survey. IPSJ

Transactions on Computer Vision and Applications, 3, pp.44-66.

[18] Torok, M.M., Golparvar-Fard, M. and Kochersberger, K.B., 2013. Image-based automated 3D crack detection

for post-disaster building assessment. Journal of Computing in Civil Engineering, 28(5), p.A4014004.

[19] Jahanshahi, M.R., Jazizadeh, F., Masri, S.F. and Becerik-Gerber, B., 2012. Unsupervised approach for

autonomous pavement-defect detection and quantification using an inexpensive depth sensor. Journal of Computing

in Civil Engineering, 27(6), pp.743-754.

[20] Moazzam, I., Kamal, K., Mathavan, S., Usman, S. and Rahman, M., 2013, October. Metrology and visualization

of potholes using the microsoft kinect sensor. In Intelligent Transportation Systems-(ITSC), 2013 16th International

IEEE Conference on (pp. 1284-1291). IEEE.

[21] Kamal, K., Mathavan, S., Zafar, T., Moazzam, I., Ali, A., Ahmad, S.U. and Rahman, M., 2018. Performance

assessment of kinect as a sensor for pothole imaging and metrology. International Journal of Pavement

Engineering, 19(7), pp.565-576.

[22] Yuan, C. and Cai, H., 2014. Automatic detection of pavement surface defects using consumer depth camera.

In Construction Research Congress 2014: Construction in a Global Network (pp. 974-983).

[23] Beckman Gomes, G.H. 2018. Deep learning-based volumetric damage quantification using an inexpensive depth

camera. MSc thesis, University of Mantoba, Winnipeg, Canada.

[24] Zeiler, M.D. and Fergus, R., 2014, September. Visualizing and understanding convolutional networks.

In European Conference on Computer Vision (pp. 818-833). Springer, Cham.

[25] Simonyan, K. and Zisserman, A., 2014. Very deep convolutional networks for large-scale image

recognition. arXiv preprint arXiv:1409.1556.

[26] Nair, V. and Hinton, G.E., 2010. Rectified linear units improve restricted boltzmann machines. In Proceedings

of the 27th International Conference on Machine Learning (ICML-10) (pp. 807-814).

Automation in Construction, 2019 (99), 114-124

[27] Steward, J., Lichti, D., Chow, J., Ferber, R. and Osis, S., 2015. Performance assessment and calibration of the

Kinect 2.0 time-of-flight range camera for use in motion capture applications. FIG Working week 2015, pp.1-14.

[28] Tzutalin. LabelImg. Git code (2015). Retrieved February 9, 2018, from https://github.com/tzutalin/labelImg

[29] Mangold, K., Shaw, J.A. and Vollmer, M., 2013. The physics of near-infrared photography. European Journal

of Physics, 34(6), p.S51.

Appendix

Table A.1 ZF-Net architecture for RPN

Layer

Type

Filter Size

Stride

Depth

CONV+ReLU

7 × 7

LRN

Max pooling

3 × 3

CONV+ReLU

5 × 5

256

LRN

Max pooling

3 × 3

256

CONV+ReLU

3 × 3

384

CONV+ReLU

3 × 3

384

CONV+ReLU

3 × 3

256

Sliding CONV+ReLU

3 × 3

256

Softmax & Regressor

Table A.2 ZF-Net architecture for Fast R-CNN

Layer

Type

Filter Size

Stride

Depth

CONV+ReLU

7 × 7

LRN

Max pooling

3 × 3

CONV+ReLU

5 × 5

256

LRN

Max pooling

3 × 3

256

CONV+ReLU

3 × 3

384

CONV+ReLU

3 × 3

384

CONV+ReLU

3 × 3

256

RoI pooling

256

FC+ReLU

4096

Dropout

FC+ReLU

4096

Dropout

FC+ReLU

Softmax & Regressor

Field Validation of Deep-Learning-Based Ground Penetrating Radar Image Analysis for Advancing Subsurface Distress Detection

Article

May 2024

This research introduces an innovative method for detecting subsurface cracks within pavements by leveraging ground penetrating radar (GPR) technology in conjunction with advanced deep learning techniques. Its primary aim is to significantly improve the accuracy and efficiency of pavement assessment, particularly for operational and maintenance purposes. The proposed model, GPR-YOLOR (You Only Learn One Representation), extends the YOLOR framework and incorporates a region of interest within the top pavement layer to detect subsurface cracks. While the model can be trained with annotated data, the main challenge lies in validating results in the field because of the inability to visually inspect subsurface conditions and the high cost associated with direct coring. To overcome this challenge, we propose an alternative approach that utilizes the co-occurrence of surface cracks as pseudo labels, allowing for easy verification. To ensure that surface cracks correspond to subsurface cracks, the focus is exclusively on transverse cracks that develop in a bottom-up manner, such as fatigue and reflective cracks. Through this methodology, our GPR-YOLOR model achieves an F1 score of 0.72, with a precision of 0.76 and a recall of 0.68. The results from field validation underscore the effectiveness of the GPR-YOLOR model in accurately identifying subsurface cracks, highlighting its practical significance in conducting field condition assessments.

Structural Condition Assessment of Steel Anchorage Using Convolutional Neural Networks and Admittance Response

Article

Full-text available

Jun 2024

Structural damage in the steel bridge anchorage, if not diagnosed early, could pose a severe risk of structural collapse. Previous studies have mainly focused on diagnosing prestress loss as a specific type of damage. This study is among the first for the automated identification of multiple types of anchorage damage, including strand damage and bearing plate damage, using deep learning combined with the EMA (electromechanical admittance) technique. The proposed approach employs the 1D CNN (one-dimensional convolutional neural network) algorithm to autonomously learn optimal features from the raw EMA data without complex transformations. The proposed approach is validated using the raw EMA response of a steel bridge anchorage specimen, which contains substantial nonlinearities in damage characteristics. A K-fold cross-validation approach is used to secure a rigorous performance evaluation and generalization across different scenarios. The method demonstrates superior performance compared to established 1D CNN models in assessing multiple damage types in the anchorage specimen, offering a potential alternative paradigm for data-driven damage identification in steel bridge anchorages.

A review on technological advancements in the field of data-driven structural health monitoring

Conference Paper

Full-text available

Jun 2022

Recent advancements in sensor technology, as well as fast progress in internet-based cloud computation; data-driven approaches in structural health monitoring (SHM) are gaining prominence. The majority of time is utilized for reviewing & analyzing the data received from various sensors deployed in structures. This data analysis helps in understating the structural stability and its current state with certain limitations. Considering this fact, integration with Machine Learning (ML) in SHM has attracted significant attention among researchers. This paper is principally aimed at understanding and reviewing of vast literature available in sensor-based data-driven approaches using ML. The implementation and methodology of vibration-based, vision-based monitoring, along with some of the ML algorithms used for SHM are discussed. Nevertheless, a perspective on the importance of data-driven SHM in the future is also presented. Conclusions are drawn from the review discuss the prospects and potential limitations of ML approaches in data-driven SHM applications.

Ensemble Deep Learning for Automated Damage Detection of Trailers at Intermodal Terminals

Article

Full-text available

Jan 2024

Efficient damage detection of trailers is essential for improving processes at inland intermodal terminals. This paper presents an automated damage detection (ADD) algorithm for trailers utilizing ensemble learning based on YOLOv8 and RetinaNet networks. The algorithm achieves 88.33% accuracy and an 81.08% F1-score on the real-life trailer damage dataset by leveraging the strengths of each object detection model. YOLOv8 is trained explicitly for detecting belt damage, while RetinaNet handles detecting other damage types and is used for cropping trailers from images. These one-stage detectors outperformed the two-stage Faster R-CNN in all tested tasks within this research. Furthermore, the algorithm incorporates slice-aided hyper inference, which significantly contributes to the efficient processing of high-resolution trailer images. Integrating the proposed ADD solution into terminal operating systems allows a substantial workload reduction at the ingate of intermodal terminals and supports, therefore, more sustainable transportation solutions.

Harnessing data from benchmark testing for the development of spalling detection techniques using deep learning

Chapter

Jan 2024

Research on concrete structure defect repair based on three-dimensional printing

Article

Jun 2024

Quality assurance and maintenance play a crucial role in engineering construction, as they have a significant impact on project safety. One common issue in concrete structures is the presence of defects. To enhance the automation level of concrete defect repairs, this study proposes a computer vision-based robotic system, which is based on three-dimensional (3D) printing technology to repair defects. This system integrates multiple sensors such as light detection and ranging (LiDAR) and camera. LiDAR is utilized to model concrete pipelines and obtain geometric parameters regarding their appearance. Additionally, a convolutional neural network (CNN) is employed with a depth camera to locate defects in concrete structures. Furthermore, a method for coordinate transformation is presented to convert the obtained coordinates into executable ones for a robotic arm. Finally, the feasibility of this concrete defect repair method is validated through simulation and experiments.

Research on automatic classification of muck in earth pressure balance shield based on deep learning network

Conference Paper

Dec 2023

A 3D Feature-based Approach for Mapping Scaling Effects on Stone Monuments

Article

Mar 2024

Weathering effects caused by physical, chemical, or biological processes result in visible damages that alter the appearance of stones’ surfaces. Consequently, weathered stone monuments can offer a distorted perception of the artworks to the point of making their interpretation misleading. Being able to detect and monitor decay is crucial for restorers and curators to perform important tasks such as identifying missing parts, assessing the preservation state, or evaluating curating strategies. Decay mapping, the process of identifying weathered zones of artworks, is essential for preservation and research projects. This is usually carried out by marking the affected parts of the monument on a 2D drawing or picture of it. One of the main problems of this methodology is that it is manual work based only on experts’ observations. This makes the process slow and often results in disparities between the mappings of the same monument made by different experts. In this paper, we focus on the weathering effect known as “scaling”, following the ICOMOS ISCS definition. We present a novel technique for detecting, segmenting, and classifying these effects on stone monuments. Our method is user-friendly, requiring minimal user input. By analyzing 3D reconstructed data considering geometry and appearance, the method identifies scaling features and segments weathered regions, classifying them by scaling subtype. It shows improvements over previous approaches and is well-received by experts, representing a significant step towards objective stone decay mapping.

An Intelligent Framework of Upgraded CapsNets with Massive Transmissibility Data for Identifying Damage in Bridges

Article

Mar 2024
APPL SOFT COMPUT

Multi-layers deep learning model with feature selection for automated detection and classification of highway pavement cracks

Article

Jan 2024

Purpose Cracks are prevalent signs of pavement distress found on highways globally. The use of artificial intelligence (AI) and deep learning (DL) for crack detection is increasingly considered as an optimal solution. Consequently, this paper introduces a novel, fully connected, optimised convolutional neural network (CNN) model using feature selection algorithms for the purpose of detecting cracks in highway pavements. Design/methodology/approach To enhance the accuracy of the CNN model for crack detection, the authors employed a fully connected deep learning layers CNN model along with several optimisation techniques. Specifically, three optimisation algorithms, namely adaptive moment estimation (ADAM), stochastic gradient descent with momentum (SGDM), and RMSProp, were utilised to fine-tune the CNN model and enhance its overall performance. Subsequently, the authors implemented eight feature selection algorithms to further improve the accuracy of the optimised CNN model. These feature selection techniques were thoughtfully selected and systematically applied to identify the most relevant features contributing to crack detection in the given dataset. Finally, the authors subjected the proposed model to testing against seven pre-trained models. Findings The study's results show that the accuracy of the three optimisers (ADAM, SGDM, and RMSProp) with the five deep learning layers model is 97.4%, 98.2%, and 96.09%, respectively. Following this, eight feature selection algorithms were applied to the five deep learning layers to enhance accuracy, with particle swarm optimisation (PSO) achieving the highest F-score at 98.72. The model was then compared with other pre-trained models and exhibited the highest performance. Practical implications With an achieved precision of 98.19% and F-score of 98.72% using PSO, the developed model is highly accurate and effective in detecting and evaluating the condition of cracks in pavements. As a result, the model has the potential to significantly reduce the effort required for crack detection and evaluation. Originality/value The proposed method for enhancing CNN model accuracy in crack detection stands out for its unique combination of optimisation algorithms (ADAM, SGDM, and RMSProp) with systematic application of multiple feature selection techniques to identify relevant crack detection features and comparing results with existing pre-trained models.

Autonomous Structural Visual Inspection Using Region-Based Deep Learning for Detecting Multiple Damage Types

Article

Full-text available

Nov 2017
COMPUT-AIDED CIV INF

Computer vision-based techniques were developed to overcome the limitations of visual inspection by trained human resources and to detect structural damage in images remotely, but most methods detect only specific types of damage, such as concrete or steel cracks. To provide quasi real-time simultaneous detection of multiple types of damages, a Faster Region-based Con-volutional Neural Network (Faster R-CNN)-based structural visual inspection method is proposed. To realize this, a database including 2,366 images (with 500 × 375 pixels) labeled for five types of damages-concrete crack, steel corrosion with two levels (medium and high), bolt corrosion, and steel delamination-is developed. Then, the architecture of the Faster R-CNN is modified, trained, validated, and tested using this database. Results show 90.6%, 83.4%, 82.1%, 98.1%, and 84.7% average precision (AP) ratings for the five damage types, respectively, with a mean AP of 87.8%. The robustness of the trained Faster R-CNN is evaluated and demonstrated using 11 new 6,000 × 4,000-pixel images taken of different structures. Its performance is also compared to that of the traditional CNN-based method. Considering that the proposed method provides a remarkably fast test speed (0.03 seconds per image with 500 × 375 resolution), a frame-* To whom correspondence should be addressed. E-mail: Young. Cha@umanitoba.ca. work for quasi real-time damage detection on video using the trained networks is developed.

Deep Learning-Based Crack Damage Detection Using Convolutional Neural Networks

Article

Full-text available

Mar 2017
COMPUT-AIDED CIV INF

A number of image processing techniques (IPTs) have been implemented for detecting civil infrastructure defects to partially replace human-conducted on-site inspections. These IPTs are primarily used to manipulate images to extract defect features, such as cracks in concrete and steel surfaces. However, the extensively varying real-world situations (e.g., lighting and shadow changes) can lead to challenges to the wide adoption of IPTs. To overcome these challenges, this article proposes a vision-based method using a deep architecture of con-volutional neural networks (CNNs) for detecting concrete cracks without calculating the defect features. As CNNs are capable of learning image features automatically , the proposed method works without the conjugation of IPTs for extracting features. The designed CNN is trained on 40 K images of 256 × 256 pixel resolutions and, consequently, records with about 98% accuracy. The trained CNN is combined with a sliding window technique to scan any image size larger than 256 × 256 pixel resolutions. The robustness and adaptability of the proposed approach are tested on 55 images of 5,888 × 3,584 pixel resolutions taken from a different structure which is not used for training and validation processes under various conditions (e.g., strong light spot, shadows , and very thin cracks). Comparative studies are conducted to examine the performance of the proposed CNN using traditional Canny and Sobel edge detection methods. The results show that the proposed method shows

Output-Only Computer Vision Based Damage Detection Using Phase-Based Optical Flow and Unscented Kalman Filters

Article

Nov 2016
ENG STRUCT

Vision-based detection of loosened bolts using the Hough transform and support vector machines

Article

Jul 2016
AUTOMAT CONSTR

Many contact-sensor-based methods for structural damage detection have been developed. However, these methods have difficulty compensating for environmental effects, such as variation or changes in temperature and humidity, which may lead to false alarms. In order to partially overcome these disadvantages, vision-based approaches have been developed to detect corrosions, cracks, delamination, and voids. However, there are few such approaches for loosened bolts. Therefore, we propose a novel vision-based detection method. Target images of loosened bolts were taken by a smartphone camera. From the images, simple damage-sensitive features, such as the horizontal and vertical lengths of the bolt head, were calculated automatically using the Hough transform and other image processing techniques. A linear support vector machine was trained with the aforementioned features, thereby building a robust classifier capable of automatically differentiating tight bolts from loose bolts. Leave-one-out cross-validation was adapted to analyze the performance of the proposed algorithm. The results highlight the excellent performance of the proposed approach to detecting loosened bolts, and that it can operate in quasi-real-time.

Performance assessment of Kinect as a sensor for pothole imaging and metrology *

Article

Jun 2016

Potholes are one of the key defects that affect the performance of roads and highway networks. Metrological features of a pothole provide useful metrics for road distress measurement and severity analysis. This paper presents a performance analysis of Kinect as a sensor for pothole imaging and metrology. Depth images of paved surfaces are collected from concrete and asphalt roads using this sensor. Three-dimensional (3D) meshes are generated for a variety of pothole configurations in order to visualise and to calculate their different metrological features. The sensor is benchmarked using a test-rig with pothole-like depressions or artificial potholes of known dimensions to evaluate sensor performance under different real-life imaging conditions, such as through the media of clear, muddy and oily water. Error in measurement due to surface roughness is also studied. Another source of error that is due to the presence of foreign objects such as stones and pebbles in the form of negative depth, is also discussed and compensated. Results show that a mean percentage error of 2.58 and 5.47% in depth and volumetric calculations, respectively.

An introduction to structural health monitoring

Article

A Review of Structural Health Monitoring Literature: 1996–2001

Technical Report

Jan 2004

Performance Assessment and Calibration of the Kinect 2.0 Time-of-Flight Range Camera for Use in Motion Capture Applications

Conference Paper

May 2015

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Article

Jun 2015

State-of-the-art object detection networks depend on region proposal algorithms to hypothesize object locations. Advances like SPPnet and Fast R-CNN have reduced the running time of these detection networks, exposing region proposal computation as a bottleneck. In this work, we introduce a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals. An RPN is a fully-convolutional network that simultaneously predicts object bounds and objectness scores at each position. RPNs are trained end-to-end to generate high-quality region proposals, which are used by Fast R-CNN for detection. With a simple alternating optimization, RPN and Fast R-CNN can be trained to share convolutional features. For the very deep VGG-16 model, our detection system has a frame rate of 5fps (including all steps) on a GPU, while achieving state-of-the-art object detection accuracy on PASCAL VOC 2007 (73.2% mAP) and 2012 (70.4% mAP) using 300 proposals per image. The code will be released.

Structural identification and damage detection through long-gauge strain measurements

Article

Sep 2015
ENG STRUCT

Deep learning-based automatic volumetric damage quantification using depth camera

Abstract and Figures

Recommended publications

Classification of Malware by Using Structural Entropy on Convolutional Neural Networks

Multicontext 3D residual CNN for false positive reduction of pulmonary nodule detection

3D Semantic Maps for Scene Segmentation

Master and Rookie Networks for Person Re-identification