ArticlePDF Available

3D Tracking of Construction Resources Using an On-site Camera System

July 2012
Journal of Computing in Civil Engineering 26(4):541-549

July 2012
26(4):541-549

DOI:10.1061/(ASCE)CP.1943-5487.0000168

Authors:

Man-Woo Park

Myongji University

Christian Koch

Bauhaus-Universität Weimar

Ioannis Brilakis

University of Cambridge

Vision trackers have been proposed as a promising alternative for tracking at large-scale, congested construction sites. They provide the location of a large number of entities in a camera view across frames. However, vision trackers provide only two-dimensional (2D) pixel coordinates, which are not adequate for construction applications. This paper proposes and validates a method that overcomes this limitation by employing stereo cameras and converting 2D pixel coordinates to three-dimensional (3D) metric coordinates. The proposed method consists of four steps: camera calibration, camera pose estimation, 2D tracking, and triangulation. Given that the method employs fixed, calibrated stereo cameras with a long baseline, appropriate algorithms are selected for each step. Once the first two steps reveal camera system parameters, the third step determines 2D pixel coordinates of entities in subsequent frames. The 2D coordinates are triangulated on the basis of the camera system parameters to obtain 3D coordinates. The methodology presented in this paper has been implemented and tested with data collected from a construction site. The results demonstrate the suitability of this method for on-site tracking purposes. DOI: 10.1061/(ASCE)CP.1943-5487.0000168. (C) 2012 American Society of Civil Engineers.

Epipolar geometry and centroid relocation

…

. Errors of Tracking Steel Plate

…

. Errors of Tracking Van

…

Layout of tests from bird ’ s eye view

…

. Errors of Tracking Worker

…

Figures - uploaded by Christian Koch

Content may be subject to copyright.

Content uploaded by Christian Koch

Content may be subject to copyright.

Three-Dimensional Tracking of Construction Resources

Using an On-Site Camera System

Man-Woo Park, A.M.ASCE1; Christian Koch2; and Ioannis Brilakis, M.ASCE3

Abstract: Vision trackers have been proposed as a promising alternative for tracking at large-scale, congested construction sites. They

provide the location of a large number of entities in a camera view across frames. However, vision trackers provide only two-dimensional

(2D) pixel coordinates, which are not adequate for construction applications. This paper proposes and validates a method that overcomes this

limitation by employing stereo cameras and converting 2D pixel coordinates to three-dimensional (3D) metric coordinates. The proposed

method consists of four steps: camera calibration, camera pose estimation, 2D tracking, and triangulation. Given that the method employs

fixed, calibrated stereo cameras with a long baseline, appropriate algorithms are selected for each step. Once the first two steps reveal camera

system parameters, the third step determines 2D pixel coordinates of entities in subsequent frames. The 2D coordinates are triangulated on

the basis of the camera system parameters to obtain 3D coordinates. The methodology presented in this paper has been implemented

and tested with data collected from a construction site. The results demonstrate the suitability of this method for on-site tracking purposes.

CE Database subject headings: Automation; Imaging techniques; Models; Information technology (IT); Cameras; Remote sensing;

Construction management.

Author keywords: Automation; Imaging techniques; Computer-aided vision system; Models; Information technology; Remote sensing.

Introduction

Three-dimensional (3D) object tracking on construction sites has a

wide variety of applications. It allows identification and tracking of

personnel, equipment, and materials to support effective progress

monitoring, activity sequence analysis, productivity measurements,

and asset management and to enhance site safety. In addition,

tracking instantly enables the identification of critical activities

and problems, which allows for on-site project control and

decision-making capabilities. Available tracking solutions are pri-

marily on the basis of radio frequency technologies, including

global positioning system (GPS), radio frequency identification

(RFID), and ultra-wideband (UWB) technologies. They all work

under the same principle of having a sensor attached on each entity

to be tracked. These technologies have been applied and proven to

work excellently for most scenarios involved in construction man-

agement, such as proactive work zone safety and material registra-

tion and installation (Teizer et al. 2007b;Ergen et al. 2007;Song

et al. 2006). However, when it comes to large-scale and congested

construction sites, the installation of the sensor system can be costly

and time-consuming because of the large amount of items involved.

Also, privacy issues can arise out of tagging workers. For these

specific scenarios, vision-based tracking may have the potential

for use as an efficient alternative.

Vision-based methods have been introduced for tracking entities

on construction sites. Vision-based tracking works by receiving

video streams and estimating an entity’s motion in subsequent

video frames on the basis of the history of their appearance and

location. Its capability of tracking multiple entities without the

installation of sensors on the entities has a great potential in con-

struction applications. It provides two-dimensional (2D) pixel

coordinates, xand y, of the entities across time. The 2D results

may be useful when predefined measurements on an entity’s tra-

jectory are available, as in Gong and Caldas’research work

(2010). However, the 2D results are generally not enough to extract

substantial information for most construction management tasks

because it is unknown how far entities are located from the camera.

Because of the lack of depth information (z), even approximate dis-

tance measurements between two entities, e.g., workers and mobile

equipment, are not reliable but necessary for safety management.

Also, any movement along the zaxis is not measurable. Brilakis

et al. (2011) proposed a framework for 3D vision-based tracking

that can provide 3D coordinates of entities by deploying stereo

cameras. The framework consists of several processes, including

construction entity detection, 2D tracking, and correlation of 2D

tracking results and calculating 3D location. Thus far only 2D

tracking of construction resources has been validated successfully

(Park et al. 2011) from this framework. Because of the large

amount of contents involved in the framework, each single process

has not been fully detailed and validated.

This paper presents and validates the framework’s method for

correlating 2D tracking results paired with multiple views and cal-

culating 3D location of construction entities. This method employs

stereo vision to provide 3D trajectories of moving entities. In

the current state of research, stereo vision has been applied to

1Ph.D. Candidate, School of Civil and Environmental Engineering,

Georgia Institute of Technology, 130 Hinman Research Building, 723

Cherry St., Atlanta, GA 30332 (corresponding author). E-mail: mw.park@

gatech.edu

2Postdoctoral Associate, Computing in Engineering, Faculty of

Civil and Environmental Engineering, Ruhr-Universität Bochum, Uni-

versitätsstraße 150, 44780 Bochum, Germany. E-mail: koch@inf.bi.rub.de

3Assistant Professor, School of Civil and Environmental Engineering,

Georgia Institute of Technology, 328 Sustainable Education Building, 788

Atlantic Dr. NW, Atlanta, GA 30332. E-mail: brilakis@gatech.edu

Note. This manuscript was submitted on June 10, 2011; approved on

September 16, 2011; published online on September 19, 2011. Discussion

period open until December 1, 2012; separate discussions must be sub-

mitted for individual papers. This paper is part of the Journal of Comput-

ing in Civil Engineering, Vol. 26, No. 4, July 1, 2012. ©ASCE, ISSN

0887-3801/2012/4-541–549/$25.00.

JOURNAL OF COMPUTING IN CIVIL ENGINEERING © ASCE / JULY/AUGUST 2012 / 541

J. Comput. Civ. Eng. 2012.26:541-549.

Downloaded from ascelibrary.org by Christian Koch on 07/10/12. For personal use only.

3D modeling for construction progress monitoring (Chae and Kano

2007;Son and Kim 2010;Golparvar-Fard et al. 2010), focusing on

the retrieval and visualization of static structure components,

whereas the focus of this paper lies in 3D localization of moving

entities and the accuracy of localization measurements. The camera

system consists of two fixed cameras having several meters of base-

line, which is significantly longer than Bumblebee’s 24-cm base-

line (Point Gray 2011). The long baseline allows competitive

accuracy in localizing far-located entities. Under the proposed

method, in the first step, cameras are calibrated to find their intrin-

sic parameters, i.e., the focal length, the principal point, radial dis-

tortions, and tangential distortions of a camera. The second step is

to estimate a relative pose (rotation and translation) of the calibrated

cameras, which is called extrinsic parameters. Once the intrinsic

and extrinsic parameters are known, a 2D tracker is applied to every

video frame of each camera in the third step. Using a kernel-based

2D tracking algorithm, the 2D pixel coordinates of an entity’s cent-

roid are determined. To obtain 3D locations, in the fourth step, 2D

tracking results are triangulated on the basis of the intrinsic and

extrinsic parameters. In every frame, a projection of the determined

centroid is obtained for each camera. Finally, an intersection of the

two projections from two cameras determines the 3D location of a

tracked entity.

The proposed method is tested on the videos recorded of a con-

struction site. The tests involve three types of entities: a steel plate,

a worker, and a van. Various point matching methods and different

baseline lengths are applied to identify their effects on accuracy.

The results are at maximum 0.658-m error with 95% confidence,

which validates the effectiveness, accuracy, and applicability of the

proposed vision-based 3D tracking approach.

Background

State of Practice in Tracking Technology

Common tracking methods are either on the basis of radio

frequency, which includes several types of technologies like

GPS RFID, Bluetooth, and wireless fidelity, e.g., Wi-Fi and

ultra-wideband, or they make use of optical remote sensors, such

as 2D image/video cameras and 3D range cameras, e.g., Flash

LADAR.

Global positioning system is an outdoor satellite-based world-

wide navigation system formed by a constellation of satellites and

ground control stations. The 3D position is determined by a GPS

receiver using triangulation on the basis of these satellites. Global

positioning system is an established location technology that offers

a wide range of off-the-shelf solutions in both hardware and soft-

ware. According to Caldas et al. (2004), GPS applications have

been applied to construction practices, such as positioning of equip-

ment and surveying. However, when using only GPS, there is lim-

ited potential in other applications, such as improving the

management of materials on construction job sites. Moreover, it

can only operate outdoors and the accuracy is only approximately

10 m.

Radio frequency identification is used for identifying and

tracking various objects (Ergen et al. 2007). Radio frequency iden-

tification systems are primarily composed of a tag and a reader.

Radio frequency identification technology does not require line

of sight and it is also durable in harsh environments and can be

embedded in concrete. Radio frequency identification enables

efficient automatic data collection because readers can be mounted

on any structure in the reading range and each reader can scan

multiple tags at a given time. However, this technology, unless

combined with other tools (Ergen et al. 2007), can only report

the radius inside which the tracked entity exists, and most impor-

tantly, the near-sighted effect prohibits its use in tracking applica-

tions. Combinations of GPS and RFID technologies have been

recently explored (Song et al. 2004,2006). The advantage of this

combination is that GPS sensors need to only accompany the tag

readers and not the materials. Every time a tag is located, the 3D

coordinates as reported by the GPS can be recorded as the location

of each piece of material at that given time.

Another type of radio technology that can be applied to short-

range communications is UWB. Ultra-wideband is able to detect

time of flight of the radio transmissions at various frequencies,

which enables it to perform effectively in providing precision

localization even in the presence of severe multipath effects

(Fontana et al. 2003). Another advantage is the low average power

requirement that results from the low pulse rate (Fontana 2004).

Teizer et al. (2007b) applied the UWB technology to construction.

It was used for a material location tracking system with primary

applications to active work zone safety. Its ability to provide accu-

rate 3D locations in real time is a definite benefit to tracking in

construction sites.

Vision technologies and laser technologies are attracting

increasing interests for tracking in large-scale, congested sites

because they are free of tags. A 3D range imaging/video camera,

e.g., a Flash LADAR, provides not only the intensity but also the

estimated range of the corresponding image area. When compared

with 3D laser scanners, which have been used in construction, the

device is portable and inexpensive. Testing various kinds of data

filtering, transformation, and clustering algorithms, Gong and

Caldas (2008) used 3D range cameras for spatial modeling. Teizer

et al. (2007a) demonstrated tracking with 3D range cameras and the

potential of its use for site safety enhancement. However, the low

resolution and short range make it difficult to be applied to large-

scale construction sites. Few tests have been executed in outdoor

construction sites in which the environments are more cluttered and

less controlled. Also, it is reported that the reflectance of a surface

varies extremely even in indoor environments (Gächter et al. 2006).

Moreover, when multiple cameras are used, they can interfere with

one other (Fuchs 2010).

Traditional 2D vision trackers are simply on the basis of a

sequence of images and can be a proper alternative to RFID meth-

ods because they remove the need for installing sensors and identity

(ID) tags of any kind on the tracked entity. For this reason, this

technology is (1) highly applicable in dynamic, busy construction

sites in which large numbers of equipment, personnel, and materi-

als are involved; and (2) more desirable from personnel who wish to

avoid being “tagged”with sensors. In Gruen’s research (1997), it is

highly regarded for its capability to measure a large number of par-

ticles with a high level of accuracy. Yang et al. (2010) proposed a

vision tracker that can track multiple construction workers. Gong

and Caldas (2010) showed the applicability of vision tracking to

automated productivity analysis.

Two-dimensional vision trackers can be categorized in kernel-

based, contour-based, and point-based methods, depending on the

way of representing objects. In kernel-based methods, an object is

represented by the color or texture in the region of interest, and its

position in the next frame is estimated on the basis of the region’s

color or texture information. In contour-based methods, an object is

represented by silhouettes or contours that determine the boundary

of the object. In point-based methods, an object is represented by a

set of feature points extracted from the region that contains the

object. Out of the three categories, kernel-based methods are the

most suitable for construction-related applications with respect

to the construction sites’characteristics, such as illumination

542 / JOURNAL OF COMPUTING IN CIVIL ENGINEERING © ASCE / JULY/AUGUST 2012

J. Comput. Civ. Eng. 2012.26:541-549.

Downloaded from ascelibrary.org by Christian Koch on 07/10/12. For personal use only.

condition, occlusion, and object types. Park et al. (2011) reported

that kernel-based methods could effectively track construction en-

tities in various illumination conditions and that they performed

well even on objects occluded by 50% or more. The entities that

failed to be tracked because of severe occlusions can still be recov-

ered by reinitialization within an object detection process. The 2D

tracker used in this paper is on the basis of the method by Ross et al.

(2008). It tracks an object on the basis of the model template com-

posed of eigenimages, and represents the tracked object as an

affine-transformed rectangle that encloses it. Six affine parameters,

xand ycoordinates of the centroid, scale, aspect ratio, rotation, and

skew, are estimated through particle filtering.

Stereo View Geometry

Two-dimensional vision-based tracking is not comparable with

other 3D technologies previously described unless it can provide

3D information. To reconstruct the 3D position of an entity, several

steps must be taken to determine the stereo view geometry (Hartley

and Zisserman 2004). Heikkilä and Silvén (1997), Zhang (1999),

and Bouguet (2004) presented and provided standard calibration

tools. The calibration tools reveal intrinsic camera parameters,

including the focal length, the principal point, radial distortions,

and tangential distortions. They use calibration objects that have

specific patterns, such as a checkerboard. In Zhang’s calibration

method, tangential distortion is not modeled. Heikkilä and Silvén’s

toolbox and Bouguet’s toolbox use the same distortion model that

takes into account both radial and tangential distortions. Therefore,

both toolboxes generally result in almost equivalent calibration.

Bouguet provides additional functions, such as error analysis,

which is useful to recalibrate with revised inputs.

After having calibrated each camera separately, the external

camera system has to be determined (see Fig. 1). For this purpose,

feature points are identified and matched within the two camera

views. The most well-known and robust algorithms commonly

used for this task are the scale-invariant feature transform (SIFT)

(Lowe 2004) and speeded up robust features (SURF) (Bay et al.

2008). Whereas SIFT uses Laplacian of Gaussian (LOG), differ-

ence of Gaussian (DOG), and histograms of local oriented gra-

dients, SURF relies on a Hessian matrix and the distribution of

Haar-wavelet responses for feature point detection and matching,

respectively. Although SIFT turned out to be slightly better in terms

of accuracy, SURF is computationally much more efficient (Bauer

et al. 2007). The algorithms SIFT and SURF provide point matches,

including extreme outliers (mismatches) that have to be removed.

To achieve that, robust algorithms for managing the outliers were

introduced. Random sample consensus (RANSAC) (Hartley and

Zisserman 2004) and maximum a posteriori sample consensus

(MAPSAC) (Torr 2002) are the representative robust methods.

The RANSAC method minimizes the number of outliers by ran-

domly selecting a small subset of the point matches and repeating

the maximization process for different subsets until it reaches a de-

sired confidence in the exclusion of outliers. One of its problems is

the poor estimates associated with a high threshold (Torr 2002).

Working in a similar way to RANSAC, MAPSAC resolved this

problem by minimizing not only the number of outliers but also

the error associated with the inliers.

The next step is the estimation of the essential matrix, E, on the

basis of the identified point matches. In general, the normalized

eight point (Hartley 1997), seven point (Hartley and Zisserman

2004), six point (Pizarro et al. 2003), and five point (Nistér

2004) algorithms are used for this purpose. Eight, seven, six,

and five is the minimal number of points required to perform

the estimation. Rashidi et al. (2011) compared the resulting accu-

racy of these algorithms in practical civil infrastructure environ-

ments, finding the five-point algorithm to be the best. However,

because of its simplicity and reasonable accuracy the normalized

eight-point algorithm is still the most common one and the second

best according to Brückner et al. (2008). On the basis of the essen-

tial matrix, E, the relative pose of two cameras (Rand Tin Fig. 1),

can be derived directly (Hartley and Zissermann 2004).

In the last step, triangulation is performed. On the basis of two

corresponding pixels in the respective view, two lines of sight have

to be intersected to find the 3D position (Fig. 1). However, because

of image noise and slightly incorrect point correspondences, the

two rays may not intersect in space. To address this problem,

Hartley-Sturm optimal triangulation (Hartley and Sturm 1997)

and optimal correction (Kanatani et al. 2008) algorithms are cur-

rently used as standard methods for finding corrected correspond-

ences. They both try to find the minimum displacement through the

geometric error minimization, correct the pixel coordinates accord-

ingly, and intersect the corrected rays to determine 3D coordinates.

Although the latter has a faster process, the former’s results are

more accurate (Fathi and Brilakis 2011).

Several researchers have introduced and applied stereo vision

technologies to construction. Most applications presented so far

are related to 3D modeling of structures for progress monitoring.

Chae and Kano (2007) estimated spatial data for development of a

project control system from stereo images. In another work, Son

and Kim (2010) used a stereo vision system to acquire 3D data

and to recognize 3D structural components. Golparvar-Fard et al.

(2010) presented a sparse 3D representation of a site scene using

daily progress photographs for use as an as-built model. On the

contrary to creating 3D geometry models on the basis of static

feature points, the application of stereo vision in this paper locates

moving entities in 3D across time. Furthermore, this paper

measures the accuracy of 3D positioning by comparing with total

station data.

Problem Statement and Objectives

As described in the previous section, the results of general vision-

based tracking are restricted to 2D. The applications of these

results are limited at large-scale, congested construction sites.

Brilakis et al. (2011) introduced a framework for 3D vision

tracking, which employs multiple fixed cameras to calculate the

3D location of an entity. From this framework, this paper aims

to present and validate the method of combining 2D tracking results

Fig. 1. Epipolar geometry and centroid relocation

JOURNAL OF COMPUTING IN CIVIL ENGINEERING © ASCE / JULY/AUGUST 2012 / 543

J. Comput. Civ. Eng. 2012.26:541-549.

Downloaded from ascelibrary.org by Christian Koch on 07/10/12. For personal use only.

with stereo vision geometry for the sake of accurate 3D trajectories

of far-located construction entities. This research is aiming strictly

for accurate localization of construction entities and not for real-

time processing. Each single step involved in this method should

be optimized to characteristics of the fixed camera system and con-

struction sites, such as various types of construction entities, the

long baseline, and the long distance from cameras to an entity that

is inevitable at large-scale construction sites.

Methodology

The proposed method is shown in Fig. 2and is composed of four

steps of camera calibration, camera pose estimation, 2D tracking,

and triangulation. To calculate 3D positions of an object, the regis-

tration of the camera system is required. The camera system in this

method is composed of two cameras located several meters apart

from one other. This system is described by epipolar geometry, as

shown in Fig. 1. This geometry consists of two types of parameters:

intrinsic and extrinsic parameters. Intrinsic parameters determine

the linear system of projecting 3D points on the image plane

(P1and P2in Fig. 1). Bouguet’s calibration toolbox (2004) is used

to reveal the intrinsic parameters because of its accuracy, robust

convergence, and convenience.

The focal point of the left camera becomes an origin of the co-

ordinate in this system. Extrinsic parameters represent the relative

pose of the right camera to the left one (the rotation matrix Rand

the translation vector Tin Fig. 1). The estimation of Rand Tin-

volves point matching between two views. Two combinations of

algorithms are considered in this paper. One is using SURF

(Bay 2008) and RANSAC (Hartley and Zisserman 2004) for the

feature descriptor and outlier removal, respectively. This combina-

tion proved to be fast and accurate enough for point cloud gener-

ation of infrastructure (Fathi and Brilakis 2011). The other is using

SIFT (Lowe 2004) and MAPSAC (Torr 2002), which is slower but

capable of acquiring more matches than the former combination.

Even though the use of SIFT is slower than SURF, it is worth

to consider this combination in the application because of the fol-

lowing reasons. First, cameras are fixed in the application, which

requires the camera pose estimation only once at the initial stage of

the framework. Therefore, the longer processing time of using SIFT

can be ignored. Second, as a longer baseline line, i.e., distance

between two cameras, is used, fewer point matches are obtained

because of higher disparity between two camera views. In this case,

SIFT and MAPSAC can be helpful to feed more inlier matches and

less outlier matches to the next step.

The normalized eight point algorithm (Hartley 1997) is selected

to estimate the essential matrix on the basis of intrinsic parameters

and point matches. The selected method is the most widely used

because of its simple implementation and reasonably accurate

results. Although this method is less computationally efficient

and more sensitive to degeneracy problems compared with other

methods (Nistér 2004;Li and Hartley 2006), it is still efficient

and accurate enough to satisfy needs with regard to fixed camera

positions, a long baseline, and the complexity of the construction

sites. Finally, extrinsic parameters, Rand T, are recovered directly

from the essential matrix (Hartley and Zisserman 2004). These

parameters together with the intrinsic parameters are used for

triangulating 2D tracking results.

For each calibrated camera view, an identified construction

entity is tracked across subsequent frames. According to the com-

parative study of Park et al. (2011), a kernel-based 2D tracker,

which is based on the method by Ross et al. (2008), is used. In

this paper, the eigenimage is constructed selectively with gray scale

values or saturation values depending on the tracked entity’s color

characteristics to enhance the accuracy. Also, in the particle filter-

ing process, the position translation, delta-xand delta-ybetween

consecutive frames, is considered instead of the entity location,

xand ycoordinates. This estimation strategy is beneficial to cor-

rectly locate the entity with fewer samples in particle filtering.

The centroid coordinates are updated every frame by accumulating

the estimated translation vector.

The results obtained in two previous sections, epipolar geometry

and two centroids, are fed into the triangulation step. Generally, the

projections of two centroid coordinates determined from two views

do not intersect one other because of camera lens distortions and

errors caused by 2D tracking. Even if the 2D tracker correctly

locates the entity on each frame, the disparity between two camera

views causes mismatch of the centroids. To enhance the accuracy of

the triangulation process, the two centroids had to be relocated so

that their projections intersect (see Fig. 1). For this purpose, Hartley

and Sturm’s algorithm (Hartley and Sturm 1997) is selected be-

cause the accuracy is more critical than the processing time in

the application. Intersecting projections of the modified pair of

centroids for each frame leads to the 3D coordinate of the tracked

entity.

Experiments and Results

The data for validation are collected from a construction site at the

Georgia Institute of Technology. This site is the construction of an

indoor football practice facility managed by Barton Malow Com-

pany. The roof and columns of the steel-framed facility were

already completed when the data were collected. The videos were

taken with two high-definition (HD) camcorders (Canon VISXIA

HF S100, 30 frames per second, 1;920 × 1;080 pixels) located ap-

proximately 4.5 m above the ground on one side of the facility

structure where the ground area of the facility structure could be

overlooked. One total station (Sokkia SET 230RK3) was used

to acquire ground truths of the entities’trajectories, which are com-

pared with obtained results.

Figs. 3and 4show the positions of the cameras and entities’

trajectories from a bird’s eye view on the basis of the total station

coordinate system and cameras’views. In Figs. 3and 4, trajectories

1 and 2 are composed of 10 and eight segments of straight lines,

Camera 1

Camera

Calibration

Essential

Matrix

2D Tracking

Centroids

of entities

Camera Pose

Estimation

Intrinsic

Parameters

Camera 2

Camera

Calibration

2D Tracking

Centroids

of entities

Intrinsic

Parameters

Triangulation

3D Coordinates

Fig. 2. Methodology overview

544 / JOURNAL OF COMPUTING IN CIVIL ENGINEERING © ASCE / JULY/AUGUST 2012

J. Comput. Civ. Eng. 2012.26:541-549.

Downloaded from ascelibrary.org by Christian Koch on 07/10/12. For personal use only.

approximately located at a 39 and 43 m distant from the left camera,

respectively. Trajectory 3 is one straight line located 36 m distant

from the left camera. The total station data include the end points of

all segments, i.e., nine, 11, and two points for trajectory 1, 2, and 3,

respectively. The ground-truth trajectories are made by connecting

those points with straight lines. The proposed methodology is

tested on three types of entities: a worker, a steel plate carried

by a worker, and a van. Trajectories 1 and 2 are those of a worker

and a steel plate, and trajectory 3 is of a van. The accuracy of

tracking is quantified by an absolute error that is defined as the

distance between the tracked point and the ground-truth trajectory.

For each frame j, the distance Djis calculated by the following

equation:

Dj¼jðQiþ1QiÞ×ðPjQiÞj∕jðQiþ1QiÞj

where Qiand Qiþ1= endpoints of the ith line segment Li¼Qiþ

tðQiþ1QiÞof ground-truth trajectories on which the object in

frame jlies; and Pj=jth frame’s tracking results, i.e., 3D points.

The main causes of error considered in this paper can be classified

into the 2D tracker error and the error of camera pose estimation.

Also, the assumption that an entity moves exactly along the straight

line is another miscellaneous cause of error.

Camera Calibration and Camera Pose Estimation

For the purpose of camera calibration, a video of a moving checker-

board (7 by 9 blocks of 65 × 65 mm squares) is recorded by each

camera. A total of 26 frames are selected appropriately to have vari-

ous angles of view and are fed into Bouguet’s calibration toolbox

(Bouguet 2004). Once the checkerboard videos are taken and the

cameras are calibrated, all camera system settings remained the

same through the experiments. All functions that may automati-

cally cause a change in the camera intrinsic parameters, such as

autofocus and automated image stabilization, are disabled. Out

of all the video frames, a pair of corresponding frames of left

and right cameras is used to obtain a large number of point matches.

The point matches and calculated intrinsic parameters are used to

estimate camera poses. Because the positions of the cameras are

fixed in the proposed method, all these procedures are required only

once as a preprocess.

Tracking of Steel Plate

A 0.6-m by 0.3-m steel plate is chosen as the first entity to track.

The plate is carried by a worker walking along trajectory 1 and 2.

The video contains 1,430 frames in total, with 790 and 640 frames

for trajectory 1 and 2, respectively, which indicates the results have

1,430 tracked 3D coordinates. In this experiment, right camera 1

(Fig. 3) is set to have a 3.8-m baseline. The template model for

the 2D tracker is composed of gray pixel values. The tracker ac-

curately fits the steel plate with an affine-transformed rectangle

in most frames. Therefore, it can be inferred that the errors in this

experiment mostly come from triangulation, including camera pose

estimation. Fig. 5shows 3D tracking results of using different

-15 -10 -5 0 5 10 15

X (m)

Z (m)

Trajectory 1

Trajectory 2

Trajectory 3

Total station

Left camera

Right camera 1

Right camera 2

8.3m

3.8m

Fig. 3. Layout of tests from bird’s eye view

Fig. 4. Entities’trajectories: (a) trajectories 1 and 2 from view of right

camera 1; (b) trajectory 3 from view of right camera 2

-14 -10 -6 -2 26

X (m)

Z (m)

Y (m)

Ground-Truth

SIFT+MAPSAC (DR=0.6)

SURF+RANSAC (DR=0.8)

SURF+RANSAC (DR=0.6)

Fig. 5. Tracking results of steel plate

JOURNAL OF COMPUTING IN CIVIL ENGINEERING © ASCE / JULY/AUGUST 2012 / 545

J. Comput. Civ. Eng. 2012.26:541-549.

Downloaded from ascelibrary.org by Christian Koch on 07/10/12. For personal use only.

camera pose estimation methods, and Table 1summarizes the error

results.

The SURF algorithm is tested with two threshold values of dis-

tance ratio (DR): 0.8 and 0.6. Distance ratio is the distance of the

closest neighbor to that of the second closest neighbor (Lowe

2004). Discarding feature points that have distance ratios higher

than the threshold is an effective way of reducing false-positive

matches. In the case of DR ¼0:8, more point matches are obtained

than DR ¼0:6, but they contain apparent outliers (Fig. 6) that have

adverse effects on essential matrix estimation. The effect of outliers

is reflected on the large error of tracking. Even though SURF with a

DR of 0.6 generates fewer point matches than others, it reduces

outliers significantly and performs even better than SIFT

(DR ¼0:6) and MAPSAC, which provide approximately twice

as many point matches. Assuming the error follows a normal

Table 1. Errors of Tracking Steel Plate

Method DR

Number of

point matches

Error (m)

Total Trajectory 1 Trajectory 2

Max Mean STD Max Mean STD Max Mean STD

SIFT plus MAPSAC 0.6 568 0.836 0.252 0.179 0.836 0.314 0.192 0.569 0.177 0.125

SURF plus RANSAC 0.8 423 3.965 1.220 0.911 3.965 1.537 0.983 2.532 0.828 0.620

0.6 271 0.631 0.180 0.127 0.631 0.222 0.136 0.429 0.127 0.091

Note: DR ¼distance ratio; STD ¼standard deviation.

Fig. 6. Point matches obtained by SURF plus RANSAC; DR ¼0:8

Table 2. Errors of Tracking Van

Method DR

Number of

point matches

Error: Trajectory 3 (m)

Max Mean STD

SIFT plus

MAPSAC

0.6 230 0.865 0.278 0.194

SURF plus

RANSAC

0.8 235 1.239 0.426 0.327

0.6 183 0.931 0.289 0.235

Note: STD ¼standard deviation.

Fig. 8. 2D tracking results in right camera view

-16 -12 -8 -4 048

X (m)

Z (m)

Y (m)

Ground-Truth

SIFT+MAPSAC (DR=0.6)

SURF+RANSAC (DR=0.8)

SURF+RANSAC (DR=0.6)

Fig. 7. Tracking results of van

-14 -10 -6 -2 26

X (m)

Z (m)

Y (m)

Ground-Truth

SIFT+MAPSAC (DR=0.6)

SURF+RANSAC (DR=0.6)

Fig. 9. Tracking results of worker with short baseline

546 / JOURNAL OF COMPUTING IN CIVIL ENGINEERING © ASCE / JULY/AUGUST 2012

J. Comput. Civ. Eng. 2012.26:541-549.

Downloaded from ascelibrary.org by Christian Koch on 07/10/12. For personal use only.

distribution, it is concluded that the tracking error is less than

0.429 m with 95% confidence.

Tracking of Van

The second experiment deals with the tracking of a van that is 2-m

wide, 1.95-m high, and 5.13-m long and moving forward and back-

ward along trajectory 3. The video contains a total of 1,034 frames.

A long baseline (8.3 m) is tested in this experiment placing a cam-

era at “right camera 2”in Fig. 3. Gray pixel values are used for

templates of the 2D tracker. Fig. 7displays obtained trajectories

with ground truth. Similar to the first experiment, it is observed

that outliers finally result in inaccurate depth estimation (SURF

plus RANSAC with DR ¼0:8). There is a difference between

the results for forward and backward moving even though they

were on the same trajectory. This disparity is caused exclusively

by the 2D tracking results. Fig. 8shows 2D tracking results in

the right camera view in which the slight difference between for-

ward and backward trajectories is observable.

The error results are presented in Table 2. The long baseline

allows a smaller number of point matches than with the short base-

line because of the greater difference between the left and right

camera views. The number decreases to less than a half compared

with the first experiment, tracking of a steel plate. The algorithms

SIFT plus MAPSAC, which generated 26% more matches than The

algorithms SURF plus RANSAC, performed better in this case. As-

suming the error follows a normal distribution, it is concluded that

the tracking error is less than 0.658 m with 95% confidence.

Tracking of Worker

The third experiment is performed on a worker moving along

trajectories 1 and 2. Two lengths of baseline, 3.8 and 8.3 m, are

tested. The videos with a short and a long baseline contain

1435 and 1368 frames, respectively. The region of a worker’s upper

body, which can be well characterized by fluorescent colors of a

hard hat and a safety vest, is tracked. Instead of gray pixel values,

saturation values are used for composing the template model.

Figs. 9and 10 present the trajectory results in which it is noticeable

that the longer baseline allows more stable and accurate trajec-

tories. The longer baseline forms a larger angle between two pro-

jections, P1and P2, in Fig. 1, which results in lower error rate. In

Table 3, errors of a long baseline are approximately half of a short

Table 3. Errors of Tracking Worker

Method

Baseline

length (m)

Number of

point matches

Error (m)

Total Trajectory 1 Trajectory 2

Max Mean STD Max Mean STD Max Mean STD

SIFT plus MAPSAC 3.8 584 1.959 0.523 0.357 1.959 0.605 0.374 1.490 0.426 0.309

8.3 215 1.053 0.258 0.193 1.053 0.317 0.211 0.555 0.187 0.140

SURF plus RANSAC 3.8 503 2.549 0.714 0.481 2.549 0.841 0.503 1.791 0.562 0.404

8.3 166 1.510 0.381 0.321 1.510 0.455 0.374 0.731 0.292 0.212

Note: STD ¼standard deviation; DR ¼distance ratio ¼0:6.

Fig. 12. 2D tracking results of 693rd frame: (a) left camera; (b) right

camera

Fig. 11. Appearance variations: (a) steel plate; (b) worker

-14 -10 -6 -2 26

X (m)

Z (m)

Y (m)

Ground-Truth

SIFT+MAPSAC (DR=0.6)

SURF+RANSAC (DR=0.6)

Fig. 10. Tracking results of worker with long baseline

J. Comput. Civ. Eng. 2012.26:541-549.

Downloaded from ascelibrary.org by Christian Koch on 07/10/12. For personal use only.

baseline, and SIFT plus MAPSAC produces lower errors than

SURF plus MAPSAC.

Whenever a worker changes his direction, the 2D tracker suffers

severe variations of a worker’s appearance. When compared with

Fig. 11(a), Fig. 11(b) shows more substantial changes in the dis-

tribution of pixel values inside a rectangle. This is why the errors

with a short baseline are higher than the errors of tracking a steel

plate. The error caused by the 2D tracker can be divided into two

cases. The first case is when the determined centroid in each view

does not exactly match the real centroid, i.e., the total station target

point. The second case is when the two centroids from the left and

right cameras do not correspond to one other (Fig. 12). These kinds

of errors are partly compensated by a decrease in triangulation er-

ror, which is achieved by using a long baseline. Assuming the

error follows a normal distribution, it is concluded that the tracking

error is less than 0.636 m with 95% confidence.

Conclusion

In this paper, details of correlating multiple 2D tracking results

were presented. Under this method, camera calibration revealed

intrinsic parameters of cameras by processing video frames of a

checkerboard. The extrinsic parameters of two cameras were esti-

mated using point matches between two corresponding views. A

2D tracker provided 2D pixel coordinates of an entity’s centroid

in each calibrated camera view. Epipolar geometry constructed with

the intrinsic and extrinsic parameters triangulated the centroids

from multiple views and retrieved 3D location information.

The proposed method was tested on the videos recorded on a

real construction site. The tests involved three types of entities:

a steel plate, a worker, and a van. A kernel-based 2D tracker

was employed, and different methods of point match extraction

were experimented to reveal the effect of errors caused by correlat-

ing multiple views. The algorithms SIFT plus MAPSAC provided a

larger number of point matches, which generally resulted in a good

estimation of extrinsic parameters, especially for long baselines.

For tracking of a steel plate and a van, the maximum errors deter-

mined with 95% confidence were smaller than the entity’s width.

Various appearances of a worker from the front, side, and rear

views brought about larger errors of 2D tracking than tracking

of a steel plate. However, it results in at most a 0.658-m error with

95% confidence using a long baseline. The results validated that the

vision-based 3D tracking approach can effectively provide accurate

localization of construction site entities, with a distance ranging

approximately 40-50 m.

The sole objective of this research is to achieve a competitive

accuracy in 3D positioning, whereas real-time processing is not an

immediate target. Working with high definition videos is not a real-

time process at the prototype level, which is expected by the current

research work. Incidentally, there are several types of applications

that do not require real-time processing and can be postprocessed,

e.g., productivity measurement, progress monitoring, and activity

sequence analysis. Also, it is expected that real-time commercial

development is attainable through code optimization and parallel

computing. For example, the access to pixel data of a high defini-

tion image, which takes a significant amount of processing time,

can be reduced by discarding static pixel areas. The next step as

a future work is to investigate how visual pattern recognition meth-

ods can be used to automatically recognize and match entities,

which would remove the need for manual entity selection and help

to recover a failure of tracking. Furthermore, it is worth to do re-

search on the camera network composed of multiple stereo camera

systems. Various angles of views and networks among them can

reduce failures caused by occlusions.

Acknowledgments

This material is on the basis of work supported by the National

Science Foundation under Grants No. 0933931 and 0904109.

Any opinions, findings, conclusions, or recommendations ex-

pressed in this material are those of the authors and do not neces-

sarily reflect the views of the National Science Foundation. The

authors would also like to thank Keitaro Kamiya, Masoud Gheisari,

and the Barton Malow Company for their help in collecting data for

the experiments.

References

Bauer, J., Sünderhauf, N., and Protzel, P. (2007). “Comparing several im-

plementations of two recently published feature detectors.”Proc., Int.

Conf. on Intelligent and Autonomous Systems, Institute of Electrical and

Electronics Engineers (IEEE), New York.

Bay, H., Tuytelaars, T., and Gool, L. V. (2008). “SURF: Speeded up

robust features.”Comput. Vis. Image Understanding, 110(3),

346–359.

Bouguet, J. Y. (2004). “Camera calibration toolbox for Matlab.”〈http://

www.vision.caltech.edu/bouguetj/calib_doc〉(Apr. 18, 2011).

Brilakis, I., Park, M.-W., and Jog, G. (2011). “Automated vision tracking of

project related entities.”Adv. Eng. Inf., 25(4), 713–724.

Brückner, M., Bajramovic, F., and Denzler, J. (2008). “Experimental evalu-

ation of relative pose estimation algorithms.”Proc., 3rd Int. Conf. on

Computer Vision Theory and Applications, Vol. 2, Institute for Systems

and Technologies of Information, Control and Communication (IN-

STICC), Setubal, Portugal, 431–438.

Caldas, C. H., Torrent, D. G., and Haas, C. T. (2004). “Integration of au-

tomated data collection technologies for real-time field materials man-

agement.”Proc., 21st Int. Symp. on Automation and Robotics in

Construction, International Association for Automation and Robotics

in Construction.

Chae, S., and Kano, N. (2007). “Application of location information by

stereo camera images to project progress monitoring.”Proc., 24th

Int. Symp. on Automation and Robotics in Construction, International

Association for Automation and Robotics in Construction, Eindhoven,

Netherlands, 89–92.

Ergen, E., Akinci, B., and Sacks, R. (2007). “Tracking and locating com-

ponents in a precast storage yard utilizing radio frequency identification

technology and GPS.”Autom. Constr., 16(3), 354–367

Fathi, H., and Brilakis, I. (2011). “Automated sparse 3D point cloud gen-

eration of infrastructure using its distinctive visual features.”Adv. Eng.

Inf., 25(4), 760–770.

Fontana, R. J. (2004). “Recent system applications of short-pulse ultra-

wideband (UWB) technology.”IEEE Trans. Microwave Theory Tech.,

52(9), 2087–2104.

Fontana, R. J., Richley, E., and Barney, J. (2003). “Commercialization of an

ultra wideband precision asset location system.”Proc., IEEE Conf. on

Ultra Wideband Systems and Technologies, Institute of Electrical and

Electronics Engineers (IEEE), New York, 369–373.

Fuchs, S. (2010). “Multipaths interference compensation in time-of-flight

camera image.”Proc., 20th Int. Conf. on Pattern Recognition, IEEE

Computer Society, Washington, DC, 3583–3586.

Gächter, S., Nguyen, V., and Siegwart, R. (2006). “Results on range image

segmentation for service robots.”Proc., IEEE Int. Conf. on Computer

Vision Systems, Institute of Electrical and Electronics Engineers (IEEE),

New York.

Golparvar-Fard, M., Peña-Mora, F., and Savarese, S. (2010). “Application

of D4AR—A 4-dimensional augmented reality model for automating

construction progress monitoring data collection, processing and com-

munication.”J. Inf. Technol. Constr., 14, 129–153.

Gong, J., and Caldas, C. H. (2008). “Data processing for real-time construc-

tion site spatial modeling.”Autom. Constr., 17(5), 526–535.

J. Comput. Civ. Eng. 2012.26:541-549.

Downloaded from ascelibrary.org by Christian Koch on 07/10/12. For personal use only.

Gong, J., and Caldas, C. H. (2010). “Computer vision-based video

interpretation model for automated productivity analysis of construction

operations.”J. Comput. Civ. Eng., 24(3), 252–263.

Gruen, A. (1997). “Fundamentals of videogrammetry—a review.”Hum.

Movement Sci. J., 16(2–3), 155–187.

Hartley, R. (1997). “In defense of the eight-point algorithm.”IEEE Trans.

Pattern Anal. Mach. Intell., 19(6), 580–593.

Hartley, R., and Sturm, P. (1997). “Triangulation.”Comput. Vis. Image

Understanding, 68(2), 146–157.

Hartley, R., and Zisserman, A. (2004). Multiple view geometry in computer

vision, Cambridge University Press, Cambridge, UK.

Heikkilä, J., and Silvén, O. (1997). “A four-step camera calibration

procedure with implicit image correction.”Proc., IEEE Computer

Society Conf. on Computer Vision and Pattern Recognition, Institute

of Electrical and Electronics Engineers (IEEE), New York, 1106–1112.

Kanatani, K., Sugaya, Y., and Niitsuma, H. (2008). “Triangulation from

two views revisited: Hartley-Sturm vs. optimal correction.”Proc.,

19th British Machine Vision Conf., British Machine Vision Association

and Society for Pattern Recognition, Malvern, UK, 173–182.

Li, H., and Hartley, R. (2006). “Five-point motion estimation made easy.”

18th Int. Conf. on Pattern Recognition (ICPR 2006), Institute of

Electrical and Electronics Engineers (IEEE), New York, 630–633.

Lowe, D. G. (2004). “Distinctive image features from scale-invariant

keypoints.”Int. J. Comput. Vis., 60(2), 91–110.

Nistér, D. (2004). “An efficient solution to the five-point relative pose prob-

lem.”IEEE Trans. Pattern Anal. Mach. Intell., 26(6), 756–770.

Park, M.-W., Makhmalbaf, A., and Brilakis, I. (2011). “Comparative study

of vision tracking methods for tracking of construction site resources.”

Autom. Constr., 20(7), 905–915.

Pizarro, O., Eustice, R., and Singh, H. (2003). “Relative pose estimation for

instrumented, calibrated platforms.”Digital image computing: Tech-

niques and applications, Proc., 7th Biennial Australian Pattern Recog-

nition Society Conf., DICTA 2003, C. Sun, H. Tablet, S. Ourselin, and

T. Adriaansen eds., CSIRO, Collingwood, Australia, 601–612

Point Grey. (2011). Stereo vision camera catalog, ASCE, Reston, VA.

Rashidi, A., Dai, F., Brilakis, I., and Vela, P. (2011). “Comparison of

camera motion estimation methods for 3D reconstruction of

infrastructure.”ASCE Int. Workshop on Computing in Civil Engineer-

ing, ASCE, Reston, VA

Ross, D., Lim, J., Lin, R.-S., and Yang, M.-H. (2008). “Incremental learn-

ing for robust visual tracking.”Int. J. Comput. Vis., 77(1), 125–141.

Son, H., and Kim, C. (2010). “3D structural component recognition and

modeling method using color and 3D data for construction progress

monitoring.”Autom. Constr., 19(7), 844–854.

Song, J., Caldas, C. H., Ergen, E., Haas, C., and Akinci, B. (2004). “Field

trials of RFID technology for tracking pre-fabricated pipe spools.”

Proc., 21st Int. Symp. on Automation and Robotics in Construction,

International Association for Automation and Robotics in Construction,

Eindhoven, Netherlands.

Song, J., Haas, C., Caldas, C., Ergen, E., and Akinci, B. (2006). “Automat-

ing pipe spool tracking in the supply chain.”Autom. Constr., 15(2),

166–177.

Teizer, J., Caldas, C. H., and Haas, C. T. (2007a). “Real-time three-

dimensional occupancy grid modeling for the detection and tracking

of construction resources.”J. Constr. Eng. Manage., 133(11), 880–888.

Teizer, J., Lao, D., and Sofer, M. (2007b). “Rapid automated monitoring of

construction site activities using ultra-wideband.”Proc., 24th Int. Symp.

on Automation and Robotics in Construction, International Association

for Automation and Robotics in Construction, Eindhoven, Netherlands,

23–28.

Torr, P. H. S. (2002). “Bayesian model estimation and selection for epipolar

geometry and generic manifold fitting.”Int. J. Comput. Vis., 50(1),

35–61.

Yang, J., Arif, O., Vela, P. A., Teizer, J., and Shi, Z. (2010). “Tracking

multiple workers on construction sites using video cameras.”Adv.

Eng. Inform., 24(4), 428–434.

Zhang, Z. (1999). “Flexible camera calibration by viewing a plane from

unknown orientations.”Proc., 7th IEEE Int. Conf. on Computer Vision,

Vol. 1, Institute of Electrical and Electronics Engineers (IEEE),

New York, 666–673.

J. Comput. Civ. Eng. 2012.26:541-549.

Downloaded from ascelibrary.org by Christian Koch on 07/10/12. For personal use only.

Computer Vision-Based Tracking of Workers in Construction Sites Based on MDNet

Article

May 2023

Automatic continuous tracking of objects involved in a construction project is required for such tasks as productivity assessment, unsafe behavior recognition, and progress monitoring. Many computer-vision-based tracking approaches have been investigated and successfully tested on construction sites; however, their practical applications are hindered by the tracking accuracy limited by the dynamic, complex nature of construction sites (i.e. clutter with background, occlusion, varying scale and pose). To achieve better tracking performance, a novel deep-learning-based tracking approach called the Multi-Domain Convolutional Neural Networks (MD-CNN) is proposed and investigated. The proposed approach consists of two key stages: 1) multi-domain representation of learning; and 2) online visual tracking. To evaluate the effectiveness and feasibility of this approach, it is applied to a metro project in Wuhan China, and the results demonstrate good tracking performance in construction scenarios with complex background. The average distance error and F-measure for the MDNet are 7.64 pixels and 67, respectively. The results demonstrate that the proposed approach can be used by site managers to monitor and track workers for hazard prevention in construction sites.

Untethered Ultra-Wideband-Based Real-Time Locating System for Road-Worker Safety

Article

Full-text available

Apr 2024
SENSORS-BASEL

In order to reduce the accident risk in road construction and maintenance, this paper proposes a novel solution for road-worker safety based on an untethered real-time locating system (RTLS). This system tracks the location of workers in real time using ultra-wideband (UWB) technology and indicates if they are in a predefined danger zone or not, where the predefined safe zone is delimited by safety cones. Unlike previous works that focus on road-worker safety by detecting vehicles that enter into the working zone, our proposal solves the problem of distracted workers leaving the safe zone. This paper presents a simple-to-deploy safety system. Our UWB anchors do not need any cables for powering, synchronisation, or data transfer. The anchors are placed inside safety cones, which are already available in construction sites. Finally, there is no need to manually measure the positions of anchors and introduce them to the system thanks to a novel self-positioning approach. Our proposal, apart from automatically estimating the anchors’ positions, also defines the limits of safe and danger zones. These features notably reduce the deployment time of the proposed safety system. Moreover, measurements show that all the proposed simplifications are obtained with an accuracy of 97%.

Industry 4.0-Based Digital Twin Approach for Construction Site Tracking Purposes

Conference Paper

Full-text available

Jun 2023

Construction sites are dynamic and complex systems with signi cant potential for time and cost e�ciency improvement through digitization and interconnection. Construction 4.0 is the use of modern information and communication technologies known from Industry 4.0 (I4.0) to interconnect construction sites with cyber-physical systems (CPS). In these decentralized systems, construction workers, machines, and processes become smart I4.0 components that can exchange data and information with each other in a decentralized and self-controlled manner. The basis for informed decision-making to control and optimize relevant on-site processes is real-time detection of the current construction progress and the machines used on the construction site. Computer vision (CV)-based tracking systems o er a technical solution that can reliably detect construction workers and machines and track construction progress and processes. These tracking systems generate large amounts of data that must be processed and analysed automatically. The goal is to integrate the tracking system into the CPS as an I4.0 component. Essential for this is the digital twin as a virtual representation of an I4.0 component that centrally collects, processes, and provides data for the respective component. This paper presents an I4.0-based digital twin approach for digitizing and interconnection of the construction site into a CPS. The approach integrates a CV-based tracking system as an I4.0 component to locate construction equipment on the construction site. The tracking system is a multi-camera multi-object tracking system that uses stereo vision cameras and a real-time capable detector. The asset administration shell (AAS) is used as the platform for the digital twin.

Cosmic-ray arrival time (CAT) indoor navigation in the World Geodetic System

Preprint

Full-text available

Feb 2024

Hiroyuki K. M. Tanaka

Indoor positioning system (IPS) technologies have a wide range of applications; however, three major limitations associated with currently used IPS technologies are: (1) weak penetration strength of signals to penetrate building materials, inhibiting seamless connection of outdoor coordinates to indoor coordinates; hence these technologies rely on local coordinates, making them incompatible with the world geodetic system (WGS84) and universal traceability, (2) active source signals that require beacons to transmit navigation signals. In contrast, the muometric positioning system utilizes naturally abundant cosmic-ray muons signals to compensate for some of these setbacks. However, its main practical challenges are: (1) the low signal rate (~1 per 10 days for laptop-sized receivers horizontally located 50 m apart from each other) and (2) the requirement for large reference detectors (> 4 m2) above the receiver to track cosmic ray precipitation. In this work, an alternative concept called CAT navigation, which relies on the extended air shower time structure for higher rate positioning (without requiring reference detectors) is first proposed and demonstrated; it located receivers placed on the ground floors of multiple buildings (within WGS84) in conditions where other IPS methods are difficult to apply. The resultant positioning accuracy was 3-4 m (at 50 m apart), which is reasonably accurate for GPS -IPS seamless bridging, and with a laptop sized receiver the averaged positioning signal update rate was (683 s)-1 which can be improved to (170 s)-1 with a future upgrade of the data gathering electronics. By integrating CAT receivers into GPS equipped smartphones, it is anticipated that this GPS -CAT hybrid method will seamlessly connect multi-users’ coordinates from outdoor to indoor environments.

Safety Improvements for Personnel and Vehicles in Short-Term Construction Sites

Article

Full-text available

Jan 2024

Despite all efforts to enhance safety, construction sites remain a major location for traffic accidents. Short-term construction sites, in particular, face limitations in implementing extensive safety measures due to their condensed timelines. This paper seeks to enhance safety in short-term construction sites by alerting maintenance personnel and approaching vehicles to potentially dangerous scenarios. Focusing on defining the exact dimensions of static construction sites, this method employs high-precision Real-Time-Kinematics-GNSS for localizing traffic cones and deriving the construction site geometry through respective algorithms. By analyzing the geometry, we can identify situations where maintenance personnel are in close proximity to the active lane or when vehicles enter the construction site. To increase awareness of hazardous situations, we present methods for distributing information to maintenance personnel and vehicles, along with technical solutions for warning those involved. Additionally, we discuss the distribution of the construction site’s geometry among approaching vehicles, which can provide future automated vehicles with crucial information on the site’s exact start and end points.

Az építésautomatizálás technológiai lehetőségei: Az ipar 4.0 szemlélet kibontakozása az építőiparban

Article

Full-text available

Feb 2024

Az építőipar a munkaerő hiánya és az egyre fokozódó minőségi elvárások miatt a hagyományos, jellemzően emberi erőforrást alkalmazó vagy emberek által közvetlenül működtetett technológiák irányából apró lépésenként az automatizált technológiák irányába fordul. Az ezzel együtt járó változás csak úgy lehet zökkenőmentes, ha az építőipar résztvevői aktív részesei a változási folyamatnak. A cikk az építőipar fejlődési irányait, annak problematikáját és lehetőségeit kívánja bemutatni a területtel kapcsolatos kutatások és a már alkalmazott technológiai megoldások elemzésével a közeljövőben lehetséges változások, további lehetőségek, illetve problémák feltérképezésére és megvilágítására törekedve.

Ontology-Based Semantic Construction Image Interpretation

Article

Full-text available

Nov 2023

Image-based techniques have become integral to the construction sector, aiding in project planning, progress monitoring, quality control, and documentation. In this paper, we address two key challenges that limit our ability to fully exploit the potential of images. The first is the “semantic gap” between low-level image features and high-level semantic descriptions. The second is the lack of principled integration between images and other digital systems used in construction, such as construction schedules and building information modeling (BIM). These challenges make it difficult to effectively incorporate images into digital twins of construction (DTC), a critical concept that addresses the construction industry’s need for more efficient project management and decision-making. To address these challenges, we first propose an ontology-based construction image interpretation (CII) framework to formalize the interpretation and integration workflow. Then, the DiCon-SII ontology is developed to provide a formalized vocabulary for visual construction contents and features. DiCon-SII also acts as a bridge between images and other digital systems to help construct an image-involved DTC. To evaluate the practical application of DiCon-SII and CII in supporting construction management tasks and as a precursor to DTC, we conducted a case study involving drywall installation. Via this case study, we demonstrate how the proposed methods can be used to infer the operational stage of a construction process, estimate labor productivity, and retrieve specific images based on user queries.

Computer vision applications in construction material and structural health monitoring: A scoping review Selection and peer-review under responsibility of the scientific committee of the 2nd International Con- ference on Sustainable Materials and Practices for Built Environment

Article

Full-text available

Jun 2023
MATER TODAY

Jayaram M.A

The concept of computer vision is as old as six decades. A spurt of remarkable magnitude on real-world applications of computer vision techniques in the realm of civil engineering is coming up since just one-and-a-half decade. A huge literature survey by the authors has revealed that, the applications are predominantly seen in allied domains of civil engineering such as, structural health monitoring, construction safety monitoring, infrastructure inspection, surveillance, data collection, and object detection. Being in-terdisciplinary, the emerging technologies from other engineering fields are getting integrated and making inroads to allied civil engineering projects in general, and construction industry related projects in particular. As the existing review publications provide a focused or context specific applications of computer vision in civil engineering, a deep review of literature will certainly provide a systematic and a lucid approach to gain an in-depth understanding. In this context this paper makes a vivid presentation of the reported and documented applications of computer vison in structural damage detection, health monitoring , vibration assessment, data anomaly detection, video surveillance applications, and investigation of serviceability conditions. It also provides a deeper insight into the current and futuristic foreseeable trends. The intent of this review is threefold. Firstly, to garner a deep understanding of possible research area and open problems for exploration. Secondly, to assess the role of computer vision as an AI based technique for aiding smart construction and for the increased quality in construction. Finally, to bring awareness and to provide futuristic ideas to the prospective research scholars, project students, teachers, and professionals. To an extent this review will also guide the practitioners to arrive at informed decisions.

Vision-based real-time process monitoring and problem feedback for productivity-oriented analysis in off-site construction

Article

Jun 2024
AUTOMAT CONSTR

A forward intersection strategy on structural parameters for robotic vision system with 3 or more DOFs

Article

Full-text available

May 2023

The vision-based measurement for robot with 3 or more degrees of freedom (DOFs) require on-site calibrations and auxiliary sensors to establish an object coordinate system at present. Obviously, it is time consuming and requires extensive computing resources, and these hinder the application of vision-based measurement for robot in many fields. To solve these questions, a measurement coordinate system on robot was established according to the posture of robotic parts in preset position, structure of robot and vision system. Two categories of structural parameters were proposed; posture transformation matrices of robotic parts from the preset posture to an actual posture were built; and the visual-related spatial vectors were expressed by the two categories of structural parameters, optical parameters, and posture matrices in the measurement coordinate system. According to these, a new forward intersection strategy is proposed for robotic vision system. The new structural parameters for vision system on robot were proven to be reliable. The expressions of the visual-related vectors for robot were found to be correct and effective. Experimental results demonstrated that the new forward intersection strategy was accurate and efficient, and this strategy can be applied to robotic vision platforms to facilitate control of robots. 18 intersection ; radius of 40 principal optic axis; robotic part; co-spherical resection ; structural parameter; measurement coordinate system I. INTRODU CTION The requirements of robotic vision and artificial intelligence technologies promote close-range photogrammetry to be high precision, efficiency, and convenience in agriculture [I] and other industrial application fields. Robot have 3 or more degrees of freedom (DOFs), the forward intersection strategy should change to adapt to the platform. After the structural parameters for vision system on 2 DOFs platform were defined and calibrated by co-spherical resection [2], forward intersection measurement for 3 or more DOFs carrier could be proceeded. In traditional close-range photogrammetry, the collinear equations of object points, image points, and perspective points are to achieve forward intersection measurement [3 and 4]. This parameter vector model is expressed by the corresponding optical parameters of camera, the exterior orientation elements, and the space translation vectors between images in a coordinate system established for object [5]. After surveying adjustment [6] for parameter calibration and aberration correction for image points [7], the spatial coordinates of to-be-measured point could be obtained. In many occasions, on-site calibrations [8, 9, 10, and II] or other distance-detecting vision sensors [12 and 13] were aids

Integration of Automated Data Collection Technologies for Real-Time Field Materials Management

Conference Paper

Full-text available

Sep 2004

Field Trials of RFID Technology for Tracking Pre-Fabricated Pipe Spools

Conference Paper

Full-text available

Sep 2004

Automated Vision Tracking of Project Related Entities

Conference Paper

Full-text available

Jan 2009

Comparison of Camera Motion Estimation Methods for 3D Reconstruction of Infrastructure

Conference Paper

Full-text available

Jun 2011

Camera motion estimation is one of the most significant steps for structure-from-motion (SFM) with a monocular camera. The normalized 8-point, the 7-point, and the 5-point algorithms are normally adopted to perform the estimation, each of which has distinct performance characteristics. Given unique needs and challenges associated to civil infrastructure SFM scenarios, selection of the proper algorithm directly impacts the structure reconstruction results. In this paper, a comparison study of the aforementioned algorithms is conducted to identify the most suitable algorithm, in terms of accuracy and reliability, for reconstructing civil infrastructure. The free variables tested are baseline, depth, and motion. A concrete girder bridge was selected as the "test-bed" to reconstruct using an off-the-shelf camera capturing imagery from all possible positions that maximally the bridge's features and geometry. The feature points in the images were extracted and matched via the SURF descriptor. Finally, camera motions are estimated based on the corresponding image points by applying the aforementioned algorithms, and the results evaluated.

Application of Location Information by Stereo Camera Images to Project Progress Monitoring

Conference Paper

Sep 2007

An Eifficient Solution to the Five-Point Relative Pose Problem

Article

Jan 2003

David Nistér

An efficient algorithmic solution to the classical five-point relative pose problem is presented. The problem is to find the possible solutions for relative camera motion between two calibrated views given five corresponding points. The algorithm consists of computing the coefficients of a tenth degree polynomial and subsequently finding its roots. It is the first algorithm well suited for numerical implementation that also corresponds to the inherent complexity of the problem. The algorithm is used to a robust hypothesise-and-test framework to estimate structure and motion in real-time.

Application of D4AR - A 4-dimensional augmented reality model for automating construction progress monitoring data collection, processing and communication

Article

Jun 2009

Early detection of actual or potential schedule delay in field construction activities is vital to project management. This entails project managers to design, implement, and maintain a systematic approach for construction progress monitoring to promptly identify, process and communicate discrepancies between actual and as-planned performances. To achieve this goal, this research focuses on exploring application of unsorted daily progress photograph logs available on any construction site as a data collection technique. Our approach is based on computing-from the images themselves-the photographer's locations and orientations, along with a sparse 3D geometric representation of the as-built site using daily progress photographs and superimposition of the reconstructed scene over as-planned 4D models. Within such an environment, progress photographs are registered in the virtual as-planned environment and this allows a large unstructured collection of daily construction images to be sorted, interactively browsed and explored. In addition, sparse reconstructed scenes superimposed over 4D models allow site images to be geo-registered with the as-planned components and consequently, location-based image processing technique to be implemented and progress data to be extracted automatically. The results of progress comparison between as-planned and as-built performances are visualized in the D4AR (4D Augmented Reality) environment using a traffic light metaphor. We present our preliminary results on three ongoing construction projects and discuss implementation, perceived benefits and future potential enhancement of this new technology in construction, in all fronts of automatic data collection, processing and communication.

Application of D4AR-A 4-Dimensional augmented reality model for automating construction progress monitoring data collection

Article

Jan 2009
SIGNAL PROCESS-IMAGE

Multiple view geometry in computer vision. With foreword by Olivier Faugeras. 2nd edition

Article

Jan 2003

Speeded-up robust features (SURF)

Article

Jun 2008

This article presents a novel scale- and rotation-invariant detector and descriptor, coined SURF (Speeded-Up Robust Features). SURF approximates or even outperforms previously proposed schemes with respect to repeatability, distinctiveness, and robustness, yet can be computed and compared much faster. This is achieved by relying on integral images for image convolutions; by building on the strengths of the leading existing detectors and descriptors (specifically, using a Hessian matrix-based measure for the detector, and a distribution-based descriptor); and by simplifying these methods to the essential. This leads to a combination of novel detection, description, and matching steps. The paper encompasses a detailed description of the detector and descriptor and then explores the effects of the most important parameters. We conclude the article with SURF's application to two challenging, yet converse goals: camera calibration as a special case of image registration, and object recognition. Our experiments underline SURF's usefulness in a broad range of topics in computer vision.

3D Tracking of Construction Resources Using an On-site Camera System

Abstract and Figures

Recommended publications

ASPRS aerial camera system calibration guidelines

Enhancement of Construction Equipment Detection in Video Frames by Combining with Tracking

Automated Vision Tracking of Project Related Entities

Construction worker detection in video frames for initializing vision trackers

Improved Localization of Construction Workers in Video Frames by Integrating Detection and Tracking