ArticlePDF Available

Combined Rule-Based and Hypothesis-Based Method for Building Model Reconstruction from Photogrammetric Point Clouds

Remote Sensing

March 2021
13(6):1107

DOI:10.3390/rs13061107

License
CC BY 4.0

Authors:

Xie Linfu

Shenzhen University

Han Hu

Southwest Jiaotong University

Qing Zhu

Southwest Jiaotong University

Xiaoming Li

Shenzhen University

Show all 9 authorsHide

Three-dimensional (3D) building models play an important role in digital cities and have numerous potential applications in environmental studies. In recent years, the photogrammetric point clouds obtained by aerial oblique images have become a major source of data for 3D building reconstruction. Aiming at reconstructing a 3D building model at Level of Detail (LoD) 2 and even LoD3 with preferred geometry accuracy and affordable computation expense, in this paper, we propose a novel method for the efficient reconstruction of building models from the photogrammetric point clouds which combines the rule-based and the hypothesis-based method using a two-stage topological recovery process. Given the point clouds of a single building, planar primitives and their corresponding boundaries are extracted and regularized to obtain abstracted building counters. In the first stage, we take advantage of the regularity and adjacency of the building counters to recover parts of the topological relationships between different primitives. Three constraints, namely pairwise constraint, triplet constraint, and nearby constraint, are utilized to form an initial reconstruction with candidate faces in ambiguous areas. In the second stage, the topologies in ambiguous areas are removed and reconstructed by solving an integer linear optimization problem based on the initial constraints while considering data fitting degree. Experiments using real datasets reveal that compared with state-of-the-art methods, the proposed method can efficiently reconstruct 3D building models in seconds with the geometry accuracy in decimeter level.

Overall workflow of the proposed method.

…

Graphic illustration of the two types of topological relations. In polygons A, B, and C, if edge (Va1, Va2) matches edge (Vc1, Vc2), then the polygon pair (A, C) is considered to be pairwise adjacent. If vertices Va1, Vb1, and Vc1 are matched with each other, then polygon triplet (A, B, C) is considered to intersect at the common point of the three supporting planes.

…

Graphic illustration of polygons and their matching relations. The blue dots represent polygons while the black lines between two dots indicate that they are matched with each other. The red triangles indicate the intersections of three non-parallel polygons.

…

Overall workflow of the proposed candidate face deduction method.

…

+13

Illustration of the three constraints used to reduce the number of candidate faces. (a): Pairwise Constraint; (b): Triplet Constraint; (c): Nearby Constraint. The red lines depict a simplified boundary polygon A in its supporting plane πA, and the blue lines which separate the plane into several candidate faces are the intersection of other planes in this plane. The green, orange, and gray solid circles indicate the status of the occupied faces as Cover, Near, and Invalid, respectively.

…

Figures - available from: Remote Sensing

This content is subject to copyright.

Access to this full-text is provided by MDPI.

Content available from Remote Sensing

This content is subject to copyright.

remote sensing

Article

Combined Rule-Based and Hypothesis-Based Method for

Building Model Reconstruction from Photogrammetric

Point Clouds

Linfu Xie 1,2,3, Han Hu 4, Qing Zhu 4, Xiaoming Li 1, Shengjun Tang 1, You Li 1, Renzhong Guo 1, Yeting Zhang 2

and Weixi Wang 1,*





Citation: Xie, L.; Hu, H.; Zhu, Q.; Li,

X.; Tang, S.; Li, Y.; Guo, R.; Zhang, Y.;

Wang, W. Combined Rule-Based and

Hypothesis-Based Method for

Building Model Reconstruction from

Photogrammetric Point Clouds.

Remote Sens. 2021,13, 1107. https://

doi.org/10.3390/rs13061107

Academic Editor: Ben Gorte

Received: 19 January 2021

Accepted: 11 March 2021

Published: 14 March 2021

Publisher’s Note: MDPI stays neutral

with regard to jurisdictional claims in

published maps and institutional afﬁl-

iations.

Licensee MDPI, Basel, Switzerland.

This article is an open access article

distributed under the terms and

conditions of the Creative Commons

Attribution (CC BY) license (https://

creativecommons.org/licenses/by/

4.0/).

1Research Institute for Smart Cities, School of Architecture and Urban Planning, Shenzhen University & Key

Laboratory of Urban Land Resources Monitoring and Simulation, MNR & Guangdong Key Laboratory of

Urban Informatics & Shenzhen Key Laboratory of Spatial Smart Sensing and Services,

Shenzhen 518060, China; linfuxie@szu.edu.cn (L.X.); lixming@szu.edu.cn (X.L.);

shengjuntang@szu.edu.cn (S.T.); liyou@szu.edu.cn (Y.L.); guorz@szu.edu.cn (R.G.)

2State Key Laboratory of Information Engineering in Surveying Mapping and Remote Sensing,

Wuhan University, Wuhan 430079, China; zhangyeting@263.net

3Department of Land Surveying & Geo-Informatics, The Hong Kong Polytechnic University, Hung Hom,

Kowloon 999077, Hong Kong

4Faculty of Geosciences and Environmental Engineering, Southwest Jiaotong University,

Chengdu 611756, China; han.hu@swjtu.edu.cn (H.H.); zhuq66@263.net (Q.Z.)

*Correspondence: wangwx@szu.edu.cn; Tel.: +86-13590317811

Abstract:

Three-dimensional (3D) building models play an important role in digital cities and have

numerous potential applications in environmental studies. In recent years, the photogrammetric

point clouds obtained by aerial oblique images have become a major source of data for 3D building

reconstruction. Aiming at reconstructing a 3D building model at Level of Detail (LoD) 2 and even

LoD3 with preferred geometry accuracy and affordable computation expense, in this paper, we

propose a novel method for the efﬁcient reconstruction of building models from the photogrammetric

point clouds which combines the rule-based and the hypothesis-based method using a two-stage

topological recovery process. Given the point clouds of a single building, planar primitives and their

corresponding boundaries are extracted and regularized to obtain abstracted building counters. In

the ﬁrst stage, we take advantage of the regularity and adjacency of the building counters to recover

parts of the topological relationships between different primitives. Three constraints, namely pairwise

constraint, triplet constraint, and nearby constraint, are utilized to form an initial reconstruction

with candidate faces in ambiguous areas. In the second stage, the topologies in ambiguous areas

are removed and reconstructed by solving an integer linear optimization problem based on the

initial constraints while considering data ﬁtting degree. Experiments using real datasets reveal that

compared with state-of-the-art methods, the proposed method can efﬁciently reconstruct 3D building

models in seconds with the geometry accuracy in decimeter level.

Keywords: building models; 3D reconstruction; point clouds; photogrammetry

1. Introduction

Three-dimensional (3D) building models play an important role in constructing digital

cities and have numerous environmental applications in areas such as urban planning [

smart city [

], environmental analysis [

], and other civil engineering [

]. In recent years,

with the rapid development of aerial vehicles, cameras, and image-processing technologies,

aerial oblique images have become a major data source for 3D city modeling [

]. Due to

the time-consuming and labor-intensive nature of manual modeling processes, researchers

in the photogrammetry, computer vision, and graphic communities have developed auto-

matic building model reconstruction methods [

–

]. Due to the presence of occlusions in

Remote Sens. 2021,13, 1107. https://doi.org/10.3390/rs13061107 https://www.mdpi.com/journal/remotesensing

Remote Sens. 2021,13, 1107 2 of 23

complex city scenes and noise in forward-intersecting point clouds, some building features

(e.g., planes, edges) are degraded or even missed in the collected point clouds, which leads

to unreliable recovery of the topological relationships [11].

To address this problem, various data- and model-driven methods (or their combina-

tion) have been proposed in recent years and substantial improvements in quality have

been achieved [

–

]. However, for complex buildings with imperfect point coverage, it

is difﬁcult to automatically and efﬁciently reconstruct building models with desired Level

of Detail (LoD) [

], hindering environmental simulation, spatial analysis, and other

model-based applications. To solve this problem, in this paper, a novel framework which

combines the rule-based and the hypothesis-based method is proposed for the efﬁcient

reconstruction of high-quality polygonal building models. Starting with the point clouds

of a single building, the planar primitives and corresponding boundaries are extracted and

regularized to obtain abstracted building counters, followed by a two-stage reconstruction.

In the ﬁrst stage, the regularity and adjacency of the building counters are used to recover

the topological relationship between different primitives and produce an initial reconstruc-

tion. In the second stage, the topologies of ambiguous areas are removed and reconstructed

by solving an integer linear optimization problem based on the initial reconstruction.

The major contributions of this paper are as follows:

(1)

A novel framework for 3D building reconstruction which combines the efﬁciency of

traditional rule-based methods and the integrity of recently developed hypothesis-

based methods.

(2)

A method for robust topology estimation that integrates the regularity and adjacency

relationships between building primitives in 3D.

(3)

An effective solution that enforces initial reconstruction results and constraints to

eliminate topological ambiguities.

The remainder of this paper is organized as follows. Section 2provides a brief re-view

of existing work on 3D building reconstruction from point clouds. In Section 3, the details

and key steps of the proposed approach are presented. The performance of the proposed

approach is evaluated in Section 4using real photogrammetric point clouds from aerial

oblique images. Discussion about the method and the experimental results are given in

Section 5. Finally, we draw our conclusions in Section 6.

2. Related Works

According to the City Geography Markup Language (CityGML) model format adopted

by the Open Geospatial Consortium, building models are divided into ﬁve LoDs, from

LoD0 to LoD4 [

]. With the rapid developments in point-cloud collection and processing

technologies, automatic building reconstruction methods for LoD0 and LoD1 building mod-

els are now relatively mature [

–

]. In past decades, researchers in the photogrammetry

and computer vision communities have expended great effort on the (semi-)automatic

reconstruction of LoD2 and even LoD3 building models [7,22–24].

Point clouds used for city reconstruction are almost obtained by Light Detection

and Ranging (LiDAR) technology [

–

] or photogrammetry [

–

]. LiDAR point clouds

usually have more precise coordinates and less noise compared with those generated by

image matching. But the density of point clouds obtained through the Structure-from-

Motion (SfM) and Multi-View Stereo (MVS) pipeline is higher in areas with sufﬁcient

textures. In general, methods for reconstructing 3D building models from point clouds

can be categorized as data-driven [

], model-driven [

], or a combination of the

two (also called hybrid-driven methods) [

]. Comparisons about them could be

found in References [

]. As summarized in several previous works, model-driven

methods that adopt top-down strategies require pre-deﬁned hypothetical models, which

hinders their application in free-form building reconstruction [

]. In contrast, data-driven

methods that require top-down approaches have the potential to reconstruct complicated

buildings. This kind of reconstruction pipeline normally includes three crucial steps:

primitive extraction, regularization, and topology estimation. Given a point set, linear

Remote Sens. 2021,13, 1107 3 of 23

primitives such as planes [

], cylinders, spheres, and line segments [

] are ﬁrst extracted

using methods based on model-ﬁtting (e.g., RANdom Sample Consensus (RANSAC) [

] or

Hough transform [

]), region-growing [

], or clustering [

]. Then, these primitives or

their boundaries are regularized using Manhattan hard or soft constraints [

–

]. Finally,

at their core, data-driven methods [

] involve estimation and reﬁnement of the adjacency

relations between different primitives to construct ﬁnal models without any topological

conﬂicts [

]. Moreover, the hybrid-driven methods combine the primitive extraction

step from the data-driven methods at the ﬁrst step, then these primitives are used to form

building models with pre-deﬁned combination solutions.

As CityGML LoD2 building models mainly concern rooftop structures, air-borne

laser scanning point clouds have become the major data source for those automatic recon-

struction methods because of their accurate altitude measurements and fewer top-view

occlusions [

]. By projecting 3D rooftops onto a two-dimensional (2D) horizontal

plane, the topological relations between rooftop primitives are estimated by detecting ridge

edges and jump edges [

], and these relations are maintained by a binary space parti-

tioning (BSP) tree [

], an adjacency matrix [

], or a roof topology graph (RTG) [

Then, the subsets of primitives are used to form building models based on pre-deﬁned

rules such as graph edit operations [

], planar partitioning, and merging [

]. These are

the so-called rule-based methods, which could be either data-driven or hybrid-driven.

In practice, photogrammetric point clouds acquired through SfM and MVS pipelines are

inherently noisier than those collected by laser scanning technology [

]. Cameras mounted

on unmanned aerial vehicles are inevitably hampered by occluded areas during the image

collection process, especially on the lower parts of buildings [

], which leads to unreliable

geometric accuracy or missing data in the photogrammetric point clouds. Although some

previous works have reported impressive results in the automatic reconstruction LoD2

building models, they may not be suitable for LoD3 building reconstruction from pho-

togrammetric point clouds due to the inferior data quality and difﬁculty in representing

more complicated topological relations in real 3D space [13,23,47,51].

Recently, researchers have tried to convert the model reconstruction problem into

an optimal subset selection problem with hard constraints [

]. Given primitives

detected from original point clouds, the object space is ﬁrst divided into several segments

to form a candidate pool. Then, different hypotheses about the building models are quan-

tized by energy functions that measure their data ﬁtness and rule ﬁtness, with additional

topological constraints, such as watertight. After that, by searching for the maximal (or

minimal) values of the object functions, segments in the candidate pool are labeled as

either selected or not to establish the ﬁnal building models. These kinds of methods are

referred as hypothesis-based methods in this paper, which could be either data-driven

or hybrid-driven methods. One virtue of these kinds of methods is that they are robust

when data are partially missing. In addition, with help from integer linear programming,

manifold and watertight hard constraints can be embedded to avoid topological conﬂicts

in the output models [

]. Hence, these methods have the potential to reconstruct LoD3

building models that contain more detailed structures [7].

However, with tile views, building façades are visible in the photogrammetric point

clouds generated by aerial oblique images, which result in more planar primitives. The

direct adoption of hypothesis-based reconstruction methods in such scenes may result in

unreasonable computation costs when solving integer linear programming problems [

As noted by Wang et al. [

], global regularities between different primitives in buildings

may reveal their topological relations. If this information could be properly explored and

utilized to estimate building topologies, even partially, the recovered topological relations

could stabilize the solution to decrease artifact and accelerate the problem-solving process

by reducing the size of the candidate pool.

In order to utilize the intrinsic structure of buildings in architectural design to produce

building models with preferred geometry accuracy while eliminating topological conﬂicts

Remote Sens. 2021,13, 1107 4 of 23

efﬁciently, a two-stage building reconstruction method that combines the traditional rule-

based and the recently arisen hypothesis-based methods is proposed.

3. Method

3.1. Overview of the Proposed Approach

Starting from the photogrammetric point clouds of aerial oblique images for single

buildings, Figure 1shows the overall workﬂow of the proposed method. The photogram-

metric point clouds could be produced from images with sufﬁcient overlaps by existing

SfM and MVS pipeline, and single buildings could be extracted from the point clouds

manually or clipped according to existing 2D footprints.

Figure 1. Overall workﬂow of the proposed method.

In the pre-processing steps, the planar primitives in the point clouds are extracted with

simple parallel and orthogonal constraints using the existing RANSAC-based methods.

Then, the extracted 3D planes which share the same normal orientations are grouped

together. For each plane group, 3D points are projected to 2D space by translating their

centroid to the origin point and rotating the normal of the plane to the positive direction

of the Z-axis, and the consecutive boundary points of each plane are extracted using

alpha-shapes. After that, the boundary points of an individual plane are simpliﬁed by

shifting them along their reﬁned normal vector to resist noise, and then grouped into

piecewise smooth segments. Finally, the orientations of each segments in the same plane

group are softly regularized to be parallel (or perpendicular) with each other or the normal

orientation of the other group planes. So, the initial point clouds are abstracted by a set of

Remote Sens. 2021,13, 1107 5 of 23

planar polygons with mutual regularity. For details of the pre-processing step, please refer

to the work by Xie et al. [43].

Si=1,2,···n={Pi,πi,Polyi}(1)

In Equation (1), S

represents the detected planar segments (nis the total number of

segments), P

is the point set which belong to segment S

πi

is the regularized base plane

of Si, and Polyiis a regularized boundary polygon of Piin plane πi.

Then, a two-stage reconstruction method is implemented. In the ﬁrst stage, the adja-

cency relations between different polygons are estimated on the basis of spatial consistency

and mutual regularity rules, even if only partial, to gain an initial reconstruction of the

3D polygonal building model. For areas not reconstructed in the ﬁrst stage, inspired by

the work of Nan and Wonka [

], hypotheses are posed regarding the ﬁnal model based

on the pairwise intersection in ﬁnite distances, followed by the selection of the optimal

combination of candidates by solving a binary linear programming problem.

3.2. Adjacency Detection between Multiple Primitives

In this stage, the robust adjacency relations between different polygons are recovered

in areas with sufﬁcient data support. Speciﬁcally, as shown in Figure 2, two types of

topological relations are identiﬁed: (1) the intersection of two planar primitives, which

indicates an edge in the model, and (2) the intersection of three planar primitives, which

indicates a vertex in the model.

Figure 2.

Graphic illustration of the two types of topological relations. In polygons A,B, and C, if

edge (Va

,Va

) matches edge (Vc

,Vc

), then the polygon pair (A,C) is considered to be pairwise

adjacent. If vertices Va

,Vb

, and Vc

are matched with each other, then polygon triplet (A,B,C) is

considered to intersect at the common point of the three supporting planes.

Although photogrammetric point clouds can be noisy or partly missing, large pla-

nar structures that are well sampled are still reliable for ﬁtting planes and recovering

boundaries. Similar to the work by Arikan et al. [

], vertex–vertex matches (VVMs) and

vertex–edge matches (VEMs) between two polygons are searched to determine their pair-

wise adjacency. To obtain robust estimations, the maximal search radius, as described by

Arikan et al. [

], should be relatively low (in this paper, it is set to be twice the average

point spacing).

Edge–edge matches are derived from VVMs and/or VEMs. In this work, the adjacency

between two non-parallel polygons, Poly

and Poly

, is veriﬁed by ﬁnding at least one pair

of edges from the two polygons that satisfy the following criteria:

(1)

The two edges are parallel or collinear.

(2)

Two VVMs, or one VVM and one VEM, or two VEMs are found for them.

Remote Sens. 2021,13, 1107 6 of 23

Note that, if more than two edges are found to satisfy the ﬁrst criteria, all of the

possible combination pairs should be tested if they satisfy the second criteria of edge–edge

matches. Then, to detect plane triplets of interest, an undirected graph G(V,E), as shown

in Figure 3, is generated by setting each polygon as a vertex (the blue dots), with the

edge between two vertices indicating matching relationships of the two polygons. As the

intersection of three non-parallel polygons can be deﬁned without ambiguity, the shortest

closed cycles are searched in the graph G(V,E) [

], and those with the shortest walk of the

three indicate an underlying intersection of a triplet of polygons if none are parallel (the

red triangles).

Figure 3.

Graphic illustration of polygons and their matching relations. The blue dots represent

polygons while the black lines between two dots indicate that they are matched with each other. The

red triangles indicate the intersections of three non-parallel polygons.

3.3. Building Model Reconstruction with Initial Topology Constraints

In the previous stage, the adjacency relations between different planar primitives

were partially recovered. A set of adjacent planar polygon pairs and a set of intersecting

non-parallel polygon triplets are obtained. So, the topology relationship between different

planar polygons is divide into two parts: the conﬁdent part and the ambiguous part.

In this stage, we incorporate these relations to produce candidate faces and constraints

for generating the ﬁnal building model. Unlike previous work which generates candidate

faces by simply intersecting detected planar segments, we embed the recovered topological

relations in this process to (1) generate more purposeful candidate faces, and (2) reduce the

size of unknown parameters in the energy function. After that, the recovered topology is

used to guide the candidate selection process as soft energy functions and hard constraints

to obtain better reconstructed models with less artifact and in remarkable running time.

3.3.1. Candidate Deduction with Topological and Spatial Hints

Given a set of boundary polygons with base planes, use of their pairwise intersections

is a simple strategy for generating redundant candidate faces. However, the drawback

is that the number of candidate faces increases substantially with increases in the initial

number of detected polygons. In addition, artifactual faces that are obviously invalid may

survive in the reconstructed models. Instead of pairwise intersection of detected planar

segments purely, in this paper, we conduct the candidate generation process based on three

assumptions:

(1)

For adjacent polygon pairs, the candidate faces in each polygon plane might be

bounded by their intersecting lines.

(2)

For adjacent non-parallel polygon triplets, the candidate faces in each polygon plane

might be bounded by the two other intersecting planes.

(3)

The potential intersection points of different polygons might not be far away from

their boundaries.

Remote Sens. 2021,13, 1107 7 of 23

The overall workﬂow of this process is shown in Figure 4. The planar primitives are

ﬁrst pairwise intersected with each other within the scope of the enlarged bounding box to

form over-redundant candidate faces. Then, the candidate faces in each plane are processed

separately to eliminate part of them according to the detected topological relations in the

previous steps and the spatial hints. After that, the union set of invalid candidates are

removed. Finally, the remaining candidates in all planes are merged and those that do not

satisfy the two-manifold rules are treated as outliers.

Figure 4. Overall workﬂow of the proposed candidate face deduction method.

To swiftly eliminate invalid faces in a plane,

, we divide the candidate faces, F

into three categories, which are marked by green, orange, and gray solid circles

Figure 5

. The ﬁrst category (F

PCover

) includes faces that share common areas with the

detected primitives in the 2D space, these faces are highly conﬁdent candidates. The second

category (F

PNear

) includes faces which are not far from the covering area of the detected

primitives and are treated as potential candidates. The rest of the faces (F

PInvalid

) in this

plane are labeled as invalid and should be rejected.

FP=nFCover

P,FNear

P,FInv alid

Po(2)

Remote Sens. 2021,13, 1107 8 of 23

Figure 5.

Illustration of the three constraints used to reduce the number of candidate faces. (

): Pairwise Constraint; (

Triplet Constraint; (

): Nearby Constraint. The red lines depict a simpliﬁed boundary polygon Ain its supporting plane

πA

and the blue lines which separate the plane into several candidate faces are the intersection of other planes in this plane.

The green, orange, and gray solid circles indicate the status of the occupied faces as Cover, Near, and Invalid, respectively.

Based on the above assumptions, three types of constraints are used to classify the

candidate faces obtained by brutal intersection, namely the pairwise constraint (PC), triplet

constraint (TC), and nearby constraint (NC).

Pairwise Constraint: All matched polygon pairs, in which the current one involved

should go through this process. As shown in Figure 5a, consider a polygon Awith support-

ing plane

πA

. If a polygon pair (A,B) is matched in the previous steps, e

is the matched

edge in polygon A, and the two supporting planes intersect at line l

. Theoretically, if

the two vertices of e

are both on the convex hull of polygon A, the candidate faces in

πA

should be bounded by l

. Then, the candidate faces in

πA

are labeled as covered (green

dot) and invalid (gray dot) respectively, according to their intersection relationship with

polygon A. Note that the same process would be done for polygon Bin its supporting plane

πB

. If both polygon Aand polygon Bare bounded by the intersection line l

, a sharp edge

is implicitly reconstructed in the building model.

Triplet Constraint: All matched polygon triplets, in which the current one involved

should go through this process. As shown in Figure 5b, consider a polygon Awith

supporting planes

πA

, and a detected polygon triplet (A,B,C), whereby the projections

of supporting planes of polygon Band Cin

πA

are two lines, l

and l

. Thus, the

intersection of l

and l

divides

πA

into four parts. Ideally, if polygon Ais located in

only one of the four parts, we could bound the candidate faces in

πA

by the boundaries

of the intersecting lines. In practice, due to the presence of noise in point clouds and the

errors accumulated in the processing steps, we consider polygon Ato be in only one of the

four parts when the area percentage of Ain this part is larger than a given threshold (e.g.,

95%). Then, the candidate faces in

πA

are restricted to being in this part only. The same

Remote Sens. 2021,13, 1107 9 of 23

veriﬁcation procedure is also performed for polygons Band Cin their supporting planes,

respectively. The best situation is that all three polygons are bounded by the two other

planes, and a corner point at which all three planes intersect is implicitly reconstructed to

bound the building model.

Nearby Constraint: As shown in Figure 5c, for a given polygon Awith supporting

plane

πA

, a haphazard pairwise intersection may result in large numbers of invalid can-

didate faces. Firstly, faces that share common areas with polygon Ain the 2D space (

πA

)

are labeled as Covered. Then, the remained faces that share at least one edge with the

candidate faces in the ﬁrst category or at least one vertex that is close to (a distance lower

than a given threshold, e.g., 2 m) the vertices of all of the polygons in the ﬁrst category are

labelled as Near. The rest of the faces in this plane are labeled as Invalid.

The union set of invalid candidate faces are rejected from the candidate pool and those

that do not satisfy the two-manifold rules are also labeled as Invalid, so the recovered

adjacency information in the Section 3.2 is embedded, so that the candidate faces in the

conﬁdent part are concise and the redundant candidates are mainly in the ambiguous part.

3.3.2. Face Selection with Initial Constraints

In this step, a subset of candidate faces is selected to establish the ﬁnal building

model, which prefers some properties and satisﬁes certain constraints. To ﬁnd the optimal

conﬁguration of candidate faces, similar to the PolyFit framework [

], we use a binary

linear programming approach to quantify the favored properties of the reconstructed

model while imposing certain hard constraints. The optimization model contains the

binary variables shown in Equation (3) below, which was ﬁrst developed by Nan and

Wonka [10]:

xf,e,es ∈{0, 1},f∈F,e∈E,es ∈E(3)

where xfand xeindicate faces in the face pool (F) constructed by all of the candidate faces

or an edge in the edge pool (E) that includes all of the associated edges of the faces in the

face pool that is selected or not, and x

indicates whether the edge is sharp or not. In this

work, four properties are favored:

Property 1: Faces supported by a large number of points are favored.

Epts =1−

∑

f∈F

numhnp



projplanef(p)∈f,dis(p,planef)<εoi

num[p](4)

In Equation (4), pstands for the points in detected primitives, fdonates a face of the

candidate face pool (F), plane

represents the regularized 3D plane that supports face f,

proj

plane-f

(p) is the 2D projection of a 3D point ponto the supporting plane of polygon A(

πA

dis(p,plane

) is the unsigned perpendicular distance from point p to

πA

is a pre-deﬁned

distance tolerance, and num(•) is the size of the set.

Property 2: Faces with a high proportion of their area covered by supporting points

are favored.

Ecover =∑

(1−area(f∩P)

area(f))(5)

This property is quantiﬁed by the ratio of overlapped area in candidate face with

the boundary polygon of the supporting points [

]. In Equation (5), Pis a 2D polygon

consisting of the boundary of the supporting points for face fand area(

•

) is the area of the

closed polygon. Both of the areas are calculated in 2D space (the supporting plane of face f).

Equations (4) and (5) are used to determine the degree to which the data ﬁt the candidate

faces, whereby the lower the values, the better the data ﬁt.

Property 3: Intersecting faces in a PC or TC are preferably selected.

Θ(f) = 1+η,f∈PC ∪TC

1, otherwise (6)

Remote Sens. 2021,13, 1107 10 of 23

In Equation (6),

(

•

) is the conﬁdence coefﬁcient of face fand

is a non-negative con-

stant (set as 1 in all the experiments in this paper) that increases the conﬁdence coefﬁcient

values for certain faces.

Property 4: Sharp edges associated with two polygons in a PC or TC are favored.

Eedge =∑

xes ·(1−length(e∩eP1∩eP2)

length(e))(7)

This property is measured by the overlapped ratio of the sharp edge with the share

section of the two matched edges. In Equation (7), estands for an edge which connects two

faces, e

and e

are the projections of two matched edges (if they matched as PC or TC)

on the intersecting line of their corresponding planes, and length(

•

) represents the unsigned

length of the segments. Longer overlapping ratio is preferred. Hence, the following energy

function is deﬁned:

E=Eplane +ωedge ·Eedge (8)

Eplane=xf·(ωpts ·Epts +ωcover ·Ecover)/Θ(f)(9)

The ﬁrst term in Equation (8) measures the degree to which the properties ﬁt the

candidate faces, and the second term measures that for their related edges,

stands for

the corresponding weight. As given in Equation (9), the data ﬁtting degree of a face is

composed of two parts, one for point number (E

pts

), the other one for point coverage

cover

). Larger number of inlier points and bigger area of face coverage is preferred. The

denominator

(f) are used to give larger opportunities for faces involved in a PC or TC. In

the paper, the weights of the three energy forms, Eedge,Eptd , and Ecovrt, are set as identical.

In addition, three constraints are imposed to strengthen the topologic and geometric

features of the building model:

Constraint 1: An edge must be selected when one of its associated faces is selected.

∃xf∈asso(ei)=1⇒xei=1 (10)

Constraint 2: When an edge is selected, it connects only two candidate faces.

xei=1⇒num[nxf=1



f∈asso(ei)o]= 2 (11)

Constraint 3: The edges associated with two polygons in a PC or TC should be sharp.

xei=1, ei∈PC ∪TC ⇒xesi=1 (12)

Properties 1 and 2 and Constraints 1 and 2 are similar to those reported by Nan

and Wonka [

] to ensure a watertight polygonal building model, whereas Properties 3

and 4 and Constraint 3 enforce the recovered topology described in Section 3.2 to realize

favorable models. The above objectives and constraints are formulated as a binary linear

programming problem that can be minimized using existing solvers [

]. Lastly, the

selected faces are combined to yield the ﬁnal building model.

4. Experimental Analysis

4.1. Test Data Description and Experimental Settings

To test the performance of the proposed methods, we made qualitative and quanti-

tative comparisons of the reconstruction quality and computational cost of our method

with those of some state-of-the-art (SOTA) methods. The photogrammetric point clouds

of the aerial oblique images of typical buildings in Shenzhen, China, were incorporated

in the experiments. The original images are captured from ﬁve directions (one vertical

and four tilt views), so the building façades are (partially) visible. The image overlap is

approximately 60% to 80%. Image orientation and dense image matching are accomplished

by existing solutions in Context Capture. After that, point clouds of single buildings are

Remote Sens. 2021,13, 1107 11 of 23

manually extracted by drawing 2D bounding boxes. The major components of buildings

are visible, and the roof structures are well-sampled. But there are also some missing and

imperfect areas due to occlusions and unfavorable lighting conditions in the lower parts of

the buildings. The point clouds are shown in Figure 6, while Table 1lists basic information

related to the test data.

Figure 6.

Photogrammetric point clouds used in the experiments. The row numbers correspond

to the building IDs. From left to right, the columns show the original point clouds, the segmented

planar primitives, and the extracted outer boundaries of each primitive.

Remote Sens. 2021,13, 1107 12 of 23

Table 1. Basic information related to the test data.

Building ID Number of

Points

Average

Spacing (m)

Footprint Area

(m2)Detected Planes

1 44,034 0.21 234 18

2 60,675 0.15 432 20

3 203,317 0.09 720 17

4 523,233 0.11 2124 33

5 611,982 0.16 672 37

6 548,766 0.20 5978 45

Starting with the single point clouds of a building, we used the region-growing

method to identify the planar segments. Then, the boundaries of the segments were

traced and regularized to concisely abstract the initial building. As plane detection and

regularization were beyond the scope of this study, we set the planar segments and their

regularized boundaries as inputs in our experiments, which could be generated by an

existing method [

]. Figure 6shows the original point clouds, the detected planes, and

the regularized boundaries. As shown in Figure 6, the regularized boundaries of detected

planar primitives could preferably represent building outlines at large primitives which

are well-sampled. Meanwhile, in areas with data insufﬁciency (e.g., under sampling or

noisy), the recovered boundaries are rather ambiguous, and even some gaps occurred (as

pointed out by the circles).

Because PolyFit [

] is the method most closely related to ours, we ﬁrst qualitatively

and quantitatively compared our method with PolyFit with respect to geometric quality

and computational efﬁciency. Since the ground truth 3D models are not available in

this area, geometric quality of reconstructed models was determined based on a visual

comparison, cloud-to-mesh (C2M) distance statistics, and mesh-to-cloud (M2C) distance

statistics. C2M distances are calculated by projecting the original points to their nearest

faces in the 3D model, and M2C distances are estimated by ﬁrst sampling the 3D model

and then calculating the cloud-to-cloud distances between the sampled and original point

clouds. For C2M distances, a lower value means better data ﬁtting degree, meanwhile,

for M2C distances, a lower value means fewer artifacts in the model. For both of the two

values, the lower is the better. Besides, computational efﬁciency is determined by the

recorded running time of each step in generating 3D building models. Comparisons with

two other SOTA building model reconstruction methods are also presented.

4.2. Comparison with PolyFit

Figures 7–12 show the intermediate candidates generated by the proposed and PolyFit

methods, as well as the ﬁnal reconstructed models. Table 2provides basic information about

the candidates and ﬁnal results, along with the computational costs of the reconstruction

process. At ﬁrst glance, we can observe that the candidate faces generated by the proposed

method are more concise than those produced by PolyFit. Some sharp edges and even some

building corners were already reconstructed by the proposed adjacency-based topology

detection method. As shown in Table 2, the number of candidate faces generated by PolyFit

ranged from 1163 to 15,885, whereas those generated by the proposed method ranged

between 138 and 2964, which again indicates the efﬁciency of the process of generating

candidate faces by our method. Although the proposed method has to ﬁrst estimate the

adjacency relations before computing the candidate faces, the running time in this step

(denoted as ADT in Table 2) is relatively fast compared with the computing times of

candidate generation (CGT) and model generation (MGT). The total running times (TT) of

PolyFit and our proposed method, as shown in Table 2for all six tested buildings, reveals

that our method is substantially faster. For simple buildings with fewer planar faces (e.g.,

Buildings #1, 2#, and 3#), although our method used only about 84.2% to 35.6% of the

running time that PolyFit did, the computing times are both in the second level. As a

building became more complicated, the running time for PolyFit increased dramatically

Remote Sens. 2021,13, 1107 13 of 23

because of the geometric growth in the number of candidate faces. For Building #5, PolyFit

used more than 10 min, whereas our method only took 26.428 s, only 4.2% of the total time

that PolyFit took. And for Building #6, PolyFit took about 37 min, whereas our method

accomplished this task in 37.383 s, nearly 60 times faster than PolyFit.

Figure 7.