ArticlePDF Available

DSM Generation from High Resolution Multi-View Stereo Satellite Imagery

May 2019
Photogrammetric Engineering & Remote Sensing 85(5):379-387

May 2019
85(5):379-387

DOI:10.14358/PERS.85.5.379

Authors:

Ke Gong

Universität Stuttgart

Dieter Fritsch

Universität Stuttgart

Along with improvements to spatial resolution, multiple-view stereo satellite imagery has become a valuable datasource for digital surface model generation. In 2016, a public multi-view stereo benchmark of commercial satellite imag- ery was released by the John Hopkins University Applied Physics Laboratory, USA. Motivated by this well-organized benchmark, we propose a pipeline to process multi-view satellite imagery into digital surface models. Input images are selected based on view angles and capture dates. We apply the relative bias-compensated model for orientation, and then generate the epipolar image pairs. The images are matched by the modified tube-based SemiGlobal Matching method (tSGM). Within the triangulation step, very dense point clouds are produced, and are fused by a median filter to generate the Digital Surface Model (DSM). A comparison with the reference data shows that the fused DSM generated by our pipeline is accurate and robust.

Workflow of the DSM generation using MVS satellite imagery.

…

Area with vegetation in: (a) Point cloud from winter images, (b) Point cloud from summer images, (c) Reference DSM.

…

WorldView-3 image (a) test site 1, (b) test site 2, (c) test site 3.

…

The fused point cloud of: (a) test site 1, (b) test site 2, (c) test site 3.

…

The Reference DSM of: (a) test site 1, (b) test site 2, (c) test site 3.

…

Figures - uploaded by Dieter Fritsch

Content may be subject to copyright.

Content uploaded by Dieter Fritsch

Content may be subject to copyright.

DSM Generation from High Resolution Multi-View

Stereo Satellite Imagery

K. Gong and D. Fritsch

Abstract

Along with improvements to spatial resolution, multiple-

view stereo satellite imagery has become a valuable data-

source for digital surface model generation. In 2016, a public

multi-view stereo benchmark of commercial satellite imag-

ery was released by the John Hopkins University Applied

Physics Laboratory, USA. Motivated by this well-organized

benchmark, we propose a pipeline to process multi-view

satellite imagery into digital surface models. Input images

are selected based on view angles and capture dates. We

apply the relative bias-compensated model for orienta-

tion, and then generate the epipolar image pairs. The im-

ages are matched by the modiﬁed tube-based SemiGlobal

Matching method (tSGM). Within the triangulation step,

very dense point clouds are produced, and are fused by a

median ﬁlter to generate the Digital Surface Model (DSM).

A comparison with the reference data shows that the fused

DSM generated by our pipeline is accurate and robust.

Introduction

Background

Over the last decade, a number of High Resolution Satellite

(HRS) sensors have been launched by commercial companies

or space agencies, like Sentinel-2, WorldView-3/4, Pléiades,

and so on. The best Ground Sample Distance (GSD) of HRS

panchromatic imagery has reached the 30 cm level, which

reveals more surface features. The HRS sensors cover most of

the regions of the Earth and collect the surface information

with large range footprints. They have high revisit frequency

over a certain area, which can provide a large number of im-

age collections and make the acquisition of multi-view stereo

(MVS) satellite imagery available. As well-known, the Rational

Polynomial Coefﬁcients (RPCs) are provided by the satel-

lite data vendor, instead of the rigorous push-broom sensor

model. Thus, data consumers can ignore the difference of the

satellite sensors and easily process the satellite data by ap-

plying a general pipeline. Because of these beneﬁts, the MVS

high resolution satellite images are useful for global three-

dimensional (3D) mapping, environmental monitoring, urban

planning, change detection, and so on.

In 2016, a public MVS benchmark of commercial satellite

imagery was released by the John Hopkins University Applied

Physics Laboratory (JHU/APL), USA. The benchmark contains

50 DigitalGlobe WorldView-3 panchromatic and multispectral

images. The imagery covers a 100 square kilometers area close

to San Fernando, Argentina, with GSD of the nadir images of

about 30 cm. High resolution image data was made available

which was captured from November 2014 to January 2016.

The benchmark also provides a (Light Detection and Ranging)

LiDAR point cloud collected on June 2016 as the ground truth,

with nominal point spacing of 20 cm. Digital surface models

(DSMs) at 30 cm GSD are produced from the LiDAR point cloud,

in order to make equally-spaced comparisons with the results

generated from Worldview-3 panchromatic imagery (Bosch

et al. 2016). This well-organized MVS high resolution satellite

benchmark has motivated us to learn and test new methods of

point cloud and DSM generation from MVS satellite data.

It is well known that MVS imagery 3D reconstruction meth-

ods can be classiﬁed into two categories. The ﬁrst category

solves the multi-view triangulation problem for all images

simultaneously, which is the true multi-view method (Furuka-

wa and Hernandez 2015). The second category only uses the

binocular stereo pairs. It processes the stereo pairs separately

and fuses the output point clouds or DSMs to a ﬁnal result

(Haala 2013). Comparing the binocular stereo strategy with

the true multi-view method, the latter is more rigorous but

also more complicated. Because of the efﬁciency and stable

performance of the semiglobal matching (SGM) algorithm

(Hirschmüller 2008), most solutions for the 3D reconstruction

from the MVS satellite imagery is implemented using binocu-

lar stereo methods (d’Angelo and Kuschk 2012; Kuschk 2013;

Qin 2017; Facciolo et al. 2017). Some researchers have inves-

tigated and compared both kinds of reconstruction strategies

on MVS satellite images (Ozcanli et al. 2015). In their imple-

mentation, the pair-wise multi-view reconstruction method

demonstrated better results than the true multi-view method.

In this paper, we present a pipeline based on the binocular

stereo method for DSM generation using MVS high resolu-

tion satellite imagery. The point clouds and DSMs, which are

separately generated from different stereo pairs, will be fused

to the ﬁnal DSM. The fused ﬁnal DSM is compared to the refer-

ence DSM for further evaluations. We conduct a qualitative

analysis by visual comparison and calculate the complete-

ness, the median error, the root-mean-square error (RMSE) and

the error distribution for the quantitative analysis. We show,

that our proposed pipeline can produce accurate and robust

DSM from MVS satellite imagery.

The contents of this paper are structured as follows: Sec-

tion “Related Work” introduces the related work, whereas the

methodology of the proposed pipeline is presented in the sec-

tion “Methodology”. Section “Experiments” demonstrates the

results generated from the benchmark data and their evalua-

tion, and in the last section we draw some conclusions.

Related Work

The high resolution satellite sensors are able to provide plenty

of imagery for a certain area, but usually they are collected on

different dates. Thus, the collected images may have different

illumination situations, different geometric conﬁgurations, and

may contain terrain changes. All of those differences will have

negative inﬂuences on the outcome of the DSM generation. A

Institute for Photogrammetry, University of Stuttgart, 70174

Stuttgart, Germany (ke.gong@ifp.uni-stuttgart.de, dieter.

fritsch@ifp.uni-stuttgart.de.

Photogrammetric Engineering & Remote Sensing

Vol. 85, No. 5, May 2019, pp. 45–xxx.

0099-1112/18/45–xxx

and Remote Sensing

doi: 10.14358/PERS.85.5.xxx

PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING

May 2019 45

large number of stereo images also means that image process-

ing is quite time-consuming. Therefore, ﬁnding a strategy to

select the most useful image pairs is an essential preprocedure

for the DSM generation from the MVS satellite data. d’Angelo

et al (2014) suggested that the intersection angle of the image

pairs’ views is the biggest factor that impacts performance.

They selected the image pairs having intersection angles

between 15 and 25 degrees. After the release of the JHU/APL’s

benchmark, the following Intelligence Advanced Research

Projects Activity multi-view stereo 3D mapping challenge

encourages more researchers to ﬁnd suitable image selection

strategies, especially, when they have to face hundreds or

thousands of possible pairs. G. Facciolo et al (2017) sorted all

possible image pairs by the completeness of their computed

DSMs, and they built a Pearson’s correlation matrix for different

factors. According to their observations, the temporal prox-

imity, maximum incidence angle, and the intersection angle

between views are three main factors on the ﬁnal quality of

the DSM. They selected image pairs having intersection angles

between 5 and 45 degrees. All the images have an incidence

angle less than 40 degrees. The image pairs with smaller date

differences are supposed to have higher accuracy. Qin (2017)

also agreed that the intersection angle is critical to the qual-

ity of the generated DSMs. He found that when the intersection

angle of the image pair is smaller than 8 degrees or larger than

40, the generated DSM performs poorly. He therefore has chosen

the image pairs with intersection angles from 10 to 30 degrees.

In the standard MVS processing work ﬂow, the Structure-

from-Motion or camera model orientation is the critical ﬁrst

step. As is well-known, the HRS data vendors prefer to provide

the RPC along with the imagery to the users, instead of the

traditional exterior and interior parameters. The RPCs have no

physical meanings at all. With them, a ratio of two polynomi-

als builds the relation between the image and the object coor-

dinates. It has been veriﬁed by many practical experiments,

that the RPC model can replace the rigorous sensor model

while maintaining the accuracy (Grodecki and Dial 2001;

Hanley and Fraser 2001; Fraser et al. 2002; Grodecki and Dial

2003). A popular solution for the orientation of the satellite

imagery is the bias-compensated RPCs bundle block adjust-

ment. Grodecki and Dial (2003) have given a detailed descrip-

tion of this method. It minimizes the bias in image space

with some additional compensation models, for instance, the

shift model or the afﬁne model. The bias-compensated RPC’s

bundle block adjustment requires some ground control points

(GCPs). It is widely applied for the absolute orientation of the

satellite stereo images (d’Angelo and Kuschk 2012; Ozcanlil

et al. 2015; Gong and Fritsch 2016). For MVS satellite imagery,

the GCPs in a certain region are not always easily to access. In

this situation, the relative orientation of the stereo image pairs

is needed. Franchis et al. (2014) has pointed out, that the in-

accurate RPC models cause relative pointing errors. This error

means that the corresponding points are not located on the

related epipolar lines. It can be measured as a simple transla-

tion when the image is small. In their approach, they divided

an image into tiles and calculated the translations between

the corresponding points and the epipolar line separately. The

median of the translations of different tiles is applied to the

whole image to remove the relative pointing error. Qin (2017)

applied pair-wise bias-compensation by using tie points

ﬁrst. Then he conducted least squares minimization for the

registration of the generated DSM and the reference DSM. The

parameters of the DSM registration will be reused to calculate

a translation in image domain for the RPCs reﬁnement. In our

previous research (Gong and Fritsch 2017), we proposed the

relative bias-compensated model without GCPs. We extract

some tie points ﬁrst and calculate the virtual ground control

information with them. The RPCs are reﬁned pairwise by an

additional afﬁne model and by applying the virtual ground

control information. Thus, the relative bias-compensated

model is also a basic strategy applied in this paper.

To generate the dense point cloud and the DSM, the SGM al-

gorithm is the most popular solution for pixel-wise matching

of the HRS imagery. Many experiments have proved that SGM

can generate dense point clouds with reliable quality from sat-

ellite data (d’Angelo and Reinartz 2011; Wohlfeil et al. 2012;

d’Angelo and Kuschk 2012; Gong and Fritsch 2016). The SGM

algorithm requires for the input more or less epipolar images,

so that the search dimension can be reduced. Unlike the tra-

ditional frame camera imagery, for the HRS imagery it is hard

to generate the epipolar geometry, because of the changing

perspective center and attitudes. Kim (2000) has explained

in his work that the epipolar lines of the satellite push-broom

sensors are more like hyperbola curves than straight lines, and

the epipolar pairs can only exist locally. Based on this conclu-

sion, Wang et al. (2010) proposed the projection-trajectory

epipolarity model. The epipolar pair is generated by project-

ing points from one image to another with the RPCs. To resam-

ple the epipolar image pair, Wang et al. (2011) deﬁne a Project

Reference Plane (PRP) in a local vertical coordinate system.

The stereo images will be projected onto the PRP. An afﬁne

model is applied to transfer the original images to the epipo-

lar images on the PRP. Koh et al. (2016) also applied the PRP

for epipolar image resampling, but they proposed a piecewise

method. They divided the epipolar line into several curves on

the PRP. Then a ﬁfth order polynomial function is applied to

ﬁt and resample all the epipolar curves. Oh (2011) proposed

an epipolar resampling strategy in the image space instead of

object space. According to his work, the orthogonal line to the

track is calculated to generate a set of start points. These start

points have a proper interval (1000 pixels) and they expanded

segments to approximate the epipolar line. Epipolar line pairs

are assigned to a constant row for y-parallaxes removal when

the segment expansion is about to ﬁnish.

The generated dense point clouds are placed into a regular

spaced and discretized grid in the Universal Transverse

Mercator (UTM) coordinate system. According to the binocular

stereo reconstruction method, the dense point clouds or DSMs

need to be fused for the ﬁnal result. Kuschk (2013) selected

the simple and common median ﬁlter to get the ﬁnal height

of every cell. Qin (2017) proposed an adaptive depth fusion

method, which considers the spatial consistency. He deﬁned a

window centered at a cell, and applied all the cells within the

window as candidates of the height value ﬁltering. Facciolo

et al (2017) proposed a clustering-based method. The height

of each cell is estimated by the k-medians clustering. The

number of clusters are increasing (1 to 8) until the clusters are

close enough to the predeﬁned precision. The lowest cluster

is kept as the ﬁnal altitude.

Methodology

The proposed pipeline of this paper is based on the binocular

stereo method for the DSM generation. It is semiautomatic and

need some tie points for all the images—all are implemented

by our self-programmed C++ modules. This section presents

every step of the pipeline. Generally, it is divided into these

steps: image selection, relative orientation and image rectiﬁ-

cation, dense image matching, triangulation and DSM fusion.

The workﬂow is presented in Figure 1.

Image Selection

As mentioned in the previous section, the MVS satellite

images are collected on different dates. The differences in

illumination situation, geometric conﬁguration, and season

will lead to bad matching performance. So a suitable im-

age selection procedure is needed to select the most useful

46 May 2019

PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING

pairs and reduce the comput-

ing time. It is commonly agreed

that the intersection angle of the

stereo images is the biggest factor

that affects the quality of the DSM

generation (d’Angelo et al. 2014;

Facciolo et al. 2017; Qin 2017).

The differences in collection date

and the incidence angle of the view

can also have an inﬂuence on the

result. Therefore, we deﬁne our im-

age selection strategy according to

these three factors.

First of all, those images are

eliminated which have large inci-

dence angles of the views. Because

the spatial resolution of the satellite

image becomes lower when the

incidence angle is larger, the perfor-

mance of the dense image match-

ing will get worse. In our image

selection strategy, we only use the

satellite images whose incidence

angle is less than 35 degrees.

Next, the intersection angles of the views of every stereo

pair are computed. As suggested by some previous references

(d’Angelo et al. 2014; Facciolo et al. 2017), image pairs are

less useful if their intersection angle is either too large or too

small. We select the image pairs having intersection angles

between 5 degrees and 35 degrees.

Lastly, the inﬂuence of the collecting dates is taken into con-

sideration. As mentioned above, in most cases, the closer the

image collecting dates are, the better the results we can obtain.

According to our observation, there can be two exceptions:

1. The image collecting dates of the stereo images are

relatively close, but the images present different seasons’

features. Figure 2a and 2b display two images collected

on 3 October 2015 and 22 October 2015. The point cloud

generated by dense image matching is shown in Figure 2c.

2. The interval between the captured dates is large, but the

images are collected in the same season. An example is

shown in Figure 3. Figure 3a is a satellite image collected

on 14 November 2014, and Figure 3b was collected on

18 December 2015. The point cloud generated from this

stereo pair is displayed in Figure 3c.

Figure 1. Workﬂow of the DSM generation using MVS satellite imagery.

(a) (b) (c)

Figure 2. Image collected on close dates and the related point cloud.

(a) (b) (c)

Figure 3. Image collected on different years and the related point cloud.

PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING

May 2019 47

According to Figure 2c, we can observe that the area

marked by the ellipse has a lot of matching failures. As

Figure 2a and Figure 2b have shown, the trees in this area

have already grown new leaves on 22 October 2015 but have

no leaves on 3 October 2015. There is an apparent season

change from the ﬁrst image to the second, although the

collecting dates are close. Comparing with the other stereo

pair’s results, the same region in Figure 3c has denser points.

Although the interval of the collecting dates is more than one

year, the terrain features presented in Figure 3a and Figure 3b

are similar. The stereo images collected in different years can

be well-matched, if there are slight seasonal changes. There-

fore, not only the collecting dates but also the season shown

in the stereo pair will play a role on the quality of the gener-

ated point cloud and DSM.

Since the seasonal changes are mainly presented by the

vegetation, we have selected a subarea with vegetation, and

generated the point clouds from an image pair collected

in summer and winter. Figure 4 exhibits the point clouds

generated from two seasons’ imagery and the corresponding

area in the reference LiDAR DSM. The vegetation area in the

reference DSM is ﬂourishing and closer to the reconstruction

from summer imagery. We have also learned, that the winter

images are noisier and have worse illuminations. In our image

selection strategy, we sort the images into two groups: winter

and summer, instead of in chronological order. In each group,

we ignore the year of data collection and order the images by

month. The image pairs, that have close collecting month in

the summer group, are selected as the inputs to generate the

DSMs in our pipeline.

Relative Orientation and Image Rectication

Because of the lack of GCPs, we cannot apply the bias-com-

pensated RPCs bundle block adjustment to the JHU/APL’s MVS

satellite benchmark data. Instead, a relative bias-compensated

model is used for the orientation of the MVS satellite imagery.

It has been proved, that the accuracy of the relative bias-com-

pensated model can reach sub-pixel level (Gong and Fritsch

2017). In the ﬁrst step, we select some tie points in all of the

input images manually. A subset of the tie points are selected

as virtual GCPs. In the relative bias-compensated model we

apply an additional afﬁne model to compensate for the bias

between different images, so at least four to six virtual GCPs

are needed (Fraser and Hanley 2005). A pair of stereo images

is selected to generate the virtual ground information. This

selected image pair requires a correction of its pointing error.

The pointing error is the distance between the correspond-

ing point and the corresponding epipolar line (Franchis et al.

2014). We compute the pointing errors of these virtual GCPs.

The afﬁne model is applied to estimate the correction of the

pointing error of the selected stereo pair. After the pointing

error correction, the object coordinates of these virtual GCPs

are calculated by the RPCs. The generated virtual GCPs have

a 3D translation to the true ground. They are then applied to

perform the bias-compensated bundle block adjustment for

all the input images. The adjustment will remove the relative

but not absolute bias for different images. The point clouds

and the DSMs generated from all stereo images are aligned to

the surface where we have the virtual ground points. Thus, no

further registration is needed for the point clouds and DSMs.

The projection-trajectory epipolarity method is used to

ﬁnd the corresponding epipolar curves. With the help of

the RPCs, a point on the base image can be projected to two

different height levels in object space. The object points are

back-projected to the slave image and acquire two intersected

image points. These image points can be used to approximate

the epipolar curve. Redoing the projections from the slave

image to the base image, we ﬁnd the corresponding epipolar

curve pair. Having had good experiences with the modiﬁed

piece-wise resampling strategy to approximate the epipolar

curve and resample the epipolar images (Gong and Fritsch

2017) it is applied here. The epipolar curve generation is

started from the points located on the boundary. Expanding

several epipolar segments with proper length from the start

points, we approximate the epipolar curve. The epipolar

segments are aligned to the same row. At last, the epipolar

images are resampled along the epipolar segments by the bi-

cubic interpolation.

Dense Image Matching and Triangulation

The proposed pipeline applies a modiﬁed SGM method—

which is called tube-based SGM (tSGM) (Rothermel et al. 2012)

to generate the very dense point clouds. tSGM is implemented

in the C++ library libTsgm, which is the core algorithm of the

software SURE. The usage of the library has been authorized

by nFrames GmbH, Stuttgart. Comparing it with the original

SGM method, the tSGM algorithm relies on the 9 × 7 Census

cost instead of the Mutual Information. Because the Census

cost is insensitive to parametrization and provides robust

results (Zabih and Woodﬁll 1994), it can also be implemented

in a hierarchical coarse-to-ﬁne method to limit disparity

search ranges. The results of the lower resolution pyramid

are introduced as the priors to determine the disparity search

ranges for the matching of the higher resolution pyramid

(Rothermel et al. 2012). The tSGM algorithm greatly reduces

computing time and optimizes memory efﬁciency. The dispar-

ity maps generated by the tSGM method are applied to derive

the corresponding pixels of every stereo image. With the

corresponding pixels, the dense point clouds are generated by

forward intersection.

(a) (b) (c)

Figure 4. Area with vegetation in: (a) Point cloud from winter images, (b) Point cloud from summer images, (c) Reference DSM.

48 May 2019

PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING

DSM Fusion

Following our image selection strategy, the number of the

point clouds could still be quite high. We must choose the

best point clouds for our DSM fusion. All of the input pairs

will ﬁrst be processed and then the generated point clouds

will be ranked according to their quality. The optimal num-

ber of point clouds that are applied for DSM fusion will be

discussed and presented in the section “Experiments”. As

explained before, the point clouds have already been aligned

on the same virtual ground surface, so no additional registra-

tion is needed. To generate the DSM, the point clouds are pro-

jected into a regularly spaced and discretized grid in the UTM

coordinate system. In our implementation, a simple median

ﬁlter is applied for the DSM fusion. The median value of the

height of each grid cell is computed as the ﬁnal height value

of the fused DSM. The Inverse Distance Weighted interpolation

method is applied if no points are projected to the cell.

Experiments

Three different test sites are selected from the MVS satellite

benchmark. The details of the test sites and the reference data

are illustrated in the section “Test Site and Evaluation Meth-

od”. In the section “Results and Analysis”, we present the

results of the proposed pipeline and the evaluation analysis.

Test Site and Evaluation Method

The proposed pipeline processes three different test sites from

the benchmark. All the WorldView-3 panchromatic images

were collected from November 2014 to January 2016, with col-

lection dates covering every month. The GSD of the imagery is

30 cm. Test site 1 is about 3000 * 3000 pixels. The sizes of test

site 2 and 3 are about 2000 * 2000 pixels. All the test sites are

close to Fernando, Argentina. They contain a range of differ-

ent terrain types, such as ﬁelds, residential areas and vegeta-

tion. Test site 3 contains several high-rise buildings. The refer-

ence LiDAR DSMs of all the test sites are given. The GSD of the

reference data is also 30 cm so that the comparison analysis

becomes easier. The three test sites are shown in Figure 5.

Results and Analysis

Following our image selection strategy, we keep the images

having an incidence angle less than 35 degrees. We sort the

images into summer and winter group, and only use the im-

ages of the summer group. The image pairs that have intersec-

tion view angles less than 5 degrees or larger than 35 degrees,

are eliminated. At last, 748 stereo pairs are selected as input

data in test site 1. In test site 2 and 3, 394 and 484 stereo pairs

are winners in the image selection strategy.

For each test site, 25 tie points are manually selected for all

the images. Ten of them are chosen as virtual ground control

points. These virtual GCPs are distributed evenly in the image

scene. We correct the pointing error of one selected stereo im-

age pair’s RPCs. With the corrected RPCs, the object coordinates

of the virtual GCPs are calculated. By applying the relative

bias-compensated model for all the images, we use additional

afﬁne models to compensate the bias caused by the RPCs. The

epipolar stereo images are generated by our modiﬁed piece-

wise epipolar resampling strategy. We apply the tSGM algo-

rithm to match all the stereo pairs and generate the disparity

maps. The point clouds are derived from the disparity maps,

and they are all aligned to the same virtual ground surface.

With hundreds of point clouds, we only select the best ones

for our DSM fusion. In order to investigate the optimal number

of the input point clouds, the point clouds are ranked by the

completeness criterion. It is widely stated that the height mea-

surement is accurate within three times of the GSD, which is

about 1 meter in our case. Therefore, we present the percentage

of the points that have height differences to the ground truth

of less than 1 m as the completeness measure. To be noticed,

the point clouds need to be aligned to the reference DSM before

the ﬁnal quality analysis is carried out. Since there are no GCPs

in our MVS satellite imagery benchmark and we have the RPCs

instead of the exterior parameters, we undertake the registra-

tion via a coarse-to-ﬁne method without any GCPs. In different

spatial resolution levels, the point cloud is moved to the refer-

ence DSM by given 3D translation shifts. The height difference

of the shifted point cloud and the reference DSM is calculated.

Iteratively, the translational shift is modiﬁed until the median

error of the height differences is minimized (Bosch et al. 2016).

In this way, we can minimize the shift between the point cloud

and the ground truth, which is caused by our relative orien-

tation procedure. For testing, we select different number of

top-ranked point clouds to do the DSM fusion. The point clouds

are converted into a discretized and regular spaced grid in the

UTM coordinate system. The fusion of the point clouds is ap-

plied by a simple median ﬁlter. In order to estimate the quality

of the fused DSM, we compute the completeness and the RMSE

of the height differences. Figure 6 demonstrates the complete-

ness and RMSE as a function of the number of the input point

clouds. In Figure 6, the blue solid lines represent the result of

test site 1, the red dash lines represent the result of test site 2,

and the green dot lines represent the result of test site 3.

According to Figure 6a, we ﬁnd that the completeness

decreases when the number of point clouds used for fusion is

too low. Then along with the increasing number of the point

clouds used, the completeness increases until it reaches the

peak. For test site 1, the completeness has the highest rank

of about 75% when the number of the input point clouds is

about 30. For test site 2, the completeness reaches a peak of

68% when about 25 point clouds are applied. Finally, for test

site 3, the best completeness is 56% with a corresponding

point cloud count of 25. Above the peak, the completeness

(a) (b) (c)

Figure 5. WorldView-3 image (a) test site 1, (b) test site 2, (c) test site 3.

PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING

May 2019 49

becomes worse if more point clouds are feeding the fusion.

We also notice that the completeness of test site 1 is the

highest of the three areas, and that test site 3 has the lowest

completeness. Moreover, the completeness decreases more

signiﬁcantly when too many point clouds are applied in the

DSM fusion of test site 3. The reason is that test site 1 has more

ﬁeld areas, and there are more residential areas in test site

2 and 3. In particular, there are several high-rise buildings

in test site 3, which will lead to larger shadow areas in the

images. The high-rise buildings reconstructed from different

stereo pairs might have very large height differences in the

boundary areas of the buildings. The dense residential areas

and the high-rise buildings lead to the loss of completeness.

By observing the relations presented in Figure 6b, the RMSE is

decreasing when the number of fused point clouds is small at

the beginning. Then the RMSE increases if more point clouds

are applied. The accuracy is reduced because more errors are

introduced by some lower quality image pairs. Test site 1 has

the best RMSE, then test site 2, and test site 3 has the largest

height differences to the ground truth. So more dense residen-

tial areas and high-rise buildings reduce the accuracy of the

reconstruction. Considering both the completeness and the

RMSE, we should select the number of point clouds which can

provide the best completeness while having a relatively small

RMSE. Therefore, the optimal number of the applied point

clouds for the DSM fusion is 30 for test site 1. The optimal

number for test site 2 is 20 point clouds. And we select 20

point clouds as the optimal number to generate the ﬁnal fused

DSM for test site 3. By checking the selected point clouds for

the ﬁnal fusion, we ﬁnd that most of these point clouds are

generated from stereo images collected on close date and the

intersection angles of most stereo pairs are between 10 and 30

degrees. This also proved that the image selection strategy is

effective and the intersection angles of the stereo pairs can be

limited from 10 to 30 degree in the future’s experiments.

The fused point clouds are displayed in Figure 7. The

fused DSMs of the three test sites, which are generated from

the optimal number of the point clouds, are displayed in

Figure 8. The reference DSMs of the three test sites are demon-

strated in Figure 9.

In order to evaluate the quality of our fused DSMs quantita-

tively, a comparison is made between the fused DSM and the

reference LiDAR DSM for all three test sites. The median height

difference, the RMSE of the height difference and completeness

of the results are computed to check the accuracy. Moreover,

we computed the normalized median deviation (NMAD), 68%

and 95% quantiles of the absolute height errors to evalu-

ate the robustness of the fused DSM. The statistic evaluation

results are illustrated in Table 1.

As Table 1 demonstrates, the RMSE of the DSM of test site1is

2.7 m. The fused DSM of test site 2 has a RMSE of 3.81 m, and

for test site 3, the RMSE is 4.08 m. The completeness of the

three test sites are 75.4%, 68.9%, and 55.9%, respectively. As

we have discussed before, the dense residential areas in test

site 2 and 3 decrease their accuracy and completeness. The

high-rise buildings in test site 3 reduce the completeness sig-

niﬁcantly. The NMADs are 0.5 m, 0.6 m, and 1.0 m for test site 1,

2, and 3. The 68 percent quantiles of the test sites are 0.66 m,

0.96 m, and 1.45 m. The dense residential areas and high-rise

buildings cause more shadows and have a negative inﬂuence

on the robustness. The distribution of the height differences of

the three test sites are depicted as histograms presented in Fig-

ure 10. According to Figure 10, the distribution of the height

differences in test site 1 and 2 are more concentrated than in

test site 3. Because the high-rise buildings and its large range

shadow causes more errors during the DSM generation.

To show the 3D capability of the reconstructed point clouds

and to conduct some qualitative analysis, the fused point

clouds are visualized by the open source software CloudCom-

pare. Several subareas are extracted from the three test sites as

regions of interest (ROIs) to analyze the reconstructed details

of the fused point clouds. The ROIs on the fused point clouds

and the reference DSM are shown in Figure 11.

The left row in Figure 11 shows the ROIs extracted from

the reference LiDAR DSM, and the right row displays the cor-

responding areas of the fused point clouds. In Figure 11a and

11b, we can ﬁnd an isolated large building in the extracted

area. In Figure 11b, the reconstructed building’s edges are

sharp except the edge on left-bottom. The blur of the edge is

caused by the shadow at this side. Comparing to the reference

DSM, the detail features on the roof of the isolated building are

also reconstructed in the fused point cloud. Figure 11c and

11d show an area which have some connected buildings, and

(a)

(b)

Figure 6. Relation between the number of point clouds and

(a) completeness, (b) RMSE.

Table 1. Evaluation result of the fused DSM.

Test site 1 Test site 2 Test site 3

Median error (m) 0.320 0.390 0.728

RMSE (m) 2.702 3.810 4.081

Completeness (%) 75.40 68.86 55.93

NMAD (m) 0.503 0.628 1.023

Aq68 (m) 0.660 0.960 1.455

Aq95 (m) 5.930 6.288 6.507

50 May 2019

PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING

(a) (b) (c)

Figure 7. The fused point cloud of: (a) test site 1, (b) test site 2, (c) test site 3.

(a) (b) (c)

Figure 8. The fused DSM of: (a) test site 1, (b) test site 2, (c) test site 3.

(a) (b) (c)

Figure 9. The Reference DSM of: (a) test site 1, (b) test site 2, (c) test site 3.

(a) (b) (c)

Figure 10. Distribution of height differences: (a) test site 1, (b) test site 2, (c) test site 3.

PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING

May 2019 51

these buildings are surrounded by trees.

The upper and right boundaries of the

buildings are reconstructed more clearly,

because they are not affected by shadows

at all. We can observe that the buildings

are hard to distinguish by its boundaries

if the trees are too close. The height of the

trees vary from image to image. So the veg-

etation in the fused point cloud is not fully

ﬁtting the reference DSM, which means

that the vegetation will introduce errors to

the generated point cloud and the DSM. To

generate higher accuracy DSMs, the vegeta-

tion should be masked in the future. The

next ROI includes a high-rise building. It

is displayed in Figure 11e and 11f. The

top and left side of the building have no

shadows and they have sharper edges than

the other two sides. Comparing to the low-

rise isolated building in the ﬁrst ROI, the

high-rise buildings have a larger range of

shadow areas and therefore have stronger

negative effects on the reconstruction. The

edges on the shadow side is blurred. More-

over, there is a small part missing from the

high-rise building. We select an area which

is full of low-rise and intensive buildings

as our last ROI. This residential area is

exhibited in Figure 11g and 11h. The fused

point cloud exhibits poor performance

because the buildings are too close, and

the shadows of the buildings are often cast

on the nearby buildings. The reconstructed

buildings are connected to each other and

have totally blurred boundaries. It is chal-

lenging to reconstruct the residential area

as separate buildings. Generally, the fused

point clouds can reconstruct the terrain

surface with some detail. The pipeline has

worse performance for high-rise buildings

and intensive residential areas, and the

vegetation and shadows will cause some

trouble during the reconstruction.

Conclusion and Outlook

In this paper, we propose a pipeline for DSM

generation from MVS satellite images. The

methods of the pipeline are implemented

by self-programmed C++ modules. Experi-

ments were carried out on three different

test sites provided by JHU/APL’s MVS satellite

image benchmark. We propose an image se-

lection strategy that considers the incidence

angles of the view, the intersection angles

and the collected dates. Those images

having large incidence angles, too small or

too large intersection angles, or those col-

lected in winter are eliminated. Image pairs

collected on close months in summer are

selected as the inputs. We apply the relative

bias-compensated model for the relative

orientation, which aligns the point clouds

to a virtual ground surface. No further point

cloud registration is needed before the fu-

sion step. Following the pipeline, the point

clouds are generated pairwise. The opti-

mal number of the involved point clouds

for DSM fusion is investigated. We apply

those point clouds which lead to the best

(a) (b)

(e) (f)

(g) (h)

Figure 11. Detail comparison between the reference DSM and the generated 3D model.

52 May 2019

PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING

completeness and maintain a relative low RMSE. In our experi-

ment, the results show that the optimal number of the point

clouds is thirty for test site 1. For test sites 2 and 3 we apply the

top-ranked twenty point clouds. More additional point clouds

introduce errors because some of them are of low quality. The

point clouds are converted into grids in the UTM system and are

fused to generate the DSM. The fusion applies the median ﬁlter.

The RMSEs of the fused DSM are 2.7 m for test site 1, 3.81 m for

test site 2, and 4.08 m for test site 3. The completeness of test

sites 1, 2, and 3 are 75.4%, 68.9%, and 55.9%. The NMAD of the

DSMs are all below 1 m and the 68% quantiles of the height dif-

ference distribution are all below 1.5 m. The proposed pipeline

to generate DSMs from MVS satellite imagery is accurate and

robust. Fusing point clouds for the ﬁnal DSM can reconstruct

the terrain surface with some detail features. High-rise build-

ings and intensive residential areas reduce the accuracy and

completeness of the DSM. The pipeline has bad performance

using imagery that includes shadows and areas of vegetation.

There are still some aspects which need to be improved in

our work. First, there are some vegetation areas in the im-

ages under investigation. Because the MVS satellite images are

collected on different dates, the vegetation introduce errors in

the dense image matching step and reduce the accuracy of the

generated DSM. We can classify the MVS imagery and mask out

the vegetation to improve the quality of the results. Second,

the images that are collected in winter are not applied in our

current pipeline. The seasonal changes are mainly presented

as difference in the vegetation. If the vegetation is masked out,

some winter images with good illumination conditions can

also be used in our procedure. Third, we apply the median ﬁl-

ter for the DSM fusion. There might be better solutions to fuse

the DSM than to simply take the median height values of the

cell. At last, we implement a binocular stereo method in our

pipeline. It is interesting to see how the true multi-view algo-

rithm will work on the MVS high resolution satellite imagery.

Acknowledgements

The authors would like to thank John Hopkins University

Applied Physics Lab for providing the well-organized MVS

satellite imagery benchmark. The authors also would like to

acknowledge the advice of Dr. Mathias Rothermel about the

MVS reconstruction. Finally, the grant of the Chinese Scholar-

ship Council (CSC) supporting the research of the ﬁrst author

is gratefully acknowledged.

References

Bosch, M., Z. Kurtz, S. Hagstrom and M. Brown. 2016. A multiple

view stereo benchmark for satellite imagery. Pages 1–9 in IEEE

Applied Imagery Pattern Recognition Workshop (AIPR).

d’Angelo, P. and G. Kuschk. 2012. Dense multi-view stereo from

satellite imagery. Pages 6944–6947 in IEEE International

Geoscience and Remote Sensing Symposium (IGARSS) .

d’Angelo, P. and P. Reinartz. 2011. Semiglobal matching results on

the ISPRS stereo matching benchmark. Pages 79–84 in ISPRS

Hannover Workshop.

d’Angelo, P., C. Rossi, C. Minet, M. Eineder, M. Flory and I. Niemeyer.

2014. High resolution 3D earth observation data analysis for

safeguards activities. In Symposium on International Safeguards:

Linking Strategy, Implementation and People. Nukleare

Entsorgung und Reaktorsicherheit.

De Franchis, C., E. Meinhardt-Llopis, J. Michel, Morel and G.

Facciolo. 2014. An automatic and modular stereo pipeline

for pushbroom images. ISPRS Annals of the Photogrammetry,

Remote Sensing and Spatial Information Sciences.

Facciolo, G., C. De Franchis and E. Meinhardt-Llopis. 2017.

Automatic 3D reconstruction from multi-date satellite images.

Pages 57–66 in Proceedings of the IEEE Conference on Computer

Vision and Pattern Recognition Workshops.

Fraser, C., H. Hanley and T. Yamakawa. 2002. Three‐dimensional

geopositioning accuracy of Ikonos imagery. The

Photogrammetric Record 17 (99): 465–479.

Fraser, C. and H. Hanley. 2005. Bias-compensated RPCs for sensor

orientation of high-resolution satellite imagery. Photogrammetric

Engineering & Remote Sensing 71 (8): 909–915.

Furukawa, Y. and C. Hernández. 2015. Multi-view stereo: A tutorial.

Foundations and Trends® in Computer Graphics and Vision 9

(1–2): 1–148.

Gong, K. and D. Fritsch. 2016. A detailed study about digital surface

model generation using high resolution satellite stereo imagery.

ISPRS Annals of the Photogrammetry, Remote Sensing and

Spatial Information Sciences, 3 (1).

Gong, K. and D. Fritsch. 2017. Relative orientation and modiﬁed

piecewise epipolar resampling for high resolution satellite

images. Page 42 in The International Archives of Photogrammetry,

Remote Sensing and Spatial Information Sciences.

Grodecki, J. and G. Dial. 2001. IKONOS geometric accuracy. Pages

19–21 in Proceedings of Joint Workshop of ISPRS Working

Groups I/2, I/5 and IV/7 on High Resolution Mapping from

Space, vol. 4.

Grodecki, J. and G. Dial. 2003. Block adjustment of high-

resolution satellite images described by rational polynomials.

Photogrammetric Engineering & Remote Sensing, 69 (1): 59–58.

Haala, N. 2013. The landscape of dense image matching algorithms.

In Photogrammetric Week’13, edited by D. Fritsch, 271–284.

Wichmann/VDE Verlag Berlin/Offenbach.

Hanley H. and C. Fraser. 2001. Geopositioning accuracy of IKONOS

imagery: Indication from two dimensional transformations.

Photogrammetric Record 17 (98): 317–329.

Hirschmuller, H. 2008. Stereo processing by semiglobal matching and

mutual information. IEEE Transactions on Pattern Analysis and

Machine Intelligence 30 (2): 328–341.

Kim, T. 2000. A study on the epipolarity of linear pushbroom images.

Photogrammetric Engineering and Remote Sensing 62 (8): 961–966.

Koh, J. and H. Yang. 2016. Uniﬁed piecewise epipolar resampling

method for pushbroom satellite images. EURASIP Journal on

Image and Video Processing 2016 (1): 11.

Kuschk, G. 2013. Large scale urban reconstruction from

remote sensing imagery. In International Archives of the

Photogrammetry, Remote Sensing and Spatial Information

Sciences, vol. 5/W1.

Oh, J. 2011. Novel Approach to Epipolar Resampling of HRSI and

Satellite Stereo Imagery-Based Georeferencing of Aerial Images,

Ph.D. Dissertation, The Ohio State University.

Ozcanli, O. C., Y. Dong, Mundy, H. Webb, R. Hammoud and V. Tom.

2015. A comparison of stereo and multiview 3-D reconstruction

using cross-sensor satellite imagery. Pages 17–25 in Proceedings

of the IEEE Conference on Computer Vision and Pattern

Recognition Workshops (CVPRW).

Qin, R. 2017. Automated 3D recovery from very high resolution

multi-view satellite images. In ASPRS 2017 Conference Annual.

Rothermel, M., K. Wenzel, D. Fritsch and N. Haala. 2012. SURE:

Photogrammetric surface reconstruction from imagery. In

Proceedings LC3D Workshop, held in Berlin, Germany, vol. 8.

Wang, M., F. Hu and J. Li. 2010. Epipolar arrangement of

satellite imagery by projection trajectory simpliﬁcation. The

Photogrammetric Record 25 (132): 422–436.

Wang, M., F. Hu and J. Li. 2011. Epipolar resampling of linear

pushbroom satellite imagery by a new epipolarity model. ISPRS

Journal of Photogrammetry and Remote Sensing, 66 (3): 347–355.

Wang, Y. 1999. Automated triangulation of linear scanner imagery.

Pages 27–30 in Joint Workshop of ISPRS WG I/1, I/3 and IV/4 on

Sensors and Mapping from Space.

Wohlfeil, J., H. Hirschmüller, B. Piltz, A. Börner and M. Suppa. 2012.

Fully automated generation of accurate digital surface models

with sub-meter resolution from satellite imagery. International

Archives of the Photogrammetry, Remote Sensing and Spatial

Information Sciences: 34–B3.

Zabih, R. and J. Woodﬁll. 1994. Non-parametric local transforms for

computing visual correspondence. Pages 151–158 in Computer

Vision ECCV’94, Lecture Notes in Computer Science. Edited by

J.-O. , vol. 801. Springer Berlin Heidelberg..

PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING

May 2019 53

Assignment of Copyright to ASPRS

Current copyright law requires that authors of papers submitted for publication in

Photogrammetric Engineering & Remote Sensing

transfer

copyright ownership to the American Society for Photogrammetry and Remote Sensing before the paper may be published. Upon receipt of

this form with your Master Proof, please complete this form and forward it to the Production Coordinator (address below).

Manuscript Title: _________________________________________________________________________________________________________

________________________________________________________________________________________________________________________

Author(s): _______________________________________________________________________________________________________________

________________________________________________________________________________________________________________________

Assignment of Copyright in the above-titled work is made on (date) _____________________ from the above listed author(s) to the

American Society for Photogrammetry and Remote Sensing, publisher of

Photogrammetric Engineering & Remote Sensing

In consideration of the Publisher’s acceptance of the above work for publication, the author or co-author(s) hereby transfer(s) to

the American Society for Photogrammetry and Remote Sensing the full and exclusive copyright to the work for all purposes for the

duration of the copyright. I (we) understand that such transfer of copyright does not preclude specic personal use, provided that

prior to said use, permission is requested from and granted by the American Society for Photogrammetry and Remote Sensing.

I (we) acknowledge that this paper has not been previously published, nor is it currently being considered for publication

by any other organization.

______________________________________________________________________________________________________________

Co-authors may ll out and submit this form separately (photocopies are acceptable) if desired; but all co-authors must sign either this or a

separate form.

Special Note to U.S. Government Employees

Material prepared by U.S. Government employees as part of their ofcial duties need not have the assignment of copyright transferred

since such material is automatically considered as part of the public domain.

If your paper falls within this category please check the appropriate statement and sign below.

__________ This paper has been prepared wholly as part of my (our) ofcial duties as (a) U.S. Government Employee(s). I (we)

acknowledge that this paper has not previously been published, nor is it currently being considered for publication,

by any other organization.

__________ This paper has been prepared partly in the course of my (our) ofcial duties as (a) U.S. Government Employee(s).

For any part(s) not prepared in the course of my (our) ofcial duties, copyright is hereby transferred to the American

Society for Photogrammetry and Remote Sensing. I (we) acknowledge that this paper has not previously been

published, nor is it currently being considered for publication, by any other organization.

______________________________________________________________________________________________________________

Please return this for m to:

Production Coordinator, ASPRS

5410 Grosvenor Lane, Suite 210

Bethesda, MD 20814-2160

301-493-0290, 301-493-0208 (fax), www.asprs.org

Author/co-author signatures

Oprint and Extra Pages Prices

Please refer to the price chart and conditions on the back of this form in order to complete the table below.

Quantity Description Amount

Oprints

Journal Covers (actual issue cover)

Extra Pages @125 per page (each page over 7 journal pages, reference “Instructions to Authors”.)

Shipping (applies to oprints and covers only (see price chart and conditions on back of this form.)

Subtotal

PE&RS Colorplate, Oprint, and Extra Pages Payment Form

This is the only opportunity available to order additional oset printing quality copies of your article. PE&RS does not go back on press after the

issue’s publication. Oprints are only shipped when ordered; they are not complimentary.

However, all authors will receive one complimentary copy of the PE&RS issue containing their article and a complimentary PDF of their article.

This form should be completed as soon as the initial proof is received, and should be returned with payment for oprints and page charges to

ASPRS.

Article Description: Manuscript # _____________________________

Article Title: ____________________________________________________________________________________________________________________________

______________________________________________________________________________________________________________________________________

Author(s): _____________________________________________________________________________________________________________________________

Author Contact Information:

Name: _________________________________________________________ Email: _____________________________________________________________

Aliation: _____________________________________________________________________________________________________________________________

Street Address: _________________________________________________________________________________________________________________________

City: ___________________________________________________________ State/Province: _____________________________________________________

Zip Code/Postal Code: ____________________________________________ Country: ___________________________________________________________

Phone: _________________________________________________________ Fax: _______________________________________________________________

Method of Payment:

Accepted manuscripts with color images will not be released for publication until this form is received along with payment. Payments must be made in US dollars,

drawn on a US bank, or appropriate credit card. Make checks payable to ASPRS. Keep a copy for your records.

 Visa  MasterCard  Discover  American Express  Check (print PE&RS issue date & manuscript # on check)

Name on Credit Card: _____________________________________________ Signature: __________________________________________________________

Account Number: _________________________________________________________ CSC: _____________ Expires (MO/YR): _____________________

Phone number for person above: ____________________________ Email Address (for receipt) ___________________________________________________

SEND this form along with method of payment to:

ASPRS, 425 Barlow Place, Suite 210, Bethesda, MD 20814-2160 rkelley@asprs.org or 301-493-0208 (fax) 12/13/2016

Color Plate Prices*

1-3 4-6 7+

$500 $1,000 $1,500

Subtotal

* For purposes of this form, a color plate is con-

sidered to be a plate that is numbered, even if it

contains several parts, e.g. (a), (b), and so on.

Total $ _____________

Shipping Information (if dierent from above):

Name: _________________________________________________________

Aliation: _____________________________________________________________________________________________________________________________

Street Address: _________________________________________________________________________________________________________________________

City: ___________________________________________________________ State/Province: _____________________________________________________

Zip Code/Postal Code: ____________________________________________ Country: ___________________________________________________________

Conditions:

Oprints are only shipped when ordered; they are not complimentary. However, all authors will receive one ccomplimentary copy of the

PE&RS issue containing their article and a complimentary PDF of their article. Oprint prices are based on the number of pages in an article.

Orders are restricted to the quantities listed below. If oprints are not ordered prior to the issue’s publication, reprints may be provided in

multiples of 100 at the customer’s expense. Shipping prices are based on weight and must be added to oprint and journal cover prices for an

order to be processed. If overnight delivery is required, customers must provide an account code for the expense of the service (please indicate

carrier and account number in the itemized table next to “Shipping”).

Prices:

Please enter the price that corresponds to the quantity of oprints and/or covers being ordered on the front of this form.

Oprints and Covers 25 50 75 100*

Oprints (1 to 4 page article) 9.00 15.00 21.00 27.00

Oprints (5 to 8 page article) 15.00 27.00 39.00 51.00

Oprints (9 to 12 page article) 21.00 39.00 57.00 75.00

Oprints (13 to 16 page article) 27.00 51.00 75.00 99.00

Journal Covers 9.00 15.00 21.00 27.00

Cover Sponsor (Applies to Outside Front Cover image suppliers only) 475.00 for 1000 covers and TOC

Shipping Prices:

Domestic

(UPS Ground) 25 50 75 100* Canada

(Air Parcel Post) 25 50 75 100*

Oprints (1 to 4 page article) 7.59 7.98 9.08 8.69 Oprints (1 to 4 page article) 19.14 20.78 21.45 22.11

Oprints (5 to 8 page article) 7.98 9.08 9.57 10.53 Oprints (5 to 8 page article) 20.68 22.11 23.87 25.74

Oprints (9 to 12 page article) 8.25 9.57 9.79 10.23 Oprints (9 to 12 page article) 21.45 23.87 26.57 28.82

Oprints (13 to 16 page article) 8.69 9.79 10.23 11.44 Oprints (13 to 16 page article) 22.11 25.74 28.85 31.96

Journal Covers 7.59 7.98 8.25 8.64 Journal Covers 19.14 20.68 21.45 22.11

Cover Sponsor (Applies to Outside Front

Cover image suppliers only) 70.00 Cover Sponsor (Applies to Outside Front

Cover image suppliers only)Contact ASPRS for Shipping Prices.

Mexico

(Air Parcel Post) 25 50 75 100* Overseas

(Air Parcel Post) 25 50 75 100*

Oprints (1 to 4 page article) 22.39 23.87 24.75 27.12 Oprints (1 to 4 page article) 28.05 30.31 41.69 36.06

Oprints (5 to 8 page article) 24.11 27.12 30.64 34.50 Oprints (5 to 8 page article) 30.36 36.03 45.26 47.41

Oprints (9 to 12 page article) 25.52 30.69 36.03 40.26 Oprints (9 to 12 page article) 33.17 41.69 49.55 55.99

Oprints (13 to 16 page article) 27.12 34.60 40.26 45.98 Oprints (13 to 16 page article) 36.03 47.41 55.99 64.51

Journal Covers 22.33 23.87 25.52 27.12 Journal Covers 28.05 30.31 41.69 46.03

Cover Sponsor (Applies to Outside Front

Cover image suppliers only)Contact ASPRS for Shipping Prices. Cover Sponsor (Applies to Outside Front

Cover image suppliers only)Contact ASPRS for Shipping Prices.

Remote Sensing Neural Radiance Fields for Multi-View Satellite Photogrammetry

Article

Full-text available

Jul 2023

Neural radiance fields (NeRFs) combining machine learning with differentiable rendering have arisen as one of the most promising approaches for novel view synthesis and depth estimates. However, NeRFs only applies to close-range static imagery and it takes several hours to train the model. The satellites are hundreds of kilometers from the earth. Satellite multi-view images are usually captured over several years, and the scene of images is dynamic in the wild. Therefore, multi-view satellite photogrammetry is far beyond the capabilities of NeRFs. In this paper, we present a new method for multi-view satellite photogrammetry of Earth observation called remote sensing neural radiance fields (RS-NeRFs). It aims to generate novel view images and accurate elevation predictions quickly. For each scene, we train an RS-NeRF using high-resolution optical images without labels or geometric priors and apply image reconstruction losses for self-supervised learning. Multi-date images exhibit significant changes in appearance, mainly due to cars and varying shadows, which brings challenges to satellite photogrammetry. Robustness to these changes is achieved by the input of solar ray direction and the vehicle removal method. NeRFs make it intolerable by requiring a very long time to train an easy scene. In order to significantly reduce the training time of RS-NeRFs, we build a tiny network with HashEncoder and adopted a new sampling technique with our custom CUDA kernels. Compared with previous work, our method performs better on novel view synthesis and elevation estimates, taking several minutes.

Automated Extraction of Topographic Map Data from Remotely Sensed Imagery by Classification and Cartographic Enhancement - An Introduction to New Mapping Tools

Technical Report

Full-text available

Sep 2023

Photogrammetry for Unconstrained Optical Satellite Imagery With Combined Neural Radiance Fields

Preprint

Full-text available

Jun 2023

p>We propose a novel generic method to address the challenge of handling unconstrained multi-view optical satellite photogrammetry under time-varying conditions of illumination and reflection. For one thing, we innovatively represent the surface radiance and albedo produced by extensive lights with continuous radiance fields based on the radiometry principle and then combine the static and transient components for satellite photogrammetry. For another, a novel self-supervised mechanism is introduced to optimize the learning process which leverages dark regions accentuation, transient and static composition, as well as occlusion and shadow suppression. We evaluate the proposed framework via real-world multi-date WorldView-3 images and demonstrate that our proposed model consistently outperforms the existing state-of-the-art methods.</p

Photogrammetry for Unconstrained Optical Satellite Imagery With Combined Neural Radiance Fields

Preprint

Full-text available

Jun 2023

DSM Generation from Multi-View High-Resolution Satellite Images Based on the Photometric Mesh Refinement Method

Article

Dec 2022

Automatic reconstruction of DSMs from satellite images is a hot issue in the field of photogrammetry. Nowadays, most state-of-the-art pipelines produce 2.5D products. In order to solve some shortcomings of traditional algorithms and expand the means of updating digital surface models, a DSM generation method based on variational mesh refinement of satellite stereo image pairs to recover 3D surfaces from coarse input is proposed. Specifically, the initial coarse mesh is constructed first and the geometric features of the generated 3D mesh model are then optimized by using the information of the original images, while the 3D mesh subdivision is constrained by combining the image’s texture information and projection information, with subdivision optimization of the mesh model finally achieved. The results of this method are compared qualitatively and quantitatively with those of the commercial software PCI and the SGM method. The experimental results show that the generated 3D digital surface has clearer edge contours, more refined planar textures, and sufficient model accuracy to match well with the actual conditions of the ground surface, proving the effectiveness of the method. The method is advantageous for conducting research on true 3D products in complex urban areas and can generate complete DSM products with the input of rough meshes, thus indicating it has some development prospects.

Epipolar resampling for Satellite Linear Pushbroom Images Using Block-wise Affine Transformation Parallel Method

Preprint

Full-text available

May 2024

Epipolar resampling is an essential step for 3D reconstruction and Digital Surface Moeld(DSM) generation from satellite images. However, with improved satellite image resolutions and larger image size, the time required for epipolar resampling increases exponentially. While the polynomial fitting methods based on Piecewise Projection Trajectory Method (PPTM) is commonly employed for epipolar resampling, its effectiveness and efficiency diminish when confronted with large and high-resolution images. To tackle this challenge, we propose a novel parallel block-wise epipolar resampling method designed to expedite the resampling process without compromising accuracy. This method leverages PPTM and fixed elevation plane to establish the relationship between left and right epipolar points. Local affine transformations and image partitioning replace polynomial transformations applied across the entire image to approximate the correspondence between original and epipolar images. Furthermore, parallel computation was employed for block-wise pixel resampling acceleration. Experimental analysis using IKONOS-2, ZY-3, and GF-7 images confirms the efficacy and accuracy of our method. We achieve subpixel y-disparities comparable to polynomial fitting methods, while reducing resampling time by 10 to 20 percent through single-core serialization. Moreover, multi-core parallelism approach achieves a parallel efficiency exceeding 80%.

A Cascade Domain Clustering Algorithm for Multi-View DSM Fusion from Urban Satellite Images

Article

Jan 2024

Multi-view digital surface model (DSM) fusion has emerged as an important technique for three-dimensional (3D) reconstruction of multi-view satellite images. However, existing multi-view DSM fusion approaches are prone to the problems of blurred elevation divisions at the object edges, salt and pepper noises on smooth surfaces, and severe loss of surface details in weakly textured regions. In this paper, we present a cascade domain clustering (CDC) algorithm for fusing multi-view DSMs, which is realized by the combination of salient domain clustering and model domain clustering. Initially, the salient domain clustering is employed to demarcate prominent objects and identify regional edges using two-dimensional (2D) spectral and 3D elevation information. Subsequently, to further segment the intricate surface structures, particularly for objects with low texture attributes, we implement the model domain clustering to iteratively aggregate 3D points and fit geometric models corresponding to the aggregated clusters. Finally, the multi-view DSMs are fused iteratively through the weighted least squares (WLS) method, with model clusters serving as the fundamental units, under the constraints of the geometric models. Sufficient experiments show that the proposed CDC algorithm surpasses other popular multi-view DSM fusion algorithms in terms of completeness and accuracy, achieving 91.51% completeness and 0.93m RMSE, representing a 79.89% improvement in completeness and an 86.40% reduction in RMSE compared to the popular stereo 3D reconstruction pipeline.

SatensoRF: Fast Satellite Tensorial Radiance Field for Multi-date Satellite Imagery of Large Size

Article

Jan 2024

Existing NeRF models for satellite imagery have limitations in processing large images and require solar input, leading to slow speeds. As a response, we introduce SatensoRF, which speeds up the entire process significantly while using fewer parameters for large satellite imagery. We have noticed that the common assumption of Lambertian surfaces in satellite neural radiance fields is not sufficient for vegetative and aquatic elements. In contrast to the traditional hierarchical MLP-based scene representation, we have chosen a multiscale tensor decomposition approach for color, volume density, and auxiliary variables to model the light field with specular color. Additionally, to rectify inconsistencies in multi-date imagery, we incorporate total variation denoising to restore the density tensor field, thus mitigating the negative impact of transient objects. To validate our approach, we conducted assessments of SatensoRF using subsets from the spacenet multi-view dataset, which includes both multi-date and single-date multi-view RGB images. Our results demonstrate that SatensoRF surpasses the state-of-the-art Sat-NeRF series regarding novel view synthesis performance. Significantly, SatensoRF requires fewer parameters for training, resulting in faster training and inference speeds and reduced computational demands.

Photogrammetry for Unconstrained Optical Satellite Imagery With Combined Neural Radiance Fields

Article

Jan 2023

We introduce a novel method tailored for unconstrained multi-view optical satellite photogrammetry in time-varying illumination and reflection conditions. Our approach employs continuous radiance fields to represent surface radiance and albedo based on radiometry principles, integrating both static and transient components for satellite photogrammetry. Additionally, an innovative self-supervised mechanism is introduced to optimize the learning process which leverages dark regions accentuation, transient and static composition, as well as shadow regularization. Evaluations on multi-date WorldView-3 images affirm that our model consistently surpasses the state-of-the-art techniques.

Photogrammetry for Unconstrained Optical Satellite Imagery With Combined Neural Radiance Fields

Preprint

Full-text available

Nov 2023

p>We introduce a novel method tailored for unconstrained multi-view optical satellite photogrammetry in time-varying illumination and reflection conditions. Our approach employs continuous radiance fields to represent surface radiance and albedo based on radiometry principles, integrating both static and transient components for satellite photogrammetry. Additionally, an innovative self-supervised mechanism is introduced to optimize the learning process which leverages dark regions accentuation, transient and static composition, as well as shadow regularization. Evaluations on multi-date WorldView-3 images affirm that our model consistently surpasses the state-of-the-art techniques.</p

The Landscape of Dense Image Matching Algorithms

Conference Paper

Full-text available

Sep 2013

Norbert Haala

Both improvements in camera technology and the rise of new matching approaches triggered the development of suitable software tools for image based 3D reconstruction by research groups and vendors of photogrammetric software. Based on dense pixel-wise matching, the photogrammetric generation of dense 3D point clouds and Digital Surface Models from highly overlapping aerial images has become feasible. In order to evaluate the quality of these matching algorithms in terms of accuracy and reliability, the European Spatial Data Research Organisation (EuroSDR) started a benchmark on image based DSM generation in February 2013. This test is based on two representative image blocks, which were processed by different groups with different software systems. The results provided from the different groups give a profound insight to the landscape of dense matching algorithms and are used within the paper to evaluate the potential of image based photogrammetric data collection.

Automatic 3D Reconstruction from Multi-date Satellite Images

Conference Paper

Full-text available

Jul 2017

A multiple view stereo benchmark for satellite imagery

Conference Paper

Full-text available

Oct 2016

RELATIVE ORIENTATION AND MODIFIED PIECEWISE EPIPOLAR RESAMPLING FOR HIGH RESOLUTION SATELLITE IMAGES

Article

Full-text available

May 2017

High resolution, optical satellite sensors are boosted to a new era in the last few years, because satellite stereo images at half meter or even 30cm resolution are available. Nowadays, high resolution satellite image data have been commonly used for Digital Surface Model (DSM) generation and 3D reconstruction. It is common that the Rational Polynomial Coefficients (RPCs) provided by the vendors have rough precision and there is no ground control information available to refine the RPCs. Therefore, we present two relative orientation methods by using corresponding image points only: the first method will use quasi ground control information, which is generated from the corresponding points and rough RPCs, for the bias-compensation model; the second method will estimate the relative pointing errors on the matching image and remove this error by an affine model. Both methods do not need ground control information and are applied for the entire image. To get very dense point clouds, the Semi-Global Matching (SGM) method is an efficient tool. However, before accomplishing the matching process the epipolar constraints are required. In most conditions, satellite images have very large dimensions, contrary to the epipolar geometry generation and image resampling, which is usually carried out in small tiles. This paper also presents a modified piecewise epipolar resampling method for the entire image without tiling. The quality of the proposed relative orientation and epipolar resampling method are evaluated, and finally sub-pixel accuracy has been achieved in our work.

Automated 3D recovery from very high resolution multi-view satellite images

Conference Paper

Full-text available

Jan 2017

Rongjun Qin

This paper presents an automated pipeline for processing multi-view satellite images to 3D digital surface models (DSM). The proposed pipeline performs automated geo-referencing and generates high-quality densely matched point clouds. In particular, a novel approach is developed that fuses multiple depth maps derived by stereo matching to generate high-quality 3D maps. By learning critical configurations of stereo pairs from sample LiDAR data, we rank the image pairs based on the proximity of the results to the sample data. Multiple depth maps derived from individual image pairs are fused with an adaptive 3D median filter that considers the image spectral similarities. We demonstrate that the proposed adaptive median filter generally delivers better results in general as compared to normal median filter, and achieved an accuracy of improvement of 0.36 meters RMSE in the best case. Results and analysis are introduced in detail.

A DETAILED STUDY ABOUT DIGITAL SURFACE MODEL GENERATION USING HIGH RESOLUTION SATELLITE STEREO IMAGERY

Article

Full-text available

Jun 2016

Photogrammetry is currently in a process of renaissance, caused by the development of dense stereo matching algorithms to provide very dense Digital Surface Models (DSMs). Moreover, satellite sensors have improved to provide sub-meter or even better Ground Sampling Distances (GSD) in recent years. Therefore, the generation of DSM from spaceborne stereo imagery becomes a vivid research area. This paper presents a comprehensive study about the DSM generation of high resolution satellite data and proposes several methods to implement the approach. The bias-compensated Rational Polynomial Coefficients (RPCs) Bundle Block Adjustment is applied to image orientation and the rectification of stereo scenes is realized based on the Project-Trajectory-Based Epipolarity (PTE) Model. Very dense DSMs are generated from WorldView-2 satellite stereo imagery using the dense image matching module of the C/C++ library LibTsgm. We carry out various tests to evaluate the quality of generated DSMs regarding robustness and precision. The results have verified that the presented pipeline of DSM generation from high resolution satellite imagery is applicable, reliable and very promising.

Unified piecewise epipolar resampling method for pushbroom satellite images

Article

Full-text available

Mar 2016
Int J Image Video Process

Computational stereo is in the fields of computer vision and photogrammetry. In the computational stereo and surface reconstruction paradigms, it is very important to achieve appropriate epipolar constraints during the camera-modeling step of the stereo image processing. It has been shown that the epipolar geometry of linear pushbroom imagery has a hyperbola-like shape because of the non-coplanarity of the line of sight vectors. Several studies have been conducted to generate resampled epipolar image pairs from linear pushbroom satellites images; however, the currently prevailing methods are limited by their pixel scales, skewed axis angles, or disproportionality between x-parallax disparities and height. In this paper, a practical and unified piecewise epipolar resampling method is proposed to generate stereo image pairs with zero y-parallax, a square pixel scale, and proportionality between x-parallax disparity and height. Furthermore, four criteria are suggested for performance evaluations of the prevailing methods, and experimental results of the method are presented based on the suggested criteria. The proposed method is shown to be equal to or an improvement upon the prevailing methods. Keywords Pushbroom high-resolution satellite imagery Piecewise epipolar resampling Stereo image pair

Non-parametric local transforms for computing visual correspondence

Article

Jan 1994

We propose a new approach to the correspondence problem that makes use of non-parametric local transforms as the basis for correlation. Non-parametric local transforms rely on the relative ordering of local intensity values, and not on the intensity values themselves. Correlation using such transforms can tolerate a significant number of outliers. This can result in improved performance near object boundaries when compared with conventional methods such as normalized correlation. We introduce two non-parametric local transforms: the rank transform, which measures local intensity, and the census transform, which summarizes local image structure. We describe some properties of these transforms, and demonstrate their utility on both synthetic and real data.

A comparison of stereo and multiview 3-D reconstruction using cross-sensor satellite imagery

Conference Paper

Jun 2015

A Study on the Epipolarity of Linear Pushbroom Images

Article

Aug 2000
PHOTOGRAMM ENG REM S

T. Kim

Although epipolar geometry is a very useful clue in processing stereo images, it has not been thoroughly examined previously for linear pushbroom images. Some have assumed that epipolar geometry would be the same for pushbroom images as for perspective images. Some do not use this geometry at all because it is not fully understood. The purpose of this paper is to provide a theoretical basis for the epipolar geometry of linear pushbroom images and to discuss the practical implications of this geometry in processing such images. We show that epipolarity for linear pushbroom images is different from that for perspective images. We also derive an equation for epipolar curves of linear pushbroom images, which are not lines but hyperbola-like non-linear curves. Through analyses of the properties of these curves, we conclude that these curves can be approximated as piece-wise linear segments and that any closely located points on one epipolar curve are mapped onto a common epipolar curve.

DSM Generation from High Resolution Multi-View Stereo Satellite Imagery

Abstract and Figures

Recommended publications

POINT CLOUD AND DIGITAL SURFACE MODEL GENERATION FROM HIGH RESOLUTION MULTIPLE VIEW STEREO SATELLITE...

RELATIVE ORIENTATION AND MODIFIED PIECEWISE EPIPOLAR RESAMPLING FOR HIGH RESOLUTION SATELLITE IMAGES

A DETAILED STUDY ABOUT DIGITAL SURFACE MODEL GENERATION USING HIGH RESOLUTION SATELLITE STEREO IMAGE...

A DETAILED STUDY ABOUT DIGITAL SURFACE MODEL GENERATION USING HIGH RESOLUTION SATELLITE STEREO IMAGE...