ArticlePDF Available

DSM Generation from High Resolution Multi-View Stereo Satellite Imagery

Authors:

Abstract and Figures

Along with improvements to spatial resolution, multiple-view stereo satellite imagery has become a valuable datasource for digital surface model generation. In 2016, a public multi-view stereo benchmark of commercial satellite imag- ery was released by the John Hopkins University Applied Physics Laboratory, USA. Motivated by this well-organized benchmark, we propose a pipeline to process multi-view satellite imagery into digital surface models. Input images are selected based on view angles and capture dates. We apply the relative bias-compensated model for orientation, and then generate the epipolar image pairs. The images are matched by the modified tube-based SemiGlobal Matching method (tSGM). Within the triangulation step, very dense point clouds are produced, and are fused by a median filter to generate the Digital Surface Model (DSM). A comparison with the reference data shows that the fused DSM generated by our pipeline is accurate and robust.
Content may be subject to copyright.
DSM Generation from High Resolution Multi-View
Stereo Satellite Imagery
K. Gong and D. Fritsch
Abstract
Along with improvements to spatial resolution, multiple-
view stereo satellite imagery has become a valuable data-
source for digital surface model generation. In 2016, a public
multi-view stereo benchmark of commercial satellite imag-
ery was released by the John Hopkins University Applied
Physics Laboratory, USA. Motivated by this well-organized
benchmark, we propose a pipeline to process multi-view
satellite imagery into digital surface models. Input images
are selected based on view angles and capture dates. We
apply the relative bias-compensated model for orienta-
tion, and then generate the epipolar image pairs. The im-
ages are matched by the modified tube-based SemiGlobal
Matching method (tSGM). Within the triangulation step,
very dense point clouds are produced, and are fused by a
median filter to generate the Digital Surface Model (DSM).
A comparison with the reference data shows that the fused
DSM generated by our pipeline is accurate and robust.
Introduction
Background
Over the last decade, a number of High Resolution Satellite
(HRS) sensors have been launched by commercial companies
or space agencies, like Sentinel-2, WorldView-3/4, Pléiades,
and so on. The best Ground Sample Distance (GSD) of HRS
panchromatic imagery has reached the 30 cm level, which
reveals more surface features. The HRS sensors cover most of
the regions of the Earth and collect the surface information
with large range footprints. They have high revisit frequency
over a certain area, which can provide a large number of im-
age collections and make the acquisition of multi-view stereo
(MVS) satellite imagery available. As well-known, the Rational
Polynomial Coefficients (RPCs) are provided by the satel-
lite data vendor, instead of the rigorous push-broom sensor
model. Thus, data consumers can ignore the difference of the
satellite sensors and easily process the satellite data by ap-
plying a general pipeline. Because of these benefits, the MVS
high resolution satellite images are useful for global three-
dimensional (3D) mapping, environmental monitoring, urban
planning, change detection, and so on.
In 2016, a public MVS benchmark of commercial satellite
imagery was released by the John Hopkins University Applied
Physics Laboratory (JHU/APL), USA. The benchmark contains
50 DigitalGlobe WorldView-3 panchromatic and multispectral
images. The imagery covers a 100 square kilometers area close
to San Fernando, Argentina, with GSD of the nadir images of
about 30 cm. High resolution image data was made available
which was captured from November 2014 to January 2016.
The benchmark also provides a (Light Detection and Ranging)
LiDAR point cloud collected on June 2016 as the ground truth,
with nominal point spacing of 20 cm. Digital surface models
(DSMs) at 30 cm GSD are produced from the LiDAR point cloud,
in order to make equally-spaced comparisons with the results
generated from Worldview-3 panchromatic imagery (Bosch
et al. 2016). This well-organized MVS high resolution satellite
benchmark has motivated us to learn and test new methods of
point cloud and DSM generation from MVS satellite data.
It is well known that MVS imagery 3D reconstruction meth-
ods can be classified into two categories. The first category
solves the multi-view triangulation problem for all images
simultaneously, which is the true multi-view method (Furuka-
wa and Hernandez 2015). The second category only uses the
binocular stereo pairs. It processes the stereo pairs separately
and fuses the output point clouds or DSMs to a final result
(Haala 2013). Comparing the binocular stereo strategy with
the true multi-view method, the latter is more rigorous but
also more complicated. Because of the efficiency and stable
performance of the semiglobal matching (SGM) algorithm
(Hirschmüller 2008), most solutions for the 3D reconstruction
from the MVS satellite imagery is implemented using binocu-
lar stereo methods (d’Angelo and Kuschk 2012; Kuschk 2013;
Qin 2017; Facciolo et al. 2017). Some researchers have inves-
tigated and compared both kinds of reconstruction strategies
on MVS satellite images (Ozcanli et al. 2015). In their imple-
mentation, the pair-wise multi-view reconstruction method
demonstrated better results than the true multi-view method.
In this paper, we present a pipeline based on the binocular
stereo method for DSM generation using MVS high resolu-
tion satellite imagery. The point clouds and DSMs, which are
separately generated from different stereo pairs, will be fused
to the final DSM. The fused final DSM is compared to the refer-
ence DSM for further evaluations. We conduct a qualitative
analysis by visual comparison and calculate the complete-
ness, the median error, the root-mean-square error (RMSE) and
the error distribution for the quantitative analysis. We show,
that our proposed pipeline can produce accurate and robust
DSM from MVS satellite imagery.
The contents of this paper are structured as follows: Sec-
tion “Related Work” introduces the related work, whereas the
methodology of the proposed pipeline is presented in the sec-
tion “Methodology”. Section “Experiments” demonstrates the
results generated from the benchmark data and their evalua-
tion, and in the last section we draw some conclusions.
Related Work
The high resolution satellite sensors are able to provide plenty
of imagery for a certain area, but usually they are collected on
different dates. Thus, the collected images may have different
illumination situations, different geometric configurations, and
may contain terrain changes. All of those differences will have
negative influences on the outcome of the DSM generation. A
Institute for Photogrammetry, University of Stuttgart, 70174
Stuttgart, Germany (ke.gong@ifp.uni-stuttgart.de, dieter.
fritsch@ifp.uni-stuttgart.de.
Photogrammetric Engineering & Remote Sensing
Vol. 85, No. 5, May 2019, pp. 45–xxx.
0099-1112/18/45–xxx
© 2019 American Society for Photogrammetry
and Remote Sensing
doi: 10.14358/PERS.85.5.xxx
PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING
May 2019 45
large number of stereo images also means that image process-
ing is quite time-consuming. Therefore, finding a strategy to
select the most useful image pairs is an essential preprocedure
for the DSM generation from the MVS satellite data. d’Angelo
et al (2014) suggested that the intersection angle of the image
pairs’ views is the biggest factor that impacts performance.
They selected the image pairs having intersection angles
between 15 and 25 degrees. After the release of the JHU/APL’s
benchmark, the following Intelligence Advanced Research
Projects Activity multi-view stereo 3D mapping challenge
encourages more researchers to find suitable image selection
strategies, especially, when they have to face hundreds or
thousands of possible pairs. G. Facciolo et al (2017) sorted all
possible image pairs by the completeness of their computed
DSMs, and they built a Pearson’s correlation matrix for different
factors. According to their observations, the temporal prox-
imity, maximum incidence angle, and the intersection angle
between views are three main factors on the final quality of
the DSM. They selected image pairs having intersection angles
between 5 and 45 degrees. All the images have an incidence
angle less than 40 degrees. The image pairs with smaller date
differences are supposed to have higher accuracy. Qin (2017)
also agreed that the intersection angle is critical to the qual-
ity of the generated DSMs. He found that when the intersection
angle of the image pair is smaller than 8 degrees or larger than
40, the generated DSM performs poorly. He therefore has chosen
the image pairs with intersection angles from 10 to 30 degrees.
In the standard MVS processing work flow, the Structure-
from-Motion or camera model orientation is the critical first
step. As is well-known, the HRS data vendors prefer to provide
the RPC along with the imagery to the users, instead of the
traditional exterior and interior parameters. The RPCs have no
physical meanings at all. With them, a ratio of two polynomi-
als builds the relation between the image and the object coor-
dinates. It has been verified by many practical experiments,
that the RPC model can replace the rigorous sensor model
while maintaining the accuracy (Grodecki and Dial 2001;
Hanley and Fraser 2001; Fraser et al. 2002; Grodecki and Dial
2003). A popular solution for the orientation of the satellite
imagery is the bias-compensated RPCs bundle block adjust-
ment. Grodecki and Dial (2003) have given a detailed descrip-
tion of this method. It minimizes the bias in image space
with some additional compensation models, for instance, the
shift model or the affine model. The bias-compensated RPC’s
bundle block adjustment requires some ground control points
(GCPs). It is widely applied for the absolute orientation of the
satellite stereo images (d’Angelo and Kuschk 2012; Ozcanlil
et al. 2015; Gong and Fritsch 2016). For MVS satellite imagery,
the GCPs in a certain region are not always easily to access. In
this situation, the relative orientation of the stereo image pairs
is needed. Franchis et al. (2014) has pointed out, that the in-
accurate RPC models cause relative pointing errors. This error
means that the corresponding points are not located on the
related epipolar lines. It can be measured as a simple transla-
tion when the image is small. In their approach, they divided
an image into tiles and calculated the translations between
the corresponding points and the epipolar line separately. The
median of the translations of different tiles is applied to the
whole image to remove the relative pointing error. Qin (2017)
applied pair-wise bias-compensation by using tie points
first. Then he conducted least squares minimization for the
registration of the generated DSM and the reference DSM. The
parameters of the DSM registration will be reused to calculate
a translation in image domain for the RPCs refinement. In our
previous research (Gong and Fritsch 2017), we proposed the
relative bias-compensated model without GCPs. We extract
some tie points first and calculate the virtual ground control
information with them. The RPCs are refined pairwise by an
additional affine model and by applying the virtual ground
control information. Thus, the relative bias-compensated
model is also a basic strategy applied in this paper.
To generate the dense point cloud and the DSM, the SGM al-
gorithm is the most popular solution for pixel-wise matching
of the HRS imagery. Many experiments have proved that SGM
can generate dense point clouds with reliable quality from sat-
ellite data (d’Angelo and Reinartz 2011; Wohlfeil et al. 2012;
d’Angelo and Kuschk 2012; Gong and Fritsch 2016). The SGM
algorithm requires for the input more or less epipolar images,
so that the search dimension can be reduced. Unlike the tra-
ditional frame camera imagery, for the HRS imagery it is hard
to generate the epipolar geometry, because of the changing
perspective center and attitudes. Kim (2000) has explained
in his work that the epipolar lines of the satellite push-broom
sensors are more like hyperbola curves than straight lines, and
the epipolar pairs can only exist locally. Based on this conclu-
sion, Wang et al. (2010) proposed the projection-trajectory
epipolarity model. The epipolar pair is generated by project-
ing points from one image to another with the RPCs. To resam-
ple the epipolar image pair, Wang et al. (2011) define a Project
Reference Plane (PRP) in a local vertical coordinate system.
The stereo images will be projected onto the PRP. An affine
model is applied to transfer the original images to the epipo-
lar images on the PRP. Koh et al. (2016) also applied the PRP
for epipolar image resampling, but they proposed a piecewise
method. They divided the epipolar line into several curves on
the PRP. Then a fifth order polynomial function is applied to
fit and resample all the epipolar curves. Oh (2011) proposed
an epipolar resampling strategy in the image space instead of
object space. According to his work, the orthogonal line to the
track is calculated to generate a set of start points. These start
points have a proper interval (1000 pixels) and they expanded
segments to approximate the epipolar line. Epipolar line pairs
are assigned to a constant row for y-parallaxes removal when
the segment expansion is about to finish.
The generated dense point clouds are placed into a regular
spaced and discretized grid in the Universal Transverse
Mercator (UTM) coordinate system. According to the binocular
stereo reconstruction method, the dense point clouds or DSMs
need to be fused for the final result. Kuschk (2013) selected
the simple and common median filter to get the final height
of every cell. Qin (2017) proposed an adaptive depth fusion
method, which considers the spatial consistency. He defined a
window centered at a cell, and applied all the cells within the
window as candidates of the height value filtering. Facciolo
et al (2017) proposed a clustering-based method. The height
of each cell is estimated by the k-medians clustering. The
number of clusters are increasing (1 to 8) until the clusters are
close enough to the predefined precision. The lowest cluster
is kept as the final altitude.
Methodology
The proposed pipeline of this paper is based on the binocular
stereo method for the DSM generation. It is semiautomatic and
need some tie points for all the images—all are implemented
by our self-programmed C++ modules. This section presents
every step of the pipeline. Generally, it is divided into these
steps: image selection, relative orientation and image rectifi-
cation, dense image matching, triangulation and DSM fusion.
The workflow is presented in Figure 1.
Image Selection
As mentioned in the previous section, the MVS satellite
images are collected on different dates. The differences in
illumination situation, geometric configuration, and season
will lead to bad matching performance. So a suitable im-
age selection procedure is needed to select the most useful
46 May 2019
PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING
pairs and reduce the comput-
ing time. It is commonly agreed
that the intersection angle of the
stereo images is the biggest factor
that affects the quality of the DSM
generation (d’Angelo et al. 2014;
Facciolo et al. 2017; Qin 2017).
The differences in collection date
and the incidence angle of the view
can also have an influence on the
result. Therefore, we define our im-
age selection strategy according to
these three factors.
First of all, those images are
eliminated which have large inci-
dence angles of the views. Because
the spatial resolution of the satellite
image becomes lower when the
incidence angle is larger, the perfor-
mance of the dense image match-
ing will get worse. In our image
selection strategy, we only use the
satellite images whose incidence
angle is less than 35 degrees.
Next, the intersection angles of the views of every stereo
pair are computed. As suggested by some previous references
(d’Angelo et al. 2014; Facciolo et al. 2017), image pairs are
less useful if their intersection angle is either too large or too
small. We select the image pairs having intersection angles
between 5 degrees and 35 degrees.
Lastly, the influence of the collecting dates is taken into con-
sideration. As mentioned above, in most cases, the closer the
image collecting dates are, the better the results we can obtain.
According to our observation, there can be two exceptions:
1. The image collecting dates of the stereo images are
relatively close, but the images present different seasons’
features. Figure 2a and 2b display two images collected
on 3 October 2015 and 22 October 2015. The point cloud
generated by dense image matching is shown in Figure 2c.
2. The interval between the captured dates is large, but the
images are collected in the same season. An example is
shown in Figure 3. Figure 3a is a satellite image collected
on 14 November 2014, and Figure 3b was collected on
18 December 2015. The point cloud generated from this
stereo pair is displayed in Figure 3c.
Figure 1. Workflow of the DSM generation using MVS satellite imagery.
(a) (b) (c)
Figure 2. Image collected on close dates and the related point cloud.
(a) (b) (c)
Figure 3. Image collected on different years and the related point cloud.
PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING
May 2019 47
According to Figure 2c, we can observe that the area
marked by the ellipse has a lot of matching failures. As
Figure 2a and Figure 2b have shown, the trees in this area
have already grown new leaves on 22 October 2015 but have
no leaves on 3 October 2015. There is an apparent season
change from the first image to the second, although the
collecting dates are close. Comparing with the other stereo
pair’s results, the same region in Figure 3c has denser points.
Although the interval of the collecting dates is more than one
year, the terrain features presented in Figure 3a and Figure 3b
are similar. The stereo images collected in different years can
be well-matched, if there are slight seasonal changes. There-
fore, not only the collecting dates but also the season shown
in the stereo pair will play a role on the quality of the gener-
ated point cloud and DSM.
Since the seasonal changes are mainly presented by the
vegetation, we have selected a subarea with vegetation, and
generated the point clouds from an image pair collected
in summer and winter. Figure 4 exhibits the point clouds
generated from two seasons’ imagery and the corresponding
area in the reference LiDAR DSM. The vegetation area in the
reference DSM is flourishing and closer to the reconstruction
from summer imagery. We have also learned, that the winter
images are noisier and have worse illuminations. In our image
selection strategy, we sort the images into two groups: winter
and summer, instead of in chronological order. In each group,
we ignore the year of data collection and order the images by
month. The image pairs, that have close collecting month in
the summer group, are selected as the inputs to generate the
DSMs in our pipeline.
Relative Orientation and Image Rectication
Because of the lack of GCPs, we cannot apply the bias-com-
pensated RPCs bundle block adjustment to the JHU/APL’s MVS
satellite benchmark data. Instead, a relative bias-compensated
model is used for the orientation of the MVS satellite imagery.
It has been proved, that the accuracy of the relative bias-com-
pensated model can reach sub-pixel level (Gong and Fritsch
2017). In the first step, we select some tie points in all of the
input images manually. A subset of the tie points are selected
as virtual GCPs. In the relative bias-compensated model we
apply an additional affine model to compensate for the bias
between different images, so at least four to six virtual GCPs
are needed (Fraser and Hanley 2005). A pair of stereo images
is selected to generate the virtual ground information. This
selected image pair requires a correction of its pointing error.
The pointing error is the distance between the correspond-
ing point and the corresponding epipolar line (Franchis et al.
2014). We compute the pointing errors of these virtual GCPs.
The affine model is applied to estimate the correction of the
pointing error of the selected stereo pair. After the pointing
error correction, the object coordinates of these virtual GCPs
are calculated by the RPCs. The generated virtual GCPs have
a 3D translation to the true ground. They are then applied to
perform the bias-compensated bundle block adjustment for
all the input images. The adjustment will remove the relative
but not absolute bias for different images. The point clouds
and the DSMs generated from all stereo images are aligned to
the surface where we have the virtual ground points. Thus, no
further registration is needed for the point clouds and DSMs.
The projection-trajectory epipolarity method is used to
find the corresponding epipolar curves. With the help of
the RPCs, a point on the base image can be projected to two
different height levels in object space. The object points are
back-projected to the slave image and acquire two intersected
image points. These image points can be used to approximate
the epipolar curve. Redoing the projections from the slave
image to the base image, we find the corresponding epipolar
curve pair. Having had good experiences with the modified
piece-wise resampling strategy to approximate the epipolar
curve and resample the epipolar images (Gong and Fritsch
2017) it is applied here. The epipolar curve generation is
started from the points located on the boundary. Expanding
several epipolar segments with proper length from the start
points, we approximate the epipolar curve. The epipolar
segments are aligned to the same row. At last, the epipolar
images are resampled along the epipolar segments by the bi-
cubic interpolation.
Dense Image Matching and Triangulation
The proposed pipeline applies a modified SGM method—
which is called tube-based SGM (tSGM) (Rothermel et al. 2012)
to generate the very dense point clouds. tSGM is implemented
in the C++ library libTsgm, which is the core algorithm of the
software SURE. The usage of the library has been authorized
by nFrames GmbH, Stuttgart. Comparing it with the original
SGM method, the tSGM algorithm relies on the 9 × 7 Census
cost instead of the Mutual Information. Because the Census
cost is insensitive to parametrization and provides robust
results (Zabih and Woodfill 1994), it can also be implemented
in a hierarchical coarse-to-fine method to limit disparity
search ranges. The results of the lower resolution pyramid
are introduced as the priors to determine the disparity search
ranges for the matching of the higher resolution pyramid
(Rothermel et al. 2012). The tSGM algorithm greatly reduces
computing time and optimizes memory efficiency. The dispar-
ity maps generated by the tSGM method are applied to derive
the corresponding pixels of every stereo image. With the
corresponding pixels, the dense point clouds are generated by
forward intersection.
(a) (b) (c)
Figure 4. Area with vegetation in: (a) Point cloud from winter images, (b) Point cloud from summer images, (c) Reference DSM.
48 May 2019
PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING
DSM Fusion
Following our image selection strategy, the number of the
point clouds could still be quite high. We must choose the
best point clouds for our DSM fusion. All of the input pairs
will first be processed and then the generated point clouds
will be ranked according to their quality. The optimal num-
ber of point clouds that are applied for DSM fusion will be
discussed and presented in the section “Experiments”. As
explained before, the point clouds have already been aligned
on the same virtual ground surface, so no additional registra-
tion is needed. To generate the DSM, the point clouds are pro-
jected into a regularly spaced and discretized grid in the UTM
coordinate system. In our implementation, a simple median
filter is applied for the DSM fusion. The median value of the
height of each grid cell is computed as the final height value
of the fused DSM. The Inverse Distance Weighted interpolation
method is applied if no points are projected to the cell.
Experiments
Three different test sites are selected from the MVS satellite
benchmark. The details of the test sites and the reference data
are illustrated in the section “Test Site and Evaluation Meth-
od”. In the section “Results and Analysis”, we present the
results of the proposed pipeline and the evaluation analysis.
Test Site and Evaluation Method
The proposed pipeline processes three different test sites from
the benchmark. All the WorldView-3 panchromatic images
were collected from November 2014 to January 2016, with col-
lection dates covering every month. The GSD of the imagery is
30 cm. Test site 1 is about 3000 * 3000 pixels. The sizes of test
site 2 and 3 are about 2000 * 2000 pixels. All the test sites are
close to Fernando, Argentina. They contain a range of differ-
ent terrain types, such as fields, residential areas and vegeta-
tion. Test site 3 contains several high-rise buildings. The refer-
ence LiDAR DSMs of all the test sites are given. The GSD of the
reference data is also 30 cm so that the comparison analysis
becomes easier. The three test sites are shown in Figure 5.
Results and Analysis
Following our image selection strategy, we keep the images
having an incidence angle less than 35 degrees. We sort the
images into summer and winter group, and only use the im-
ages of the summer group. The image pairs that have intersec-
tion view angles less than 5 degrees or larger than 35 degrees,
are eliminated. At last, 748 stereo pairs are selected as input
data in test site 1. In test site 2 and 3, 394 and 484 stereo pairs
are winners in the image selection strategy.
For each test site, 25 tie points are manually selected for all
the images. Ten of them are chosen as virtual ground control
points. These virtual GCPs are distributed evenly in the image
scene. We correct the pointing error of one selected stereo im-
age pair’s RPCs. With the corrected RPCs, the object coordinates
of the virtual GCPs are calculated. By applying the relative
bias-compensated model for all the images, we use additional
affine models to compensate the bias caused by the RPCs. The
epipolar stereo images are generated by our modified piece-
wise epipolar resampling strategy. We apply the tSGM algo-
rithm to match all the stereo pairs and generate the disparity
maps. The point clouds are derived from the disparity maps,
and they are all aligned to the same virtual ground surface.
With hundreds of point clouds, we only select the best ones
for our DSM fusion. In order to investigate the optimal number
of the input point clouds, the point clouds are ranked by the
completeness criterion. It is widely stated that the height mea-
surement is accurate within three times of the GSD, which is
about 1 meter in our case. Therefore, we present the percentage
of the points that have height differences to the ground truth
of less than 1 m as the completeness measure. To be noticed,
the point clouds need to be aligned to the reference DSM before
the final quality analysis is carried out. Since there are no GCPs
in our MVS satellite imagery benchmark and we have the RPCs
instead of the exterior parameters, we undertake the registra-
tion via a coarse-to-fine method without any GCPs. In different
spatial resolution levels, the point cloud is moved to the refer-
ence DSM by given 3D translation shifts. The height difference
of the shifted point cloud and the reference DSM is calculated.
Iteratively, the translational shift is modified until the median
error of the height differences is minimized (Bosch et al. 2016).
In this way, we can minimize the shift between the point cloud
and the ground truth, which is caused by our relative orien-
tation procedure. For testing, we select different number of
top-ranked point clouds to do the DSM fusion. The point clouds
are converted into a discretized and regular spaced grid in the
UTM coordinate system. The fusion of the point clouds is ap-
plied by a simple median filter. In order to estimate the quality
of the fused DSM, we compute the completeness and the RMSE
of the height differences. Figure 6 demonstrates the complete-
ness and RMSE as a function of the number of the input point
clouds. In Figure 6, the blue solid lines represent the result of
test site 1, the red dash lines represent the result of test site 2,
and the green dot lines represent the result of test site 3.
According to Figure 6a, we find that the completeness
decreases when the number of point clouds used for fusion is
too low. Then along with the increasing number of the point
clouds used, the completeness increases until it reaches the
peak. For test site 1, the completeness has the highest rank
of about 75% when the number of the input point clouds is
about 30. For test site 2, the completeness reaches a peak of
68% when about 25 point clouds are applied. Finally, for test
site 3, the best completeness is 56% with a corresponding
point cloud count of 25. Above the peak, the completeness
(a) (b) (c)
Figure 5. WorldView-3 image (a) test site 1, (b) test site 2, (c) test site 3.
PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING
May 2019 49
becomes worse if more point clouds are feeding the fusion.
We also notice that the completeness of test site 1 is the
highest of the three areas, and that test site 3 has the lowest
completeness. Moreover, the completeness decreases more
significantly when too many point clouds are applied in the
DSM fusion of test site 3. The reason is that test site 1 has more
field areas, and there are more residential areas in test site
2 and 3. In particular, there are several high-rise buildings
in test site 3, which will lead to larger shadow areas in the
images. The high-rise buildings reconstructed from different
stereo pairs might have very large height differences in the
boundary areas of the buildings. The dense residential areas
and the high-rise buildings lead to the loss of completeness.
By observing the relations presented in Figure 6b, the RMSE is
decreasing when the number of fused point clouds is small at
the beginning. Then the RMSE increases if more point clouds
are applied. The accuracy is reduced because more errors are
introduced by some lower quality image pairs. Test site 1 has
the best RMSE, then test site 2, and test site 3 has the largest
height differences to the ground truth. So more dense residen-
tial areas and high-rise buildings reduce the accuracy of the
reconstruction. Considering both the completeness and the
RMSE, we should select the number of point clouds which can
provide the best completeness while having a relatively small
RMSE. Therefore, the optimal number of the applied point
clouds for the DSM fusion is 30 for test site 1. The optimal
number for test site 2 is 20 point clouds. And we select 20
point clouds as the optimal number to generate the final fused
DSM for test site 3. By checking the selected point clouds for
the final fusion, we find that most of these point clouds are
generated from stereo images collected on close date and the
intersection angles of most stereo pairs are between 10 and 30
degrees. This also proved that the image selection strategy is
effective and the intersection angles of the stereo pairs can be
limited from 10 to 30 degree in the future’s experiments.
The fused point clouds are displayed in Figure 7. The
fused DSMs of the three test sites, which are generated from
the optimal number of the point clouds, are displayed in
Figure 8. The reference DSMs of the three test sites are demon-
strated in Figure 9.
In order to evaluate the quality of our fused DSMs quantita-
tively, a comparison is made between the fused DSM and the
reference LiDAR DSM for all three test sites. The median height
difference, the RMSE of the height difference and completeness
of the results are computed to check the accuracy. Moreover,
we computed the normalized median deviation (NMAD), 68%
and 95% quantiles of the absolute height errors to evalu-
ate the robustness of the fused DSM. The statistic evaluation
results are illustrated in Table 1.
As Table 1 demonstrates, the RMSE of the DSM of test site1is
2.7 m. The fused DSM of test site 2 has a RMSE of 3.81 m, and
for test site 3, the RMSE is 4.08 m. The completeness of the
three test sites are 75.4%, 68.9%, and 55.9%, respectively. As
we have discussed before, the dense residential areas in test
site 2 and 3 decrease their accuracy and completeness. The
high-rise buildings in test site 3 reduce the completeness sig-
nificantly. The NMADs are 0.5 m, 0.6 m, and 1.0 m for test site 1,
2, and 3. The 68 percent quantiles of the test sites are 0.66 m,
0.96 m, and 1.45 m. The dense residential areas and high-rise
buildings cause more shadows and have a negative influence
on the robustness. The distribution of the height differences of
the three test sites are depicted as histograms presented in Fig-
ure 10. According to Figure 10, the distribution of the height
differences in test site 1 and 2 are more concentrated than in
test site 3. Because the high-rise buildings and its large range
shadow causes more errors during the DSM generation.
To show the 3D capability of the reconstructed point clouds
and to conduct some qualitative analysis, the fused point
clouds are visualized by the open source software CloudCom-
pare. Several subareas are extracted from the three test sites as
regions of interest (ROIs) to analyze the reconstructed details
of the fused point clouds. The ROIs on the fused point clouds
and the reference DSM are shown in Figure 11.
The left row in Figure 11 shows the ROIs extracted from
the reference LiDAR DSM, and the right row displays the cor-
responding areas of the fused point clouds. In Figure 11a and
11b, we can find an isolated large building in the extracted
area. In Figure 11b, the reconstructed building’s edges are
sharp except the edge on left-bottom. The blur of the edge is
caused by the shadow at this side. Comparing to the reference
DSM, the detail features on the roof of the isolated building are
also reconstructed in the fused point cloud. Figure 11c and
11d show an area which have some connected buildings, and
(a)
(b)
Figure 6. Relation between the number of point clouds and
(a) completeness, (b) RMSE.
Table 1. Evaluation result of the fused DSM.
Test site 1 Test site 2 Test site 3
Median error (m) 0.320 0.390 0.728
RMSE (m) 2.702 3.810 4.081
Completeness (%) 75.40 68.86 55.93
NMAD (m) 0.503 0.628 1.023
Aq68 (m) 0.660 0.960 1.455
Aq95 (m) 5.930 6.288 6.507
50 May 2019
PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING
(a) (b) (c)
Figure 7. The fused point cloud of: (a) test site 1, (b) test site 2, (c) test site 3.
(a) (b) (c)
Figure 8. The fused DSM of: (a) test site 1, (b) test site 2, (c) test site 3.
(a) (b) (c)
Figure 9. The Reference DSM of: (a) test site 1, (b) test site 2, (c) test site 3.
(a) (b) (c)
Figure 10. Distribution of height differences: (a) test site 1, (b) test site 2, (c) test site 3.
PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING
May 2019 51
these buildings are surrounded by trees.
The upper and right boundaries of the
buildings are reconstructed more clearly,
because they are not affected by shadows
at all. We can observe that the buildings
are hard to distinguish by its boundaries
if the trees are too close. The height of the
trees vary from image to image. So the veg-
etation in the fused point cloud is not fully
fitting the reference DSM, which means
that the vegetation will introduce errors to
the generated point cloud and the DSM. To
generate higher accuracy DSMs, the vegeta-
tion should be masked in the future. The
next ROI includes a high-rise building. It
is displayed in Figure 11e and 11f. The
top and left side of the building have no
shadows and they have sharper edges than
the other two sides. Comparing to the low-
rise isolated building in the first ROI, the
high-rise buildings have a larger range of
shadow areas and therefore have stronger
negative effects on the reconstruction. The
edges on the shadow side is blurred. More-
over, there is a small part missing from the
high-rise building. We select an area which
is full of low-rise and intensive buildings
as our last ROI. This residential area is
exhibited in Figure 11g and 11h. The fused
point cloud exhibits poor performance
because the buildings are too close, and
the shadows of the buildings are often cast
on the nearby buildings. The reconstructed
buildings are connected to each other and
have totally blurred boundaries. It is chal-
lenging to reconstruct the residential area
as separate buildings. Generally, the fused
point clouds can reconstruct the terrain
surface with some detail. The pipeline has
worse performance for high-rise buildings
and intensive residential areas, and the
vegetation and shadows will cause some
trouble during the reconstruction.
Conclusion and Outlook
In this paper, we propose a pipeline for DSM
generation from MVS satellite images. The
methods of the pipeline are implemented
by self-programmed C++ modules. Experi-
ments were carried out on three different
test sites provided by JHU/APL’s MVS satellite
image benchmark. We propose an image se-
lection strategy that considers the incidence
angles of the view, the intersection angles
and the collected dates. Those images
having large incidence angles, too small or
too large intersection angles, or those col-
lected in winter are eliminated. Image pairs
collected on close months in summer are
selected as the inputs. We apply the relative
bias-compensated model for the relative
orientation, which aligns the point clouds
to a virtual ground surface. No further point
cloud registration is needed before the fu-
sion step. Following the pipeline, the point
clouds are generated pairwise. The opti-
mal number of the involved point clouds
for DSM fusion is investigated. We apply
those point clouds which lead to the best
(a) (b)
(c) (d)
(e) (f)
(g) (h)
Figure 11. Detail comparison between the reference DSM and the generated 3D model.
52 May 2019
PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING
completeness and maintain a relative low RMSE. In our experi-
ment, the results show that the optimal number of the point
clouds is thirty for test site 1. For test sites 2 and 3 we apply the
top-ranked twenty point clouds. More additional point clouds
introduce errors because some of them are of low quality. The
point clouds are converted into grids in the UTM system and are
fused to generate the DSM. The fusion applies the median filter.
The RMSEs of the fused DSM are 2.7 m for test site 1, 3.81 m for
test site 2, and 4.08 m for test site 3. The completeness of test
sites 1, 2, and 3 are 75.4%, 68.9%, and 55.9%. The NMAD of the
DSMs are all below 1 m and the 68% quantiles of the height dif-
ference distribution are all below 1.5 m. The proposed pipeline
to generate DSMs from MVS satellite imagery is accurate and
robust. Fusing point clouds for the final DSM can reconstruct
the terrain surface with some detail features. High-rise build-
ings and intensive residential areas reduce the accuracy and
completeness of the DSM. The pipeline has bad performance
using imagery that includes shadows and areas of vegetation.
There are still some aspects which need to be improved in
our work. First, there are some vegetation areas in the im-
ages under investigation. Because the MVS satellite images are
collected on different dates, the vegetation introduce errors in
the dense image matching step and reduce the accuracy of the
generated DSM. We can classify the MVS imagery and mask out
the vegetation to improve the quality of the results. Second,
the images that are collected in winter are not applied in our
current pipeline. The seasonal changes are mainly presented
as difference in the vegetation. If the vegetation is masked out,
some winter images with good illumination conditions can
also be used in our procedure. Third, we apply the median fil-
ter for the DSM fusion. There might be better solutions to fuse
the DSM than to simply take the median height values of the
cell. At last, we implement a binocular stereo method in our
pipeline. It is interesting to see how the true multi-view algo-
rithm will work on the MVS high resolution satellite imagery.
Acknowledgements
The authors would like to thank John Hopkins University
Applied Physics Lab for providing the well-organized MVS
satellite imagery benchmark. The authors also would like to
acknowledge the advice of Dr. Mathias Rothermel about the
MVS reconstruction. Finally, the grant of the Chinese Scholar-
ship Council (CSC) supporting the research of the first author
is gratefully acknowledged.
References
Bosch, M., Z. Kurtz, S. Hagstrom and M. Brown. 2016. A multiple
view stereo benchmark for satellite imagery. Pages 1–9 in IEEE
Applied Imagery Pattern Recognition Workshop (AIPR).
d’Angelo, P. and G. Kuschk. 2012. Dense multi-view stereo from
satellite imagery. Pages 6944–6947 in IEEE International
Geoscience and Remote Sensing Symposium (IGARSS) .
d’Angelo, P. and P. Reinartz. 2011. Semiglobal matching results on
the ISPRS stereo matching benchmark. Pages 79–84 in ISPRS
Hannover Workshop.
d’Angelo, P., C. Rossi, C. Minet, M. Eineder, M. Flory and I. Niemeyer.
2014. High resolution 3D earth observation data analysis for
safeguards activities. In Symposium on International Safeguards:
Linking Strategy, Implementation and People. Nukleare
Entsorgung und Reaktorsicherheit.
De Franchis, C., E. Meinhardt-Llopis, J. Michel, Morel and G.
Facciolo. 2014. An automatic and modular stereo pipeline
for pushbroom images. ISPRS Annals of the Photogrammetry,
Remote Sensing and Spatial Information Sciences.
Facciolo, G., C. De Franchis and E. Meinhardt-Llopis. 2017.
Automatic 3D reconstruction from multi-date satellite images.
Pages 57–66 in Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition Workshops.
Fraser, C., H. Hanley and T. Yamakawa. 2002. Three‐dimensional
geopositioning accuracy of Ikonos imagery. The
Photogrammetric Record 17 (99): 465–479.
Fraser, C. and H. Hanley. 2005. Bias-compensated RPCs for sensor
orientation of high-resolution satellite imagery. Photogrammetric
Engineering & Remote Sensing 71 (8): 909–915.
Furukawa, Y. and C. Hernández. 2015. Multi-view stereo: A tutorial.
Foundations and Trends® in Computer Graphics and Vision 9
(1–2): 1–148.
Gong, K. and D. Fritsch. 2016. A detailed study about digital surface
model generation using high resolution satellite stereo imagery.
ISPRS Annals of the Photogrammetry, Remote Sensing and
Spatial Information Sciences, 3 (1).
Gong, K. and D. Fritsch. 2017. Relative orientation and modified
piecewise epipolar resampling for high resolution satellite
images. Page 42 in The International Archives of Photogrammetry,
Remote Sensing and Spatial Information Sciences.
Grodecki, J. and G. Dial. 2001. IKONOS geometric accuracy. Pages
19–21 in Proceedings of Joint Workshop of ISPRS Working
Groups I/2, I/5 and IV/7 on High Resolution Mapping from
Space, vol. 4.
Grodecki, J. and G. Dial. 2003. Block adjustment of high-
resolution satellite images described by rational polynomials.
Photogrammetric Engineering & Remote Sensing, 69 (1): 59–58.
Haala, N. 2013. The landscape of dense image matching algorithms.
In Photogrammetric Week’13, edited by D. Fritsch, 271–284.
Wichmann/VDE Verlag Berlin/Offenbach.
Hanley H. and C. Fraser. 2001. Geopositioning accuracy of IKONOS
imagery: Indication from two dimensional transformations.
Photogrammetric Record 17 (98): 317–329.
Hirschmuller, H. 2008. Stereo processing by semiglobal matching and
mutual information. IEEE Transactions on Pattern Analysis and
Machine Intelligence 30 (2): 328–341.
Kim, T. 2000. A study on the epipolarity of linear pushbroom images.
Photogrammetric Engineering and Remote Sensing 62 (8): 961–966.
Koh, J. and H. Yang. 2016. Unified piecewise epipolar resampling
method for pushbroom satellite images. EURASIP Journal on
Image and Video Processing 2016 (1): 11.
Kuschk, G. 2013. Large scale urban reconstruction from
remote sensing imagery. In International Archives of the
Photogrammetry, Remote Sensing and Spatial Information
Sciences, vol. 5/W1.
Oh, J. 2011. Novel Approach to Epipolar Resampling of HRSI and
Satellite Stereo Imagery-Based Georeferencing of Aerial Images,
Ph.D. Dissertation, The Ohio State University.
Ozcanli, O. C., Y. Dong, Mundy, H. Webb, R. Hammoud and V. Tom.
2015. A comparison of stereo and multiview 3-D reconstruction
using cross-sensor satellite imagery. Pages 17–25 in Proceedings
of the IEEE Conference on Computer Vision and Pattern
Recognition Workshops (CVPRW).
Qin, R. 2017. Automated 3D recovery from very high resolution
multi-view satellite images. In ASPRS 2017 Conference Annual.
Rothermel, M., K. Wenzel, D. Fritsch and N. Haala. 2012. SURE:
Photogrammetric surface reconstruction from imagery. In
Proceedings LC3D Workshop, held in Berlin, Germany, vol. 8.
Wang, M., F. Hu and J. Li. 2010. Epipolar arrangement of
satellite imagery by projection trajectory simplification. The
Photogrammetric Record 25 (132): 422–436.
Wang, M., F. Hu and J. Li. 2011. Epipolar resampling of linear
pushbroom satellite imagery by a new epipolarity model. ISPRS
Journal of Photogrammetry and Remote Sensing, 66 (3): 347–355.
Wang, Y. 1999. Automated triangulation of linear scanner imagery.
Pages 27–30 in Joint Workshop of ISPRS WG I/1, I/3 and IV/4 on
Sensors and Mapping from Space.
Wohlfeil, J., H. Hirschmüller, B. Piltz, A. Börner and M. Suppa. 2012.
Fully automated generation of accurate digital surface models
with sub-meter resolution from satellite imagery. International
Archives of the Photogrammetry, Remote Sensing and Spatial
Information Sciences: 34–B3.
Zabih, R. and J. Woodfill. 1994. Non-parametric local transforms for
computing visual correspondence. Pages 151–158 in Computer
Vision ECCV’94, Lecture Notes in Computer Science. Edited by
J.-O. , vol. 801. Springer Berlin Heidelberg..
PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING
May 2019 53
Assignment of Copyright to ASPRS
Current copyright law requires that authors of papers submitted for publication in
Photogrammetric Engineering & Remote Sensing
transfer
copyright ownership to the American Society for Photogrammetry and Remote Sensing before the paper may be published. Upon receipt of
this form with your Master Proof, please complete this form and forward it to the Production Coordinator (address below).
Manuscript Title: _________________________________________________________________________________________________________
________________________________________________________________________________________________________________________
Author(s): _______________________________________________________________________________________________________________
________________________________________________________________________________________________________________________
Assignment of Copyright in the above-titled work is made on (date) _____________________ from the above listed author(s) to the
American Society for Photogrammetry and Remote Sensing, publisher of
Photogrammetric Engineering & Remote Sensing
.
In consideration of the Publisher’s acceptance of the above work for publication, the author or co-author(s) hereby transfer(s) to
the American Society for Photogrammetry and Remote Sensing the full and exclusive copyright to the work for all purposes for the
duration of the copyright. I (we) understand that such transfer of copyright does not preclude specic personal use, provided that
prior to said use, permission is requested from and granted by the American Society for Photogrammetry and Remote Sensing.
I (we) acknowledge that this paper has not been previously published, nor is it currently being considered for publication
by any other organization.
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
Co-authors may ll out and submit this form separately (photocopies are acceptable) if desired; but all co-authors must sign either this or a
separate form.
Special Note to U.S. Government Employees
Material prepared by U.S. Government employees as part of their ofcial duties need not have the assignment of copyright transferred
since such material is automatically considered as part of the public domain.
If your paper falls within this category please check the appropriate statement and sign below.
__________ This paper has been prepared wholly as part of my (our) ofcial duties as (a) U.S. Government Employee(s). I (we)
acknowledge that this paper has not previously been published, nor is it currently being considered for publication,
by any other organization.
__________ This paper has been prepared partly in the course of my (our) ofcial duties as (a) U.S. Government Employee(s).
For any part(s) not prepared in the course of my (our) ofcial duties, copyright is hereby transferred to the American
Society for Photogrammetry and Remote Sensing. I (we) acknowledge that this paper has not previously been
published, nor is it currently being considered for publication, by any other organization.
______________________________________________________________________________________________________________
______________________________________________________________________________________________________________
Please return this for m to:
Production Coordinator, ASPRS
5410 Grosvenor Lane, Suite 210
Bethesda, MD 20814-2160
301-493-0290, 301-493-0208 (fax), www.asprs.org
Author/co-author signatures
Author/co-author signatures
Oprint and Extra Pages Prices
Please refer to the price chart and conditions on the back of this form in order to complete the table below.
Quantity Description Amount
Oprints
Journal Covers (actual issue cover)
Extra Pages @125 per page (each page over 7 journal pages, reference “Instructions to Authors”.)
Shipping (applies to oprints and covers only (see price chart and conditions on back of this form.)
Subtotal
PE&RS Colorplate, Oprint, and Extra Pages Payment Form
This is the only opportunity available to order additional oset printing quality copies of your article. PE&RS does not go back on press after the
issue’s publication. Oprints are only shipped when ordered; they are not complimentary.
However, all authors will receive one complimentary copy of the PE&RS issue containing their article and a complimentary PDF of their article.
This form should be completed as soon as the initial proof is received, and should be returned with payment for oprints and page charges to
ASPRS.
Article Description: Manuscript # _____________________________
Article Title: ____________________________________________________________________________________________________________________________
______________________________________________________________________________________________________________________________________
Author(s): _____________________________________________________________________________________________________________________________
Author Contact Information:
Name: _________________________________________________________ Email: _____________________________________________________________
Aliation: _____________________________________________________________________________________________________________________________
Street Address: _________________________________________________________________________________________________________________________
City: ___________________________________________________________ State/Province: _____________________________________________________
Zip Code/Postal Code: ____________________________________________ Country: ___________________________________________________________
Phone: _________________________________________________________ Fax: _______________________________________________________________
Method of Payment:
Accepted manuscripts with color images will not be released for publication until this form is received along with payment. Payments must be made in US dollars,
drawn on a US bank, or appropriate credit card. Make checks payable to ASPRS. Keep a copy for your records.
Visa MasterCard Discover American Express Check (print PE&RS issue date & manuscript # on check)
Name on Credit Card: _____________________________________________ Signature: __________________________________________________________
Account Number: _________________________________________________________ CSC: _____________ Expires (MO/YR): _____________________
Phone number for person above: ____________________________ Email Address (for receipt) ___________________________________________________
SEND this form along with method of payment to:
ASPRS, 425 Barlow Place, Suite 210, Bethesda, MD 20814-2160 rkelley@asprs.org or 301-493-0208 (fax) 12/13/2016
Color Plate Prices*
1-3 4-6 7+
$500 $1,000 $1,500
Subtotal
* For purposes of this form, a color plate is con-
sidered to be a plate that is numbered, even if it
contains several parts, e.g. (a), (b), and so on.
Total $ _____________
Shipping Information (if dierent from above):
Name: _________________________________________________________
Aliation: _____________________________________________________________________________________________________________________________
Street Address: _________________________________________________________________________________________________________________________
City: ___________________________________________________________ State/Province: _____________________________________________________
Zip Code/Postal Code: ____________________________________________ Country: ___________________________________________________________
Conditions:
Oprints are only shipped when ordered; they are not complimentary. However, all authors will receive one ccomplimentary copy of the
PE&RS issue containing their article and a complimentary PDF of their article. Oprint prices are based on the number of pages in an article.
Orders are restricted to the quantities listed below. If oprints are not ordered prior to the issue’s publication, reprints may be provided in
multiples of 100 at the customer’s expense. Shipping prices are based on weight and must be added to oprint and journal cover prices for an
order to be processed. If overnight delivery is required, customers must provide an account code for the expense of the service (please indicate
carrier and account number in the itemized table next to “Shipping”).
Prices:
Please enter the price that corresponds to the quantity of oprints and/or covers being ordered on the front of this form.
Oprints and Covers 25 50 75 100*
Oprints (1 to 4 page article) 9.00 15.00 21.00 27.00
Oprints (5 to 8 page article) 15.00 27.00 39.00 51.00
Oprints (9 to 12 page article) 21.00 39.00 57.00 75.00
Oprints (13 to 16 page article) 27.00 51.00 75.00 99.00
Journal Covers 9.00 15.00 21.00 27.00
Cover Sponsor (Applies to Outside Front Cover image suppliers only) 475.00 for 1000 covers and TOC
Shipping Prices:
Domestic
(UPS Ground) 25 50 75 100* Canada
(Air Parcel Post) 25 50 75 100*
Oprints (1 to 4 page article) 7.59 7.98 9.08 8.69 Oprints (1 to 4 page article) 19.14 20.78 21.45 22.11
Oprints (5 to 8 page article) 7.98 9.08 9.57 10.53 Oprints (5 to 8 page article) 20.68 22.11 23.87 25.74
Oprints (9 to 12 page article) 8.25 9.57 9.79 10.23 Oprints (9 to 12 page article) 21.45 23.87 26.57 28.82
Oprints (13 to 16 page article) 8.69 9.79 10.23 11.44 Oprints (13 to 16 page article) 22.11 25.74 28.85 31.96
Journal Covers 7.59 7.98 8.25 8.64 Journal Covers 19.14 20.68 21.45 22.11
Cover Sponsor (Applies to Outside Front
Cover image suppliers only) 70.00 Cover Sponsor (Applies to Outside Front
Cover image suppliers only)Contact ASPRS for Shipping Prices.
Mexico
(Air Parcel Post) 25 50 75 100* Overseas
(Air Parcel Post) 25 50 75 100*
Oprints (1 to 4 page article) 22.39 23.87 24.75 27.12 Oprints (1 to 4 page article) 28.05 30.31 41.69 36.06
Oprints (5 to 8 page article) 24.11 27.12 30.64 34.50 Oprints (5 to 8 page article) 30.36 36.03 45.26 47.41
Oprints (9 to 12 page article) 25.52 30.69 36.03 40.26 Oprints (9 to 12 page article) 33.17 41.69 49.55 55.99
Oprints (13 to 16 page article) 27.12 34.60 40.26 45.98 Oprints (13 to 16 page article) 36.03 47.41 55.99 64.51
Journal Covers 22.33 23.87 25.52 27.12 Journal Covers 28.05 30.31 41.69 46.03
Cover Sponsor (Applies to Outside Front
Cover image suppliers only)Contact ASPRS for Shipping Prices. Cover Sponsor (Applies to Outside Front
Cover image suppliers only)Contact ASPRS for Shipping Prices.
... Although active sensors such as LiDAR can directly measure the distance from a satellite to the ground, they require a significant amount of energy compared to passive cameras. Therefore, many pipelines have been developed to accurately estimate depth from disparity using multiple satellite views [7][8][9][10][11][12][13]. The resulting large-scale 3D models are Section 3.3 proposes a lightweight network architecture with only 20% of the number of neurons compared to previous works. ...
... In this paradigm, surfaces are represented by a function of the form f (x, y) = h, known as a digital elevation model (DEM), where (x, y) are spatial coordinates on the Earth (e.g., latitude, longitude), and h represents surface elevation. These methods typically use matching strategies derived from semiglobal matching algorithms [7][8][9]11,37] to estimate dense disparity maps, and handcrafted features and cost functions are at the core of these methods. ...
Article
Full-text available
Neural radiance fields (NeRFs) combining machine learning with differentiable rendering have arisen as one of the most promising approaches for novel view synthesis and depth estimates. However, NeRFs only applies to close-range static imagery and it takes several hours to train the model. The satellites are hundreds of kilometers from the earth. Satellite multi-view images are usually captured over several years, and the scene of images is dynamic in the wild. Therefore, multi-view satellite photogrammetry is far beyond the capabilities of NeRFs. In this paper, we present a new method for multi-view satellite photogrammetry of Earth observation called remote sensing neural radiance fields (RS-NeRFs). It aims to generate novel view images and accurate elevation predictions quickly. For each scene, we train an RS-NeRF using high-resolution optical images without labels or geometric priors and apply image reconstruction losses for self-supervised learning. Multi-date images exhibit significant changes in appearance, mainly due to cars and varying shadows, which brings challenges to satellite photogrammetry. Robustness to these changes is achieved by the input of solar ray direction and the vehicle removal method. NeRFs make it intolerable by requiring a very long time to train an easy scene. In order to significantly reduce the training time of RS-NeRFs, we build a tiny network with HashEncoder and adopted a new sampling technique with our custom CUDA kernels. Compared with previous work, our method performs better on novel view synthesis and elevation estimates, taking several minutes.
... DSM derived from satellite imagery is investigated in (Gong and Fritsch, 2019). The achieved accuracy with overlapping WorldView-3 satellite imagery is quoted with NMAD = 0.7 m or 2.4 GSD and Q68 = 1.0 m or 3.4 GSD (average of three test sites). ...
... These images find extensive applications in different fields such as disaster assessment, urban planning, and target detection [2] [3]. From a technological standpoint, numerous computer vision researches focus on generating digital surface models (DSMs) or digital elevation models (DEMs) using multi-view satellite photogrammetry [7]. While some of these works perform binocular stereo matching and bundle adjustment based on selected pixel and they fuse the resulting point clouds or depth maps using camera matrices [4] [6] [9]. ...
Preprint
Full-text available
p>We propose a novel generic method to address the challenge of handling unconstrained multi-view optical satellite photogrammetry under time-varying conditions of illumination and reflection. For one thing, we innovatively represent the surface radiance and albedo produced by extensive lights with continuous radiance fields based on the radiometry principle and then combine the static and transient components for satellite photogrammetry. For another, a novel self-supervised mechanism is introduced to optimize the learning process which leverages dark regions accentuation, transient and static composition, as well as occlusion and shadow suppression. We evaluate the proposed framework via real-world multi-date WorldView-3 images and demonstrate that our proposed model consistently outperforms the existing state-of-the-art methods.</p
... These images find extensive applications in different fields such as disaster assessment, urban planning, and target detection [2] [3]. From a technological standpoint, numerous computer vision researches focus on generating digital surface models (DSMs) or digital elevation models (DEMs) using multi-view satellite photogrammetry [7]. While some of these works perform binocular stereo matching and bundle adjustment based on selected pixel and they fuse the resulting point clouds or depth maps using camera matrices [4] [6] [9]. ...
Preprint
Full-text available
p>We propose a novel generic method to address the challenge of handling unconstrained multi-view optical satellite photogrammetry under time-varying conditions of illumination and reflection. For one thing, we innovatively represent the surface radiance and albedo produced by extensive lights with continuous radiance fields based on the radiometry principle and then combine the static and transient components for satellite photogrammetry. For another, a novel self-supervised mechanism is introduced to optimize the learning process which leverages dark regions accentuation, transient and static composition, as well as occlusion and shadow suppression. We evaluate the proposed framework via real-world multi-date WorldView-3 images and demonstrate that our proposed model consistently outperforms the existing state-of-the-art methods.</p
... The SGM algorithm was first released in 2011 and performed well on ISPRS test data [20]. After that, Gong, K. et al. used the hierarchical method of the SGM to conduct experiments, further expanding the SGM algorithm [21]. Li, Y. S. et al. adopted an efficient hierarchical matching strategy, which significantly reduced the matching cost of the SGM algorithm [22]. ...
Article
Automatic reconstruction of DSMs from satellite images is a hot issue in the field of photogrammetry. Nowadays, most state-of-the-art pipelines produce 2.5D products. In order to solve some shortcomings of traditional algorithms and expand the means of updating digital surface models, a DSM generation method based on variational mesh refinement of satellite stereo image pairs to recover 3D surfaces from coarse input is proposed. Specifically, the initial coarse mesh is constructed first and the geometric features of the generated 3D mesh model are then optimized by using the information of the original images, while the 3D mesh subdivision is constrained by combining the image’s texture information and projection information, with subdivision optimization of the mesh model finally achieved. The results of this method are compared qualitatively and quantitatively with those of the commercial software PCI and the SGM method. The experimental results show that the generated 3D digital surface has clearer edge contours, more refined planar textures, and sufficient model accuracy to match well with the actual conditions of the ground surface, proving the effectiveness of the method. The method is advantageous for conducting research on true 3D products in complex urban areas and can generate complete DSM products with the input of rough meshes, thus indicating it has some development prospects.
Preprint
Full-text available
Epipolar resampling is an essential step for 3D reconstruction and Digital Surface Moeld(DSM) generation from satellite images. However, with improved satellite image resolutions and larger image size, the time required for epipolar resampling increases exponentially. While the polynomial fitting methods based on Piecewise Projection Trajectory Method (PPTM) is commonly employed for epipolar resampling, its effectiveness and efficiency diminish when confronted with large and high-resolution images. To tackle this challenge, we propose a novel parallel block-wise epipolar resampling method designed to expedite the resampling process without compromising accuracy. This method leverages PPTM and fixed elevation plane to establish the relationship between left and right epipolar points. Local affine transformations and image partitioning replace polynomial transformations applied across the entire image to approximate the correspondence between original and epipolar images. Furthermore, parallel computation was employed for block-wise pixel resampling acceleration. Experimental analysis using IKONOS-2, ZY-3, and GF-7 images confirms the efficacy and accuracy of our method. We achieve subpixel y-disparities comparable to polynomial fitting methods, while reducing resampling time by 10 to 20 percent through single-core serialization. Moreover, multi-core parallelism approach achieves a parallel efficiency exceeding 80%.
Article
Multi-view digital surface model (DSM) fusion has emerged as an important technique for three-dimensional (3D) reconstruction of multi-view satellite images. However, existing multi-view DSM fusion approaches are prone to the problems of blurred elevation divisions at the object edges, salt and pepper noises on smooth surfaces, and severe loss of surface details in weakly textured regions. In this paper, we present a cascade domain clustering (CDC) algorithm for fusing multi-view DSMs, which is realized by the combination of salient domain clustering and model domain clustering. Initially, the salient domain clustering is employed to demarcate prominent objects and identify regional edges using two-dimensional (2D) spectral and 3D elevation information. Subsequently, to further segment the intricate surface structures, particularly for objects with low texture attributes, we implement the model domain clustering to iteratively aggregate 3D points and fit geometric models corresponding to the aggregated clusters. Finally, the multi-view DSMs are fused iteratively through the weighted least squares (WLS) method, with model clusters serving as the fundamental units, under the constraints of the geometric models. Sufficient experiments show that the proposed CDC algorithm surpasses other popular multi-view DSM fusion algorithms in terms of completeness and accuracy, achieving 91.51% completeness and 0.93m RMSE, representing a 79.89% improvement in completeness and an 86.40% reduction in RMSE compared to the popular stereo 3D reconstruction pipeline.
Article
Existing NeRF models for satellite imagery have limitations in processing large images and require solar input, leading to slow speeds. As a response, we introduce SatensoRF, which speeds up the entire process significantly while using fewer parameters for large satellite imagery. We have noticed that the common assumption of Lambertian surfaces in satellite neural radiance fields is not sufficient for vegetative and aquatic elements. In contrast to the traditional hierarchical MLP-based scene representation, we have chosen a multiscale tensor decomposition approach for color, volume density, and auxiliary variables to model the light field with specular color. Additionally, to rectify inconsistencies in multi-date imagery, we incorporate total variation denoising to restore the density tensor field, thus mitigating the negative impact of transient objects. To validate our approach, we conducted assessments of SatensoRF using subsets from the spacenet multi-view dataset, which includes both multi-date and single-date multi-view RGB images. Our results demonstrate that SatensoRF surpasses the state-of-the-art Sat-NeRF series regarding novel view synthesis performance. Significantly, SatensoRF requires fewer parameters for training, resulting in faster training and inference speeds and reduced computational demands.
Article
We introduce a novel method tailored for unconstrained multi-view optical satellite photogrammetry in time-varying illumination and reflection conditions. Our approach employs continuous radiance fields to represent surface radiance and albedo based on radiometry principles, integrating both static and transient components for satellite photogrammetry. Additionally, an innovative self-supervised mechanism is introduced to optimize the learning process which leverages dark regions accentuation, transient and static composition, as well as shadow regularization. Evaluations on multi-date WorldView-3 images affirm that our model consistently surpasses the state-of-the-art techniques.
Preprint
Full-text available
p>We introduce a novel method tailored for unconstrained multi-view optical satellite photogrammetry in time-varying illumination and reflection conditions. Our approach employs continuous radiance fields to represent surface radiance and albedo based on radiometry principles, integrating both static and transient components for satellite photogrammetry. Additionally, an innovative self-supervised mechanism is introduced to optimize the learning process which leverages dark regions accentuation, transient and static composition, as well as shadow regularization. Evaluations on multi-date WorldView-3 images affirm that our model consistently surpasses the state-of-the-art techniques.</p
Conference Paper
Full-text available
Both improvements in camera technology and the rise of new matching approaches triggered the development of suitable software tools for image based 3D reconstruction by research groups and vendors of photogrammetric software. Based on dense pixel-wise matching, the photogrammetric generation of dense 3D point clouds and Digital Surface Models from highly overlapping aerial images has become feasible. In order to evaluate the quality of these matching algorithms in terms of accuracy and reliability, the European Spatial Data Research Organisation (EuroSDR) started a benchmark on image based DSM generation in February 2013. This test is based on two representative image blocks, which were processed by different groups with different software systems. The results provided from the different groups give a profound insight to the landscape of dense matching algorithms and are used within the paper to evaluate the potential of image based photogrammetric data collection.
Article
Full-text available
High resolution, optical satellite sensors are boosted to a new era in the last few years, because satellite stereo images at half meter or even 30cm resolution are available. Nowadays, high resolution satellite image data have been commonly used for Digital Surface Model (DSM) generation and 3D reconstruction. It is common that the Rational Polynomial Coefficients (RPCs) provided by the vendors have rough precision and there is no ground control information available to refine the RPCs. Therefore, we present two relative orientation methods by using corresponding image points only: the first method will use quasi ground control information, which is generated from the corresponding points and rough RPCs, for the bias-compensation model; the second method will estimate the relative pointing errors on the matching image and remove this error by an affine model. Both methods do not need ground control information and are applied for the entire image. To get very dense point clouds, the Semi-Global Matching (SGM) method is an efficient tool. However, before accomplishing the matching process the epipolar constraints are required. In most conditions, satellite images have very large dimensions, contrary to the epipolar geometry generation and image resampling, which is usually carried out in small tiles. This paper also presents a modified piecewise epipolar resampling method for the entire image without tiling. The quality of the proposed relative orientation and epipolar resampling method are evaluated, and finally sub-pixel accuracy has been achieved in our work.
Conference Paper
Full-text available
This paper presents an automated pipeline for processing multi-view satellite images to 3D digital surface models (DSM). The proposed pipeline performs automated geo-referencing and generates high-quality densely matched point clouds. In particular, a novel approach is developed that fuses multiple depth maps derived by stereo matching to generate high-quality 3D maps. By learning critical configurations of stereo pairs from sample LiDAR data, we rank the image pairs based on the proximity of the results to the sample data. Multiple depth maps derived from individual image pairs are fused with an adaptive 3D median filter that considers the image spectral similarities. We demonstrate that the proposed adaptive median filter generally delivers better results in general as compared to normal median filter, and achieved an accuracy of improvement of 0.36 meters RMSE in the best case. Results and analysis are introduced in detail.
Article
Full-text available
Photogrammetry is currently in a process of renaissance, caused by the development of dense stereo matching algorithms to provide very dense Digital Surface Models (DSMs). Moreover, satellite sensors have improved to provide sub-meter or even better Ground Sampling Distances (GSD) in recent years. Therefore, the generation of DSM from spaceborne stereo imagery becomes a vivid research area. This paper presents a comprehensive study about the DSM generation of high resolution satellite data and proposes several methods to implement the approach. The bias-compensated Rational Polynomial Coefficients (RPCs) Bundle Block Adjustment is applied to image orientation and the rectification of stereo scenes is realized based on the Project-Trajectory-Based Epipolarity (PTE) Model. Very dense DSMs are generated from WorldView-2 satellite stereo imagery using the dense image matching module of the C/C++ library LibTsgm. We carry out various tests to evaluate the quality of generated DSMs regarding robustness and precision. The results have verified that the presented pipeline of DSM generation from high resolution satellite imagery is applicable, reliable and very promising.
Article
Full-text available
Computational stereo is in the fields of computer vision and photogrammetry. In the computational stereo and surface reconstruction paradigms, it is very important to achieve appropriate epipolar constraints during the camera-modeling step of the stereo image processing. It has been shown that the epipolar geometry of linear pushbroom imagery has a hyperbola-like shape because of the non-coplanarity of the line of sight vectors. Several studies have been conducted to generate resampled epipolar image pairs from linear pushbroom satellites images; however, the currently prevailing methods are limited by their pixel scales, skewed axis angles, or disproportionality between x-parallax disparities and height. In this paper, a practical and unified piecewise epipolar resampling method is proposed to generate stereo image pairs with zero y-parallax, a square pixel scale, and proportionality between x-parallax disparity and height. Furthermore, four criteria are suggested for performance evaluations of the prevailing methods, and experimental results of the method are presented based on the suggested criteria. The proposed method is shown to be equal to or an improvement upon the prevailing methods. Keywords Pushbroom high-resolution satellite imagery Piecewise epipolar resampling Stereo image pair
Article
We propose a new approach to the correspondence problem that makes use of non-parametric local transforms as the basis for correlation. Non-parametric local transforms rely on the relative ordering of local intensity values, and not on the intensity values themselves. Correlation using such transforms can tolerate a significant number of outliers. This can result in improved performance near object boundaries when compared with conventional methods such as normalized correlation. We introduce two non-parametric local transforms: the rank transform, which measures local intensity, and the census transform, which summarizes local image structure. We describe some properties of these transforms, and demonstrate their utility on both synthetic and real data.
Article
Although epipolar geometry is a very useful clue in processing stereo images, it has not been thoroughly examined previously for linear pushbroom images. Some have assumed that epipolar geometry would be the same for pushbroom images as for perspective images. Some do not use this geometry at all because it is not fully understood. The purpose of this paper is to provide a theoretical basis for the epipolar geometry of linear pushbroom images and to discuss the practical implications of this geometry in processing such images. We show that epipolarity for linear pushbroom images is different from that for perspective images. We also derive an equation for epipolar curves of linear pushbroom images, which are not lines but hyperbola-like non-linear curves. Through analyses of the properties of these curves, we conclude that these curves can be approximated as piece-wise linear segments and that any closely located points on one epipolar curve are mapped onto a common epipolar curve.