ArticlePDF Available

Spatial interpolation using conditional generative adversarial neural networks

Authors:

Abstract and Figures

Spatial interpolation is a traditional geostatistical operation that aims at predicting the attribute values of unobserved locations given a sample of data defined on point supports. However, the continuity and heterogeneity underlying spatial data are too complex to be approximated by classic statistical models. Deep learning models, especially the idea of conditional generative adversarial networks (CGANs), provide us with a perspective for formalizing spatial interpolation as a conditional generative task. In this article, we design a novel deep learning architecture named conditional encoder-decoder generative adversarial neural networks (CEDGANs) for spatial interpolation, therein combining the encoder-decoder structure with adversarial learning to capture deep representations of sampled spatial data and their interactions with local structural patterns. A case study on elevations in China demonstrates the ability of our model to achieve outstanding interpolation results compared to benchmark methods. Further experiments uncover the learned spatial knowledge in the model’s hidden layers and test the potential to generalize our adversarial interpolation idea across domains. This work is an endeavor to investigate deep spatial knowledge using artificial intelligence. The proposed model can benefit practical scenarios and enlighten future research in various geographical applications related to spatial prediction.
Content may be subject to copyright.
Full Terms & Conditions of access and use can be found at
https://www.tandfonline.com/action/journalInformation?journalCode=tgis20
International Journal of Geographical Information
Science
ISSN: 1365-8816 (Print) 1362-3087 (Online) Journal homepage: https://www.tandfonline.com/loi/tgis20
Spatial interpolation using conditional generative
adversarial neural networks
Di Zhu, Ximeng Cheng, Fan Zhang, Xin Yao, Yong Gao & Yu Liu
To cite this article: Di Zhu, Ximeng Cheng, Fan Zhang, Xin Yao, Yong Gao & Yu Liu (2019):
Spatial interpolation using conditional generative adversarial neural networks, International Journal
of Geographical Information Science, DOI: 10.1080/13658816.2019.1599122
To link to this article: https://doi.org/10.1080/13658816.2019.1599122
Published online: 16 Apr 2019.
Submit your article to this journal
View Crossmark data
RESEARCH ARTICLE
Spatial interpolation using conditional generative
adversarial neural networks
Di Zhu
a,b,c
, Ximeng Cheng
a
, Fan Zhang
a,d
, Xin Yao
a
, Yong Gao
a
and Yu Liu
a
a
Institute of Remote Sensing and Geographical Information Systems, School of Earth and Space Sciences,
Peking University, Beijing, China;
b
Beijing Key Lab of Spatial Information Integration and Its Applications,
Peking University, Beijing, China;
c
SpaceTimeLab, Department of Civil, Environmental and Geomatic
Engineering, University College London, London, UK;
d
Senseable City Laboratory, Massachusetts Institute
of Technology, Cambridge, MA, USA
ABSTRACT
Spatial interpolation is a traditional geostatistical operation that
aims at predicting the attribute values of unobserved locations
given a sample of data dened on point supports. However, the
continuity and heterogeneity underlying spatial data are too com-
plex to be approximated by classic statistical models. Deep learning
models, especially the idea of conditional generative adversarial
networks (CGANs), provide us with a perspective for formalizing
spatial interpolation as a conditional generative task. In this article,
we design a novel deep learning architecture named conditional
encoder-decoder generative adversarial neural networks (CEDGANs)
for spatial interpolation, therein combining the encoder-decoder
structure with adversarial learning to capture deep representations
of sampled spatial data and their interactions with local structural
patterns. A case study on elevations in China demonstrates the
ability of our model to achieve outstanding interpolation results
compared to benchmark methods. Further experiments uncover the
learned spatial knowledge in the models hidden layers and test the
potential to generalize our adversarial interpolation idea across
domains. This work is an endeavor to investigate deep spatial
knowledge using articial intelligence. The proposed model can
benet practical scenarios and enlighten future research in various
geographical applications related to spatial prediction.
ARTICLE HISTORY
Received 18 April 2018
Accepted 20 March 2019
KEYWORDS
Spatial interpolation;
generative adversarial
networks; deep learning;
encoder-decoder; spatial
prediction
1. Introduction
When attempting to understand a geographical phenomenon, such as the spatial
distribution of precipitation, we are often forced to collect a limited number of samples
instead of acquiring information at every possible location (Cochran 1963, Hedayat and
Sinha 1991, Goodchild et al.1993, Thompson 1996, Fotheringham and Rogerson 2008).
Spatial interpolation is a traditional geostatistical operation that aims at predicting the
value zðxÞat an unobserved location xgiven some sampled data zðxÞat observed
locations x(Atkinson and Lloyd 2009, Lam 2009). Toblersrst law (TFL) of geography
(Tobler 1970) describes the essential nature of the real world from a geographic view.
CONTACT Yu Liu liuyu@urban.pku.edu.cn
INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE
https://doi.org/10.1080/13658816.2019.1599122
© 2019 Informa UK Limited, trading as Taylor & Francis Group
Oliver and Webster (1990) further noted that spatially distributed data behave more like
random variables, where stochastic models are required to characterize the underlying
spatial autocorrelation (Hubert et al.1981, Azaele et al.2009, Fischer et al.2010) and
spatial non-stationarity (Anselin 1995, Marsily et al.2005, Fotheringham et al.2017).
The complex features of spatial distribution patterns necessitate the development of
interpolation methods, of which kriging (Matheron 1963, Cressie 1990, Li and Heap
2011) is the most commonly used geostatistical method and can be roughly divided into
two types that conceptually rely on dierent approaches to modeling the spatial
variability. The rst type of methods, such as simple kriging (SK), ordinary kriging (OK)
and cokriging, characterizes the spatial structural features by estimating the semi-
variogram cloud, which is a plot of the semi-variances γðhÞfor paired data against the
distances hseparating the paired data points, and uses the tted model to make spatial
estimations (Matheron 1963, Diggle et al.1998). The other type of methods, such as
regression kriging (RK) and universal kriging (UK) (Appelhans et al.2015,Liet al.2015),
makes predictions by combining a regression of the dependent variable on auxiliary
variables with the SK of the regression residuals (Hengl et al.2007), which further leads
to the training-based multi-point geostatistics (MPS)(Mariethoz and Caers 2014).
Despite the above-mentioned endeavours in spatial interpolation, we have to admit
that the nature of spatial continuity and heterogeneity in geographical digital represen-
tations (Goodchild 2004, Zhu et al.2018) is substantially more complex than classic
statistical models (Shepard 1968, Oliver and Webster 1990). In recent years, deep
learning approaches have been increasingly used to understand spatial processes from
a data-driven perspective, as they can well extract underlying patterns given complex
spatial contexts. Convolutional neural networks (CNNs) have been proven to be extre-
mely ecient for high-dimensional data representation and function approximation
(LeCun et al.2015). Through the backpropagation of gradients in the linear transform
layers combined with non-linear activations, these networks learn a way to transform
the input into an ideal output representation by capturing the deep features of gen-
eration as the high-dimensional parameters (Le 2013, Schmidhuber 2014). More impor-
tantly, the characteristics of the CNNs architecture local connectivity and shared
weights enable the model to focus on features near to each other as well as far
away features, which is consistent with the function approximation objective in many
spatial analysis problems (Fischer 1998).
The workow of spatial interpolation can be considered as a generative procedure:
only limited data on point supports (the space on which each observation is dened)
(Atkinson and Lloyd 2009) can be acquired. The objective is to generate an accurate
global mapping of the spatial phenomenon through learning of observed reciprocities
among location attributes. A deep learning framework named generative adversarial
networks (GANs) (Goodfellow et al.2014) was recently introduced as a powerful archi-
tecture for training generative models, therein sidestepping the diculty of approximat-
ing many intractable probabilistic computations by adopting an adversarial structure to
train the loss (Radford et al.2015, Salimans et al.2016). Based upon the idea of GANs,
conditional generative adversarial networks (CGANs) is an extension of GANs that
enables us to direct the data generation process by conditioning the model on certain
external information (Mirza and Osindero 2014). The CGAN has been widely used in
various data generation applications such as image super-resolution (Chen et al.2016,
2D. ZHU ET AL.
Zhao et al.2019), image-to-image translation (Isola et al.2016), face generation (Antipov
et al.2017) and terrain reconstruction (Gurin et al.2017).
Previous research on CGANs mainly formalizes the deterministic conditions of the
generation as some loosely coupled auxiliary features with no spatial information, and
their objective is for the generator to create realistic-looking fake images that the
discriminator is unable to identify. For example, Antipov et al.(2017) successfully
simulated the face aging of people by using a random latent vector to represent
a persons identity and a conditional age term to control the generation. The accuracy
of the generated fake images is often beyond the scope of consideration in related
state-of-the-art CGANs (Lu et al.2017, Laloy et al.2018).
In contrast, spatial interpolation requires an accurate estimation of the real spatial
pattern instead of simply a realistic-looking reproduction. Therefore, a spatial extension
of state-of-the-art deep learning structures is needed to bridge the gap between CGANs
and the task of spatial interpolation such that an accurate global estimation given
certain spatial sampled data can be achieved.
This article introduces a novel idea of using conditional generative adversarial net-
works to capture deep spatial features underlying spatial distribution datasets and to
perform spatial interpolation. To achieve this objective, we designed a deep learning
model named conditional encoder-decoder generative adversarial neural networks
(CEDGANs) with spatial consideration. Incorporating an encoder-decoder structure
with the idea of adversarial learning, the proposed model can learn the deep features
of input sampled spatial data and their complex interactions with local structural
patterns. A case study on the terrains in China demonstrates the ability of our model
to gain outstanding spatial interpolation results compared to benchmark methods.
Further experiments investigate the learned complex spatial knowledge and demon-
strate the potential of generalizing the CEDGAN-based spatial interpolation idea to more
geographical applications.
2. Methodology
Considering the gaps between spatial interpolation and common conditional generation
tasks, we need to explicitly consider both spatial structural patterns and interpolation
accuracies in the generative adversarial model. The proposed model is assumed to take
spatial sampled data as the only deterministic input (with no priori noise) and to
perform accurate generation using the knowledge captured during the adversarial
learning. For clarity, we will rst briey present the concept of GANs and the state-of-
the-art CGANs, and then, we will show how to construct the adversarial spatial inter-
polation structure using a restructured CGAN.
2.1. Generative adversarial networks
Basically, the GAN framework introduced by Goodfellow et al.(2014) consists of two
models ðG;DÞ:ageneratorGthat attempts to capture the data distribution and
a discriminator Destimating the probability that a sample comes from the real
dataset rather than G. To learn a generator distribution pgsimilar to the distribution
INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE 3
pdataðxÞof a dataset x,Gusually maps a noise vector zfrom the prior distribution
pzðzÞto the data space as GðzÞ. The discriminator Doutputs a single scalar represent-
ing the probability that the input data come from the training set rather than the
generated samples of G.
Gand Dare trained following a two-player minimax game so that the parameters θgof
Gare adjusted to maximally confuse the discriminator, i.e. minimizing logð1DðGðzÞÞÞ,
and the parameters θdof Dare adjusted to make the best judgement, i.e. maximizing
logDðxÞþlogð1DðGðzÞÞÞ. The objective function of the minimax game is
min
θg
max
θdðEx,pdataðxÞ½logDðxÞ þ Ez,pzðzÞ½logð1DðGðzÞÞÞÞ:(1)
The training of the adversarial network can be conducted through simultaneously
updating θdand θgby descending the stochastic gradient of logistic loss functions, i.e.
θd
1
2nP
n
i¼1½logð1DðxðiÞÞÞ þ logDðGðzðiÞÞÞ (2)
and
θg
1
nP
n
i¼1½logð1DðGðzðiÞÞÞÞ;(3)
respectively, where nis the number of samples in each data batch during training.
GANs can be extended to a conditional version named CGANs if both Gand Dare
conditioned on the same auxiliary information y, which can restrict Gin its generation
process and Din its discrimination process. In previous works, the prior input noise
vector zand the condition yhave been combined jointly as low-dimensional inputs for
Gto generate dierent random fake data under the same condition, while the discrimi-
nator receives x(or Gðz;yÞ) and yas inputs to make a determination based on ywithout
considering z(Gauthier 2014, Mirza and Osindero 2014, Antipov et al.2017). The
objective function of a CGAN is formalized as Equation (4):
min
θg
max
θdðEx,pdataðxÞ½logDðx;yÞ þ Ez,pzðzÞ½logð1DðGðz;yÞ;yÞÞÞ:(4)
2.2. Adversarial spatial interpolation using point supports as conditions
For spatial interpolation scenarios, however, the traditional adversarial strategy needs to
be modied to ensure the stability of conditional generations. Specically, the random
noise vector zthat is commonly used to generate random data samples should be
removed such that the conditional generation could be considered to be determined by
the sampled data as the only constraint.
Let the data space
ΔRCWH,whereWand Hrepresent the size of a spatial raster data
(spatial image) and Cis the number of data channels. A real spatial image is dened as
x2V. If the point supports (the space on which each observation is dened) of a sampling
conguration fon xwith msampled locations is f¼½ðc1;r1Þ;ðc2;r2Þ;;ðcm;rmÞ 2 R2m,
where ðck;rkÞis the coordinate of the kth observed point, we can formalize the sampled
spatial image fðxÞ2Vas
4D. ZHU ET AL.
fðxÞð:;i;jÞ:¼xð:;i;jÞifði;jÞ2f;
N=A otherwise:
(5)
When training an adversarial spatial interpolation network, we need a generator Gthat
requires the sampled image fðxÞas input and output a generated fake image GðfðxÞÞ 2
Vas close to the real image xas possible. In addition, a discriminator Dneeds to be
trained to distinguish the fake image GðfðxÞÞ from a real image xbased on the sampled
image fðxÞ. The objective function of adversarial spatial interpolation networks can be
dened as
min
θg
max
θdðEx,pdataðxjfðxÞÞ½logDðx;fðxÞÞ þ Ex,pdataðxjfðxÞÞ½logð1DðGðfðxÞÞ;fðxÞÞÞÞ;(6)
where Gis a dierentiable function representing the generators structure with para-
meters θgand Dis a dierentiable function representing the discriminators structure
with parameters θd.Gattempts to approximate a conditional probability distribution
pgðGðfðxÞÞjfðxÞÞ most similar to the conditional probability pdataðxjfðxÞÞ in the real
dataset, therein minimizing the second term of Equation (6). Meanwhile, Djudges
whether a spatial image came from pgðGðfðxÞÞjfðxÞÞ or pdataðxjfðxÞÞ, maximizing both
terms in Equation (6).
Compared with GANs and CGANs (see Equation (1) and (4)), both terms of Equation
(6) contain a spatial conditional data fðxÞdeduced from the training data instead of
some explicit auxiliary conditional data y. The most important thing is that our adver-
sarial spatial interpolation learning is designed to approximate the conditional genera-
tive probability distribution given spatial sampled images (pdataðxjfðxÞÞ) rather than the
probability distribution of data existence (pdataðxÞ).
By discarding the prior noise vector z, the adversarial network only takes a pre-
dened sampling conguration function fto direct the generation, with no random
feature aecting the result of the spatial interpolation such that the output can be stable
given a sampled image. The basic requirement for spatial interpolation is that we will
not obtain two dierent interpolated images given the same sampled image. However,
if the scenario changes to where we allow multiple results given the same sampled
image, Equation (6) is actually not contradictory to Equation (4), as we can add a term z
to allow variations in the output.
We rephrase Equation (6) in the form of a binary cross-entropy (BCL) loss function JðθÞ
for clarity. Given a mini-batch xðiÞ

n
i¼1of ntraining real spatial images, the loss function
for Dis dened to let Dassign a true label to real spatial images xðiÞbased on point
supports fðxðiÞÞand a false label to generated fake spatial images GðfðxðiÞÞÞ based on the
same fðxðiÞÞ:
JðθdÞ¼1
2nP
n
i¼1
logð1DðxðiÞ;fðxðiÞÞÞÞ þ P
n
i¼1
logDðGðfðxðiÞÞÞ;fðxðiÞÞÞ

:(7)
The loss function for Gis similar but relates only to the second term of Equation (6) and
attempts to trick D:
JðθgÞ¼1
nP
n
i¼1
logð1DðGðfðxðiÞÞÞ;fðxðiÞÞÞÞ:(8)
INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE 5
Then, we can minimize the loss function of the adversarial interpolation by simulta-
neously updating θdand θgusing stochastic gradient descent θg:¼θgαJðθgÞ
and θd:¼θdαJðθdÞ.
2.3. Conditional encoder-decoder generative adversarial networks for spatial
interpolation
In Section 2.2, we have dened the input and the training objective of an adversarial
spatial interpolation model. However, further considerations of how to capture local
spatial structural patterns and how to make the model trainable are not mentioned.
As for the training, Radford et al.(2015) introduced a class of stable architectures for
training GANs named deep convolutional GANs (DCGANs), where they replaced pooling
layers with strided convolutions in the discriminator and fractional-strided convolutions
in the generator to conserve the image continuity information (Kingma and Welling
2014, Mansimov et al.2015). However, the DCGANsgenerator contains only a decoder
structure to generate images from noise, with no attempt to link deep features with
spatial constraints. Simultaneously, some encoder-decoder architectures, such as the
SegNet (Badrinarayanan et al.2017) and U-Net (Ronneberger et al.2015), that use an
encoder structure to obtain the deep feature maps from inputs and a decoder to
upsample the deep features into full-size image representations (Long et al.2015,
Isola et al.2016) can be adopted to design our generative model, which needs to
capture deep spatial representations.
Here, we propose a conditional encoder-decoder generative adversarial network
(CEDGAN) to model adversarial spatial interpolation. The main structure of CEDGAN is
illustrated in Figure 1(a). A CEDGAN consists of a generator Gand a discriminator D.G
attempts to learn the relationships between sampled spatial data and corresponding
real spatial data and to achieve the objective of generating as accurate as possible fake
spatial data. Dattempts to capture the correspondence between spatial data and the
sampled data, with the objective of determining whether the interpolated fake data can
be considered correct based on the limited samples.
In Figure 1(b), we display the details of Gand D. The generator Gis designed to be
a fully convolutional encoder-decoder structure that contains three two-dimensional
convolution layers as the encoder (convs 1, 2 and 3) and three two-dimensional
transposed convolution layers as the decoder (deconvs 1, 2 and 3). Each encoder layer
performs a zero-padding convolution with the given convolving kernel and stride
length. Each decoder layer implements the upsampling of the feature maps through
a fractionally strided transposed convolution with the same settings as that of the
encoder layers. The discriminator Dis a convolutional neural network similar to typical
models of image classication except that we use a concat operation to merge the
sampled data fðxÞand the full-size real data x(or fake data GðfðxÞÞ) as the input. Each
layer of Dperforms a zero-padding convolution with the same settings of the encoder
layers in G. The output of Dis a scalar indicating whether the input full-size image is
a correct interpolation.
Batch normalization (BN) (Ioeand Szegedy 2015) is applied to all layers except for
the output layer of Gand the input/output layer of D. This can avoid model instability
and help gradients ow in the networks. The LeakyReLU activation (Xu et al.2015)is
6D. ZHU ET AL.
used after convolutions, and the ReLU activation (Nair and Hinton 2010) is used after
transposed convolutions. For the output layers of Gand D, we use the Tanh and Sigmoid
activation functions, respectively, according to Radford et al.(2015).
3. Experiments on spatial data: case of the DEM interpolation
3.1. Data descriptions
We use a dataset of digital elevation models (DEMs) in China as an example to test the
feasibility and eectiveness of the proposed CEDGAN model. However, we hope to
address problems not only in DEMs. The method can be applied to a broader range of
spatial data. We select DEMs as our case study simply because the ground-truth terrain
data can help us test the accuracy of interpolations and thus demonstrate the feasibility
of our adversarial model in capturing deep spatial features.
Four representative subregions in mainland China are selected as the ground truths,
including the Shannan area of the Tibetan Plateau, the Sichuan Basin, the Pearl River Delta,
and the Qinling Mountains. These regions consist of various terrains that have a diverse
range of altitude and hypsography. An overview of the study areas is illustrated in 2.The
GDEM Version2
1
for these areas are collected as the raw DEM dataset. After preprocessing,
single-channel DEM tiles (1 32 32) with no repetition are randomly cropped using
Monte Carlo simulation as the DEM images. To address the concerns of over-tting and
Figure 1. An illustration of how a conditional encoder-decoder generative adversarial network
(CEDGAN) works for spatial interpolation. (a) The main loop of training, where real images and
fake images are discriminated by Dconditioned on the same sampled data, and the gradients of Ds
output are used to update models parameters. (b) For G, sampled images fðxÞare encoded into
spatial feature maps, and then, fractionally strided convolutions upsample the deep features into
fake spatial images GðfðxÞÞ of full size. For the discriminator D, the real spatial images x(or fake
images GðfðxÞÞ) and the corresponding sampled images are merged as the input. The output of Dis
a scalar to determine a correct interpolation.
INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE 7
memorization of training samples, we acquire 60,000 DEM images in total, with 48,000
images composing the training set and 12,000 images composing the validation set. For
each subarea, there are 15,000 ground-truth images, of which 12,000 are for training and
3,000 are for validation. Each DEM image covers a 0:10:1geographic tile. The terrain
elevations in the dataset range from 7 m to 6,999 m. We rst transform these images
linearly into oat tensor images (½0:0;1:0); then, we normalize the tensor images to have
0.5 mean and 0.5 standard deviation (½1:0;1:0) for improved training eciency. All
elevations are mapped back to their original values in the reported accuracies.
Noting that there are many indicators that can measure the performance of a spatial
interpolation method (Li and Heap 2011), we simply choose the root mean square error
(RMSE) to calculate the interpolation error εat the pixel level, as it requires minimal
auxiliary information to utilize:
ε¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
Pn
i¼1ðpioiÞ2
n
r;(9)
where nis the total number of pixels, pis the predicted value, and ois the observed value.
3.2. Adversarial training procedure
In this section, we show the results of spatial interpolation with a 10 10 uniform
sampling conguration u100 on 1 32 32 DEM images as an example. A 10 10
Figure 2. Study areas for the spatial interpolation of terrain elevations. Subregion Ais the Shannan
area of the Tibetan Plateau. Subregion Bis the Sichuan Basin. Subregion Cis the Pearl River Delta.
Subregion Dis the Qinling Mountains. We omit the north arrow and the map scale for simplicity.
8D. ZHU ET AL.
uniformly sampled DEM image contains the elevations of 100 locations that are evenly
distributed, and the other locations are null (Figure 3).
The network is trained using mini-batch stochastic gradient descent (SGD) with
a batch size of 64. The training dataset with 48,000 DEM images is randomly divided
into 750 batches, with each batch containing 64 images (dropping the last batch with
fewer than 64 images). Based on the parameter suggestions of Radford et al.(2015), for
layers with LeakyReLU activation, we set the slope of the leak to be 0.2. In addition, we
use the Adam optimizer, where β1¼0:5 and β2¼0:999, and the learning rate αfor
backpropagation is set to 0.0002. All gradients are computed using Equations (7) and (8).
The details of the adversarial training are shown in Figure 4. The evolution of our
model can be easily identied in the main plot of Figure 4(a), where the RMSE between
the generated fake data and real data are computed to plot the gray error curve. The
error curve shows that the accuracy of our model is evidently increasing during the rst
60,000 batches (80 epochs) of training; however, after that, the improvement is not very
signicant. We train on 150,000 batches (200 epochs) and nd that the average inter-
polation error per pixel gradually stabilized at 2:5 m, which is quite amazing since the
elevations range from 7 m to 6,999 m.
Apart from the relative stable decreasing trend, we can see some sudden rises in the
gray error curve, which reect the adversarial nature of our model: when a local
optimum is reached whereby Gcannot further deceive D,Gwill jump out of the local
parameter space and attempt to nd a more optimal solution. However, these jump-out
attempts usually return worse results. The sub-plot in the upper right of Figure 4(a)
illustrates the variation in the binary cross entropy loss (BCELoss) for Dand G(Equations
(7) and (8)) throughout the training procedure. It can be observed that D(blue curve)
trends toward maximal confusion, with its loss approximating 0.5 and Gs loss (orange
curve) continually improving.
a
Real image x
b
Sampled image u
x
Figure 3. Illustration of a 10 10 uniform sampling conguration (u100) on a single-channel 32 32
DEM image. Elevations are represented by gray-scale colors so that the whiter a pixel is, the higher its
elevation (all DEM images shown in this article share the same colorbar if not indicated). Pixels in the
sampled image with null value are displayed in blue.
INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE 9
A more comprehensible visualization of the combat between Gand Dis shown in
Figure 4(b), where the result after each trained batch is drawn as a scatter point. The
color of a point represents the number of trained batches, with the xvalue being the
BCELoss of Dand the yvalue being the BCELoss of G. In this scatter plot, yellow points
roughly cluster in the small area, where BCELoss(D)0:1;0:5and BCELoss(G)3;5,
indicating that our proposed adversarial model tends to converge to its game equili-
brium during the training.
3.3. Validation of the trained generator
To demonstrate that our generator is not producing high-quality interpolation results by
simply over-tting or memorizing training samples, we apply the trained Gof the 10
10 uniform sampling conguration on the validation set with 12,000 DEM images
dierent from the training data.
By randomly choosing mini-batches of real DEM images xfrom the validation set, we
invoke the generator model Gevery 10 epochs during the training process, input the
sampled DEM images uðxÞinto Gand calculate the εbetween generated fake DEM
images GðuðxÞÞ and their corresponding x. The decreasing trend of the generators
average accuracy (Figure 5(a)) is similar to that of Figure 4(a), with no sign of over-
tting. A generator trained on 200 epochs can also achieve an interpolation error ε
2:5mon the validation DEM images collected in the same area.
Figure 5(b) displays the evolution of our generator regarding visual delity. We list the
generated fake images GðuðxÞÞ by epochs, 0, 10, 20, 50, 100, and 200, on the same real
image xas an example. It is interesting to see that the generator Gis not aware of any
knowledge at the very beginning, generating a noise image by epoch 000. Then, after 10
epochs of training, Gquickly learns some fundamental knowledge of spatial interpolation,
such as the basic mapping between elevations and colors as well as a coarse spatial
continuity, and can produce a blurry fake image based on the given observations. After
that, Ggradually achieves more accurate generation by attempting to add more terrain
Figure 4. Training details of CEDGAN with a 10 10 uniform sampling conguration. (a) Variation of
the model accuracy and the BCELoss for Gand Dduring the training procedure; early trainings with
ε>15 m are not shown in the plot. (b) Illustration of the adversarial game between Gand D, where
the model tends to converge to a game equilibrium during the training process.
10 D. ZHU ET AL.
details that seem to be correct, as we can see more valleys in the displayed images during
the evolution. Finally, by epoch 200, Gis capable of producing a high-quality fake image
that is almost visually indistinguishable from the real image. Moreover, in the lower-left
part of the real image, we can see two near-branches of the valley; however, no branching
can be identied in the fake image by epoch 200 (Figure 5(b)). The ultimate accuracy may
be limited by the spatial resolution of the given sampling conguration; further discussion
can be found in Section 3.4.
3.4. Dierent spatial sampling congurations
The proposed CEDGAN model requires a training process regarding each spatial sampling
conguration. In practice, typical scenarios that require spatial interpolation may have
fewer sampled locations, and the distribution of the sampled locations can be irregular.
To address these concerns, we change the sampling conguration in two respects: the ratio
of sampled locations under uniform sampling and random sampling to see how dierent
spatial sampling congurations will aect the performance of the model.
3.4.1. Ratio of sampled locations
We modify the ratio of the sampled locations, i.e. the number of sampled locations m
given a xed spatial image size, under the circumstances of systematic sampling (uni-
form sampling). Formally, ðci;rjÞis the coordinate of an observed value in an image of
size WH, which can be dened as
ci¼c1þði1ÞδW;rj¼r1þðj1ÞδH
"i;j¼1;;ffiffiffiffi
m
p;(10)
Figure 5. Generators performance on the validation set with a 10 10 uniform sampling cong-
uration (u100). (a) Average interpolation error for dierent epochs. (b) Visualization of the gener-
ated fake image GðuðxÞÞ for dierent epochs on the same sampled image uðxÞ.
INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE 11
where the initial sampled point is ðc1;r1Þ, the interval δW¼ðW1Þ=ðffiffiffiffi
m
p1Þ, and
δH¼ðH1Þ=ðffiffiffiffi
m
p1Þ. An illustration of the sampled images fðxÞusing dierent ratios
of uniform sampling on the 32 32 DEM image is shown in Figure 6.
We set the number of sampled location mto be 36, 49, 64, 81, 100, 121, 144, 169, and
196, and we train the corresponding CEDGAN-based interpolation model separately. The
training processes with dierent sampling ratios are illustrated in Figure 7. By the end of
training, each model exhibits a near-convergent status with dierent nal accuracies. As
the ratio of uniform sampled locations increases, the nal accuracy increases as well:
when m¼36, ε4:2m, while when m¼196, ε2:0m. In addition, we can see a more
(a) x(32 ×32) (b) m=36, δ6.20 (c) m=100, δ3.44 (d) m=196, δ2.07
Figure 6. Sampled images uðxÞusing dierent ratios of uniform sampling.
Figure 7. Training processes based on nine uniform sampling congurations with dierent sampled
location ratios.
12 D. ZHU ET AL.
unstable curve when increasing the sampling ratio, which indicates that the learning
ability of the generator Gin our model will become more dominant compared to that of
discriminator Dwhen more observed values are given, therein showing more attempts
to jump out of the local optimum of the parameter space.
Meanwhile, Figure 8 shows the evolution of εfor generators with dierent mon the
validation set. The decreasing trends of the interpolation error on the validation set are
similar to those of Figure 7 given dierent uniform sampling congurations, which
further demonstrates the usability of our model in common interpolation tasks. The
multiple generation processes on the validation set also prove that our model is not
trained to produce high-quality interpolation results by simply over-tting or memoriz-
ing training samples.
3.4.2. Random sampling
As for the random sampling r,wend the nal accuracy with m¼100 is similar to that
of a uniform sampling with m¼36, as both ε4:2m(Figure 9(a)). However, the
produced spatial pattern can be problematic when we randomly choose the sampled
locations for each input image, as shown in Figure 9(b). This is caused mainly by the
variation in inputs during the CEDGANs training.
If we undersample in some areas, the local spatial variation patterns may not be
captured. Oversampling, on the other hand, may result in redundant data. Figure 9(b)
displays the dierences between the interpolated results GðrðxÞÞ with some selected DEM
images x. It is interesting to nd that the CEDGAN-based interpolation method can
generate visually appealing fake images regardless of how the sampled locations are
distributed, even if the generated terrains in certain local areas may not be correct.
Actually, all interpolation methods suer from the inuence of improper spatial sampling
Figure 8. Generation process based on nine uniform sampling congurations with dierent sampled
location ratios.
INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE 13
conguration to some extent. Although our model with random sampling still performs
well in terms of accuracy, the problem of improper sampling cannot be fully addressed
since the interpolated result may not be correct in some local structural patterns.
4. Discussion
4.1. Comparison with benchmark interpolation methods
To show how the CEDGAN-based spatial interpolation method outperforms classic
interpolation methods, we choose the inverse distance weighted (IDW) interpolation
(Shepard 1968) and ordinary kriging (OK) (Matheron 1963, Cressie 1990) as benchmarks
to test our methods performance in terms of accuracy, computing speed, batch proces-
sing and visual delity. The CEDGAN-based model is implemented using PyTorch, a deep
learning framework in python with GPU acceleration. IDW and OK are also implemented
in PyTorch by converting all variables into tensors for GPU acceleration. In this way, all
reported results in this section are computed using the same NVIDIA 1080TI GPU and
can be compared.
The comparisons between CEDGAN and IDW are listed in Table 1, where all CEDGANs
are trained for 200 epochs and the distance decay parameter of IDW is set to 2.0. We
apply these two methods under dierent uniform sampling ratios, check their average
interpolation errors at the pixel level and record the corresponding computing speed.
The result shows that CEDGAN can achieve lower average errors (AE) compared to IDW.
For CEDGANs, the improvement in accuracy when increasing the sampling ratio is also
more signicant than with IDW. The computing speed (AS) of CEDGAN ( 1:5e3s)is
approximately 1000-times that of IDW ( 1:5s), and with increasing sampling ratio,
CEDGAN does not exhibit an obvious slow-down. Note that we do not take the average
training time (AT) into consideration when comparing the computing speed because, for
a pre-trained CEDGAN model, training is only ever performed once and can be done
beforehand. Given a mini-batch of 64 spatial images (32 32), the training time in our
experiment is shown in the second column of Table 1. AT increases as the sampling ratio
increases, mainly because we need time to sample from the real images. For the u100
Figure 9. Experiment based on a random sampling with 100 sampled locations.
14 D. ZHU ET AL.
sampling conguration, the total training time for 200 epochs is 45,840 seconds
(0.3056 150,000), approximately 12.7 hours.
The comparison in Table 1 does not consider ordinary kriging because kriging
methods are naturally not suitable for batch processing due to the problem of semi-
variogram tting. Among a batch of spatial images, the shapes of experimental semi-
variograms can vary signicantly from image to image; thus, single arbitrary tting curve
is insucient to capture the complex spatial structures, and it is dicult to determine
a prior tting function. Actually, the computing speed and pixel-level accuracy of OK are
both inferior to those of IDW when applied to batches of spatial images.
Figure 10 visualizes a batch of ground-truth DEM images and fake DEM images
generated by CEDGAN, IDW and OK under a 10 10 uniform sampling. Again, the
CEDGAN has been trained for 200 epochs, and the distance decay parameter of IDW is
set to 2.0. OK is implemented based on the PyKrige 1.3.2 package,
2
and we set the
tting curve to be spherical. The visual comparison between our method and bench-
mark methods is highly encouraging: given a relatively low sampling ratio ( <10%), both
Table 1. Comparisons of the CEDGAN-based and inverse distance weighted interpolation.
AE(m) AS(s)
fAT(s) CEDGAN IDW CEDGAN IDW
u36 0.1257 4.117 4.432 0.001471 1.397
u49 0.1620 3.433 3.539 0.001511 1.402
u64 0.2035 3.192 3.321 0.001611 1.428
u81 0.2445 2.693 3.160 0.001459 1.447
u100 0.3056 2.587 2.951 0.001448 1.408
u121 0.3629 2.455 2.794 0.001579 1.547
u144 0.4327 2.060 2.758 0.001588 1.582
u169 0.5006 2.156 2.708 0.001702 1.659
u196 0.5981 1.977 2.636 0.001659 1.711
fis the sampling conguration, AT is the average training time for a mini-batch in CEDGAN, AE is the
average interpolation error (ε) at the pixel level, and AS is the average time for interpolating
a mini-batch of spatial images (each mini-batch contains 64 spatial images).
Figure 10. Visual comparison of the interpolation results of CEDGAN, IDW and OK based on a 10
10 uniform sampling conguration. No data augmentation is applied to any DEM images to show
the dierence in terrains within a mini-batch; thus, the contrast ratio in some images may not be
high enough to be visible.
Note that it is actually unfair to compare dierent spatial interpolation methods under the same circumstances: Kriging
and IDW are powerful when we do not have training data; training-based methods, such as MPS (Mariethoz and Caers
2014), are powerful when we have already acquired a physical model for the spatial process; and a well-trained
CEDGAN can provide satisfactory results without prior domain knowledge. We treat each spatial interpolation method
given its corresponding advantage; one should choose the most suitable method in practice.
INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE 15
IDW and OK can only produce fake images that are very blurry (Figure 10(c) and 10(d)),
whereas CEDGAN can generate fake images (Figure 10(b)) that are very similar to the
real images (Figure 10(a)).
4.2. Investigation of the learned spatial knowledge
The reason why the pre-trained generator Goutperforms the benchmark spatial inter-
polation methods in both accuracy and visual delity (Figure 10) is that we train the
generator through an encoder-decoder structure that can capture local geographical
structure patterns underlying the spatial distribution dataset after multiple adversarial
learning processes. Basically, the encoder module in Glooks for the relationships among
the sampled locations, while the decoder module assembles structural spatial patterns
with the determined sampled locations and outputs a most-convincing spatial distribu-
tion. In the case of a DEM, the structure patterns may be valleys and ridges of various
morphometric types (Wang et al.2010). Thus, the generated fake spatial images can be
very similar to the real images because of these combined local patterns, and the
accuracy is guaranteed by a suspicious discriminator who makes judgements based on
the priori sampled images.
4.2.1. Visualization of feature maps in a pre-trained generator
To further understand what spatial knowledge the CEDGAN-based spatial interpolation
model has learned, we adopted a pre-trained generator to visualize the feature maps in
the hidden layers during the generation process. Eight typical DEM images with dierent
terrains are selected from the validation set. After a 10 10 uniform sampling, we input
the sampled DEM images into a pre-trained Gwith u100 and 200 epochs of training.
Here, we only display some representative feature maps captured in the rst hidden
layer (layer 1) and the last hidden layer (layer 5) of G(see Figure 1(b)). These two layers
belong to the encoder and decoder module, respectively, making their feature maps
worth investigation. In addition, since layer 1/layer 5 is the closest layer to the input/
output layer, it is easier to interpret its corresponding feature maps (Figure 11).
It can be observed that feature maps in layer 1 (fmð1Þ) aim at capturing the local
continuities around certain sampled locations as well as the relationships among
sampled locations, showing grid-style patterns with local hotspots and linear connec-
tions. After encoding and decoding, the feature maps in layer 5 (fmð5Þ) appear to have
captured some structural patterns related to dierent terrain features such as valleys and
mountains. More importantly, the 1st and 3rd feature maps (from the left) in fmð1Þare
very similar; however, their corresponding feature maps in fmð5Þare signicantly dier-
ent. A similar situation occurs to the 5th and 8th images. This phenomenon shows that
the pre-trained generator achieves good interpolation by learning many possible local
terrain patterns and that it somehow manages to merge these local patterns with
deterministicly sampled locations.
Therefore, instead of remembering training samples, our CEDGAN-based spatial
interpolation model captures complex spatial features underlying the given spatial
dataset and can perhaps be generalized into other domains with dierent distribution
16 D. ZHU ET AL.
patterns but similar deep spatial features (a further discussion about the models
generalization ability is given in Section 4.3).
4.2.2. Slope analysis
In addition, we investigate the relationships among local slopes and the interpolation
accuracy at the pixel level. The results are illustrated in Figure 12. For each pixel of
a DEM image, we calculate the plane slope of a 3 3 neighborhood around it (fewer
neighborhoods for edge pixels) using the average maximum technique introduced by
Burrough and Mcdonnel (1999). The lower the slope value, the atter the terrain, and the
Figure 11. Deep feature maps in a pre-trained generator (200 epochs) with a 10 10 uniform
sampling. Each image is visualized using an independent color scale.
Figure 12. Correlations between local slopes and the accuracy of CEDGAN-based spatial
interpolation.
INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE 17
elevations are considered more spatially continuous; the higher the slope value, the
steeper the terrain, and the elevations are less spatially continuous.
On the left side of Figure 12, we display the real DEM images (Real), the corresponding
slope images (Slope), the generated fake DEM images by u100 after 200 epochs (Fake), and
the error images (Error) for a mini-batch of the validation set. The variation pattern of the
error images seems to be correlated with that of the slope images. Then, we draw the
relationships between the slopes and errors in a kernel density estimation on the right side
of Figure 12. The slopes and errors are normalized to ½0;1for the visualization. The Pearson
correlation coecient ρ¼0:44 and the Spearman correlation coecient r¼0:52 indicate
a positive correlation between the local slope of terrains and the interpolation error.
Since the CEDGAN-based model is designed to capture spatial dependencies as basic
knowledge through the convolutional layers in Gs encoder-decoder structure, it is
naturally more dicult to predict values at locations where the spatial attribute values
vary too quickly. Slope analysis also shows that our model is designed to learn some
typical local spatial patterns of attributes (as shown in Section 4.2.1); thus, an abnormal
pattern, e.g. a very steep slope, is not easy to reproduce.
4.3. Potentials and limitations
4.3.1. Potential to apply pre-trained models across domains
Assuming the pre-trained CEDGAN model has learned enough DEM deep features because
our training dataset covers various terrains in mainland China, including plateau mountain
areas, basin areas, high altitude plains as well as river deltas (Section 3.1), we hope to answer
questions about how to solve problems in new domains through the transfer of the learned
spatial knowledge. Applying the pre-trained model outside our study area can help test the
models generalization ability. If the deep spatial features captured before are capable of
describing patterns in a new area, the generator should achieve satisfactory interpolation
results without any parameter ne tuning (Yosinski et al.2014).
We choose data from Florence, Italy as a case study to conduct the transfer experi-
ment. Florence is the capital of the Italian region of Tuscany. It lies in a basin formed by
hills surrounding and with several rivers owing through it. The elevations in this area
range from 22 m to 1,626 m, which is quite dierent from that of the selected areas in
China (7 m to 6,999 m). Meanwhile, the terrain of the Florence, Italy area can be
considered as a basin-mountain area with relative low altitude, which is not explicitly
given during the previous model training for China.
Figure 13 illustrates how we use a pre-trained generator for China with a 6 6
uniform sampling conguration (u36) and 200 epochs of training to interpolate the
DEM data of Florence, Italy. We cropped 3,000 real DEM images of size 1 32 32 using
the same method explained in Section 3.1. The overall interpolation accuracy reaches
approximately 9.1 m per pixel before any model ne tuning is performed. Some fake
DEM images are displayed to help understand the result.
The accuracy for Florence is not as high as the 4.2 m obtained with u36 in Figure 7,
although it is acceptable because the terrains of Florence are indeed very dierent from
the previous training set. The fake images in Figure 13 indicate that when transferring
the pre-trained model to a new domain, the generator can still generate realistic-looking
DEM images with similar local terrain structural patterns compared to the real images.
18 D. ZHU ET AL.
This experiment demonstrates that deep feature maps captured by our model for China
can be transferred to address new terrains in Florence. The pre-trained model can be applied
across domains if the spatial features in the new domain can be considered roughly similar to
those of the previous domain. However, if the features between two domains are too
dierent, e.g. transferring from a DEM dataset to a meteorological dataset, the pre-trained
model may need additional training data in the new domain to improve its performance.
Depending on the domains, a complete new training might be necessary if data are available.
4.3.2. Limitations and future directions
In this methodology-oriented paper, we use a large DEM dataset that contains various
ground-truth terrains to validate the feasibility and test the stability of our method.
Admittedly, spatial interpolation based on CEDGAN, as proposed in this research, has
some limitations that invite future works to investigate.
This adversarial deep learning framework requires some training data to capture the
complex spatial patterns in certain domains and perform interpolation based on this
learned knowledge. However, in most scenarios of traditional GIS that need spatial
interpolation, we often do not have access to sucient ground-truth data to train the
CEDGAN model. In Section 4.2, we investigated the learned spatial deep features of our
pre-trained model, and we tested its generalization ability to be transferred across
domains. In practice, spatial deep features may be highly dierent when we transfer
a pre-trained model in a domain with sucient spatial data into another domain with
little ground truth. It is basically impossible to expect the pre-trained model to perform
well in an unknown domain with no ne tuning of the models parameters.
As good spatial coverage of sampled data is essential to retaining local spatial
variabilities in spatial interpolation, a lower sampling density would cause a worse
interpolation result. This is a truth that cannot be overcome by existing methods. With
Figure 13. Using the pre-trained generator for China to interpolate data on Florence, Italy.
INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE 19
the emergence of geospatial big data, the acquisition of historical spatial distribution
datasets with very-high spatiotemporal resolution has become much easier. It is possible
to use these historical data as training sets and train our CEDGAN model to capture
spatial knowledge about geographical phenomena that are of interest, and thus repro-
duce spatial patterns with more realistic details.
For example, if we have the historical precipitation data of Nmeteorological obser-
vatories, it is practical to train a CEDGAN with only a small number of observatories as
the sampled locations. In this way, we can reduce the number of active observatories to
achieve cost savings. Moreover, if the captured deep spatial patterns of precipitation are
representative, we can directly transfer the pre-trained generator into a new area with
insucient meteorological observatories. Similarly, in a smart city, multiple types of
sensors are deployed with high spatial resolution to record the activities of urban
citizens. The CEDGAN-based spatial interpolation idea can help signicantly reduce the
number of sensors and contribute to the development of a smart city.
5. Conclusions
Deep learning approaches are increasingly used to understand spatial processes from
a data-driven perspective, as they are powerful in terms of their ability to extract underlying
patterns given complex spatial contexts. The remarkable characteristics of convolutional
neural networks local connectivity and shared weights enable the deep learning models
to better focus on both features near each other and far-away features and thus provide
a way to non-linearly approximate the complex functions describing spatial patterns.
Spatial interpolation is a family of geostatistical methods that attempts to capture the
spatial variation patterns underlying the observed limited spatial samples and make
a reasonable estimation of spatial patterns based on both spatial continuity and heteroge-
neity. Since the workow of spatial interpolation can be basically regarded as a generative
procedure, we demonstrate, for the rst time, the feasibility of spatial interpolation based on
a modern deep learning framework named conditional generative adversarial neural net-
works. We design a conditional encoder-decoder generative adversarial network (CEDGAN)
that can capture the complex properties of input spatial data distributions and perform spatial
interpolation tasks under dierent circumstances. A CEDGAN consists of a generator Gand
a discriminator D. The generator Gattempts to learn the relationships among sampled spatial
data and corresponding real spatial data, and it uses the learned spatial knowledge to
generate fake spatial data as accurately as possible. The discriminator Dcaptures the corre-
spondences among spatial data and their sampled data, with the objective of determining
whether the generated fake data from Gcan be considered correct.
A case study on terrain interpolation for China showed that the accuracy of the CEDGAN-
based method can achieve an error of approximately 2.5 meters per location even when the
sampling ratio is less than 10%.Dierent sampling congurations were adopted to test the
stability of our proposed method. The CEDGAN-based spatial interpolation outperforms
benchmark approaches, such as inverse distance weighted (IDW) interpolation and ordinary
kriging (OK), in terms of accuracy, batching capability, computing speed and visual delity.
In addition, multiple experiments were conducted to investigate the learned complex
spatial knowledge in pre-trained models, and we discussed the potential of generalizing
the CEDGAN-based spatial interpolation idea to a broader range of GIS applications.
20 D. ZHU ET AL.
Our work is a positive attempt to incorporate articial intelligence into discovering
deep spatial features of geographical patterns. We introduce the idea of using condi-
tional adversarial generation to model the workow of spatial interpolation and hope-
fully to enlighten future works concerning spatial prediction. With the rapid
development of big geo-data and articial intelligence, the CEDGAN framework can
potentially be adopted in various geographic applications that are related to spatial
estimation, including both natural phenomena (precipitation, air temperature, air pres-
sure, etc.) and socio-economic phenomena (population, poverty, trac, etc.).
Notes
1. METI of Japan and NASA released a second version of the Global Digital Elevation Model
(GDEM) from the Advanced Spaceborne Thermal Emission and Reection Radiometer
(ASTER) in mid-October, 2011 (https://lpdaac.usgs.gov/). GDEM V2 has an overall accuracy
of approximately 17 m at the 95% condence level, and we consider these data as the
ground-truth elevations in this work.
2. https://pypi.python.org/pypi/PyKrige.
Acknowledgments
The authors would like to thank Dr. Lei Dong, Dr. Michael Goodchild, Dr. Tao Cheng, Dr. Krzysztof
Janowicz, Dr. May Yuan and the anonymous referees for their insightful comments.
Disclosure statement
No potential conict of interest was reported by the authors.
Funding
This research was supported by the National Natural Science Foundation of China [41625003 and
41830645] and the National Key Research and Development Program of China [2017YFB0503602]
and the Open Project Fund of the institute for China Sustainable Urbanization, Tsinghua University
(TUCSU-K-17026-01).
Notes on contributors
Di Zhu received his B.S. in Geographic Information Systems from Peking University and a dual B.S.
in Economics also from Peking University. He is currently a PhD candidate at the Institute of
Remote Sensing and Geographical Information Systems, Peking University. His research interests
include geospatial modelling, social sensing and applied articial intelligence.
Ximeng Cheng received the B.S. and M.S. degrees from China University of Geosciences (Beijing).
He is currently a PhD candidate in GIScience at the Institute of Remote Sensing and
Geographical Information Systems, Peking University. His research interests include spatiotem-
poral data mining, deep learning and urban analysis etc.
Fan Zhang received his B.S. degree from Beijing Normal University, Zhuhai and M.Sc and
PhD degree from Chinese University of Hong Kong. He is currently a postdoctoral fellow at
Insitute of Remote Sensing and Geographical Information Systems, Peking University. His research
interests include spatiotemporal data mining, machine learning and computer vision.
INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE 21
Xin Yao received his B.S. degree from Wuhan University in 2015. He is currently pursuing the PhD
degree in GIScience with the Institute of Remote Sensing and Geographical Information Systems,
Peking University. His primary research interest lies in spatial data mining and geographic
information visualization.
Yong Gao received the B.S. degree from Beijing Normal University in 1997 and the M.S. and PhD
degrees from Peking University in 2000 and 2003, respectively. He is currently an Associate
Professor of GIScience with the Institute of Remote Sensing and Geographical Information
Systems, Peking University. His research interests lie in spatial data mining, geographic informa-
tion retrieval, and high-performance computing with geographical data.
Yu Liu received the B.S., M.S. and PhD degrees from Peking University in 1994, 1997 and 2003. He
is currently a professor at the Institute of Remote Sensing and Geographical Information Systems,
Peking University. His research interest mainly concentrates in humanities and social science based
on big geo-data.
ORCID
Di Zhu http://orcid.org/0000-0002-3237-6032
Ximeng Cheng http://orcid.org/0000-0001-9923-7240
Yu Liu http://orcid.org/0000-0002-0016-2902
References
Anselin, L., 1995. Local indicators of spatial associationLISA. Geographical Analysis, 27 (2), 93115.
doi:10.1111/j.1538-4632.1995.tb00338.x
Antipov, G., Baccouche, M., and Dugelay, J.L., 2017. Face aging with conditional generative
adversarial networks. In:2017 IEEE International Conference on Image Processing (ICIP),
20892093, Beijing, China.
Appelhans, T., et al., 2015. Evaluating machine learning approaches for the interpolation of
monthly air temperature at Mt. Kilimanjaro, Tanzania. Spatial Statistics, 14, 91113.
doi:10.1016/j.spasta.2015.05.008
Atkinson, P.M. and Lloyd, C.D., 2009. Geostatistics and spatial interpolation. In:The SAGE handbook
of spatial analysis. 159181. London, United Kingdom: SAGE Publications.
Azaele, S., et al., 2009. Predicting spatial similarity of freshwater sh biodiversity. Proceedings of the
National Academy of Sciences, 106 (17), 70587062. doi:10.1073/pnas.0805845106
Badrinarayanan, V., Kendall, A., and Cipolla, R., 2017. SegNet: a deep convolutional
encoder-decoder architecture for scene segmentation. IEEE Transactions on Pattern Analysis &
Machine Intelligence, 39 (12), 24812495.
Burrough, P.A. and Mcdonnel, R.A., 1999. Principles of geographical information systems - spatial
information systems and geostatistics. Landscape & Urban Planning, 15 (3), 357358.
Chen, Z., et al., 2016. Convolutional neural network based DEM super resolution. ISPRS -
International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences,
XLI-B3, 247250. doi:10.5194/isprsarchives-XLI-B3-247-2016
Cochran, W.G., 1963.Sampling techniques. Hoboken, New Jersey, US: Wiley.
Cressie, N., 1990. The origins of kriging. Mathematical Geology,22,239252. doi:10.1007/BF00889887
Diggle, P.J., Tawn, J.A., and Moyeed, R.A., 1998. Model-based geostatistics. Journal of the Royal
Statistical Society: Series C (Applied Statistics), 47 (3), 299350. doi:10.1111/1467-9876.00113
Fischer, M.M., 1998. Computational neural networks: a new paradigm for spatial analysis.
Environment and Planning A, 30 (10), 18731891. doi:10.1068/a301873
Fischer, M.M., Reismann, M., and And Scherngell, T., 2010. Spatial interaction and spatial auto-
correlation. In: L. Anselin and S.J. Rey, eds. Perspectives on spatial data analysis. Berlin,
Heidelberg: Springer Berlin Heidelberg, 6179. doi:10.1007/978-3-642-01976-0_5
22 D. ZHU ET AL.
Fotheringham, A.S. and Rogerson, P.A., 2008.The SAGE handbook of spatial analysis. London,
United Kingdom: SAGE Publications.
Fotheringham, A.S., Yang, W., and Kang, W., 2017. Multiscale geographically weighted regression
(MGWR). Annals of the American Association of Geographers, 107 (6), 12471265. doi:10.1080/
24694452.2017.1352480
Gauthier, J., 2014. Conditional generative adversarial nets for convolutional face generation. In:
Class project for Stanford CS231N: convolutional neural networks for visual recognition, Winter
semester. 5. Stanford, CA, US.
Goodchild, M.F., 2004. GIScience, geography, form, and process. Annals of the Association of
American Geographers, 94 (4), 709714.
Goodchild, M.F., Anselin, L., and Deichmann, U., 1993. A framework for the areal interpolation of
socioeconomic data. Environment & Planning A, 25 (3), 383397. doi:10.1068/a250383
Goodfellow, I.J., et al., 2014. Generative adversarial nets. In:Advances in Neural Information
Processing Systems, 26722680. Montréal, Canada.
Guã©Rin, E., et al., 2017. Interactive example-based terrain authoring with conditional generative
adversarial networks. Acm Transactions on Graphics, 36 (6), Article No. 228.
Hedayat, A. and Sinha, B.K., 1991.Design and inference in nite population sampling. Hoboken, New
Jersey, US: Wiley.
Hengl, T., Heuvelink, G.B., and Rossiter, D.G., 2007. About regression-kriging: from equations to
case studies. Computers & Geosciences, 33 (10), 13011315. doi:10.1016/j.cageo.2007.05.001
Hubert, L.J., Golledge, R.G., and Costanzo, C.M., 1981. Generalized procedures for evaluating spatial
autocorrelation. Geographical Analysis, 13 (3), 224233. doi:10.1111/j.1538-4632.1981.tb00731.x
Ioe, S. and Szegedy, C., 2015. Batch normalization: accelerating deep network training by
reducing internal covariate shift. International Conference on Machine Learning, 448456. Lille,
France.
Isola, P., et al., 2016. Image-to-image translation with conditional adversarial networks. arXiv
preprint,p. arXiv:1611.07004.
Kingma, D.P. and Welling, M., 2014. Auto-encoding variational bayes. In:International Conference
on Learning Representations (ICLR) 2014.Ban, Canada.
Laloy, E., et al., 2018. Trainingimage based geostatistical inversion using a spatial generative
adversarial neural network. Water Resources Research, 54, 381406. doi:10.1002/2017WR022148
Lam, N., 2009. Spatial interpolation. International Encyclopedia of Human Geography,10(2),369376.
Le, Q.V., 2013. Building high-level features using large scale unsupervised learning. In:IEEE
International Conference on Acoustics, Speech and Signal Processing, 85958598. Vancouver,
Canada.
LeCun, Y., Bengio, Y., and Hinton, G., 2015. Deep learning. Nature, 521 (7553), 436444.
doi:10.1038/nature14539
Li, J. and Heap, A.D., 2011. A review of comparative studies of spatial interpolation methods in
environmental sciences: performance and impact factors. Ecological Informatics, 6 (3), 228241.
doi:10.1016/j.ecoinf.2010.12.003
Li, L., Romary, T., and Caers, J., 2015. Universal kriging with training images. Spatial Statistics, 14,
240268. doi:10.1016/j.spasta.2015.04.004
Long, J., Shelhamer, E., and Darrell, T., 2015. Fully convolutional networks for semantic segmenta-
tion. In:Proceedings of the IEEE conference on computer vision and pattern recognition.
34313440. Boston, MA, US.
Lu, Y., Tai, Y.W., and Tang, C.K., 2017. Conditional CycleGAN for attribute guided face image
generation. arXiv preprint, p. arXiv:1705.09966.
Mansimov, E., Parisotto, E., Ba, J. L., & Salakhutdinov, R., 2015. Generating images from captions
with attention. arXiv preprint, p. arXiv:1511.02793.
Mariethoz, G. and Caers, J., 2014.Multiple-point geostatistics: stochastic modeling with training
images. Hoboken, New Jersey, US: Wiley.
Marsily, G.D., et al., 2005.Dealing with spatial heterogeneity. Hydrogeology Journal, 13 (1), 161183.
doi:10.1007/s10040-004-0432-3
INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE 23
Matheron, G., 1963. Principles of geostatistics. Economic Geology, 58 (8), 12461266. doi:10.2113/
gsecongeo.58.8.1246
Mirza, M. and Osindero, S., 2014. Conditional generative adversarial nets. arXiv preprint, p.
arXiv:1411.1784.
Nair, V. and Hinton, G.E., 2010. Rectied linear units improve restricted boltzmann machines.
International Conference on Machine Learning, 807814. Haifa, Israel.
Oliver, M.A. and Webster, R., 1990. Kriging: a method of interpolation for geographical information
systems. International Journal of Geographical Information Systems, 4 (3), 313332. doi:10.1080/
02693799008941549
Radford, A., Metz, L., and Chintala, S., 2015. Unsupervised representation learning with deep
convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434.
Ronneberger, O., Fischer, P., and Brox, T., 2015. U-Net: convolutional networks for biomedical
image segmentation. In:International Conference on Medical Image Computing and Computer-
Assisted Intervention, 234241. Munich, Germany.
Salimans, T., et al., 2016. Improved techniques for training GANs. In:Advances in Neural Information
Processing Systems. 22342242. Barcelona, Spain.
Schmidhuber, J., 2014. Deep learning in neural networks: an overview. Neural Networks, 61,
85117. doi:10.1016/j.neunet.2014.09.003
Shepard, D., 1968. A two-dimensional interpolation function for irregularly-spaced data. In:ACM
National Conference, 517524. New York, NY, US. doi:10.1055/s-0028-1105114
Thompson, S.K., 1996.Adaptive sampling. Hoboken, New Jersey, US: Wiley.
Tobler, W.R., 1970. A computer movie simulating urban growth in the Detroit region. Economic
Geography, 46, 234240. doi:10.2307/143141
Wang, D., et al., 2010. Morphometric characterisation of landform from DEMs. International Journal
of Geographical Information Science,24 (2), 305326. doi:10.1080/13658810802467969
Xu, B., et al., 2015. Empirical evaluation of rectied activations in convolutional network. arXiv
preprint, p. arXiv:1505.00853.
Yosinski, J., et al., 2014. How transferable are features in deep neural networks? In:Advances in
Neural Information Processing Systems. 33203328. Montréal, Canada.
Zhao, L., et al., 2019. Simultaneous color-depth super-resolution with conditional generative
adversarial networks. Pattern Recognition, 88, 356369. doi:10.1016/j.patcog.2018.11.028
Zhu, D., et al., 2018. Inferring spatial interaction patterns from sequential snapshots of spatial
distributions. International Journal of Geographical Information Science, 32 (4), 783805.
doi:10.1080/13658816.2017.1413192
24 D. ZHU ET AL.
... A study by Zhu [43] integrated convolutional neural networks, encoders, decoders, and conditional generative adversarial networks to create the conditional encoder-decoder generative adversarial neural network (CEDGAN) for spatial interpolation, addressing the issues mentioned above. Traditional CGANs primarily focus on feature labels, and the auxiliary information input into the model typically lacks spatial coupling information. ...
... The CEDGAN training parameters were set with reference to Zhu's research study [43], making appropriate modifications to suit the training set of this study. The network was trained with the mini-batch stochastic gradient descent (SGD) method [66], with a batch size of 64. ...
... [43]. ...
Article
Full-text available
Conducting precise evaluations and predictions of the environmental conditions for plant growth in green spaces is crucial for ensuring their health and sustainability. Yet, assessing the health of urban greenery and the plant growth environment represents a significant and complex challenge within the fields of urban planning and environmental management. This complexity arises from two main challenges: the limitations in acquiring high-density, high-precision data, and the difficulties traditional methods face in capturing and modeling the complex nonlinear relationships between environmental factors and plant growth. In light of the superior spatial interpolation capabilities of CEDGAN (conditional encoder–decoder generative adversarial neural network), notwithstanding its comparative lack of robustness across different subjects, and the excellent ability of FCNN (fully connected neural network) to fit multiple nonlinear equation models, we have developed two models based on these network structures. One model performs high-precision spatial attribute interpolation for urban green spaces, and the other predicts and evaluates the environmental conditions for plant growth within these areas. Our research has demonstrated that, following training with various samples, the CEDGAN network exhibits satisfactory performance in interpolating soil pH values, with an average pixel error below 0.03. This accuracy in predicting both spatial distribution and feature aspects improves with the increase in sample size and the number of controlled sampling points, offering an advanced method for high-precision spatial attribute interpolation in the planning and routine management of urban green spaces. Similarly, FCNN has shown commendable performance in predicting and evaluating plant growth environments, with prediction errors generally less than 0.1. Comparing different network structures, models with fewer hidden layers and nodes yielded superior training outcomes.
... To verify the validity of the methods in this paper, we compared the BICUCIC interpolation [54], Kriging [55], FSRCNN [56], EDSR [57], SRGAN [58] ,CEDGAN [59] and the DAT [60] methods with the EMASA-SR. Additionally, to demonstrate the effectiveness of the PPM and SKC modules, we included a control experiment with the MASA-SR [38] network that does not incorporate these two modules. ...
Article
Full-text available
Traditional methods for acquiring high-resolution Digital Elevation Models (DEMs) are costly and laborious. Deep learning-based image super-resolution (SR) offers a promising alternative, but requires substantial training data. High-resolution DEMs, however, are often scarcer than satellite images at the same resolution. Recognizing the strong correlation between DEM grayscale images and high-resolution satellite imagery, we propose a novel method called EMASA-SR: Enhanced DEM Image super-resolution Reconstruction using Texture Transfer. It leverages texture information from satellite images to enhance the resolution of low-resolution DEMs. We address limitations of existing texture transfer methods by integrating a pyramid pooling module (PPM) and selective kernel convolution (SKC) into the network. PPM strengthens feature extraction for complex terrain objects, while SKC minimizes texture loss and feature confusion. Our experiments used 10 m Sentinel-2 remote sensing images and AW3D30 DEM data to upscale 30 m DEMs to 10 m resolution. Validation with ground-truth elevation data and ICESat-2 laser altimetry data revealed significant improvements. Compared to the original DEM, EMASA-SR achieved a 21.42%-37.44% reduction in elevation RMSE and a 23.30%-38.99% decrease in MAE. Moreover, it outperformed other super-resolution methods, achieving a 2.87%-28.27% reduction in RMSE and a 7.83%-30.04/% decrease in MAE.
... Typically, acquiring high-precision DSMs requires multi-view high-resolution remote sensing images or Light Detection and Ranging (LiDAR) technology. However, these approaches incur significant expense in both office operations and fieldwork costs [3]. To reduce costs, super-resolution algorithms originally designed for natural images are employed on DSM/DEM data [4]. ...
Article
Digital Surface Models (DSMs) have numerous valuable applications in infrastructure and industrial development. However, the spatial resolutions of DSMs are often limited due to data acquisition constraints, resulting in potential inaccuracies in these applications. Recently, deep learning-based super-resolution algorithms have been utilized to enhance the accuracy of DSMs. Despite their success, these algorithms still possess limited accuracy and robustness. As such, a new DSM super-resolution algorithm named LIE-DSM is proposed, which leverages single remote sensing imagery (RSI) to enhance the accuracy of low-resolution DSM. Specifically, we introduce a dual-input neural network tailored to the characteristics of RSIs and DSMs to generate a high-resolution output. Experiments show that LIE-DSM attains outstanding performance in all metrics and the improvements exceeds 15% in high-ratio upsampling. Moreover, the visualized results showcase more accurate shapes, crisper edges, and a distribution closely resembling the ground truth.
... GeoAI has led to many successes and discoveries (Janowicz et al., 2020) and is being used by GIScientists for classification (i.e., Gomez et al., 2012), feature detection (i.e., Wang & Li, 2021), and interpolation (i.e., Zhu et al., 2020). Arguably, one of the biggest advantages of GeoAI is the ability to detect and predict future spatial patterns (i.e., Boulos et al., 2019). ...
Article
Digital Elevation Models (DEMs) are crucial for modeling and analyzing terrestrial environments, but voids in DEMs can compromise their downstream use. Diff-DEM is a self-supervised method for filling DEM voids that leverages a Denoising Diffusion Probabilistic Model (DDPM). Conditioned on a void-containing DEM, the DDPM acts as a transition kernel in the diffusion reversal, progressively reconstructing a sharp and accurate DEM. Both qualitative and quantitative assessments demonstrate Diff-DEM outperforms existing DEM inpainting, including Generative Adversarial Network (GAN) methods, Inverse Distance Weighting (IDW), Kriging, LR B-spline, and Perona-Malik diffusion. The comparison is on Gavriil’s and on our benchmark that expands Gavriil’s dataset from 63 to 217 full-size (5051 × 5051) 10-meter GeoTIFF images sourced from the Norwegian Mapping Authority; and from 50 DEMs to three groups of 1k each of increasing void size. Code and dataset: https://github.com/kylelo/Diff-DEM.
Article
Full-text available
Artificial neural networks (ANN), machine learning (ML), deep learning (DL), and ensemble learning (EL) are four outstanding approaches that enable algorithms to extract information from data and make predictions or decisions autonomously without the need for direct instructions. ANN, ML, DL, and EL models have found extensive application in predicting geotechnical and geoenvironmental parameters. This research aims to provide a comprehensive assessment of the applications of ANN, ML, DL, and EL in addressing forecasting within the field related to geotechnical engineering, including soil mechanics, foundation engineering, rock mechanics, environmental geotechnics, and transportation geotechnics. Previous studies have not collectively examined all four algorithms—ANN, ML, DL, and EL—and have not explored their advantages and disadvantages in the field of geotechnical engineering. This research aims to categorize and address this gap in the existing literature systematically. An extensive dataset of relevant research studies was gathered from the Web of Science and subjected to an analysis based on their approach, primary focus and objectives, year of publication, geographical distribution, and results. Additionally, this study included a co-occurrence keyword analysis that covered ANN, ML, DL, and EL techniques, systematic reviews, geotechnical engineering, and review articles that the data, sourced from the Scopus database through the Elsevier Journal, were then visualized using VOS Viewer for further examination. The results demonstrated that ANN is widely utilized despite the proven potential of ML, DL, and EL methods in geotechnical engineering due to the need for real-world laboratory data that civil and geotechnical engineers often encounter. However, when it comes to predicting behavior in geotechnical scenarios, EL techniques outperform all three other methods. Additionally, the techniques discussed here assist geotechnical engineering in understanding the benefits and disadvantages of ANN, ML, DL, and EL within the geo techniques area. This understanding enables geotechnical practitioners to select the most suitable techniques for creating a certainty and resilient ecosystem.
Article
The stationarity assumption of geostatistical methods is difficult to satisfy in practice. To overcome this limitation, this study proposed a geometric and statistical coupling strategy for modeling spatial dependence structures and developed a generalized Yang Chizhong filtering and interpolation (GYangCZ) method without the assumption of stationarity. In this work, we theoretically prove the effectiveness of Yang Chizhong filtering in fitting spatial dependence structures from a geometric perspective, and develop an orientation-constrained Yang Chizhong filtering to fit the local and discontinuous spatial dependence structures. To measure nonstationary spatial dependence structure, we define a local statistical indicator (i.e., fundamental variation function) by comparing the variance of the original data and the fitted geometric surfaces obtained under different filtering radii. The fundamental variation function is used as the kernel function to obtain the approximate best linear unbiased estimators at unobserved locations. We theoretically demonstrate that when only a linear drift exists in local areas, GYangCZ does not require the stationarity assumption. GYangCZ was used to estimate the gold grade of the Xiadian gold deposit in China. The results show that GYangCZ outperformed ordinary kriging, moving window kriging, and kriging convolution networks. GYangCZ is easy to implement with wide applications in geoscience.
Article
Full-text available
We present a novel and practical deep fully convolutional neural network architecture for semantic pixel-wise segmentation termed SegNet. This core trainable segmentation engine consists of an encoder network, a corresponding decoder network followed by a pixel-wise classification layer. The architecture of the encoder network is topologically identical to the 13 convolutional layers in the VGG16 network [1]. The role of the decoder network is to map the low resolution encoder feature maps to full input resolution feature maps for pixel-wise classification. The novelty of SegNet lies is in the manner in which the decoder upsamples its lower resolution input feature map(s). Specifically, the decoder uses pooling indices computed in the max-pooling step of the corresponding encoder to perform non-linear upsampling. This eliminates the need for learning to upsample. The upsampled maps are sparse and are then convolved with trainable filters to produce dense feature maps. We compare our proposed architecture with the widely adopted FCN [2] and also with the well known DeepLab-LargeFOV [3] , DeconvNet [4] architectures. This comparison reveals the memory versus accuracy trade-off involved in achieving good segmentation performance. SegNet was primarily motivated by scene understanding applications. Hence, it is designed to be efficient both in terms of memory and computational time during inference. It is also significantly smaller in the number of trainable parameters than other competing architectures and can be trained end-to-end using stochastic gradient descent. We also performed a controlled benchmark of SegNet and other architectures on both road scenes and SUN RGB-D indoor scene segmentation tasks. These quantitative assessments show that SegNet provides good performance with competitive inference time and most efficient inference memory-wise as compared to other architectures. We also provide a Caffe implementation of SegNet and a web demo at http://mi.eng.cam.ac.uk/projects/segnet/.
Article
Full-text available
Spatial interactions underlying consecutive sequential snapshots of spatial distributions, such as the migration flows underlying temporal population snapshots, can reflect the details of spatial evolution processes. In the era of big data, we have access to individual-level data, but the acquisition of high-quality spatial interaction data remains a challenging problem. Most research has been focused on distributions of movable objects or the modelling of spatial interaction patterns, with few attempts to identify hidden spatial interaction patterns from temporal transitions of spatial distributions. In this article, we introduced an approach to infer spatial interaction patterns from sequential snapshots of spatial population distributions by incorporating linear programming and the spatial constraints of human movement. Experiments using synthetic data were conducted using four simple scenarios to explore the characteristics of our method. The proposed method was used to extract interurban flows of migrants during the Chinese Spring Festival in 2016. Our research demonstrated the feasibility of using discrete multi-temporal snapshots of population distributions in space to infer spatial interaction patterns and offered a general analytical framework from snapshot data to spatial interaction patterns.
Article
Probabilistic inversion within a multiple-point statistics framework is often computationally prohibitive for high-dimensional problems. To partly address this, we introduce and evaluate a new training-image based inversion approach for complex geologic media. Our approach relies on a deep neural network of the generative adversarial network (GAN) type. After training using a training image (TI), our proposed spatial GAN (SGAN) can quickly generate 2D and 3D unconditional realizations. A key characteristic of our SGAN is that it defines a (very) low-dimensional parameterization, thereby allowing for efficient probabilistic inversion using state-of-the-art Markov chain Monte Carlo (MCMC) methods. In addition, available direct conditioning data can be incorporated within the inversion. Several 2D and 3D categorical TIs are first used to analyze the performance of our SGAN for unconditional geostatistical simulation. Training our deep network can take several hours. After training, realizations containing a few millions of pixels/voxels can be produced in a matter of seconds. This makes it especially useful for simulating many thousands of realizations (e.g., for MCMC inversion) as the relative cost of the training per realization diminishes with the considered number of realizations. Synthetic inversion case studies involving 2D steady-state flow and 3D transient hydraulic tomography with and without direct conditioning data are used to illustrate the effectiveness of our proposed SGAN-based inversion. For the 2D case, the inversion rapidly explores the posterior model distribution. For the 3D case, the inversion recovers model realizations that fit the data close to the target level and visually resemble the true model well.
Article
We propose a new framework for estimating generative models via an adversarial process, in which we simultaneously train two models: a generative model G that captures the data distribution, and a discriminative model D that estimates the probability that a sample came from the training data rather than G. The training procedure for G is to maximize the probability of D making a mistake. This framework corresponds to a minimax two-player game. In the space of arbitrary functions G and D, a unique solution exists, with G recovering the training data distribution and D equal to 1/2 everywhere. In the case where G and D are defined by multilayer perceptrons, the entire system can be trained with backpropagation. There is no need for any Markov chains or unrolled approximate inference networks during either training or generation of samples. Experiments demonstrate the potential of the framework through qualitative and quantitative evaluation of the generated samples.
Conference Paper
How can we perform efficient inference and learning in directed probabilistic models, in the presence of continuous latent variables with intractable posterior distributions, and large datasets? We introduce a stochastic variational inference and learning algorithm that scales to large datasets and, under some mild differentiability conditions, even works in the intractable case. Our contributions is two-fold. First, we show that a reparameterization of the variational lower bound yields a lower bound estimator that can be straightforwardly optimized using standard stochastic gradient methods. Second, we show that for i.i.d. datasets with continuous latent variables per datapoint, posterior inference can be made especially efficient by fitting an approximate inference model (also called a recognition model) to the intractable posterior using the proposed lower bound estimator. Theoretical advantages are reflected in experimental results.
Article
In this paper, color-depth conditional generative adversarial networks (CDcGAN) are proposed to resolve the problems of simultaneous color image super-resolution and depth image super-resolution in 3D videos. Firstly, a generative network is presented to leverage the mutual information of the low-resolution color image and low-resolution depth image so that they can enhance each other considering their geometric structural similarity in the same scene. Secondly, three auxiliary losses of data loss, total variation loss, and 8-connected gradient difference loss are introduced to train this generative network to ensure that the generated images are close to the real ones in addition to the adversarial loss. Finally, we study the CDcGAN and its variants. Experimental results show that the proposed approach can produce the high-quality color image and depth image from a pair of low-quality images, and it is superior to several other leading methods. Additionally, it has also been used to resolve the problems of concurrent image smoothing and edge detection, as well as the problem of HR-color-image guided depth super-resolution to show the effectiveness and universality of the proposed method.
Article
Scale is a fundamental geographic concept, and a substantial literature exists discussing the various roles that scale plays in different geographical contexts. Relatively little work exists, though, that provides a means of measuring the geographic scale over which different processes operate. Here we demonstrate how geographically weighted regression (GWR) can be adapted to provide such measures. GWR explores the potential spatial nonstationarity of relationships and provides a measure of the spatial scale at which processes operate through the determination of an optimal bandwidth. Classical GWR assumes that all of the processes being modeled operate at the same spatial scale, however. The work here relaxes this assumption by allowing different processes to operate at different spatial scales. This is achieved by deriving an optimal bandwidth vector in which each element indicates the spatial scale at which a particular process takes place. This new version of GWR is termed multiscale geographically weighted regression (MGWR), which is similar in intent to Bayesian nonseparable spatially varying coefficients (SVC) models, although potentially providing a more flexible and scalable framework in which to examine multiscale processes. Model calibration and bandwidth vector selection in MGWR are conducted using a back-fitting algorithm. We compare the performance of GWR and MGWR by applying both frameworks to two simulated data sets with known properties and to an empirical data set on Irish famine. Results indicate that MGWR not only is superior in replicating parameter surfaces with different levels of spatial heterogeneity but provides valuable information on the scale at which different processes operate.
Article
Probabilistic inversion within a multiple-point statistics framework is still computationally prohibitive for large-scale problems. To partly address this, we introduce and evaluate a new training-image based simulation and inversion approach for complex geologic media. Our approach relies on a deep neural network of the spatial generative adversarial network (SGAN) type. After training using a training image (TI), our proposed SGAN can quickly generate 2D and 3D unconditional realizations. A key feature of our SGAN is that it defines a (very) low-dimensional parameterization, thereby allowing for efficient probabilistic (or deterministic) inversion using state-of-the-art Markov chain Monte Carlo (MCMC) methods. A series of 2D and 3D categorical TIs is first used to analyze the performance of our SGAN for unconditional simulation. The speed at which realizations are generated makes it especially useful for simulating over large grids and/or from a complex multi-categorical TI. Subsequently, synthetic inversion case studies involving 2D steady-state flow and 3D transient hydraulic tomography are used to illustrate the effectiveness of our proposed SGAN-based probabilistic inversion. For the 2D case, the inversion rapidly explores the posterior model distribution. For the 3D case, the inversion recovers model realizations that fit the data close to the target level and visually resemble the true model well. Future work will focus on the inclusion of direct conditioning data and application to continuous TIs.
Article
State-of-the-art techniques in Generative Adversarial Networks (GANs) such as cycleGAN is able to learn the mapping of one image domain $X$ to another image domain $Y$ using unpaired image data. We extend the cycleGAN to ${\it Conditional}$ cycleGAN such that the mapping from $X$ to $Y$ is subjected to attribute condition $Z$. Using face image generation as an application example, where $X$ is a low resolution face image, $Y$ is a high resolution face image, and $Z$ is a set of attributes related to facial appearance (e.g. gender, hair color, smile), we present our method to incorporate $Z$ into the network, such that the hallucinated high resolution face image $Y'$ not only satisfies the low resolution constrain inherent in $X$, but also the attribute condition prescribed by $Z$. Using face feature vector extracted from face verification network as $Z$, we demonstrate the efficacy of our approach on identity-preserving face image super-resolution. Our approach is general and applicable to high-quality face image generation where specific facial attributes can be controlled easily in the automatically generated results.