Content uploaded by Di Zhu
Author content
All content in this area was uploaded by Di Zhu on Apr 22, 2019
Content may be subject to copyright.
Full Terms & Conditions of access and use can be found at
https://www.tandfonline.com/action/journalInformation?journalCode=tgis20
International Journal of Geographical Information
Science
ISSN: 1365-8816 (Print) 1362-3087 (Online) Journal homepage: https://www.tandfonline.com/loi/tgis20
Spatial interpolation using conditional generative
adversarial neural networks
Di Zhu, Ximeng Cheng, Fan Zhang, Xin Yao, Yong Gao & Yu Liu
To cite this article: Di Zhu, Ximeng Cheng, Fan Zhang, Xin Yao, Yong Gao & Yu Liu (2019):
Spatial interpolation using conditional generative adversarial neural networks, International Journal
of Geographical Information Science, DOI: 10.1080/13658816.2019.1599122
To link to this article: https://doi.org/10.1080/13658816.2019.1599122
Published online: 16 Apr 2019.
Submit your article to this journal
View Crossmark data
RESEARCH ARTICLE
Spatial interpolation using conditional generative
adversarial neural networks
Di Zhu
a,b,c
, Ximeng Cheng
a
, Fan Zhang
a,d
, Xin Yao
a
, Yong Gao
a
and Yu Liu
a
a
Institute of Remote Sensing and Geographical Information Systems, School of Earth and Space Sciences,
Peking University, Beijing, China;
b
Beijing Key Lab of Spatial Information Integration and Its Applications,
Peking University, Beijing, China;
c
SpaceTimeLab, Department of Civil, Environmental and Geomatic
Engineering, University College London, London, UK;
d
Senseable City Laboratory, Massachusetts Institute
of Technology, Cambridge, MA, USA
ABSTRACT
Spatial interpolation is a traditional geostatistical operation that
aims at predicting the attribute values of unobserved locations
given a sample of data defined on point supports. However, the
continuity and heterogeneity underlying spatial data are too com-
plex to be approximated by classic statistical models. Deep learning
models, especially the idea of conditional generative adversarial
networks (CGANs), provide us with a perspective for formalizing
spatial interpolation as a conditional generative task. In this article,
we design a novel deep learning architecture named conditional
encoder-decoder generative adversarial neural networks (CEDGANs)
for spatial interpolation, therein combining the encoder-decoder
structure with adversarial learning to capture deep representations
of sampled spatial data and their interactions with local structural
patterns. A case study on elevations in China demonstrates the
ability of our model to achieve outstanding interpolation results
compared to benchmark methods. Further experiments uncover the
learned spatial knowledge in the model’s hidden layers and test the
potential to generalize our adversarial interpolation idea across
domains. This work is an endeavor to investigate deep spatial
knowledge using artificial intelligence. The proposed model can
benefit practical scenarios and enlighten future research in various
geographical applications related to spatial prediction.
ARTICLE HISTORY
Received 18 April 2018
Accepted 20 March 2019
KEYWORDS
Spatial interpolation;
generative adversarial
networks; deep learning;
encoder-decoder; spatial
prediction
1. Introduction
When attempting to understand a geographical phenomenon, such as the spatial
distribution of precipitation, we are often forced to collect a limited number of samples
instead of acquiring information at every possible location (Cochran 1963, Hedayat and
Sinha 1991, Goodchild et al.1993, Thompson 1996, Fotheringham and Rogerson 2008).
Spatial interpolation is a traditional geostatistical operation that aims at predicting the
value zðxÞat an unobserved location xgiven some sampled data zðxÞat observed
locations x(Atkinson and Lloyd 2009, Lam 2009). Tobler’sfirst law (TFL) of geography
(Tobler 1970) describes the essential nature of the real world from a geographic view.
CONTACT Yu Liu liuyu@urban.pku.edu.cn
INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE
https://doi.org/10.1080/13658816.2019.1599122
© 2019 Informa UK Limited, trading as Taylor & Francis Group
Oliver and Webster (1990) further noted that spatially distributed data behave more like
random variables, where stochastic models are required to characterize the underlying
spatial autocorrelation (Hubert et al.1981, Azaele et al.2009, Fischer et al.2010) and
spatial non-stationarity (Anselin 1995, Marsily et al.2005, Fotheringham et al.2017).
The complex features of spatial distribution patterns necessitate the development of
interpolation methods, of which kriging (Matheron 1963, Cressie 1990, Li and Heap
2011) is the most commonly used geostatistical method and can be roughly divided into
two types that conceptually rely on different approaches to modeling the spatial
variability. The first type of methods, such as simple kriging (SK), ordinary kriging (OK)
and cokriging, characterizes the spatial structural features by estimating the semi-
variogram cloud, which is a plot of the semi-variances γðhÞfor paired data against the
distances hseparating the paired data points, and uses the fitted model to make spatial
estimations (Matheron 1963, Diggle et al.1998). The other type of methods, such as
regression kriging (RK) and universal kriging (UK) (Appelhans et al.2015,Liet al.2015),
makes predictions by combining a regression of the dependent variable on auxiliary
variables with the SK of the regression residuals (Hengl et al.2007), which further leads
to the training-based multi-point geostatistics (MPS)(Mariethoz and Caers 2014).
Despite the above-mentioned endeavours in spatial interpolation, we have to admit
that the nature of spatial continuity and heterogeneity in geographical digital represen-
tations (Goodchild 2004, Zhu et al.2018) is substantially more complex than classic
statistical models (Shepard 1968, Oliver and Webster 1990). In recent years, deep
learning approaches have been increasingly used to understand spatial processes from
a data-driven perspective, as they can well extract underlying patterns given complex
spatial contexts. Convolutional neural networks (CNNs) have been proven to be extre-
mely efficient for high-dimensional data representation and function approximation
(LeCun et al.2015). Through the backpropagation of gradients in the linear transform
layers combined with non-linear activations, these networks learn a way to transform
the input into an ideal output representation by capturing the deep features of gen-
eration as the high-dimensional parameters (Le 2013, Schmidhuber 2014). More impor-
tantly, the characteristics of the CNN’s architecture –local connectivity and shared
weights –enable the model to focus on features near to each other as well as far
away features, which is consistent with the function approximation objective in many
spatial analysis problems (Fischer 1998).
The workflow of spatial interpolation can be considered as a generative procedure:
only limited data on point supports (the space on which each observation is defined)
(Atkinson and Lloyd 2009) can be acquired. The objective is to generate an accurate
global mapping of the spatial phenomenon through learning of observed reciprocities
among location attributes. A deep learning framework named generative adversarial
networks (GANs) (Goodfellow et al.2014) was recently introduced as a powerful archi-
tecture for training generative models, therein sidestepping the difficulty of approximat-
ing many intractable probabilistic computations by adopting an adversarial structure to
train the loss (Radford et al.2015, Salimans et al.2016). Based upon the idea of GANs,
conditional generative adversarial networks (CGANs) is an extension of GANs that
enables us to direct the data generation process by conditioning the model on certain
external information (Mirza and Osindero 2014). The CGAN has been widely used in
various data generation applications such as image super-resolution (Chen et al.2016,
2D. ZHU ET AL.
Zhao et al.2019), image-to-image translation (Isola et al.2016), face generation (Antipov
et al.2017) and terrain reconstruction (Gurin et al.2017).
Previous research on CGANs mainly formalizes the deterministic conditions of the
generation as some loosely coupled auxiliary features with no spatial information, and
their objective is for the generator to create realistic-looking fake images that the
discriminator is unable to identify. For example, Antipov et al.(2017) successfully
simulated the face aging of people by using a random latent vector to represent
a person’s identity and a conditional age term to control the generation. The accuracy
of the generated fake images is often beyond the scope of consideration in related
state-of-the-art CGANs (Lu et al.2017, Laloy et al.2018).
In contrast, spatial interpolation requires an accurate estimation of the real spatial
pattern instead of simply a realistic-looking reproduction. Therefore, a spatial extension
of state-of-the-art deep learning structures is needed to bridge the gap between CGANs
and the task of spatial interpolation such that an accurate global estimation given
certain spatial sampled data can be achieved.
This article introduces a novel idea of using conditional generative adversarial net-
works to capture deep spatial features underlying spatial distribution datasets and to
perform spatial interpolation. To achieve this objective, we designed a deep learning
model named conditional encoder-decoder generative adversarial neural networks
(CEDGANs) with spatial consideration. Incorporating an encoder-decoder structure
with the idea of adversarial learning, the proposed model can learn the deep features
of input sampled spatial data and their complex interactions with local structural
patterns. A case study on the terrains in China demonstrates the ability of our model
to gain outstanding spatial interpolation results compared to benchmark methods.
Further experiments investigate the learned complex spatial knowledge and demon-
strate the potential of generalizing the CEDGAN-based spatial interpolation idea to more
geographical applications.
2. Methodology
Considering the gaps between spatial interpolation and common conditional generation
tasks, we need to explicitly consider both spatial structural patterns and interpolation
accuracies in the generative adversarial model. The proposed model is assumed to take
spatial sampled data as the only deterministic input (with no priori noise) and to
perform accurate generation using the knowledge captured during the adversarial
learning. For clarity, we will first briefly present the concept of GANs and the state-of-
the-art CGANs, and then, we will show how to construct the adversarial spatial inter-
polation structure using a restructured CGAN.
2.1. Generative adversarial networks
Basically, the GAN framework introduced by Goodfellow et al.(2014) consists of two
models ðG;DÞ:ageneratorGthat attempts to capture the data distribution and
a discriminator Destimating the probability that a sample comes from the real
dataset rather than G. To learn a generator distribution pgsimilar to the distribution
INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE 3
pdataðxÞof a dataset x,Gusually maps a noise vector zfrom the prior distribution
pzðzÞto the data space as GðzÞ. The discriminator Doutputs a single scalar represent-
ing the probability that the input data come from the training set rather than the
generated samples of G.
Gand Dare trained following a two-player minimax game so that the parameters θgof
Gare adjusted to maximally confuse the discriminator, i.e. minimizing logð1DðGðzÞÞÞ,
and the parameters θdof Dare adjusted to make the best judgement, i.e. maximizing
logDðxÞþlogð1DðGðzÞÞÞ. The objective function of the minimax game is
min
θg
max
θdðEx,pdataðxÞ½logDðxÞ þ Ez,pzðzÞ½logð1DðGðzÞÞÞÞ:(1)
The training of the adversarial network can be conducted through simultaneously
updating θdand θgby descending the stochastic gradient of logistic loss functions, i.e.
θd
1
2nP
n
i¼1½logð1DðxðiÞÞÞ þ logDðGðzðiÞÞÞ (2)
and
θg
1
nP
n
i¼1½logð1DðGðzðiÞÞÞÞ;(3)
respectively, where nis the number of samples in each data batch during training.
GANs can be extended to a conditional version named CGANs if both Gand Dare
conditioned on the same auxiliary information y, which can restrict Gin its generation
process and Din its discrimination process. In previous works, the prior input noise
vector zand the condition yhave been combined jointly as low-dimensional inputs for
Gto generate different random fake data under the same condition, while the discrimi-
nator receives x(or Gðz;yÞ) and yas inputs to make a determination based on ywithout
considering z(Gauthier 2014, Mirza and Osindero 2014, Antipov et al.2017). The
objective function of a CGAN is formalized as Equation (4):
min
θg
max
θdðEx,pdataðxÞ½logDðx;yÞ þ Ez,pzðzÞ½logð1DðGðz;yÞ;yÞÞÞ:(4)
2.2. Adversarial spatial interpolation using point supports as conditions
For spatial interpolation scenarios, however, the traditional adversarial strategy needs to
be modified to ensure the stability of conditional generations. Specifically, the random
noise vector zthat is commonly used to generate random data samples should be
removed such that the conditional generation could be considered to be determined by
the sampled data as the only constraint.
Let the data space V¼
ΔRCWH,whereWand Hrepresent the size of a spatial raster data
(spatial image) and Cis the number of data channels. A real spatial image is defined as
x2V. If the point supports (the space on which each observation is defined) of a sampling
configuration fon xwith msampled locations is f¼½ðc1;r1Þ;ðc2;r2Þ;;ðcm;rmÞ 2 R2m,
where ðck;rkÞis the coordinate of the kth observed point, we can formalize the sampled
spatial image fðxÞ2Vas
4D. ZHU ET AL.
fðxÞð:;i;jÞ:¼xð:;i;jÞifði;jÞ2f;
N=A otherwise:
(5)
When training an adversarial spatial interpolation network, we need a generator Gthat
requires the sampled image fðxÞas input and output a generated fake image GðfðxÞÞ 2
Vas close to the real image xas possible. In addition, a discriminator Dneeds to be
trained to distinguish the fake image GðfðxÞÞ from a real image xbased on the sampled
image fðxÞ. The objective function of adversarial spatial interpolation networks can be
defined as
min
θg
max
θdðEx,pdataðxjfðxÞÞ½logDðx;fðxÞÞ þ Ex,pdataðxjfðxÞÞ½logð1DðGðfðxÞÞ;fðxÞÞÞÞ;(6)
where Gis a differentiable function representing the generator’s structure with para-
meters θgand Dis a differentiable function representing the discriminator’s structure
with parameters θd.Gattempts to approximate a conditional probability distribution
pgðGðfðxÞÞjfðxÞÞ most similar to the conditional probability pdataðxjfðxÞÞ in the real
dataset, therein minimizing the second term of Equation (6). Meanwhile, Djudges
whether a spatial image came from pgðGðfðxÞÞjfðxÞÞ or pdataðxjfðxÞÞ, maximizing both
terms in Equation (6).
Compared with GANs and CGANs (see Equation (1) and (4)), both terms of Equation
(6) contain a spatial conditional data fðxÞdeduced from the training data instead of
some explicit auxiliary conditional data y. The most important thing is that our adver-
sarial spatial interpolation learning is designed to approximate the conditional genera-
tive probability distribution given spatial sampled images (pdataðxjfðxÞÞ) rather than the
probability distribution of data existence (pdataðxÞ).
By discarding the prior noise vector z, the adversarial network only takes a pre-
defined sampling configuration function fto direct the generation, with no random
feature affecting the result of the spatial interpolation such that the output can be stable
given a sampled image. The basic requirement for spatial interpolation is that we will
not obtain two different interpolated images given the same sampled image. However,
if the scenario changes to where we allow multiple results given the same sampled
image, Equation (6) is actually not contradictory to Equation (4), as we can add a term z
to allow variations in the output.
We rephrase Equation (6) in the form of a binary cross-entropy (BCL) loss function JðθÞ
for clarity. Given a mini-batch xðiÞ
n
i¼1of ntraining real spatial images, the loss function
for Dis defined to let Dassign a true label to real spatial images xðiÞbased on point
supports fðxðiÞÞand a false label to generated fake spatial images GðfðxðiÞÞÞ based on the
same fðxðiÞÞ:
JðθdÞ¼1
2nP
n
i¼1
logð1DðxðiÞ;fðxðiÞÞÞÞ þ P
n
i¼1
logDðGðfðxðiÞÞÞ;fðxðiÞÞÞ
:(7)
The loss function for Gis similar but relates only to the second term of Equation (6) and
attempts to trick D:
JðθgÞ¼1
nP
n
i¼1
logð1DðGðfðxðiÞÞÞ;fðxðiÞÞÞÞ:(8)
INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE 5
Then, we can minimize the loss function of the adversarial interpolation by simulta-
neously updating θdand θgusing stochastic gradient descent θg:¼θgαJðθgÞ
and θd:¼θdαJðθdÞ.
2.3. Conditional encoder-decoder generative adversarial networks for spatial
interpolation
In Section 2.2, we have defined the input and the training objective of an adversarial
spatial interpolation model. However, further considerations of how to capture local
spatial structural patterns and how to make the model trainable are not mentioned.
As for the training, Radford et al.(2015) introduced a class of stable architectures for
training GANs named deep convolutional GANs (DCGANs), where they replaced pooling
layers with strided convolutions in the discriminator and fractional-strided convolutions
in the generator to conserve the image continuity information (Kingma and Welling
2014, Mansimov et al.2015). However, the DCGANs’generator contains only a decoder
structure to generate images from noise, with no attempt to link deep features with
spatial constraints. Simultaneously, some encoder-decoder architectures, such as the
SegNet (Badrinarayanan et al.2017) and U-Net (Ronneberger et al.2015), that use an
encoder structure to obtain the deep feature maps from inputs and a decoder to
upsample the deep features into full-size image representations (Long et al.2015,
Isola et al.2016) can be adopted to design our generative model, which needs to
capture deep spatial representations.
Here, we propose a conditional encoder-decoder generative adversarial network
(CEDGAN) to model adversarial spatial interpolation. The main structure of CEDGAN is
illustrated in Figure 1(a). A CEDGAN consists of a generator Gand a discriminator D.G
attempts to learn the relationships between sampled spatial data and corresponding
real spatial data and to achieve the objective of generating as accurate as possible fake
spatial data. Dattempts to capture the correspondence between spatial data and the
sampled data, with the objective of determining whether the interpolated fake data can
be considered correct based on the limited samples.
In Figure 1(b), we display the details of Gand D. The generator Gis designed to be
a fully convolutional encoder-decoder structure that contains three two-dimensional
convolution layers as the encoder (convs 1, 2 and 3) and three two-dimensional
transposed convolution layers as the decoder (deconvs 1, 2 and 3). Each encoder layer
performs a zero-padding convolution with the given convolving kernel and stride
length. Each decoder layer implements the upsampling of the feature maps through
a fractionally strided transposed convolution with the same settings as that of the
encoder layers. The discriminator Dis a convolutional neural network similar to typical
models of image classification except that we use a concat operation to merge the
sampled data fðxÞand the full-size real data x(or fake data GðfðxÞÞ) as the input. Each
layer of Dperforms a zero-padding convolution with the same settings of the encoder
layers in G. The output of Dis a scalar indicating whether the input full-size image is
a correct interpolation.
Batch normalization (BN) (Ioffeand Szegedy 2015) is applied to all layers except for
the output layer of Gand the input/output layer of D. This can avoid model instability
and help gradients flow in the networks. The LeakyReLU activation (Xu et al.2015)is
6D. ZHU ET AL.
used after convolutions, and the ReLU activation (Nair and Hinton 2010) is used after
transposed convolutions. For the output layers of Gand D, we use the Tanh and Sigmoid
activation functions, respectively, according to Radford et al.(2015).
3. Experiments on spatial data: case of the DEM interpolation
3.1. Data descriptions
We use a dataset of digital elevation models (DEMs) in China as an example to test the
feasibility and effectiveness of the proposed CEDGAN model. However, we hope to
address problems not only in DEMs. The method can be applied to a broader range of
spatial data. We select DEMs as our case study simply because the ground-truth terrain
data can help us test the accuracy of interpolations and thus demonstrate the feasibility
of our adversarial model in capturing deep spatial features.
Four representative subregions in mainland China are selected as the ground truths,
including the Shannan area of the Tibetan Plateau, the Sichuan Basin, the Pearl River Delta,
and the Qinling Mountains. These regions consist of various terrains that have a diverse
range of altitude and hypsography. An overview of the study areas is illustrated in 2.The
GDEM Version2
1
for these areas are collected as the raw DEM dataset. After preprocessing,
single-channel DEM tiles (1 32 32) with no repetition are randomly cropped using
Monte Carlo simulation as the DEM images. To address the concerns of over-fitting and
Figure 1. An illustration of how a conditional encoder-decoder generative adversarial network
(CEDGAN) works for spatial interpolation. (a) The main loop of training, where real images and
fake images are discriminated by Dconditioned on the same sampled data, and the gradients of D‘s
output are used to update model’s parameters. (b) For G, sampled images fðxÞare encoded into
spatial feature maps, and then, fractionally strided convolutions upsample the deep features into
fake spatial images GðfðxÞÞ of full size. For the discriminator D, the real spatial images x(or fake
images GðfðxÞÞ) and the corresponding sampled images are merged as the input. The output of Dis
a scalar to determine a correct interpolation.
INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE 7
memorization of training samples, we acquire 60,000 DEM images in total, with 48,000
images composing the training set and 12,000 images composing the validation set. For
each subarea, there are 15,000 ground-truth images, of which 12,000 are for training and
3,000 are for validation. Each DEM image covers a 0:10:1geographic tile. The terrain
elevations in the dataset range from −7 m to 6,999 m. We first transform these images
linearly into float tensor images (½0:0;1:0); then, we normalize the tensor images to have
0.5 mean and 0.5 standard deviation (½1:0;1:0) for improved training efficiency. All
elevations are mapped back to their original values in the reported accuracies.
Noting that there are many indicators that can measure the performance of a spatial
interpolation method (Li and Heap 2011), we simply choose the root mean square error
(RMSE) to calculate the interpolation error εat the pixel level, as it requires minimal
auxiliary information to utilize:
ε¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
Pn
i¼1ðpioiÞ2
n
r;(9)
where nis the total number of pixels, pis the predicted value, and ois the observed value.
3.2. Adversarial training procedure
In this section, we show the results of spatial interpolation with a 10 10 uniform
sampling configuration u100 on 1 32 32 DEM images as an example. A 10 10
Figure 2. Study areas for the spatial interpolation of terrain elevations. Subregion Ais the Shannan
area of the Tibetan Plateau. Subregion Bis the Sichuan Basin. Subregion Cis the Pearl River Delta.
Subregion Dis the Qinling Mountains. We omit the north arrow and the map scale for simplicity.
8D. ZHU ET AL.
uniformly sampled DEM image contains the elevations of 100 locations that are evenly
distributed, and the other locations are null (Figure 3).
The network is trained using mini-batch stochastic gradient descent (SGD) with
a batch size of 64. The training dataset with 48,000 DEM images is randomly divided
into 750 batches, with each batch containing 64 images (dropping the last batch with
fewer than 64 images). Based on the parameter suggestions of Radford et al.(2015), for
layers with LeakyReLU activation, we set the slope of the leak to be 0.2. In addition, we
use the Adam optimizer, where β1¼0:5 and β2¼0:999, and the learning rate αfor
backpropagation is set to 0.0002. All gradients are computed using Equations (7) and (8).
The details of the adversarial training are shown in Figure 4. The evolution of our
model can be easily identified in the main plot of Figure 4(a), where the RMSE between
the generated fake data and real data are computed to plot the gray error curve. The
error curve shows that the accuracy of our model is evidently increasing during the first
60,000 batches (80 epochs) of training; however, after that, the improvement is not very
significant. We train on 150,000 batches (200 epochs) and find that the average inter-
polation error per pixel gradually stabilized at 2:5 m, which is quite amazing since the
elevations range from −7 m to 6,999 m.
Apart from the relative stable decreasing trend, we can see some sudden rises in the
gray error curve, which reflect the adversarial nature of our model: when a local
optimum is reached whereby Gcannot further deceive D,Gwill jump out of the local
parameter space and attempt to find a more optimal solution. However, these jump-out
attempts usually return worse results. The sub-plot in the upper right of Figure 4(a)
illustrates the variation in the binary cross entropy loss (BCELoss) for Dand G(Equations
(7) and (8)) throughout the training procedure. It can be observed that D(blue curve)
trends toward maximal confusion, with its loss approximating 0.5 and G‘s loss (orange
curve) continually improving.
(
a
)
Real image x
(
b
)
Sampled image u
(
x
)
Figure 3. Illustration of a 10 10 uniform sampling configuration (u100) on a single-channel 32 32
DEM image. Elevations are represented by gray-scale colors so that the whiter a pixel is, the higher its
elevation (all DEM images shown in this article share the same colorbar if not indicated). Pixels in the
sampled image with null value are displayed in blue.
INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE 9
A more comprehensible visualization of the combat between Gand Dis shown in
Figure 4(b), where the result after each trained batch is drawn as a scatter point. The
color of a point represents the number of trained batches, with the xvalue being the
BCELoss of Dand the yvalue being the BCELoss of G. In this scatter plot, yellow points
roughly cluster in the small area, where BCELoss(D)2½0:1;0:5and BCELoss(G)2½3;5,
indicating that our proposed adversarial model tends to converge to its game equili-
brium during the training.
3.3. Validation of the trained generator
To demonstrate that our generator is not producing high-quality interpolation results by
simply over-fitting or memorizing training samples, we apply the trained Gof the 10
10 uniform sampling configuration on the validation set with 12,000 DEM images
different from the training data.
By randomly choosing mini-batches of real DEM images xfrom the validation set, we
invoke the generator model Gevery 10 epochs during the training process, input the
sampled DEM images uðxÞinto Gand calculate the εbetween generated fake DEM
images GðuðxÞÞ and their corresponding x. The decreasing trend of the generator’s
average accuracy (Figure 5(a)) is similar to that of Figure 4(a), with no sign of over-
fitting. A generator trained on 200 epochs can also achieve an interpolation error ε
2:5mon the validation DEM images collected in the same area.
Figure 5(b) displays the evolution of our generator regarding visual fidelity. We list the
generated fake images GðuðxÞÞ by epochs, 0, 10, 20, 50, 100, and 200, on the same real
image xas an example. It is interesting to see that the generator Gis not aware of any
knowledge at the very beginning, generating a noise image by epoch 000. Then, after 10
epochs of training, Gquickly learns some fundamental knowledge of spatial interpolation,
such as the basic mapping between elevations and colors as well as a coarse spatial
continuity, and can produce a blurry fake image based on the given observations. After
that, Ggradually achieves more accurate generation by attempting to add more terrain
Figure 4. Training details of CEDGAN with a 10 10 uniform sampling configuration. (a) Variation of
the model accuracy and the BCELoss for Gand Dduring the training procedure; early trainings with
ε>15 m are not shown in the plot. (b) Illustration of the adversarial game between Gand D, where
the model tends to converge to a game equilibrium during the training process.
10 D. ZHU ET AL.
details that seem to be correct, as we can see more valleys in the displayed images during
the evolution. Finally, by epoch 200, Gis capable of producing a high-quality fake image
that is almost visually indistinguishable from the real image. Moreover, in the lower-left
part of the real image, we can see two near-branches of the valley; however, no branching
can be identified in the fake image by epoch 200 (Figure 5(b)). The ultimate accuracy may
be limited by the spatial resolution of the given sampling configuration; further discussion
can be found in Section 3.4.
3.4. Different spatial sampling configurations
The proposed CEDGAN model requires a training process regarding each spatial sampling
configuration. In practice, typical scenarios that require spatial interpolation may have
fewer sampled locations, and the distribution of the sampled locations can be irregular.
To address these concerns, we change the sampling configuration in two respects: the ratio
of sampled locations under uniform sampling and random sampling to see how different
spatial sampling configurations will affect the performance of the model.
3.4.1. Ratio of sampled locations
We modify the ratio of the sampled locations, i.e. the number of sampled locations m
given a fixed spatial image size, under the circumstances of systematic sampling (uni-
form sampling). Formally, ðci;rjÞis the coordinate of an observed value in an image of
size WH, which can be defined as
ci¼c1þði1ÞδW;rj¼r1þðj1ÞδH
"i;j¼1;;ffiffiffiffi
m
p;(10)
Figure 5. Generator’s performance on the validation set with a 10 10 uniform sampling config-
uration (u100). (a) Average interpolation error for different epochs. (b) Visualization of the gener-
ated fake image GðuðxÞÞ for different epochs on the same sampled image uðxÞ.
INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE 11
where the initial sampled point is ðc1;r1Þ, the interval δW¼ðW1Þ=ðffiffiffiffi
m
p1Þ, and
δH¼ðH1Þ=ðffiffiffiffi
m
p1Þ. An illustration of the sampled images fðxÞusing different ratios
of uniform sampling on the 32 32 DEM image is shown in Figure 6.
We set the number of sampled location mto be 36, 49, 64, 81, 100, 121, 144, 169, and
196, and we train the corresponding CEDGAN-based interpolation model separately. The
training processes with different sampling ratios are illustrated in Figure 7. By the end of
training, each model exhibits a near-convergent status with different final accuracies. As
the ratio of uniform sampled locations increases, the final accuracy increases as well:
when m¼36, ε4:2m, while when m¼196, ε2:0m. In addition, we can see a more
(a) x(32 ×32) (b) m=36, δ≈6.20 (c) m=100, δ≈3.44 (d) m=196, δ≈2.07
Figure 6. Sampled images uðxÞusing different ratios of uniform sampling.
Figure 7. Training processes based on nine uniform sampling configurations with different sampled
location ratios.
12 D. ZHU ET AL.
unstable curve when increasing the sampling ratio, which indicates that the learning
ability of the generator Gin our model will become more dominant compared to that of
discriminator Dwhen more observed values are given, therein showing more attempts
to jump out of the local optimum of the parameter space.
Meanwhile, Figure 8 shows the evolution of εfor generators with different mon the
validation set. The decreasing trends of the interpolation error on the validation set are
similar to those of Figure 7 given different uniform sampling configurations, which
further demonstrates the usability of our model in common interpolation tasks. The
multiple generation processes on the validation set also prove that our model is not
trained to produce high-quality interpolation results by simply over-fitting or memoriz-
ing training samples.
3.4.2. Random sampling
As for the random sampling r,wefind the final accuracy with m¼100 is similar to that
of a uniform sampling with m¼36, as both ε4:2m(Figure 9(a)). However, the
produced spatial pattern can be problematic when we randomly choose the sampled
locations for each input image, as shown in Figure 9(b). This is caused mainly by the
variation in inputs during the CEDGAN’s training.
If we undersample in some areas, the local spatial variation patterns may not be
captured. Oversampling, on the other hand, may result in redundant data. Figure 9(b)
displays the differences between the interpolated results GðrðxÞÞ with some selected DEM
images x. It is interesting to find that the CEDGAN-based interpolation method can
generate visually appealing fake images regardless of how the sampled locations are
distributed, even if the generated terrains in certain local areas may not be correct.
Actually, all interpolation methods suffer from the influence of improper spatial sampling
Figure 8. Generation process based on nine uniform sampling configurations with different sampled
location ratios.
INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE 13
configuration to some extent. Although our model with random sampling still performs
well in terms of accuracy, the problem of improper sampling cannot be fully addressed
since the interpolated result may not be correct in some local structural patterns.
4. Discussion
4.1. Comparison with benchmark interpolation methods
To show how the CEDGAN-based spatial interpolation method outperforms classic
interpolation methods, we choose the inverse distance weighted (IDW) interpolation
(Shepard 1968) and ordinary kriging (OK) (Matheron 1963, Cressie 1990) as benchmarks
to test our method’s performance in terms of accuracy, computing speed, batch proces-
sing and visual fidelity. The CEDGAN-based model is implemented using PyTorch, a deep
learning framework in python with GPU acceleration. IDW and OK are also implemented
in PyTorch by converting all variables into tensors for GPU acceleration. In this way, all
reported results in this section are computed using the same NVIDIA 1080TI GPU and
can be compared.
The comparisons between CEDGAN and IDW are listed in Table 1, where all CEDGANs
are trained for 200 epochs and the distance decay parameter of IDW is set to 2.0. We
apply these two methods under different uniform sampling ratios, check their average
interpolation errors at the pixel level and record the corresponding computing speed.
The result shows that CEDGAN can achieve lower average errors (AE) compared to IDW.
For CEDGANs, the improvement in accuracy when increasing the sampling ratio is also
more significant than with IDW. The computing speed (AS) of CEDGAN ( 1:5e3s)is
approximately 1000-times that of IDW ( 1:5s), and with increasing sampling ratio,
CEDGAN does not exhibit an obvious slow-down. Note that we do not take the average
training time (AT) into consideration when comparing the computing speed because, for
a pre-trained CEDGAN model, training is only ever performed once and can be done
beforehand. Given a mini-batch of 64 spatial images (32 32), the training time in our
experiment is shown in the second column of Table 1. AT increases as the sampling ratio
increases, mainly because we need time to sample from the real images. For the u100
Figure 9. Experiment based on a random sampling with 100 sampled locations.
14 D. ZHU ET AL.
sampling configuration, the total training time for 200 epochs is 45,840 seconds
(0.3056 150,000), approximately 12.7 hours.
The comparison in Table 1 does not consider ordinary kriging because kriging
methods are naturally not suitable for batch processing due to the problem of semi-
variogram fitting. Among a batch of spatial images, the shapes of experimental semi-
variograms can vary significantly from image to image; thus, single arbitrary fitting curve
is insufficient to capture the complex spatial structures, and it is difficult to determine
a prior fitting function. Actually, the computing speed and pixel-level accuracy of OK are
both inferior to those of IDW when applied to batches of spatial images.
Figure 10 visualizes a batch of ground-truth DEM images and fake DEM images
generated by CEDGAN, IDW and OK under a 10 10 uniform sampling. Again, the
CEDGAN has been trained for 200 epochs, and the distance decay parameter of IDW is
set to 2.0. OK is implemented based on the PyKrige 1.3.2 package,
2
and we set the
fitting curve to be spherical. The visual comparison between our method and bench-
mark methods is highly encouraging: given a relatively low sampling ratio ( <10%), both
Table 1. Comparisons of the CEDGAN-based and inverse distance weighted interpolation.
AE(m) AS(s)
fAT(s) CEDGAN IDW CEDGAN IDW
u36 0.1257 4.117 4.432 0.001471 1.397
u49 0.1620 3.433 3.539 0.001511 1.402
u64 0.2035 3.192 3.321 0.001611 1.428
u81 0.2445 2.693 3.160 0.001459 1.447
u100 0.3056 2.587 2.951 0.001448 1.408
u121 0.3629 2.455 2.794 0.001579 1.547
u144 0.4327 2.060 2.758 0.001588 1.582
u169 0.5006 2.156 2.708 0.001702 1.659
u196 0.5981 1.977 2.636 0.001659 1.711
fis the sampling configuration, AT is the average training time for a mini-batch in CEDGAN, AE is the
average interpolation error (ε) at the pixel level, and AS is the average time for interpolating
a mini-batch of spatial images (each mini-batch contains 64 spatial images).
Figure 10. Visual comparison of the interpolation results of CEDGAN, IDW and OK based on a 10
10 uniform sampling configuration. No data augmentation is applied to any DEM images to show
the difference in terrains within a mini-batch; thus, the contrast ratio in some images may not be
high enough to be visible.
Note that it is actually unfair to compare different spatial interpolation methods under the same circumstances: Kriging
and IDW are powerful when we do not have training data; training-based methods, such as MPS (Mariethoz and Caers
2014), are powerful when we have already acquired a physical model for the spatial process; and a well-trained
CEDGAN can provide satisfactory results without prior domain knowledge. We treat each spatial interpolation method
given its corresponding advantage; one should choose the most suitable method in practice.
INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE 15
IDW and OK can only produce fake images that are very blurry (Figure 10(c) and 10(d)),
whereas CEDGAN can generate fake images (Figure 10(b)) that are very similar to the
real images (Figure 10(a)).
4.2. Investigation of the learned spatial knowledge
The reason why the pre-trained generator Goutperforms the benchmark spatial inter-
polation methods in both accuracy and visual fidelity (Figure 10) is that we train the
generator through an encoder-decoder structure that can capture local geographical
structure patterns underlying the spatial distribution dataset after multiple adversarial
learning processes. Basically, the encoder module in Glooks for the relationships among
the sampled locations, while the decoder module assembles structural spatial patterns
with the determined sampled locations and outputs a most-convincing spatial distribu-
tion. In the case of a DEM, the structure patterns may be valleys and ridges of various
morphometric types (Wang et al.2010). Thus, the generated fake spatial images can be
very similar to the real images because of these combined local patterns, and the
accuracy is guaranteed by a suspicious discriminator who makes judgements based on
the priori sampled images.
4.2.1. Visualization of feature maps in a pre-trained generator
To further understand what spatial knowledge the CEDGAN-based spatial interpolation
model has learned, we adopted a pre-trained generator to visualize the feature maps in
the hidden layers during the generation process. Eight typical DEM images with different
terrains are selected from the validation set. After a 10 10 uniform sampling, we input
the sampled DEM images into a pre-trained Gwith u100 and 200 epochs of training.
Here, we only display some representative feature maps captured in the first hidden
layer (layer 1) and the last hidden layer (layer 5) of G(see Figure 1(b)). These two layers
belong to the encoder and decoder module, respectively, making their feature maps
worth investigation. In addition, since layer 1/layer 5 is the closest layer to the input/
output layer, it is easier to interpret its corresponding feature maps (Figure 11).
It can be observed that feature maps in layer 1 (fmð1Þ) aim at capturing the local
continuities around certain sampled locations as well as the relationships among
sampled locations, showing grid-style patterns with local hotspots and linear connec-
tions. After encoding and decoding, the feature maps in layer 5 (fmð5Þ) appear to have
captured some structural patterns related to different terrain features such as valleys and
mountains. More importantly, the 1st and 3rd feature maps (from the left) in fmð1Þare
very similar; however, their corresponding feature maps in fmð5Þare significantly differ-
ent. A similar situation occurs to the 5th and 8th images. This phenomenon shows that
the pre-trained generator achieves good interpolation by learning many possible local
terrain patterns and that it somehow manages to merge these local patterns with
deterministicly sampled locations.
Therefore, instead of remembering training samples, our CEDGAN-based spatial
interpolation model captures complex spatial features underlying the given spatial
dataset and can perhaps be generalized into other domains with different distribution
16 D. ZHU ET AL.
patterns but similar deep spatial features (a further discussion about the model’s
generalization ability is given in Section 4.3).
4.2.2. Slope analysis
In addition, we investigate the relationships among local slopes and the interpolation
accuracy at the pixel level. The results are illustrated in Figure 12. For each pixel of
a DEM image, we calculate the plane slope of a 3 3 neighborhood around it (fewer
neighborhoods for edge pixels) using the average maximum technique introduced by
Burrough and Mcdonnel (1999). The lower the slope value, the flatter the terrain, and the
Figure 11. Deep feature maps in a pre-trained generator (200 epochs) with a 10 10 uniform
sampling. Each image is visualized using an independent color scale.
Figure 12. Correlations between local slopes and the accuracy of CEDGAN-based spatial
interpolation.
INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE 17
elevations are considered more spatially continuous; the higher the slope value, the
steeper the terrain, and the elevations are less spatially continuous.
On the left side of Figure 12, we display the real DEM images (Real), the corresponding
slope images (Slope), the generated fake DEM images by u100 after 200 epochs (Fake), and
the error images (Error) for a mini-batch of the validation set. The variation pattern of the
error images seems to be correlated with that of the slope images. Then, we draw the
relationships between the slopes and errors in a kernel density estimation on the right side
of Figure 12. The slopes and errors are normalized to ½0;1for the visualization. The Pearson
correlation coefficient ρ¼0:44 and the Spearman correlation coefficient r¼0:52 indicate
a positive correlation between the local slope of terrains and the interpolation error.
Since the CEDGAN-based model is designed to capture spatial dependencies as basic
knowledge through the convolutional layers in G‘s encoder-decoder structure, it is
naturally more difficult to predict values at locations where the spatial attribute values
vary too quickly. Slope analysis also shows that our model is designed to learn some
typical local spatial patterns of attributes (as shown in Section 4.2.1); thus, an abnormal
pattern, e.g. a very steep slope, is not easy to reproduce.
4.3. Potentials and limitations
4.3.1. Potential to apply pre-trained models across domains
Assuming the pre-trained CEDGAN model has learned enough DEM deep features because
our training dataset covers various terrains in mainland China, including plateau mountain
areas, basin areas, high altitude plains as well as river deltas (Section 3.1), we hope to answer
questions about how to solve problems in new domains through the transfer of the learned
spatial knowledge. Applying the pre-trained model outside our study area can help test the
model’s generalization ability. If the deep spatial features captured before are capable of
describing patterns in a new area, the generator should achieve satisfactory interpolation
results without any parameter fine tuning (Yosinski et al.2014).
We choose data from Florence, Italy as a case study to conduct the transfer experi-
ment. Florence is the capital of the Italian region of Tuscany. It lies in a basin formed by
hills surrounding and with several rivers flowing through it. The elevations in this area
range from 22 m to 1,626 m, which is quite different from that of the selected areas in
China (−7 m to 6,999 m). Meanwhile, the terrain of the Florence, Italy area can be
considered as a basin-mountain area with relative low altitude, which is not explicitly
given during the previous model training for China.
Figure 13 illustrates how we use a pre-trained generator for China with a 6 6
uniform sampling configuration (u36) and 200 epochs of training to interpolate the
DEM data of Florence, Italy. We cropped 3,000 real DEM images of size 1 32 32 using
the same method explained in Section 3.1. The overall interpolation accuracy reaches
approximately 9.1 m per pixel before any model fine tuning is performed. Some fake
DEM images are displayed to help understand the result.
The accuracy for Florence is not as high as the 4.2 m obtained with u36 in Figure 7,
although it is acceptable because the terrains of Florence are indeed very different from
the previous training set. The fake images in Figure 13 indicate that when transferring
the pre-trained model to a new domain, the generator can still generate realistic-looking
DEM images with similar local terrain structural patterns compared to the real images.
18 D. ZHU ET AL.
This experiment demonstrates that deep feature maps captured by our model for China
can be transferred to address new terrains in Florence. The pre-trained model can be applied
across domains if the spatial features in the new domain can be considered roughly similar to
those of the previous domain. However, if the features between two domains are too
different, e.g. transferring from a DEM dataset to a meteorological dataset, the pre-trained
model may need additional training data in the new domain to improve its performance.
Depending on the domains, a complete new training might be necessary if data are available.
4.3.2. Limitations and future directions
In this methodology-oriented paper, we use a large DEM dataset that contains various
ground-truth terrains to validate the feasibility and test the stability of our method.
Admittedly, spatial interpolation based on CEDGAN, as proposed in this research, has
some limitations that invite future works to investigate.
This adversarial deep learning framework requires some training data to capture the
complex spatial patterns in certain domains and perform interpolation based on this
learned knowledge. However, in most scenarios of traditional GIS that need spatial
interpolation, we often do not have access to sufficient ground-truth data to train the
CEDGAN model. In Section 4.2, we investigated the learned spatial deep features of our
pre-trained model, and we tested its generalization ability to be transferred across
domains. In practice, spatial deep features may be highly different when we transfer
a pre-trained model in a domain with sufficient spatial data into another domain with
little ground truth. It is basically impossible to expect the pre-trained model to perform
well in an unknown domain with no fine tuning of the model’s parameters.
As good spatial coverage of sampled data is essential to retaining local spatial
variabilities in spatial interpolation, a lower sampling density would cause a worse
interpolation result. This is a truth that cannot be overcome by existing methods. With
Figure 13. Using the pre-trained generator for China to interpolate data on Florence, Italy.
INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE 19
the emergence of geospatial big data, the acquisition of historical spatial distribution
datasets with very-high spatiotemporal resolution has become much easier. It is possible
to use these historical data as training sets and train our CEDGAN model to capture
spatial knowledge about geographical phenomena that are of interest, and thus repro-
duce spatial patterns with more realistic details.
For example, if we have the historical precipitation data of Nmeteorological obser-
vatories, it is practical to train a CEDGAN with only a small number of observatories as
the sampled locations. In this way, we can reduce the number of active observatories to
achieve cost savings. Moreover, if the captured deep spatial patterns of precipitation are
representative, we can directly transfer the pre-trained generator into a new area with
insufficient meteorological observatories. Similarly, in a smart city, multiple types of
sensors are deployed with high spatial resolution to record the activities of urban
citizens. The CEDGAN-based spatial interpolation idea can help significantly reduce the
number of sensors and contribute to the development of a smart city.
5. Conclusions
Deep learning approaches are increasingly used to understand spatial processes from
a data-driven perspective, as they are powerful in terms of their ability to extract underlying
patterns given complex spatial contexts. The remarkable characteristics of convolutional
neural networks –local connectivity and shared weights –enable the deep learning models
to better focus on both features near each other and far-away features and thus provide
a way to non-linearly approximate the complex functions describing spatial patterns.
Spatial interpolation is a family of geostatistical methods that attempts to capture the
spatial variation patterns underlying the observed limited spatial samples and make
a reasonable estimation of spatial patterns based on both spatial continuity and heteroge-
neity. Since the workflow of spatial interpolation can be basically regarded as a generative
procedure, we demonstrate, for the first time, the feasibility of spatial interpolation based on
a modern deep learning framework named conditional generative adversarial neural net-
works. We design a conditional encoder-decoder generative adversarial network (CEDGAN)
that can capture the complex properties of input spatial data distributions and perform spatial
interpolation tasks under different circumstances. A CEDGAN consists of a generator Gand
a discriminator D. The generator Gattempts to learn the relationships among sampled spatial
data and corresponding real spatial data, and it uses the learned spatial knowledge to
generate fake spatial data as accurately as possible. The discriminator Dcaptures the corre-
spondences among spatial data and their sampled data, with the objective of determining
whether the generated fake data from Gcan be considered correct.
A case study on terrain interpolation for China showed that the accuracy of the CEDGAN-
based method can achieve an error of approximately 2.5 meters per location even when the
sampling ratio is less than 10%.Different sampling configurations were adopted to test the
stability of our proposed method. The CEDGAN-based spatial interpolation outperforms
benchmark approaches, such as inverse distance weighted (IDW) interpolation and ordinary
kriging (OK), in terms of accuracy, batching capability, computing speed and visual fidelity.
In addition, multiple experiments were conducted to investigate the learned complex
spatial knowledge in pre-trained models, and we discussed the potential of generalizing
the CEDGAN-based spatial interpolation idea to a broader range of GIS applications.
20 D. ZHU ET AL.
Our work is a positive attempt to incorporate artificial intelligence into discovering
deep spatial features of geographical patterns. We introduce the idea of using condi-
tional adversarial generation to model the workflow of spatial interpolation and hope-
fully to enlighten future works concerning spatial prediction. With the rapid
development of big geo-data and artificial intelligence, the CEDGAN framework can
potentially be adopted in various geographic applications that are related to spatial
estimation, including both natural phenomena (precipitation, air temperature, air pres-
sure, etc.) and socio-economic phenomena (population, poverty, traffic, etc.).
Notes
1. METI of Japan and NASA released a second version of the Global Digital Elevation Model
(GDEM) from the Advanced Spaceborne Thermal Emission and Reflection Radiometer
(ASTER) in mid-October, 2011 (https://lpdaac.usgs.gov/). GDEM V2 has an overall accuracy
of approximately 17 m at the 95% confidence level, and we consider these data as the
ground-truth elevations in this work.
2. https://pypi.python.org/pypi/PyKrige.
Acknowledgments
The authors would like to thank Dr. Lei Dong, Dr. Michael Goodchild, Dr. Tao Cheng, Dr. Krzysztof
Janowicz, Dr. May Yuan and the anonymous referees for their insightful comments.
Disclosure statement
No potential conflict of interest was reported by the authors.
Funding
This research was supported by the National Natural Science Foundation of China [41625003 and
41830645] and the National Key Research and Development Program of China [2017YFB0503602]
and the Open Project Fund of the institute for China Sustainable Urbanization, Tsinghua University
(TUCSU-K-17026-01).
Notes on contributors
Di Zhu received his B.S. in Geographic Information Systems from Peking University and a dual B.S.
in Economics also from Peking University. He is currently a PhD candidate at the Institute of
Remote Sensing and Geographical Information Systems, Peking University. His research interests
include geospatial modelling, social sensing and applied artificial intelligence.
Ximeng Cheng received the B.S. and M.S. degrees from China University of Geosciences (Beijing).
He is currently a PhD candidate in GIScience at the Institute of Remote Sensing and
Geographical Information Systems, Peking University. His research interests include spatiotem-
poral data mining, deep learning and urban analysis etc.
Fan Zhang received his B.S. degree from Beijing Normal University, Zhuhai and M.Sc and
PhD degree from Chinese University of Hong Kong. He is currently a postdoctoral fellow at
Insitute of Remote Sensing and Geographical Information Systems, Peking University. His research
interests include spatiotemporal data mining, machine learning and computer vision.
INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE 21
Xin Yao received his B.S. degree from Wuhan University in 2015. He is currently pursuing the PhD
degree in GIScience with the Institute of Remote Sensing and Geographical Information Systems,
Peking University. His primary research interest lies in spatial data mining and geographic
information visualization.
Yong Gao received the B.S. degree from Beijing Normal University in 1997 and the M.S. and PhD
degrees from Peking University in 2000 and 2003, respectively. He is currently an Associate
Professor of GIScience with the Institute of Remote Sensing and Geographical Information
Systems, Peking University. His research interests lie in spatial data mining, geographic informa-
tion retrieval, and high-performance computing with geographical data.
Yu Liu received the B.S., M.S. and PhD degrees from Peking University in 1994, 1997 and 2003. He
is currently a professor at the Institute of Remote Sensing and Geographical Information Systems,
Peking University. His research interest mainly concentrates in humanities and social science based
on big geo-data.
ORCID
Di Zhu http://orcid.org/0000-0002-3237-6032
Ximeng Cheng http://orcid.org/0000-0001-9923-7240
Yu Liu http://orcid.org/0000-0002-0016-2902
References
Anselin, L., 1995. Local indicators of spatial association–LISA. Geographical Analysis, 27 (2), 93–115.
doi:10.1111/j.1538-4632.1995.tb00338.x
Antipov, G., Baccouche, M., and Dugelay, J.L., 2017. Face aging with conditional generative
adversarial networks. In:2017 IEEE International Conference on Image Processing (ICIP),
2089–2093, Beijing, China.
Appelhans, T., et al., 2015. Evaluating machine learning approaches for the interpolation of
monthly air temperature at Mt. Kilimanjaro, Tanzania. Spatial Statistics, 14, 91–113.
doi:10.1016/j.spasta.2015.05.008
Atkinson, P.M. and Lloyd, C.D., 2009. Geostatistics and spatial interpolation. In:The SAGE handbook
of spatial analysis. 159–181. London, United Kingdom: SAGE Publications.
Azaele, S., et al., 2009. Predicting spatial similarity of freshwater fish biodiversity. Proceedings of the
National Academy of Sciences, 106 (17), 7058–7062. doi:10.1073/pnas.0805845106
Badrinarayanan, V., Kendall, A., and Cipolla, R., 2017. SegNet: a deep convolutional
encoder-decoder architecture for scene segmentation. IEEE Transactions on Pattern Analysis &
Machine Intelligence, 39 (12), 2481–2495.
Burrough, P.A. and Mcdonnel, R.A., 1999. Principles of geographical information systems - spatial
information systems and geostatistics. Landscape & Urban Planning, 15 (3), 357–358.
Chen, Z., et al., 2016. Convolutional neural network based DEM super resolution. ISPRS -
International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences,
XLI-B3, 247–250. doi:10.5194/isprsarchives-XLI-B3-247-2016
Cochran, W.G., 1963.Sampling techniques. Hoboken, New Jersey, US: Wiley.
Cressie, N., 1990. The origins of kriging. Mathematical Geology,22,239–252. doi:10.1007/BF00889887
Diggle, P.J., Tawn, J.A., and Moyeed, R.A., 1998. Model-based geostatistics. Journal of the Royal
Statistical Society: Series C (Applied Statistics), 47 (3), 299–350. doi:10.1111/1467-9876.00113
Fischer, M.M., 1998. Computational neural networks: a new paradigm for spatial analysis.
Environment and Planning A, 30 (10), 1873–1891. doi:10.1068/a301873
Fischer, M.M., Reismann, M., and And Scherngell, T., 2010. Spatial interaction and spatial auto-
correlation. In: L. Anselin and S.J. Rey, eds. Perspectives on spatial data analysis. Berlin,
Heidelberg: Springer Berlin Heidelberg, 61–79. doi:10.1007/978-3-642-01976-0_5
22 D. ZHU ET AL.
Fotheringham, A.S. and Rogerson, P.A., 2008.The SAGE handbook of spatial analysis. London,
United Kingdom: SAGE Publications.
Fotheringham, A.S., Yang, W., and Kang, W., 2017. Multiscale geographically weighted regression
(MGWR). Annals of the American Association of Geographers, 107 (6), 1247–1265. doi:10.1080/
24694452.2017.1352480
Gauthier, J., 2014. Conditional generative adversarial nets for convolutional face generation. In:
Class project for Stanford CS231N: convolutional neural networks for visual recognition, Winter
semester. 5. Stanford, CA, US.
Goodchild, M.F., 2004. GIScience, geography, form, and process. Annals of the Association of
American Geographers, 94 (4), 709714.
Goodchild, M.F., Anselin, L., and Deichmann, U., 1993. A framework for the areal interpolation of
socioeconomic data. Environment & Planning A, 25 (3), 383–397. doi:10.1068/a250383
Goodfellow, I.J., et al., 2014. Generative adversarial nets. In:Advances in Neural Information
Processing Systems, 2672–2680. Montréal, Canada.
Guã©Rin, E., et al., 2017. Interactive example-based terrain authoring with conditional generative
adversarial networks. Acm Transactions on Graphics, 36 (6), Article No. 228.
Hedayat, A. and Sinha, B.K., 1991.Design and inference in finite population sampling. Hoboken, New
Jersey, US: Wiley.
Hengl, T., Heuvelink, G.B., and Rossiter, D.G., 2007. About regression-kriging: from equations to
case studies. Computers & Geosciences, 33 (10), 1301–1315. doi:10.1016/j.cageo.2007.05.001
Hubert, L.J., Golledge, R.G., and Costanzo, C.M., 1981. Generalized procedures for evaluating spatial
autocorrelation. Geographical Analysis, 13 (3), 224–233. doi:10.1111/j.1538-4632.1981.tb00731.x
Ioffe, S. and Szegedy, C., 2015. Batch normalization: accelerating deep network training by
reducing internal covariate shift. International Conference on Machine Learning, 448–456. Lille,
France.
Isola, P., et al., 2016. Image-to-image translation with conditional adversarial networks. arXiv
preprint,p. arXiv:1611.07004.
Kingma, D.P. and Welling, M., 2014. Auto-encoding variational bayes. In:International Conference
on Learning Representations (ICLR) 2014.Banff, Canada.
Laloy, E., et al., 2018. Trainingimage based geostatistical inversion using a spatial generative
adversarial neural network. Water Resources Research, 54, 381–406. doi:10.1002/2017WR022148
Lam, N., 2009. Spatial interpolation. International Encyclopedia of Human Geography,10(2),369–376.
Le, Q.V., 2013. Building high-level features using large scale unsupervised learning. In:IEEE
International Conference on Acoustics, Speech and Signal Processing, 8595–8598. Vancouver,
Canada.
LeCun, Y., Bengio, Y., and Hinton, G., 2015. Deep learning. Nature, 521 (7553), 436–444.
doi:10.1038/nature14539
Li, J. and Heap, A.D., 2011. A review of comparative studies of spatial interpolation methods in
environmental sciences: performance and impact factors. Ecological Informatics, 6 (3), 228–241.
doi:10.1016/j.ecoinf.2010.12.003
Li, L., Romary, T., and Caers, J., 2015. Universal kriging with training images. Spatial Statistics, 14,
240–268. doi:10.1016/j.spasta.2015.04.004
Long, J., Shelhamer, E., and Darrell, T., 2015. Fully convolutional networks for semantic segmenta-
tion. In:Proceedings of the IEEE conference on computer vision and pattern recognition.
3431–3440. Boston, MA, US.
Lu, Y., Tai, Y.W., and Tang, C.K., 2017. Conditional CycleGAN for attribute guided face image
generation. arXiv preprint, p. arXiv:1705.09966.
Mansimov, E., Parisotto, E., Ba, J. L., & Salakhutdinov, R., 2015. Generating images from captions
with attention. arXiv preprint, p. arXiv:1511.02793.
Mariethoz, G. and Caers, J., 2014.Multiple-point geostatistics: stochastic modeling with training
images. Hoboken, New Jersey, US: Wiley.
Marsily, G.D., et al., 2005.Dealing with spatial heterogeneity. Hydrogeology Journal, 13 (1), 161–183.
doi:10.1007/s10040-004-0432-3
INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE 23
Matheron, G., 1963. Principles of geostatistics. Economic Geology, 58 (8), 1246–1266. doi:10.2113/
gsecongeo.58.8.1246
Mirza, M. and Osindero, S., 2014. Conditional generative adversarial nets. arXiv preprint, p.
arXiv:1411.1784.
Nair, V. and Hinton, G.E., 2010. Rectified linear units improve restricted boltzmann machines.
International Conference on Machine Learning, 807–814. Haifa, Israel.
Oliver, M.A. and Webster, R., 1990. Kriging: a method of interpolation for geographical information
systems. International Journal of Geographical Information Systems, 4 (3), 313–332. doi:10.1080/
02693799008941549
Radford, A., Metz, L., and Chintala, S., 2015. Unsupervised representation learning with deep
convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434.
Ronneberger, O., Fischer, P., and Brox, T., 2015. U-Net: convolutional networks for biomedical
image segmentation. In:International Conference on Medical Image Computing and Computer-
Assisted Intervention, 234–241. Munich, Germany.
Salimans, T., et al., 2016. Improved techniques for training GANs. In:Advances in Neural Information
Processing Systems. 2234–2242. Barcelona, Spain.
Schmidhuber, J., 2014. Deep learning in neural networks: an overview. Neural Networks, 61,
85–117. doi:10.1016/j.neunet.2014.09.003
Shepard, D., 1968. A two-dimensional interpolation function for irregularly-spaced data. In:ACM
National Conference, 517–524. New York, NY, US. doi:10.1055/s-0028-1105114
Thompson, S.K., 1996.Adaptive sampling. Hoboken, New Jersey, US: Wiley.
Tobler, W.R., 1970. A computer movie simulating urban growth in the Detroit region. Economic
Geography, 46, 234–240. doi:10.2307/143141
Wang, D., et al., 2010. Morphometric characterisation of landform from DEMs. International Journal
of Geographical Information Science,24 (2), 305–326. doi:10.1080/13658810802467969
Xu, B., et al., 2015. Empirical evaluation of rectified activations in convolutional network. arXiv
preprint, p. arXiv:1505.00853.
Yosinski, J., et al., 2014. How transferable are features in deep neural networks? In:Advances in
Neural Information Processing Systems. 3320–3328. Montréal, Canada.
Zhao, L., et al., 2019. Simultaneous color-depth super-resolution with conditional generative
adversarial networks. Pattern Recognition, 88, 356–369. doi:10.1016/j.patcog.2018.11.028
Zhu, D., et al., 2018. Inferring spatial interaction patterns from sequential snapshots of spatial
distributions. International Journal of Geographical Information Science, 32 (4), 783–805.
doi:10.1080/13658816.2017.1413192
24 D. ZHU ET AL.