ArticlePDF Available

Comparison between convolutional neural networks and random forest for local climate zone classification in mega urban areas using Landsat images

Authors:

Abstract and Figures

The Local Climate Zone (LCZ) scheme is a classification system providing a standardization framework to present the characteristics of urban forms and functions, especially for urban heat island (UHI) research. Landsat-based 100 m resolution LCZ maps have been classified by the World Urban Database and Portal Tool (WUDAPT) method using a random forest (RF) machine learning classifier. Some studies have proposed modified RF and convolutional neural network (CNN) approaches. This study aims to compare CNN with an RF classifier for LCZ mapping in great detail. We designed five schemes (three RF-based schemes (S1-S3) and two CNN-based ones (S4-S5)), which consist of various combinations of input features from bitemporal Landsat 8 data over four global mega cities: Rome, Hong Kong, Madrid, and Chicago. Among the five schemes, the CNN-based one with the incorporation of a larger neighborhood information showed the best classification performance. When compared to the WUDAPT workflow, the overall accuracies for entire land cover classes (OA) and for urban LCZ types (i.e., LCZ1-10; OA urb) increased by about 6-8% and 10-13%, respectively, for the four cities. The trans-ferability of LCZ models for the four cities were evaluated, showing that CNN consistently resulted in higher accuracy (increased by about 7-18% and 18-29% for OA and OA urb , respectively) than RF. This study revealed that the CNN classifier classified particularly well for the specific LCZ classes in which buildings were mixed with trees or buildings or plants were sparsely distributed. The research findings can provide a basis for guidance of future LCZ classification using deep learning.
Content may be subject to copyright.
Contents lists available at ScienceDirect
ISPRS Journal of Photogrammetry and Remote Sensing
journal homepage: www.elsevier.com/locate/isprsjprs
Comparison between convolutional neural networks and random forest for
local climate zone classification in mega urban areas using Landsat images
Cheolhee Yoo
a
, Daehyeon Han
a
, Jungho Im
a,
, Benjamin Bechtel
b
a
School of Urban and Environmental Engineering, Ulsan National Institute of Science and Technology (UNIST), Ulsan 44919, South Korea
b
Department of Geography, Ruhr-University Bochum, Bochum 44801, Germany
ARTICLE INFO
Keywords:
Local climate zone
Convolutional neural networks
Random forest
Urban climate
Landsat
ABSTRACT
The Local Climate Zone (LCZ) scheme is a classification system providing a standardization framework to present
the characteristics of urban forms and functions, especially for urban heat island (UHI) research. Landsat-based
100 m resolution LCZ maps have been classified by the World Urban Database and Portal Tool (WUDAPT)
method using a random forest (RF) machine learning classifier. Some studies have proposed modified RF and
convolutional neural network (CNN) approaches. This study aims to compare CNN with an RF classifier for LCZ
mapping in great detail. We designed five schemes (three RF-based schemes (S1–S3) and two CNN-based ones
(S4–S5)), which consist of various combinations of input features from bitemporal Landsat 8 data over four
global mega cities: Rome, Hong Kong, Madrid, and Chicago. Among the five schemes, the CNN-based one with
the incorporation of a larger neighborhood information showed the best classification performance. When
compared to the WUDAPT workflow, the overall accuracies for entire land cover classes (OA) and for urban LCZ
types (i.e., LCZ1-10; OA
urb
) increased by about 6–8% and 10–13%, respectively, for the four cities. The trans-
ferability of LCZ models for the four cities were evaluated, showing that CNN consistently resulted in higher
accuracy (increased by about 7–18% and 18–29% for OA and OA
urb
, respectively) than RF. This study revealed
that the CNN classifier classified particularly well for the specific LCZ classes in which buildings were mixed with
trees or buildings or plants were sparsely distributed. The research findings can provide a basis for guidance of
future LCZ classification using deep learning.
1. Introduction
Although the ratio of urban areas to global land surface is just 3%,
about 54% of the world's population live in urban centers; by 2050, that
number will increase to nearly 65% (Cohen, 2015). Urbanization results
in the increased absorption of solar radiation due to the expanded
impervious area, the reduced sky view factor due to the greater number
of (high-rise) buildings, and the release of artificial heat in the urban
canyon especially in mega cities (Barnes et al., 2001; Giridharan et al.,
2004; Han-qiu and Ben-qing, 2004; Rizwan et al., 2008). The urban
heat island phenomenon (UHI), that is urban areas are warmer than the
surrounding areas, is important these days as it interacts with other
urban climate problems, such as heat waves and air pollution (Founda
and Santamouris, 2017; Salata et al., 2017; Yadav et al., 2017;
Fallmann et al., 2016). Different types of UHIs need to be differentiated,
most importantly the surface temperature UHI (SUHI) and the air
temperature UHI in the canopy layer, which is from the ground to the
height of buildings.
Traditionally UHI studies analyze the temperature difference be-
tween urban and rural areas. These can be differentiated by satellite-
based land cover data based on specific class types (i.e., typical land
cover classification), is one of the possible solutions. Typical global land
cover data used in existing UHI studies include the 500 m resolution
MODIS land cover product (MCD12Q1) (Friedl et al., 2010), the 300 m
resolution GlobCover 2009 dataset produced by ESA (Bontemps et al.,
2011), and the Global Land Cover product (GLC or GlobalLand30)
produced by Chen et al. (2015) with Landsat data for 30 m resolution
(Mathew et al., 2018; Lauwaet et al., 2015; Liu et al., 2018b). However,
these products have only one urban land cover class: “urban and built-
up class” in MODIS, “artificial surfaces and associated areas” in Glob-
Cover 2009, and “artificial surfaces” in GlobalLand30. Stewart and Oke
(2012), however, explained that the thermal properties of urban areas
vary with the height and density of the buildings in them. Thus, there is
a limit to analyzing the detailed UHI characteristics of a city using
global land cover products that have a single urban class.
In fact, many countries have produced their own detailed land cover
https://doi.org/10.1016/j.isprsjprs.2019.09.009
Received 6 February 2019; Received in revised form 11 September 2019; Accepted 12 September 2019
Corresponding author.
E-mail address: ersgis@unist.ac.kr (J. Im).
ISPRS Journal of Photogrammetry and Remote Sensing 157 (2019) 155–170
0924-2716/ © 2019 Published by Elsevier B.V. on behalf of International Society for Photogrammetry and Remote Sensing, Inc. (ISPRS).
T
data with at least several urban type classes. The national land cover
product of the United States (NLCD2011), for example, has a total of 20
land cover classes and four of them are urban types based on the degree
of development (i.e., high intensity, medium intensity, low intensity,
and open space). The European CORINE (Co-ORdinated INformation on
the Environment) land cover has 11 urban-related classes in its level 3
product. The Urban Atlas product also provides high-resolution land
use maps of urban areas in European countries. Because urban classes
vary by product, the use of the urban classes for studying global heat
phenomena is relatively limited. Since the classification criteria of these
products, such as NLCD2011, CORINE and Urban Atlas, focus only on
the density of the impervious areas with consideration of land use in-
formation, the factors strongly linked to UHI—including the sky view
factor and building height to street width ratio—were barely considered
when the products were generated.
To overcome such an issue, researchers in the UHI field have de-
signed a classification system that well fits this purpose. Local Climate
Zone (LCZ) is a classification system designed by Stewart and Oke
(2012) especially for UHI research. The LCZ consists of 10 urban LCZ
types and 7 natural LCZ types. It has a culturally neutral framework
which is generic and easy to understand for global urban climate stu-
dies (Fig. 1). Bechtel et al. (2015) devised a World Urban Database and
Portal Tool (WUDAPT) method to construct a 100 m resolution pixel-
based LCZ map using Landsat 8 images. Landsat 8 is a polar orbiting
satellite sensor system that can capture global areas with a resolution of
30 m (for visible, NIR, and SWIR bands) to 100 m (for thermal bands)
every 16 days. The WUDAPT method resamples the Landsat image of
each city into 100 m resolution (i.e., using the zonal mean) to get the
spectral information of local-scale urban structures. Local experts with
deep knowledge of individual cities build LCZ reference polygons using
high resolution Google Earth images. These polygons are then con-
verted into 100 m resolution pixels and used for training and testing
LCZ classification models with Landsat images. WUDAPT uses random
forest (RF), a rule-based machine learning approach, for classification.
The LCZ maps of many cities all over the globe (about 90 cities as of
August 2018) have been built in this way and shared through the
WUDAPT portal (http://www.wudapt.org) (Bechtel et al., 2019).
The LCZ maps produced by the WUDAPT method have been used to
find several key parameters that affect UHI (Giridharan and Emmanuel,
2018; Kaloustian and Bechtel, 2016). Land surface temperature and air
temperature have been analyzed for LCZ classes (Beck et al., 2018;
Wang et al., 2018; Cai et al., 2018). Furthermore, the effect of re-
spiratory particulate matter on land surface temperature has been dis-
cussed using various LCZ classes (Ziaul and Pal, 2018). The WUDAPT-
based LCZ maps, however, are still limited in terms of classification
accuracy. The average Overall Accuracy (OA) of the 90 LCZs uploaded
on the WUDAPT portal is 74.5%, leaving much room for improvement.
In particular, the average OA of the urban LCZ types (OA
urb
) of the 90
LCZs is just 59.3%, which means that the urban LCZ types are not as
accurate as the other general natural LCZ types such as forest and
water. The low classification accuracy of urban features (i.e., urban LCZ
types) is a major limitation for urban climate-related research.
Therefore, the WUDAPT community has encouraged scientists to
explore various classification approaches to further improve LCZ clas-
sification (Yokoya et al., 2018). For example, Danylo et al. (2016)
added various spectral metrics (i.e., zonal maximum and minimum) to
the input variables of the RF classifier. Their OA improved by 2% when
compared with the traditional WUDAPT method for LCZ classification
in Kiev, Ukraine. Verdonck et al. (2017) extracted the spectral in-
formation (i.e., mean, minimum, maximum, median, and 25th and 75th
quantile values) of neighboring pixels through a moving window ap-
proach. These six new features were used as input variables in the RF
machine learning model. The OA of the LCZ classification of Antwerp,
Brussels, and Ghent in Belgium were improved by 7.9%, 13.0%, and
5.4%, respectively, when compared to the original WUDAPT method.
These studies improved LCZ classification by using additional input
variables in a way that got more spectral features on a contextual do-
main into the RF classifier.
In recent years, deep learning models which exploit many layers of
non-linear information have been widely used for image classification,
object segmentation, and text determination (Schmidhuber, 2015;
LeCun et al., 2015; Wang et al., 2012). Among various deep learning
models, Convolutional Neural Networks (CNN) has been shown to ex-
hibit high performance in image classification tasks (Krizhevsky et al.,
2012; Vedaldi and Lenc, 2015; Kim et al., 2018b). CNN, a feedforward
network with feature learning, extracts inherent spatial features at each
layer. Theoretically, CNN has the ability of self-study and in-depth
learning for feature extraction, weight sharing and dimension reduction
Fig. 1. The local climate zone (LCZ) types identified in urban climate research (from Bechtel et al., 2017 after Stewart and Oke, 2012), © CC-BY 4.0.
C. Yoo, et al. ISPRS Journal of Photogrammetry and Remote Sensing 157 (2019) 155–170
156
by combining a backpropagation mechanism and a gradient descent
optimization method. Back propagation gives an opportunity for
backward feedback to enhance the reliability, and the gradient descent
method is used in the self-training process.
Numerous studies have used CNN for land cover classification from
satellite images (Paoletti et al., 2018; Xu et al., 2018; Marcos et al.,
2018), including recent applications for LCZ classification. Sukhanov
et al. (2017) designed a multi-level ensemble model combining RF,
Gradient Boosting Machines, and a simple CNN with small input data
size (i.e., 3 × 3) to create LCZ maps, which was trained for five cites
(i.e., Berlin, Rome, Paris, Sao Paulo and Hong Kong) and then tested
over four different cities (i.e., Amsterdam, Chicago, Madrid and Xi’an).
Qiu et al. (2018) used a residual convolutional neural network (ResNet)
to conduct a systematic analysis of feature importance from multi-
source datasets for LCZ classification across 9 cities located in Europe.
Since RF is the most successfully used LCZ classifier so far, it is im-
portant to know the advantages and disadvantages of using CNN over
RF for LCZ classification. However, there has been minimum explora-
tion investigating LCZ classification performance between the CNN and
RF classifiers.
This study aims to compare CNN with the RF classifier for LCZ
classification. We designed five schemes, which consist of various
combinations of input data over four global mega cities: Rome, Hong
Kong, Madrid and Chicago. The objectives of this research were to: (1)
examine five schemes in order to identify the effect of CNN when
compared to other methods that employ RF classifiers, which were
proposed in previous studies; (2) investigate a specific set of LCZ classes
that produce high classification accuracies; (3) compare the LCZ map
generated from two different types of classifiers with reference data;
and (4) discuss the research direction of improving local climate zone
classification methods for future use.
2. Study area and data
2.1. Study area
Rome, Hong Kong, Madrid, and Chicago were selected as our study
areas (Fig. 2). These four cities represent various climatic (Table 1) and
geographic characteristics. In addition, their urban structure differs,
which enables us to verify the robustness of the proposed approaches.
Rome, the capital city of Italy, is in the midwestern region of the
Italian peninsula, and the center of the city is about 24 km inland from
the Mediterranean Sea. Rome has about 2.9 million residents living
within an area of 1,285 km
2
, making it Italy’s largest and most populous
city. The city has a monocentric urban structure with increasing den-
sities toward the city center.
Hong Kong is located on the southern coast of China. The city covers
about 1,104 km
2
of land, with 7.4 million residents. Hong Kong is
known for its unique urban form and high-density land use. Most areas
of the city are hilly, and just under a quarter of the study domain is
habitable (i.e., built-up area).
Madrid is the capital city of Spain, a densely populated metropolis
located in a relatively flat area lying in the center of the southern
Meseta of the Iberian Peninsula. Madrid is the largest city in Spain, with
3.2 million residents living in 604 km
2
. We selected the study region
covering the Madrid metropolitan area, comprising monocentric
Madrid and its surrounding municipalities called autonomous com-
munities.
Chicago is the third largest city in the US, situated beside the huge
Lake Michigan in Illinois. The city of Chicago has about 2.7 million
residents in an area of 606 km
2
. Chicago tends to have a regularly
shaped street pattern and city blocks based on their grid plan (Ellickson,
2012). We selected the study region that includes the Chicago me-
tropolitan area, comprising the city and its suburbs. The high-density
urban center is located in the city of Chicago, while low-density sub-
urban areas surrounding it.
2.2. Satellite input data
Two Landsat 8 images of different seasons for each city were
downloaded from the US Geological Survey Earth Explorer site
(https://earthexplorer.usgs.gov). The acquisition dates with clear sky
conditions for the Landsat data are presented in Table 2. We chose two
scenes per city close to summer and winter to consider seasonal effects,
such as the phenology of vegetation, and to increase classification ac-
curacy, as found by Bechtel et al. (2015). All Landsat images were first
clipped covering each city and then atmospheric-corrected into scaled
reflectance data using ENVI Fast Line-of-sight Atmospheric Analysis of
Hypercubes (FLAASH). Nine of the 11 bands (bands 1–7, 10, and 11) in
each Landsat 8 scene were used as input data. Bands 1–7 were the 30 m
resolution Operational Land Imager (OLI) spectral bands, and bands 10
and 11 were 30 m resolution thermal bands interpolated from 100 m
resolution data collected from Thermal Infrared Sensor (TIRS).
2.3. Reference data
LCZ reference data for the four cities are available from the 2017
IEEE GRSS data fusion contest organized by the Image Analysis and
Data Fusion Technical Committee, in collaboration with WUDAPT and
GeoWiki (Fig. 2). These data were extracted from the WUDAPT data-
base and further revised to be as accurate as possible (Tuia et al., 2017;
Yokoya et al, 2018). Due to unique urban structures and compositions,
the number of LCZ classes differs from city to city. Rome has 10 LCZ
classes (6 urban LCZ types and 4 natural LCZ types); Hong Kong has 13
LCZ classes (8 urban LCZ types and 5 natural LCZ types); Madrid has 14
LCZ classes (7 urban LCZ types and 7 natural LCZ types); and Chicago
has 15 LCZ classes (9 urban LCZ types and 6 natural LCZ types). In
addition, the number of polygons digitized for each LCZ class differs
between both classes and cities. The polygons of each LCZ class were
randomly divided into two parts: the first for training the models and
the other for testing them. We tried to equally divide the polygons into
the two sets, considering both the number of polygons and the number
of 100 m resolution LCZ pixels within each polygon. It is well known
that if the training and validation sample pixels share the same poly-
gons, the classification accuracy can be inflated (Zhen et al., 2013).
Some LCZ classes in each city, however, form a small number of
polygons (fewer than 3), because the classes were not widely dis-
tributed within the city. Dividing these small numbers of polygons into
two sets would make the models poorly trained. Therefore, we labeled
these classes “red-star class”. For the red-star classes, two sets were
created by dividing the number of pixels of each polygon into two
groups through a random sampling approach. The number of polygons
and pixels of the two sets for each LCZ class for the four cities are shown
in Table 3.
The Global Man-made Impervious Surface (GMIS) data were used to
analyze the LCZ maps generated for each city. GMIS provides the 30 m
resolution global fractional impervious cover for the year of 2010,
which were derived from Landsat data (de Colstoun et al., 2017). To
identify the medium-to-high density developed areas, we extracted the
GMIS pixels which have an impervious fraction over 70% within the
study domain for each city.
3. Methods
3.1. Random forest (RF) classifier
RF has been widely used in the remote sensing field for both clas-
sification (Sim et al., 2018; Park et al., 2018; Li et al., 2013) and re-
gression (Lee et al., 2018; Yoo et al., 2018; Richardson et al., 2017). RF
is an algorithm based on classification and regression trees (CART),
which uses a recursive binary split method to reach final nodes in a tree
structure (Breiman, 2001). RF produces numerous independent trees
with randomly selected subsets through bootstrapping from training
C. Yoo, et al. ISPRS Journal of Photogrammetry and Remote Sensing 157 (2019) 155–170
157
samples and from input variables at every node of a tree. To achieve a
final decision, RF adopts an ensemble approach from numerous trees
through majority voting for classification.
In this study, the RF was implemented using a random forest
package provided in R software (https://www.r-project.org/). All
parameters except for the number of trees were set as the default values
provided by the package (i.e., the number of training samples for each
tree was 66.7% of the entire training samples, the number of randomly
sampled variables as candidates at each split was the square root of the
number of input variables, and the minimum size of the terminal node
was 1). The number of trees (i.e., ntree) was selected at the modeling
process.
3.2. Convolutional neural networks (CNN) classifier
CNN is a kind of artificial neural network and basically consists of
convolutional layers, pooling layers, and fully connected layers. When
compared to typical neural networks, the distinguishing feature of CNN
is its use of convolutional layers. With the 3-dimensional input data
(width, height, and channel), the output of a convolutional layer is
transmitted to the next layer keeping the same 3-dimensional shape.
The input and output data of the convolutional layer are called feature
maps. The convolution is performed with several filters (or kernels)
over the input feature maps. Each moving filter sweeps the input fea-
ture maps conducting a dot-product with corresponding elements of the
input feature maps, and then the total sum is obtained. The depth of the
output feature maps is no longer the number of channels but the
number of filters. For example, when 32 filters are used in the first
convolutional layer, the output feature map has a depth of 32 regardless
of the number of channels in the input feature map.
Convolution reduces the size of the output feature maps. To prevent
this, padding is widely used. Padding refers to filling the input feature
maps with a specific value before doing the convolution. Padding is
mainly used to adjust the spatial size of the output feature maps. The
value to be filled can be determined according to the model, but zero-
Fig. 2. Study area and Local Climate Zone (LCZ) reference data with legends.
Table 1
The climatic characteristics of the cities. The classes in parentheses correspond to the Köppen-Geiger climate classification (Peel et al., 2007).
City Description of climate
Rome Mediterranean climate with dry summers and cool, humid winters (Csa)
Hong Kong Humid subtropical climate with a hot and humid summer (Cfa)
Madrid Inland Mediterranean climate, transitioning to a semi-arid climate in the eastern part of the city (Csa)
Chicago Hot humid continental climate with distinct seasons such as warm to hot and humid summers and cold, snowy winters (Dfa)
Table 2
Selected winter and summer Landsat 8 scenes for each city.
Scene 1 Scene 2
Rome January 11, 2017 August 23, 2017
Hong Kong February 12, 2018 October 23, 2017
Madrid January 12, 2015 August 13, 2017
Chicago February 03, 2017 September 12, 2016
C. Yoo, et al. ISPRS Journal of Photogrammetry and Remote Sensing 157 (2019) 155–170
158
padding is widely used in various applications. Once feature maps are
extracted through the convolutional layers, generally sub-sampling is
conducted to reduce the size of data. This downsampling process is
called as pooling. Pooling locally summarizes the output of the previous
layer making translation invariance, which focuses on the presence of
the feature rather than the location (Goodfellow et al., 2016). In ad-
dition, pooling helps to avoid the overfitting problem by making the
model simpler. The number of weights to be optimized is significantly
reduced at the pooling layers, creating a lower computational cost. Max
pooling is commonly used based on the concept that the maximum
values of a feature map can represent local features (Zhou and
Chellappa, 1988). Finally, fully connected layers are used as the clas-
sifier using final output feature maps. By using the features from pre-
vious layers, fully connected layers determine the final class with the
highest probability using a softmax function. It is a commonly used
classifier in multi-class classification problems in neural networks
(Goodfellow et al., 2016; Yu et al., 2017; Kim et al., 2018a). Fully
connected layers consist of a set of weights to be optimized for a node.
By using the features from previous layers, fully connected layers de-
termine the final class with the highest probability using a softmax
function. It is a commonly used classifier in multi-class classification
problems in neural networks (Goodfellow et al., 2016; Yu et al., 2017;
Kim et al., 2018a). Fully connected layers consist of a set of weights to
be optimized for a node.
An activation function converts the sum of input data into an output
result. To get the benefit of multiple layers on a neural network, it is
essential to use a nonlinear activation function. The rectified linear unit
(ReLU) is the most popular activation function in deep learning for its
excellent performance with a relatively simple structure (Glorot et al.,
2011; LeCun et al., 2015).
All of the weights, such as filters in convolutional layers and nodes
in the fully connected layers, are randomly initialized. By reducing the
error between the estimated result and reference data, weights are
gradually optimized. This iterative process is called backpropagation,
which calculates the derivative of the error function to find the
minimum error (Rumerlhar, 1986; Goodfellow et al., 2016). All of the
weights are updated by the optimization method using the calculated
gradient.
In this study CNN was implemented using the Keras open-source
library. There are many ways to construct the CNN architecture.
Therefore, it is important to find an optimal model that works well with
data considering their characteristics. Unfortunately, there is no way to
directly find an optimal model in deep learning. A multitude of tests is
typically conducted to find the optimal CNN parameters considering
performance and efficiency. In this study, 32, 64, 128 and 256 filters at
convolutional layers were tested to determine an optimal structure. We
finally constructed a CNN model, which consisted of four convolutional
layers with 32 3 × 3-sized filters. The ReLU activation function was
adopted at each layer. Max pooling with a 2 × 2 window and a stride of
2 was performed after the second and fourth convolutional layers. A
fully connected layer with 256 nodes was applied after the convolu-
tional and max-pooling layers. A soft-max function was used to classify
the LCZ type. The adaptive moment estimation (ADAM) optimizer was
used to minimize the error function, which is typically used in neural
Table 3
Training and test datasets of each LCZ type by city. The values in the training and test columns are the number of polygons. The number of the corresponding 100 m
resolution pixels is shown in parentheses. * is allocated to the red-star classes, which have only a few reference polygons of the LCZ classes. The LCZ figures in the left
column are from Stewart and Oke (2012).
LCZ Rome Hong Kong Madrid Chicago
Training Test Training Test Training Test Training Test
1 13 (318) 13 (313) 2* (228)
213 (775) 12 (776) 6 (112) 5 (67) 12 (1567) 5 (5647) 2* (126)
32* (104) 7 (195) 7 (131) 1* (92) 3 (128) 3 (123)
4 9 (383) 10 (290) 3* (305) 2* (140)
511 (749) 12 (746) 4 (76) 4 (50) 5 (715) 3 (620) 2* (104)
63 (239) 4 (241) 7 (64) 6 (56) 6 (932) 6 (894) 10 (2059) 12 (1901)
7– –
87 (235) 4 (194) 4 (86) 5 (51) 10 (1433) 12 (1380) 11 (2231) 10 (2296)
9 1* (82) 4 (422) 3 (429)
10 2* (51) 5 (109) 4 (110) 2 (238) 2 (227)
A2 (146) 3 (138) 7 (832) 7 (784) 1 (1115) 3 (244) 6 (515) 7 (4 8 8)
B2 (293) 3 (262) 7 (207) 6 (200) 8 (1906) 8 (1888) 5 (188) 5 (1 53)
C 5 (379) 4 (312) 2 (982) 2 (250)
D4 (512) 3 (472) 6 (332) 6 (236) 9 (3621) 6 (3517) 4 (1150) 4 (1227)
E 3 (324) 2 (312) 4 (115) 3 (86)
F 1* (304) 3 (31) 2 (28)
G3* (485) 5 (1282) 4 (1054) 2 (391) 2 (385) 5 (967) 4 (984)
C. Yoo, et al.
ISPRS Journal of Photogrammetry and Remote Sensing 157 (2019) 155–170
159
networks especially for the classification task (Kingma and Ba, 2014). A
graphics processor unit (GPU) of Nvidia GTX 1080Ti with 11 GB
memory was used to speed up model training with 256 batch size and
1000 epochs. The final CNN structure used in this study is shown in
Fig. 3.
3.3. Classification scheme design
To produce a 100 m resolution pixel–based LCZ map from Landsat
images, this study used two classifiers— RF and CNN. We designed five
classification schemes (three RF-based schemes (S1–S3) and two CNN-
based ones (S4–S5) with different input features and classifiers (Fig. 4)).
3.3.1. Benchmark RF-based schemes (S1–S3)
RF is the classifier adopted by the existing LCZ classification com-
munity, including the WUDAPT method. We designed three schemes
(S1–S3) with RF, based on the benchmark of existing studies. S1 cor-
responds to the existing WUDAPT method. The 30 m resolution Landsat
images were bilinearly resampled to 10 m resolution, then resampled to
100 m resolution by a zonal mean function based on the LCZ grid area.
S2 benchmarked the method proposed by Danylo et al. (2016), which
achieved an increase in the classification accuracy by adding more
spectral information to the WUDAPT model as input variables. The
10 m bilinear resampled Landsat images were resampled to 100 m, not
only by zonal mean but also by maximum and minimum within the LCZ
grid area. The three features were constructed for each Landsat band in
S2. S3 benchmarked the method suggested by Verdonck et al. (2017).
To consider the contextual characteristics of a feature, the mean,
minimum, maximum, median, 25th and 75th quantile values of the
nine pixels in a 3 × 3 window (i.e., one center pixel and its surrounding
eight pixels) were calculated from 100-meter zonal-mean Landsat
images. In each scheme, we used the features constructed from 18
bands (i.e., 9 bands for one scene) of two Landsat images in (or very
close to) the winter and summer seasons (Table 2) as input variables. In
summary, the number of input variables of each scheme was: 18, 54,
and 108 for S1, S2, and S3, respectively (Table 4). We extracted the
pixel values of the input variables at the location corresponding to LCZ
reference pixels in each scheme.
3.3.2. CNN-based schemes (S4–S5)
We proposed two different schemes based on CNN. The 30 m re-
solution Landsat images were bilinearly resampled to 10 m, allowing
100 (10 × 10) pixels to be placed in a single 100 m LCZ grid. Each 10 m
resolution image was normalized using the min–max approach, to re-
duce training time (Ba et al., 2016). In the case of S4, the 10 × 10 size
features of 10 m resolution Landsat images in each LCZ reference pixel
Fig. 3. The structure of CNN we designed in this study. N indicates the size of input image (i.e., a 10 × 10 size image has N of 10). The k in the last output means the
number of LCZ types to be classified for each city.
Fig. 4. The schematic process flow showing how to prepare the input features for each scheme.
C. Yoo, et al. ISPRS Journal of Photogrammetry and Remote Sensing 157 (2019) 155–170
160
area were extracted and fed into CNN. The final scheme (S5) takes into
consideration the surrounding area of a focus pixel (i.e., the same area
of the moving window in S3). We extracted the 30 × 30 size features of
10 m resolution Landsat images and fed them into the CNN classifier. In
summary, the S4 has a 10 × 10–sized 10 m resolution feature for each
band, while the S5 has a 30 × 30–sized 10 m resolution feature for each
band as input variables (Table 4). After the 10 m resolution images
were fed into the CNN model, the fully connected layers could make a
final decision of one LCZ class for each image in order to produce a
100 m resolution LCZ map. The number of trainable parameters of the
S4 and S5 for the four cities are summarized in Table S1.
3.4. Modelling and accuracy assessment
A randomly selected 90% of the training samples (i.e., training set)
were used to train the models and the remaining 10% were used to
identify the optimum parameter values for the models. Through this
process, we selected the optimal number of RF trees (i.e., ntree) within
100–1000 based on overall accuracy (OA) for each RF-based scheme
(S1–3). In the case of the CNN-based schemes (S4–5), the model re-
sulting in the best accuracy based on the 10% samples in 1000 epochs
was selected. We ran the models ten times for each scheme to examine
the robustness of the methods, and assessed accuracy using the separate
test datasets (Table 3). For an assessment of accuracy, we used not only
OA but also OA
urb
, which is the accuracy among the urban LCZ types
(LCZs 1–10) and OA
nat
, which is the accuracy among the natural LCZ
types (LCZs A–G). In addition, we obtained the F1-score (Eq. (1)) from
user’s accuracy (UA) and producer’s accuracy (PA) of each LCZ class to
further examine the classification accuracy by class. As the F1-score is
the harmonic mean of UA and PA, the score is not only an indicator of
the classification capability but also able to explain how similar the two
values (i.e., UA and PA) are (Sokolova and Lapalme, 2009).
=
× ×
+
F1 (2 UA PA)
(UA PA)
(1)
Finally, we selected one model among the 10 simulated models to
map LCZ for each city, based on the highest value of the sum of OA and
OA
urb
for each scheme. We also conducted McNemar's test to evaluate
the significance of the differences in the classification results by
scheme.
3.5. Transferability experiments
We further compared the transferability between CNN and RF
classifiers by applying the LCZ models developed for three cities to the
remaining city based on the best performing RF and CNN models from
the experiment of individual cities. In other words, reference data of
one city was used to evaluate the transferability of the LCZ model that
was developed using reference data of the other three cities as training
samples shown in Table 3. The procedure for designing the models for
transferability test is the same as that documented in Section 3.4.
Considering the different LCZ types by city, only the LCZ labels be-
longing to the test city were selected when training the LCZ models.
4. Results and discussion
4.1. Overall performance of the schemes
Table 5 shows the accuracy assessment results of the five schemes
for four cities. When compared to S1 (i.e., the WUDAPT method), S2
showed an increase of OA of 2% for Madrid and of 3% for Chicago,
which agrees with the findings from Danylo et al. (2016). Moreover, it
should be noted that the OA
urb
of S2 significantly increased when
compared to that of S1 for Hong Kong, Madrid, and Chicago. Interest-
ingly, the OA
nat
did not significantly increase for all cities in S2. This
suggests that putting various spectral information (i.e., maximum and
minimum) as input variables might contribute to the increase in the
accuracy for urban LCZ types (i.e., LCZ1–10), which have more het-
erogeneous spectral characteristics.
For Hong Kong, while the OA
urb
of S2 was higher than that of S1,
the OA
nat
of S2 is lower than that of S1. For natural LCZ types (i.e.,
LCZA–Z) in Hong Kong, adding manually extracted features as input
data rather decreased the accuracy. Moreover, there was no significant
accuracy difference between S1 and S2 for Rome. This implies that
including more contextual information as input variables in the RF does
not always guarantee improving classification accuracy.
Unlike S1 and S2, where we manually selected input features, the
CNN-based S4 can automatically learn multi-level features from the
original input images. It is not surprising, therefore, that S4 shows
higher OA value than S1 for all four cities. In addition, S4 showed
higher OA than S2, except for Chicago; one possible reason is that the
added contextual information in S2 was meaningful enough to improve
accuracy in Chicago where the city blocks have regular arrangements.
Athiwaratkun and Kang (2015) showed that using well-learned features
as input variables in RF yielded higher accuracy than CNN.
The influence of considering neighborhood pixels as input features
is seen in both RF- and CNN-based schemes. S3 produced the highest
OA value among the three RF-based schemes (S1–3), which is con-
sistent with Verdonck et al. (2017). The CNN-based S5, with 30 × 30-
sized input features, showed the highest accuracy among the five
schemes, by increasing OA in all cities by 5–8% when compared to the
Table 4
Summary of each scheme with input feature types.
Scheme Classifier # of input features Feature types (spatial resolution)
S1 RF 18 Zonal mean (100 m)
S2 RF 54 Zonal mean, maximum and minimum (100 m)
S3 RF 108 Mean, minimum, maximum, median, 25th and 75th quantile values of 3×3 moving window (100 m)
S4 CNN 18 10 × 10 sized (10 m)
S5 CNN 18 30 × 30 sized (10 m)
Table 5
Accuracy assessment results for five schemes of four cities with the average
statistic values from 10 times runs for each scheme. The numbers in parentheses
are standard deviations of OA with 10 times runs.
Scheme Rome Hong Kong
OA (σ) % OA
urb
% OA
nat
% OA (σ) % OA
urb
% OA
nat
%
S1 (RF) 72.05 (0.13) 68.17 79.13 71.58 (0.43) 52.96 79.27
S2 (RF) 72.45 (0.19) 68.17 80.27 71.42 (0.18) 56.73 77.49
S3 (RF) 75.36 (0.25) 73.76 78.27 75.37 (0.06) 64.34 79.93
S4 (CNN) 73.32 (0.64) 72.22 75.33 74.84 (0.52) 54.62 83.20
S5 (CNN) 80.34 (1.04) 81.99 77.34 79.80 (0.63) 65.15 85.85
Scheme Madrid Chicago
OA (σ) % OA
urb
% OA
nat
% OA (σ) % OA
urb
% OA
nat
%
S1 (RF) 82.75 (0.18) 76.65 87.07 84.22 (0.06) 80.96 90.02
S2 (RF) 84.41 (0.07) 79.76 87.71 87.28 (0.09) 85.22 90.96
S3 (RF) 85.78 (0.14) 81.58 88.76 89.66 (0.10) 88.71 91.36
S4 (CNN) 85.33 (0.42) 80.67 88.64 86.18 (0.21) 83.44 91.07
S5 (CNN) 89.72 (0.41) 88.18 90.82 90.85 (0.38) 90.46 91.54
C. Yoo, et al. ISPRS Journal of Photogrammetry and Remote Sensing 157 (2019) 155–170
161
WUDAPT method (S1). These findings agree well with Zhang and Tang
(2019), who showed that accuracy improved when the surrounding
areas of the center target pixel were fed into CNN. In particular, when
comparing the accuracy difference between S2 and S3 with that be-
tween S4 and S5, the influence of contextual information appeared
more effective in CNN than in RF. This implies that the input features of
S5 that consider the surrounding areas of the target LCZ pixels (i.e.,
30 × 30-sized images) contributed to learning more meaningful fea-
tures in the convolutional layers of CNN than those of S4 (i.e., 10 × 10-
sized images). Since the wider areas were considered more in S5 than in
S4, the CNN could learn more significant patterns, probably due to the
broader information integrated by combining local patterns, especially
for urban features (Min et al., 2017). Moreover, the OA
urb
of S5 in-
creased by about 10–13% compared to that of S1 for four cities. The
increasing rate of OA
urb
between S5 and S1 is much higher than that of
OA
nat
, implying that CNN-based S5 can be considered as the most ef-
fective LCZ mapping model for the mega urban areas.
The imbalance problem of accuracy by class occurred when the
number of samples differed greatly among classes, resulting in poor
performance over the minority classes (Huang et al., 2016; Jeatrakul
et al., 2010; Zhou and Liu, 2006). In particular, RF is known to be less
sensitive to unbalanced sample size than neural network-based CNN
(Liu et al., 2018a; Liu et al., 2013). In Rome, S4 showed a higher OA
urb
but a lower OA
nat
than S2. For Hong Kong, on the other hand, S4
showed the opposite pattern. One reason may be that the ratio of
samples among the LCZ classes varies by city. The sample sizes in
Table 3 show that Rome has fewer samples of natural classes, and more
samples of urban classes than natural classes. In Hong Kong, however,
the number of samples in the urban LCZ types was very small, while the
number of samples in the natural LCZ types was much larger than that
of urban classes. When training CNN, LCZ classes with a relatively large
number of samples could be more correctly classified than weakly re-
presented LCZ classes. Such an imbalance problem of training sample
size by class seemed to be mitigated in S5 when compared to S4.
Consequently, the consideration of neighborhood pixels in CNN led to
the good classification of the LCZ classes even with a small sample size
(i.e., natural classes for Rome, urban classes for Hong Kong).
The standard deviations of the results in Table 5 show that the CNN-
based schemes (S4–S5) yielded a higher variation of accuracy than
those of the RF-based schemes (S1–S3). This implies that RF produces
more consistent results than CNN, because RF is an ensemble-based
model (Lebedev et al., 2014; Kursa, 2014; Khoshgoftaar et al., 2007).
Despite the relatively high standard deviation in the S5 results, most of
the S5 classifications produced higher accuracy than those of the RF-
based schemes.
Using the most accurate model among the 10 simulations for each of
the five schemes, the significance of the accuracy differences between
the classifications was assessed by McNemar’s Chi-squared test (Fig. 5).
In Rome, S1 yielded an outcome comparable to S2, and the perfor-
mance of both S2 and S3 were similar to that of S4. In the case of Hong
Kong, S1 and S2 showed similar results, as did S3 and S4. For Madrid
and Chicago, all classifications were statistically different, except for
the S3/S4 pair for Madrid. S5, the CNN scheme, achieved significantly
higher accuracy than the other schemes in all four cities (Table 5). S5 is
considered to have a great utility in LCZ classification because it con-
sistently shows statistical significance with the other schemes, resulting
in the highest classification accuracy (Table 5 and Fig. 5). Interestingly,
the accuracy difference between S4 and S3 was not significant in Rome,
Hong Kong, and Madrid. This implies that the RF model considering the
neighborhood area (300 × 300 m; S3) produced a similar performance
with the CNN model without considering such a large neighborhood of
the LCZ grid (100 × 100 m; S4). In the case of Chicago, however, S3
and S4 resulted in a significant difference, showing higher accuracy of
the RF model for S3 than the CNN for S4 (Table 5). In Chicago where
the city has been developed based on regular grids, increasing features
(i.e., spectral and neighboring information) could bring sufficiently
high accuracy in RF. This is particularly true in light of the accuracy
differences between the pair S2 and S1 and the pair S3 and S1, which
are both the highest among four cities in Table 5.
4.2. Classification accuracy per class
Fig. 6 shows the F1-score for each of the four cities on the average of
10-time runs per scheme. Figs. 7–10 show the confusion matrices of the
most accurate models among the 10-time runs of S3, which is the best
scheme among the RF-based schemes, and S5, which is the best of the
CNN-based schemes, as shown in Table 5.
It should be noted that CNN-based S5 showed the highest F1-score
among the five schemes on LCZ5 and LCZ6 for all cities except the red-
star classes. For LCZ5 and LCZ6, where the abundant trees are mixed
with openly arranged low or mid-rise buildings, the RF-based S3 mis-
classified them as other classes, such as densely packed buildings (i.e.,
LCZ1-3) or natural LCZ types (Figs. 7–10). On the other hand, the CNN-
based S5 classified the classes more accurately than S3. One possible
reason is that CNN can learn the regions of mixed pixels with buildings
in the images by incorporating the surrounding area information.
Awrangjeb et al. (2012) reported that using the building edge in-
formation improved the detection performance of the trees, which were
misclassified as buildings. For S3 in Rome, LCZ6 (open low-rise) was
confused by various classes, especially LCZB (scattered trees) in Fig. 7a.
However, the accuracy of LCZ6 clearly increased in S5 (Fig. 7b). Among
the urban LCZ types in Hong Kong, LCZ5 (open mid-rise) and LCZ6
(open mid-rise) showed higher F1-scores in S5 than those in the other
schemes. In Fig. 8a, LCZ5 was often confused with LCZ4 (open high-
rise) and LCZA (dense trees) in S3, and LCZ6 was confused with LCZD
(low plants) in S3. Fig. 8b shows that such misclassification of LCZ5 and
LCZ6 happened much less in S5 than inS3. In Madrid, the confusion of
LCZ5 with LCZ2 in S3 significantly improved in S5, and the confusion
between LCZ6 and LCZ5 in S3 also clearly improved in S5 (Fig. 9a–b).
The LCZ9 (sparsely built) in the CNN-based schemes showed higher
F1-scores than the RF-based schemes did. Especially in Chicago, LCZ9
tended to be misclassified as LCZD (low plants) in S3, but this error was
observed to be reduced in S5 (Fig. 10a–b). In fact, LCZ9 shows a unique
spatial structure where small-sized buildings are sparsely built among
Fig. 5. Results of McNemar’s Chi-squared test. The orange squares indicate the significant accuracy difference at the 99% confidence level, while the blue ones at the
95% confidence level. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
C. Yoo, et al. ISPRS Journal of Photogrammetry and Remote Sensing 157 (2019) 155–170
162
vegetation. Thus, it is not surprising that the CNN classifiers classified
this class well, which is consistent with the results of Fu et al. (2018),
who reported that CNN showed the higher classification accuracy for
the mixed objects when compared to RF.
One possible reason that the CNN's F1-score was relatively lower
than that of RF in LCZA in Rome and LCZ8 in Hong Kong could be a
data imbalance problem due to the relatively small number of training
samples of the classes (Table 3). In Hong Kong, however, LCZ5 and
LCZ6, which have open arrangement of buildings mixed with tress,
confirmed that the F1-scores of CNN were higher than those of RF, even
if the number of samples was small. In this study, the data imbalance
problem might exert a relatively weak influence on accuracy because
LCZ classes with a small number of polygons were classified into the
red-star class in each city.
We calculated the F1-score difference between S3 and S2 (RF
schemes) and S5 and S4 (CNN schemes) to identify the neighboring
effects for all LCZs, except red-star classes. Interestingly, the class
yielding the highest difference for each city is LCZ6 in Rome, LCZ5 in
Hong Kong and Madrid, and LCZ3 in Chicago (Figure S1) for the dif-
ference between S5 and S4 (CNN schemes). This result implies that the
consideration of neighborhoods could be more effective for urban types
classes, especially when using CNN. Moreover, LCZ5 and LCZ6, which
are open arrangement classes, showed significantly improved accuracy
when using CNN with the incorporation of neighborhood information
for all cities except Chicago, where the accuracy is still good enough,
even before the consideration of neighboring areas.
The results of the red-star class should be carefully interpreted in
Fig. 6. A positive bias may appear because the training and test sets
were randomly stratified samples within one polygon (Verdonck et al.,
2017). In particular, the red-star classes tended to have a higher F1-
score than the other schemes in S3 and S5, which incorporate the
neighboring areas into their classifications.
Fig. 6. Comparison of F1-score between the five schemes (S1–5) for the four cities. The F1-scores were averaged from 10 runs for each scheme. * indicates the red-
star classes. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
C. Yoo, et al. ISPRS Journal of Photogrammetry and Remote Sensing 157 (2019) 155–170
163
4.3. Mapping LCZ for four cities
Figs. 11 and 12 show the 30 m resolution GMIS with LCZ references
and the developed LCZ maps for the four cities from S3 and S5, which
have the highest accuracy among the RF- and CNN-based schemes,
respectively. We divided the results of the two classified maps into four
cases and then calculated their ratios, as shown in Table 6.
The two maps of Rome have different classification results for the
suburban areas consisting of open arrangements that appear at a dis-
tance from the monocentric city center. In Rome, the study domain
denoted by the blue box (middle bottom) in S3 was classified as LCZ5
(open mid-rise), while that in S5 was classified as LCZD (low plants). In
addition, the east bottom part of the study domain bound by yellow box
shows an amount of open low-rise areas in S5, while S3 tended to show
this area as scattered trees. On the other hand, S3 classified the
northeastern part more as open mid-rise classes than in S5, which
classified the area as low plants and scattered trees. When compared to
the impervious cover and Google Earth images (not shown), S5 seemed
to classify the built-type classes better than the S3 did, while S3 tended
to be confused between vegetation and mixed buildings. These results
correspond to the accuracy assessment where the F1-scores of LCZ5 and
LCZ6 showed better performance in CNN than in RF in Rome. It should
be noted that LCZs 5, 6, B and D in Rome appear as the dominant LCZ
classes in the classified maps (Table S2). Considering that LCZs 5 and 6
are open-arrangement urban LCZ types, CNN's ability to classify these
types of LCZs better than RF seems to account for the map disparities
for Rome. Two maps of Rome were more likely to be classified differ-
ently between urban and natural LCZ types: 21.01% (Table 6).
In Hong Kong, the non-residential (i.e., hilly) and habitable areas
are clearly distinguished from each other, so the difference within each
natural and urban LCZ type is somewhat larger than that between
natural and urban LCZ types in the two maps (Table 6). Especially, the
maps of S3 and S5 in Hong Kong exhibited differences within natural
LCZ types, such as low plants, trees, and bushes, in some areas based on
visual inspection. The classification accuracy of LCZD in Hong Kong
was higher in the CNN-based schemes than the RF-based schemes,
considering both the F1-scores and the confusion matrices. The region
of the study bounded by a black box in S5 was classified more as LCZD
(low plants) than LCZC (bush, scrub), as opposed to its more pre-
dominant classification of LCZC (bush, scrub) in S3. As in the land use
map of Hong Kong (Chan et al., 2016), these areas are dominated by
grassland, which implies that CNN could distinguish between plants
and scrub better than RF. Unlike RF, less confusion occurred between
LCZC and LCZD in CNN, which corresponds to the results of the con-
fusion matrices for not only Hong Kong but also Madrid (Figs. 8 & 9).
In Madrid, various small clustered areas called autonomous
Fig. 7. Confusion matrices of the most accurate model among the 10-time runs of S3, the best scheme among the RF-based schemes, and S5, the best of the CNN-
based schemes for Rome. * indicates the red-star classes. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version
of this article.)
C. Yoo, et al. ISPRS Journal of Photogrammetry and Remote Sensing 157 (2019) 155–170
164
communities surround the city-core. This is a conurbation in which
extended suburbs and villages comprise dense built-type classes, such as
compact mid-rise; identified using Google Earth images (not shown).
These clusters appear more distinctly in S5 than in S3 (bounded by a
blue box), and were compared to the impervious cover, which clearly
showed that the S5 classified the clusters of these dense buildings re-
latively well. In RF, these compact clusters were misclassified as open
arrangement buildings, which corresponds well to the accuracy as-
sessment result showing that LCZ2 was more often confused among
other urban LCZ types in RF compared to CNN (Fig. 9). It is interesting
that LCZ2 was not confused that often with natural LCZ types in the
confusion matrix of S3, but the generated LCZ map of S3 showed some
misclassification of LCZ2, often confused with LCZB. This could be
because there are few reference samples to test in these cluster areas. In
the case of Madrid, the ratio of natural LCZ types in the study region is
remarkably high, resulting in a classification difference among natural
LCZ types that is the highest, at 14.65% (Table 6). The different clas-
sification of urban and natural LCZ types in the two maps (~5.02%;
Table 6), could originate from the municipalities surrounding Madrid,
which were better classified in CNN than in RF due to the textural
patterns over large areas.
The promising results for LCZ classification by CNN could be useful
data for the various urban climate studies especially for the regions
with abundant LCZ classes mixed with different objects (i.e., buildings
with trees and bare-soil with shrubs). Although to a lesser extent than
Rome, the two maps of Chicago show a difference in the open ar-
rangement of low-density suburban areas surrounding the high-density
urban center, particularly in their different classifications for LCZ types:
Urban and Natural (10.43% in Table 6). However, it should be noted
that CNN could result in low user’s accuracy. In Chicago, LCZ9 (sparsely
built) was distributed widely in the middle top of the study domain
(bounded by a black box) of the maps of S3 and S5. The CNN-based S5
has the advantage of catching the sparse buildings between some of the
trees and plants by object detection, but LCZ9 seems instead to be over-
classified on the map of S5 when compared to S3. This also corresponds
well with the result of the confusion matrices in Chicago (Fig. 10), as S5
showed a higher producer’s accuracy, but a lower user’s accuracy than
S3 for LCZ9. Although this paper used only the Landsat data corre-
sponding to the WUDAPT method, if input variables, such as Sentinel-1
backscattered data, were used additionally to explain the characteristics
of buildings (Koppel et al., 2017; Demuzere et al., 2019), the limitation
of the CNN could be improved.
The CNN-based classification is known to take more time from the
training stage to the mapping stage than the RF. Nonetheless, the CNN-
based S5 had high classification accuracy and was of high value in
classifying specific LCZ types where the objects were mixed, when
compared to those of the RF-based S3.
Fig. 8. Confusion matrices of the most accurate model among the 10-time runs of S3, the best scheme among the RF-based schemes, and S5, the best of the CNN-
based schemes for Hong Kong. * indicates the red-star classes. (For interpretation of the references to colour in this figure legend, the reader is referred to the web
version of this article.)
C. Yoo, et al. ISPRS Journal of Photogrammetry and Remote Sensing 157 (2019) 155–170
165
4.4. Evaluation of model transferability
Table 7 shows the transferability assessment results based on the
two best performing schemes from the experiment of individual cities.
Interestingly, the CNN-based scheme (S5) showed a distinctly higher
performance than the RF-based scheme (S3) both for the OA and OA
urb
.
In particular, the significant improvement of OA
urb
for all four cities
was found corresponding to the findings in our single-city experiments,
which implies the superiority of the object detection-based character-
istics of CNN classifiers. In recent years, research on a transferability
framework has been attempted, with LCZ reference samples of specific
cities trained and applied to other cities (Demuzere et al., 2019; Qiu
et al., 2019; Yokoya et al., 2018). For example, Demuzere et al. (2019)
examined global transferability of LCZ models using RF classifiers with
the Google Earth Engine. However, they found the transferability of the
LCZ models was still challenging because the accuracies of their models
were generally poor (average OA of the 15 cities close to 50%). The
results of this present study identified the advantages of using CNN
classifiers over RF in the transferability framework of LCZ classification,
especially for urban-type LCZ classification. When compared to the
single-city experiment results in Table 5, the accuracy of the transfer-
ability experiment was a bit lower, varying by city, possibly due to the
limited coverage of reference data for training. It is crucial to construct
thorough and sufficient reference data of LCZ classes for various urban
structural types over the globe to improve the transferability of LCZ
models.
4.5. Novelty, limitations, and future directions
To our knowledge, this is the first study to compare and discuss LCZ
classification results between RF and CNN classifiers, in detail.
Although some previous studies tried to compare the LCZ classification
results among different classifiers including basic machine learning
algorithms (i.e., RF, Support Vector Machine (SVM) and Neural
Networks (NN)), they didn’t examine deep learning-based classifiers
(Bechtel and Daneke, 2012; Bechtel et al., 2016). More recently, a few
studies on LCZ classification using CNN classifiers have been conducted
(Sukhanov et al., 2017; Qiu et al., 2018). However, they did not fully
compare the classification performance with the existing models using
RF classifiers. Furthermore, this paper compared the results using dif-
ferent sizes of input data (i.e., 10 × 10 and 30 × 30) fed into the CNN
classifiers. The positive effect of an increasing input patch size in CNN
has been proven in different studies (Hamwood et al., 2018), which is
also shown in the LCZ mapping in this study. In particular, the specific
LCZ classes (i.e. LCZ5 and LCZ6) that have a favorable impact using
CNN were identified when an increasing size of input data was applied
when compared to the impact of RF under the same conditions. This
result can provide meaningful guidance for the continued research of
Fig. 9. Confusion matrices of the most accurate model among the 10-time runs of S3, the best scheme among the RF-based schemes, and S5, the best of the CNN-
based schemes for Madrid. * indicates the red-star classes. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version
of this article.)
C. Yoo, et al. ISPRS Journal of Photogrammetry and Remote Sensing 157 (2019) 155–170
166
LCZ classification using CNN classifiers. In addition, many LCZ classi-
fication studies have focused on only OA for their accuracy assessment.
In this study, OA
urb
, the overall accuracy between the urban LCZ types,
was also carefully examined when comparing the accuracy of the pro-
posed schemes. The validity of the results of the LCZ classification using
two classifiers was strengthened by applying it to four cities with dif-
ferent urban structures and geographical characteristics in various
continents such as Europe, Asia, and America.
The major limitation of this study is the small sample size of the
specific LCZ classes (i.e., red-star classes). In this study, reference LCZ
data were provided by the IEEE data fusion contest to ensure the re-
liability of the data. In order to validate LCZ classification with a
minimum bias, the reference polygons should be divided into training
and test sets. For the red-star classes, however, we divided the datasets
by stratified random sampling among the pixels in a polygon, because
of the limited number of polygons. The red-star classes are likely to
have a positive bias in their classification results, so care is needed in
any interpretation. Further improvement in accuracy for the LCZ classes
with a small number of samples (i.e., LCZE in Chicago) is expected
through the utilization of data augmentation methods discussed by
Yokoya et al. (2018). The CNN-based S5 showed higher accuracy in
four cities when compared to other RF-based schemes, but we could not
pinpoint which objects contributed to the detection of each LCZ class in
CNN. The use of high spatial resolution satellite data (i.e., Sentinels) in
future LCZ classification will improve the object detection ability of
CNN classifiers. In addition, using high-resolution images will enable a
more detailed analysis, especially if heat maps of CNN classifiers are
used. It is also possible to make Landsat images as higher-resolution
images by using pan-sharpening techniques (Xing et al., 2018;
Gilbertson et al., 2017; Rahaman et al., 2017). Recently, in the deep
learning field, CNN and other machine learning classifiers are being
combined to construct better models (Zhang et al., 2018; Soltau et al.,
2014). These techniques can be applied to the field of LCZ classification
as well.
When it comes to the CNN model, the fully connected network
(FCN) is adopted in recent land cover classification, with the aspect of
the semantic segmentation (Mohammadimanesh et al., 2019; Wurm
et al., 2019; Yue et al., 2019). FCN has the advantage of learning spatial
relationships at different scales (Volpi and Tuia, 2016), which can be
expected to yield improved performance in LCZ classification by taking
into account the various size and shape of each LCZ class in future
work.
5. Conclusion
In this study, we compared the two classifiers, RF and CNN, for LCZ
classification in four mega cities—Rome, Hong Kong, Madrid, and
Chicago—using bitemporal Landsat images. A total of five schemes
Fig. 10. Confusion matrices of the most accurate model among the 10-time runs of S3, the best scheme among the RF-based schemes, and S5, the best of the CNN-
based schemes for Chicago. * indicates the red-star classes. (For interpretation of the references to colour in this figure legend, the reader is referred to the web
version of this article.)
C. Yoo, et al. ISPRS Journal of Photogrammetry and Remote Sensing 157 (2019) 155–170
167
were constructed and compared. Three RF-based schemes (S1–3) were
benchmarked based on previous LCZ classification research studies.
Two CNN-based schemes (S4–5) were benchmarked using different
input feature sizes. Among the five schemes, S5 showed the best clas-
sification performance. When compared to the existing WUDAPT
workflow (i.e., S1), the OA and OA
urb
of S5 increased by about 6–8%
and 10–13%, respectively, for the four cities. This study has revealed
that the CNN classifiers were particularly good at classifying the spe-
cific LCZ classes in which buildings were mixed with trees or buildings
and trees were sparsely distributed. We also found that the
Fig. 11. LCZ maps of the best classification model among the 10-time runs of S3, the best scheme among the RF-based schemes, and S5, the best of the CNN-based
schemes for Rome and Hong Kong. Impervious covers from GMIS and LCZ reference datasets are also presented.
Fig. 12. LCZ maps of the best classification model among the 10-time runs of S3, the best scheme among the RF-based schemes, and S5, the best of the CNN-based
schemes for Madrid and Chicago. Impervious covers from GMIS and LCZ reference datasets are also presented.
C. Yoo, et al. ISPRS Journal of Photogrammetry and Remote Sensing 157 (2019) 155–170
168
classification performance of CNN significantly improved when the
input features were created with consideration of the lager neighbor-
hood areas. The results from the transferability experiment of the LCZ
models supported the superiority of the CNN approach over RF in terms
of both OA and OA
urb
for all four cities. In the future, the CNN-based
approach will become more advantageous when incorporating higher-
resolution satellite images (i.e., Sentinels) and additional spatio-
temporal features.
Acknowledgements
This research was supported by the Space Technology Development
Program and the Basic Science Research Program through the National
Foundation of Korea (NRF) funded by the Ministry of Science, ICT, &
Future Planning and the Ministry of Education of Korea, respectively
(Grant: NRF-2017M1A3A3A02015981; NRF-2017R1D1A1B03028129),
and the Korea Meteorological Administration Research and
Development Program under Grant KMIPA 2017-7010. CY was also
supported by Global PhD Fellowship Program through the National
Research Foundation of Korea (NRF), funded by the Ministry of
Education (NRF-2018H1A2A1062207). We also would like to thank
WUDAPT (the World Urban Database and Access Portal Tools project,
www.wudapt.org), the IEEE GRSS Image Analysis and Data Fusion
Technical Committee, and all the contributors for LCZ ground-truth
samples, in particular Chao Ren, Dragan Milosevic, Guillaume Dumas,
and Maria De Fatima Andrade.
Appendix A. Supplementary material
Supplementary data to this article can be found online at https://
doi.org/10.1016/j.isprsjprs.2019.09.009.
References
Athiwaratkun, B., Kang, K., 2015. Feature representation in convolutional neural net-
works. arXiv preprint arXiv:1507.02313.
Awrangjeb, M., Zhang, C., Fraser, C.S., 2012. Building detection in complex scenes
thorough effective separation of buildings from trees. Photogramm. Eng. Remote
Sens. 78, 729–745.
Ba, J.L., Kiros, J.R., Hinton, G.E., 2016. Layer normalization. arXiv preprint arXiv:1607.
06450.
Barnes, K.B., Morgan, J., Roberge, M., 2001. Impervious surfaces and the quality of
natural and built environments. Department of Geography and Environmental
Planning, Towson University, Baltimore.
Bechtel, B., Alexander, P.J., Beck, C., Böhner, J., Brousse, O., Ching, J., Demuzere, M.,
Fonte, C., Gál, T., Hidalgo, J., 2019. Generating WUDAPT Level 0 data–Current status
of production and evaluation. Urban Clim. 27, 24–45.
Bechtel, B., Alexander, P.J., Böhner, J., Ching, J., Conrad, O., Feddema, J., Mills, G., See,
L., Stewart, I., 2015. Mapping local climate zones for a worldwide database of the
form and function of cities. ISPRS Int. J. Geo-Inf. 4, 199–219.
Bechtel, B., Daneke, C., 2012. Classification of local climate zones based on multiple earth
observation data. IEEE J-Stars 5, 1191.
Bechtel, B., Demuzere, M., Sismanidis, P., Fenner, D., Brousse, O., Beck, C., Van Coillie, F.,
Conrad, O., Keramitsoglou, I., Middel, A., 2017. Quality of crowdsourced data on
urban morphology—The human influence experiment (HUMINEX). Urban Sci. 1, 15.
Bechtel, B., See, L., Mills, G., Foley, M., 2016. Classification of local climate zones using
SAR and multispectral data in an arid environment. IEEE J-Stars 9, 3097–3105.
Beck, C., Straub, A., Breitner, S., Cyrys, J., Philipp, A., Rathmann, J., Schneider, A., Wolf,
K., Jacobeit, J., 2018. Air temperature characteristics of local climate zones in the
Augsburg urban area (Bavaria, southern Germany) under varying synoptic condi-
tions. Urban Clim. 25, 152–166.
Bontemps, S., Defourny, P., Bogaert, E.V., Arino, O., Kalogirou, V., Perez, J.R., 2011.
GLOBCOVER 2009-Products description and validation report.
Breiman, L., 2001. Random forests. Machine Learn. 45, 5–32.
Cai, M., Ren, C., Xu, Y., Lau, K.K.-L., Wang, R., 2018. Investigating the relationship be-
tween local climate zone and land surface temperature using an improved WUDAPT
methodology–A case study of Yangtze River Delta, China. Urban Clim. 24, 485–502.
Chan, E.H., Wang, A., Lang, W., 2016. Comprehensive Evaluation Framework for
Sustainable Land Use: Case Study of Hong Kong in 2000–2010. J. Urban Plann. Dev.
142, 05016007.
Chen, J., Chen, J., Liao, A., Cao, X., Chen, L., Chen, X., He, C., Han, G., Peng, S., Lu, M.,
2015. Global land cover mapping at 30 m resolution: A POK-based operational ap-
proach. ISPRS J. Photogramm. 103, 7–27.
Cohen, B., 2015. Urbanization, City growth, and the New United Nations development
agenda. Cornerstone 3, 4–7.
Danylo, O., See, L., Bechtel, B., Schepaschenko, D., Fritz, S., 2016. Contributing to
WUDAPT: a local climate zone classification of two cities in Ukraine. IEEE J-Stars 9,
1841–1853.
de Colstoun, E.C.B., Huang, C., Wang, P., Tilton, J.C., Tan, B., Phillips, J., Niemczura, S.,
Ling, P.-Y., Wolfe, R., 2017. Documentation for the Global Man-made Impervious
Surface (GMIS) Dataset From Landsat.
Demuzere, M., Bechtel, B., Mills, G., 2019. Global transferability of local climate zone
models. Urban Clim. 27, 46–63.
Ellickson, R.C., 2012. The law and economics of street layouts: How a grid pattern
benefits a downtown. Ala. L. Rev. 64, 463.
Fallmann, J., Forkel, R., Emeis, S., 2016. Secondary effects of urban heat island mitigation
measures on air quality. Atmos. Environ. 125, 199–211.
Founda, D., Santamouris, M., 2017. Synergies between urban heat island and heat waves
in Athens (Greece), during an extremely hot summer (2012). Sci. Rep. 7, 10973.
Friedl, M.A., Sulla-Menashe, D., Tan, B., Schneider, A., Ramankutty, N., Sibley, A., Huang,
X., 2010. MODIS Collection 5 global land cover: Algorithm refinements and char-
acterization of new datasets. Remote Sens. Environ. 114, 168–182.
Fu, T., Ma, L., Li, M., Johnson, B.A., 2018. Using convolutional neural network to identify
irregular segmentation objects from very high-resolution remote sensing imagery. J.
Appl. Remote Sens. 12, 025010.
Gilbertson, J.K., Kemp, J., Van Niekerk, A., 2017. Effect of pan-sharpening multi-tem-
poral Landsat 8 imagery for crop type differentiation using different classification
techniques. Comput. Electron. Agric. 134, 151–159.
Giridharan, R., Emmanuel, R., 2018. The impact of urban compactness, comfort strategies
and energy consumption on tropical urban heat island intensity: a review. Sustain.
Cities Soc. 40, 677–687.
Giridharan, R., Ganesan, S., Lau, S., 2004. Daytime urban heat island effect in high-rise
and high-density residential developments in Hong Kong. Energy Build. 36, 525–534.
Glorot, X., Bordes, A., Bengio, Y., 2011. Deep sparse rectifier neural networks. In:
Proceedings of the fourteenth international conference on artificial intelligence and
statistics, pp. 315–323.
Goodfellow, I., Bengio, Y., Courville, A., Bengio, Y., 2016. Deep Learning. MIT Press
Cambridge.
Hamwood, J., Alonso-Caneiro, D., Read, S.A., Vincent, S.J., Collins, M.J., 2018. Effect of
patch size and network architecture on a convolutional neural network approach for
automatic segmentation of OCT retinal layers. Biomed. Opt. Express 9, 3049–3066.
Han-qiu, X., Ben-qing, C., 2004. Remote sensing of the urban heat island and its changes
in Xiamen City of SE China. J. Environ. Sci. 16, 276–281.
Huang, C., Li, Y., Change Loy, C., Tang, X., 2016. Learning deep representation for im-
balanced classification. In: Proceedings of the IEEE Conference on Computer Vision
and Pattern Recognition, pp. 5375–5384.
Jeatrakul, P., Wong, K.W., Fung, C.C., 2010. Classification of imbalanced data by com-
bining the complementary neural network and SMOTE algorithm. In: International
Conference on Neural Information Processing. Springer, pp. 152–159.
Kaloustian, N., Bechtel, B., 2016. Local climatic zoning and urban heat island in Beirut.
Procedia Eng. 169, 216–223.
Khoshgoftaar, T.M., Golawala, M., Van Hulse, J., 2007. An empirical study of learning
from imbalanced data using random forest, Tools with Artificial Intelligence, 2007.
ICTAI 2007. In: 19th IEEE International Conference on. IEEE, pp. 310–317.
Table 6
The percentages of the LCZ differences between two classified maps (S3 and S5)
for four cities shown in Figs. 11 and 12.
Rome Hong Kong Madrid Chicago
Classification within the same LCZ 60.21% 73.47% 77.87% 80.31%
Different classification within Urban
LCZ types
11.29% 8.09% 2.46% 5.70%
Different Classification within
Natural LCZ types
7.49% 11.10% 14.65% 3.56%
Different Classification for LCZ types:
Urban and Natural
21.01% 7.33% 5.02% 10.43%
Table 7
Transferability assessment results by test city based on S3 and S5, the best
performing RF and CNN schemes from the single city experiments, respectively.
The overall accuracies were extracted from the best model among 10-time runs.
Scheme Rome Hong Kong
OA % OA
urb
% OA
nat
% OA % OA
urb
% OA
nat
%
S3 (RF) 45.20 43.33 48.61 52.03 5.34 71.31
S5 (CNN) 62.69 67.42 54.07 58.68 28.18 71.27
Scheme Madrid Chicago
OA % OA
urb
% OA
nat
% OA % OA
urb
% OA
nat
%
S3 (RF) 60.64 47.68 69.83 27.24 6.89 63.45
S5 (CNN) 78.03 77.08 78.71 41.52 25.13 70.70
C. Yoo, et al. ISPRS Journal of Photogrammetry and Remote Sensing 157 (2019) 155–170
169
Kim, M., Lee, J., Han, D., Shin, M., Im, J., Lee, J., Quackenbush, L.J., Gu, Z., 2018a.
Convolutional neural network-based land cover classification using 2-D spectral re-
flectance curve graphs with multitemporal satellite imagery. IEEE J-Stars 11,
4604–4617.
Kim, M., Lee, J., Im, J., 2018b. Deep learning-based monitoring of overshooting cloud
tops from geostationary satellite data. Gisci. Remote Sens. 55, 763–792.
Kingma, D.P., Ba, J., 2014. Adam: A method for stochastic optimization. arXiv preprint
arXiv:1412.6980.
Koppel, K., Zalite, K., Voormansik, K., Jagdhuber, T., 2017. Sensitivity of Sentinel-1
backscatter to characteristics of buildings. Int. J. Remote Sens. 38, 6298–6318.
Krizhevsky, A., Sutskever, I., Hinton, G.E., 2012. Imagenet classification with deep con-
volutional neural networks. In: Advances in Neural Information Processing Systems,
pp. 1097–1105.
Kursa, M.B., 2014. Robustness of Random Forest-based gene selection methods. BMC
Bioinf. 15, 8.
Lauwaet, D., Hooyberghs, H., Maiheu, B., Lefebvre, W., Driesen, G., Van Looy, S., De
Ridder, K., 2015. Detailed Urban Heat Island projections for cities worldwide: dy-
namical downscaling CMIP5 global climate models. Climate 3, 391–415.
Lebedev, A., Westman, E., Van Westen, G., Kramberger, M., Lundervold, A., Aarsland, D.,
Soininen, H., Kłoszewska, I., Mecocci, P., Tsolaki, M., 2014. Random Forest en-
sembles for detection and prediction of Alzheimer's disease with a good between-
cohort robustness. NeuroImage: Clin. 6, 115–125.
LeCun, Y., Bengio, Y., Hinton, G., 2015. Deep learning. Nature 521, 436–444.
Lee, J., Im, J., Kim, K., Quackenbush, L.J., 2018. Machine learning approaches for esti-
mating forest stand height using plot-based observations and airborne LiDAR data.
Forests 9, 268.
Li, M., Im, J., Beier, C., 2013. Machine learning approaches for forest classification and
change analysis using multi-temporal Landsat TM images over Huntington Wildlife
Forest. Gisci. Remote Sens. 50, 361–384.
Liu, M., Wang, M., Wang, J., Li, D., 2013. Comparison of random forest, support vector
machine and back propagation neural network for electronic tongue data classifica-
tion: application to the recognition of orange beverage and Chinese vinegar. Sens.
Actuat. B 177, 970–980.
Liu, T., Abd-Elrahman, A., Morton, J., Wilhelm, V.L., 2018a. Comparing fully convolu-
tional networks, random forest, support vector machine, and patch-based deep con-
volutional neural networks for object-based wetland mapping using images from
small unmanned aircraft system. Gisci. Remote Sens. 55, 243–264.
Liu, Y., Fang, X., Xu, Y., Zhang, S., Luan, Q., 2018b. Assessment of surface urban heat
island across China’s three main urban agglomerations. Theor. Appl. Climatol. 133,
473–488.
Marcos, D., Volpi, M., Kellenberger, B., Tuia, D., 2018. Land cover mapping at very high
resolution with rotation equivariant CNNs: Towards small yet accurate models. ISPRS
J. Photogramm. 145, 96–107.
Mathew, A., Khandelwal, S., Kaul, N., 2018. Investigating spatio-temporal surface urban
heat island growth over Jaipur city using geospatial techniques. Sustain. Cities Soc.
40, 484–500.
Min, S., Lee, B., Yoon, S., 2017. Deep learning in bioinformatics. Briefings Bioinf. 18,
851–869.
Mohammadimanesh, F., Salehi, B., Mahdianpari, M., Gill, E., Molinier, M., 2019. A new
fully convolutional neural network for semantic segmentation of polarimetric SAR
imagery in complex land cover ecosystem. ISPRS J. Photogramm. Remote Sens. 151,
223–236.
Paoletti, M., Haut, J., Plaza, J., Plaza, A., 2018. A new deep convolutional neural network
for fast hyperspectral image classification. ISPRS J. Photogramm. 145, 120–147.
Park, S., Im, J., Park, S., Yoo, C., Han, H., Rhee, J., 2018. Classification and mapping of
paddy rice by combining landsat and SAR time series data. Remote Sens-Basel 10,
447.
Peel, M.C., Finlayson, B.L., McMahon, T.A., 2007. Updated world map of the Koppen-
Geiger climate classification. Hydrol. Earth Syst. Sci. 11, 1633–1644.
Qiu, C., Mou, L., Schmitt, M., Zhu, X.X., 2019. Local climate zone-based urban land cover
classification from multi-seasonal Sentinel-2 images with a recurrent residual net-
work. ISPRS J. Photogramm. Remote Sens. 154, 151–162.
Qiu, C., Schmitt, M., Mou, L., Ghamisi, P., Zhu, X., 2018. Feature importance analysis for
local climate zone classification using a residual convolutional neural network with
multi-source datasets. Remote Sens-Basel 10, 1572.
Rahaman, K.R., Hassan, Q.K., Ahmed, M.R., 2017. Pan-sharpening of Landsat-8 images
and its application in calculating vegetation greenness and canopy water contents.
ISPRS Int. J. Geo-Inf. 6, 168.
Richardson, H.J., Hill, D.J., Denesiuk, D.R., Fraser, L.H., 2017. A comparison of geo-
graphic datasets and field measurements to model soil carbon using random forests
and stepwise regressions (British Columbia, Canada). Gisci. Remote Sens. 54,
573–591.
Rizwan, A.M., Dennis, L.Y., Chunho, L., 2008. A review on the generation, determination
and mitigation of Urban Heat Island. J. Environ. Sci. 20, 120–128.
Rumerlhar, D., 1986. Learning representation by back-propagating errors. Nature 323,
533–536.
Salata, F., Golasi, I., Petitti, D., de Lieto Vollaro, E., Coppi, M., de Lieto Vollaro, A., 2017.
Relating microclimate, human thermal comfort and health during heat waves: an
analysis of heat island mitigation strategies through a case study in an urban outdoor
environment. Sustain. Cities Soc. 30, 79–96.
Schmidhuber, J., 2015. Deep learning in neural networks: an overview. Neural networks
61, 85–117.
Sim, S., Im, J., Park, S., Park, H., Ahn, M.H., Chan, P.W., 2018. Icing detection over East
Asia from geostationary satellite data using machine learning approaches. Remote
Sens-Basel 10, 631.
Sokolova, M., Lapalme, G., 2009. A systematic analysis of performance measures for
classification tasks. Inform. Process. Manage. 45, 427–437.
Soltau, H., Saon, G., Sainath, T.N., 2014. Joint training of convolutional and non-con-
volutional neural networks. ICASSP 5572–5576.
Stewart, I.D., Oke, T.R., 2012. Local climate zones for urban temperature studies. Bull.
Am. Meteorol. Soc. 93, 1879–1900.
Sukhanov, S., Tankoyeu, I., Louradour, J., Heremans, R., Trofimova, D., Debes, C., 2017.
Multilevel ensembling for local climate zones classification. In: 2017 IEEE
International Geoscience and Remote Sensing Symposium (IGARSS). IEEE, pp.
1201–1204.
Tuia, D., Moser, G., Le Saux, B., Bechtel, B., See, L., 2017. 2017 IEEE GRSS data fusion
contest: open data for global multimodal land use classification [Technical
Committees]. IEEE Geosci. Remote Sens. Mag. 5, 70–73.
Vedaldi, A., Lenc, K., 2015. Matconvnet: convolutional neural networks for matlab. In:
Proceedings of the 23rd ACM International Conference on Multimedia. ACM, pp.
689–692.
Verdonck, M.-L., Okujeni, A., van der Linden, S., Demuzere, M., De Wulf, R., Van Coillie,
F., 2017. Influence of neighbourhood information on ‘local climate zone’mapping in
heterogeneous cities. Int. J. Appl. Earth Obs. Geoinf. 62, 102–113.
Volpi, M., Tuia, D., 2016. Dense semantic labeling of subdecimeter resolution images with
convolutional neural networks. IEEE Trans. Geosci. Remote Sens. 55, 881–893.
Wang, C., Middel, A., Myint, S.W., Kaplan, S., Brazel, A.J., Lukasczyk, J., 2018. Assessing
local climate zones in arid cities: the case of Phoenix, Arizona and Las Vegas, Nevada.
ISPRS J. Photogramm. 141, 59–71.
Wang, T., Wu, D.J., Coates, A., Ng, A.Y., 2012. End-to-end text recognition with con-
volutional neural networks. In: Pattern Recognition (ICPR), 2012 21st International
Conference on. IEEE, pp. 3304–3308.
Wurm, M., Stark, T., Zhu, X.X., Weigand, M., Taubenböck, H., 2019. Semantic segmen-
tation of slums in satellite images using transfer learning on fully convolutional
neural networks. ISPRS J. Photogramm. Remote Sens. 150, 59–69.
Xing, Y., Wang, M., Yang, S., Jiao, L., 2018. Pan-sharpening via deep metric learning.
ISPRS J. Photogramm. 145, 165–183.
Xu, Z., Guan, K., Casler, N., Peng, B., Wang, S., 2018. A 3D convolutional neural network
method for land cover classification using LiDAR and multi-temporal Landsat ima-
gery. ISPRS J. Photogramm. 144, 423–434.
Yadav, N., Sharma, C., Peshin, S., Masiwal, R., 2017. Study of intra-city urban heat island
intensity and its influence on atmospheric chemistry and energy consumption in
Delhi. Sustain. Cities Soc. 32, 202–211.
Yokoya, N., Ghamisi, P., Xia, J., Sukhanov, S., Heremans, R., Tankoyeu, I., Bechtel, B., Le
Saux, B., Moser, G., Tuia, D., 2018. Open data for global multimodal land use clas-
sification: outcome of the 2017 IEEE GRSS Data Fusion Contest. IEEE J-Stars 11,
1363–1377.
Yoo, C., Im, J., Park, S., Quackenbush, L.J., 2018. Estimation of daily maximum and
minimum air temperatures in urban landscapes using MODIS time series satellite
data. ISPRS J. Photogramm. 137, 149–162.
Yu, X.R., Wu, X.M., Luo, C.B., Ren, P., 2017. Deep learning in remote sensing scene
classification: a data augmentation enhanced convolutional neural network frame-
work. Gisci. Remote Sens. 54, 741–758.
Yue, K., Yang, L., Li, R., Hu, W., Zhang, F., Li, W., 2019. TreeUNet: Adaptive Tree con-
volutional neural networks for subdecimeter aerial image segmentation. ISPRS J.
Photogramm. Remote Sens. 156, 1–13.
Zhang, C., Pan, X., Li, H., Gardiner, A., Sargent, I., Hare, J., Atkinson, P.M., 2018. A
hybrid MLP-CNN classifier for very fine resolution remotely sensed image classifi-
cation. ISPRS J. Photogramm. 140, 133–144.
Zhang, T., Tang, H., 2019. A Comprehensive Evaluation of Approaches for Built-Up Area
Extraction from Landsat OLI Images Using Massive Samples. Remote Sens-Basel 11, 2.
Zhen, Z., Quackenbush, L.J., Stehman, S.V., Zhang, L., 2013. Impact of training and va-
lidation sample selection on classification accuracy and accuracy assessment when
using reference polygons in object-based classification. Int. J. Remote Sens. 34,
6914–6930.
Zhou, Y.-T., Chellappa, R., 1988. Computation of optical flow using a neural network. In:
IEEE International Conference on Neural Networks, pp. 71–78.
Zhou, Z.H., Liu, X.Y., 2006. Training cost-sensitive neural networks with methods ad-
dressing the class imbalance problem. IEEE Trans. Knowl. Data Eng. 18, 63–77.
Ziaul, S., Pal, S., 2018. Analyzing control of respiratory particulate matter on Land
Surface Temperature in local climatic zones of English Bazar Municipality and
Surroundings. Urban Clim. 24, 34–50.
C. Yoo, et al. ISPRS Journal of Photogrammetry and Remote Sensing 157 (2019) 155–170
170
... Since Stewart and Oke (2012) proposed the concept of the LCZ in 2012, nearly 200 cities around the world have conducted urban thermal environment studies under the LCZ framework and some urban climate research projects based on the LCZ concept, such as the World Urban Database and Access Portal Tools (WUDAPT) (Ching, 2012), MapUCE (Masson et al., 2015), LCZ generator (Demuzere et al., 2021), Google Earth Engine (Gorelick et al., 2017), all of which continue to innovate in urban multiscale data collection and technology process optimization. In addition, with the development of science and technology in recent years, deep learning algorithms (Cheng et al., 2020;Mahdianpari et al., 2018), semantic segmentation (Liu et al., 2019;Wang et al., 2023;Yoo et al., 2019), model simulation (Brousse et al., 2016;Vahmani & Ban-Weiss, 2016), and other cutting-edge techniques have also been applied in LCZ mapping. ...
... Most of the RS-based methods use pixels as mapping units, and most of these studies have adopted the WUDAPT method proposed by Bechtel et al. (2015). However, the size of the LCZ area is not consistent across cities, the pattern of regular graphical units may not be suitable for expressing the classification results of a city, and usually, the method has low resolution and poor visualization (Ma et al., 2021;Yan et al., 2022;Yoo et al., 2019). Bechtel, Alexander, et al. (2019) confirmed that the average overall accuracy of the 90 cities mapped by WUDAPT is only 50-60 %. ...
Article
In recent years, the concept of Local Climate Zone (LCZ) has been widely used in various cross-cutting areas of urban climate planning and has the potential to become a generalized assessment tool. Although the underlying framework for detailed LCZ theory and mapping has been proposed, a generalized methodology and multidisciplinary cross-cutting applicability arguments are still lacking, which is not friendly enough for future LCZ synergistic urban planning and policy output. Therefore, there is an urgent need for a comprehensive survey of empirical studies of LCZ systems in cities to improve the understanding and address the above issues. In this study, bibliometric analysis and meta-analysis were used to provide a systematic review of LCZ empirical studies over the past decade; analyze the number of studies, geographical distribution, keywords and research hotspots; and discuss the following themes: conducting studies on a global scale; establishing a new standardized mapping process; heat island assessment based on global datasets; using LCZ as a tool for assessing the thermal health of cities; improving the compatibility of the LCZ framework with climate models; and urban planning and design applications incorporating nonphysical factors. Scientific and practical communities can quickly clarify the current status and challenges of using LCZ in urban climate planning and provide references for expanding the application of LCZ.
... The bootstrap aggregation process reshuffles the original datasets and creates n number of new sub-samples of the datasets with replacements. Each tree is generated from the sampling datasets in the training process (Yoo et al., 2019). The starting/apex point of an individual tree is considered a node, and the endpoint of the branch is known as a leaf. ...
... Every node is divided with the help of the best possible variable in the datasets. The RF method calculates the data similarity based on its location in the same leaves (Yoo et al., 2019). The similarity between x and y (s(x, y)) is denoted by the number of times the two given data lies in the same leaf. ...
Article
Full-text available
Study region: Eight governorates in upper Egypt namely Aswan, Asyut, Beni-Suef, Fayoum, Luxor, Minya, Qena and Sohag. Study focus: This study aims to develop novel hybrid machine learning (ML) models for forecasting the drought phenomena based on limited inputs for the eight Egyptian govern-orates, and ii) evaluate the performance and accuracy of the developed ML models for predicting Palmer Drought Severity Index (PDSI) to recommend the optimal model based on performance statistical metrics. The hybrid ML models were Convolution Neural Networks (CNN)-Long Short-Term Memory (LSTM), CNN-Random Forest (RF), CNN-Support Vector Machine (SVR), and CNN-Extreme Gradient Boosting (XGB). New hydrological insights for the region: Results showed that CNN-LSTM model outperformed the others followed by CNN-RF. Values of NSE, MAE, MARE, IA, R 2 , and RMSE for CNN-LSTM were 0.885, 0.915, − 2.073, 0.967, 0.885, and 0.573, respectively. For the testing stage CNN-SVR model was found to perform the best; average values of NSE, MAE, MARE, IA, R 2 , and RMSE were 0.828, 0.364, − 2.903, 0.950, 0.828 and 0.688, respectively. This study provided a way forward for convenient estimation of the PDSI Index from the meteorological data in terms of advancing deep learning algorithms. The developed hybrid models, more or less, can satisfactory predict PDSI values. Additionally, the study suggests the CNN-LSTM model as the most suitable model to advance future investigation in the study area.
... The training and test sets were separated from the entire reference data. If samples included in the same OSM polygon are used for training and testing, overestimates of accuracy may occur 50 . To deal with this, we randomly divided the OSM polygons into 8:2 for IND and NIND, respectively. ...
Article
Full-text available
Industrial land drives economic growth but also contributes to global warming through carbon dioxide emissions. Still, the variance in its impact on economies and emissions across countries at different development stages is understudied. Here, we used satellite data and machine learning to map industrial land at 30 m resolution in ten countries with substantial industrial value-added, and analyzed the impact of industrial land expansion on economic growth and emissions in 216 subnational regions from 2000 to 2019. We found that industrial land expansion was the leading factor for economic growth and emissions in developing regions, contributing 31% and 55%, respectively. Conversely, developed regions showed a diminished impact (8% and 3%, respectively), with a shift towards other economic growth drivers like education. Our findings encourage developing regions to consider the adverse effects of climate change during industrial land expansion and that developed regions prioritize human capital investment over further land expansion.
Chapter
Over the past few decades, urbanization has led to significant changes in land use and cover, impacting urban climate and public health, as well as energy consumption. In 2012, the local climate zone (LCZ) classification system was introduced to better represent the complexity of urban morphology. However, mapping LCZ over a long period has been challenging. This chapter presents a machine learning-based framework for mapping annual LCZ time series in three major urban agglomerations in China, providing spatial-temporal consistency in the resulting maps. The chapter also reveals the spatial and temporal patterns of LCZ time series, with the high-rise and open urban LCZ types becoming more prominent in urban morphology over the past two decades. Urban morphology varies considerably in urban expansion and urban renewal areas. From 2000 to 2020, inter-city urban morphology differences narrowed between the three urban agglomerations, but intra-city urban morphology differences widened.
Chapter
GIS-based mapping and Remote sensing-based mapping have been widely employed in LCZ classification. GIS mapping method uses multiple data sources, such as remote sensing imagery, aerial photographs, and existing GIS databases of planning information, which allows detailed descriptions of urban forms and land cover conditions. WUDAPT (World Urban Database and Access Portal Tools) is a global project aimed at creating a worldwide comprehensive database of LCZ based on remote sensing techniques. It provides a fast and low-cost way of classifying land cover conditions based on free-access remote sensing images. The supervised pixel-based classification of WUDAPT shows wide applicability in different regions around the world, which is especially helpful for developing regions to fast establish LCZ classification maps. The combined mapping method of GIS-based and WUDAPT mapping is an emerging trend that aims to leverage the strengths of both GIS and remote sensing techniques to improve the accuracy and efficiency of LCZ mapping. LCZ classification maps and databases are increasingly applied in climatic planning as information support for decision-making.
Chapter
Accurate local climate zone (LCZ) maps are crucial for urban environmental studies. The last chapter introduced the ways to generate LCZ maps, including object-based and pixel-based remote sensing and GIS methods. Among these methodologies, the supervised pixel-based method using open-access remote sensing imagery has gained popularity, providing a fast and cost-efficient way for LCZ classification. Implementing the World Urban Database and Access Portal Tools (WUDAPT) further provides an open platform and global database for consistent supervised pixel-based LCZ information to support different types of applications and research (http://www.wudapt.org/). This chapter outlines three critical components in supervised pixel-based LCZ classification, including (1) geometrical pre-processing and classification platform (Sect. 4.1), (2) remote sensing data (Sect. 4.2); and (3) classification algorithm (Sect. 4.3), and their recent development and improvement in LCZ classification. Three case studies comparing different classification algorithms in Asian cities’ LCZ mapping (Sects. 4.4, 4.5, and 4.6) are also presented in this chapter.
Article
Remote sensing can be used for effectively mapping plant species, thereby aiding in their sustainable management. Pinus roxburghii (PR), also known as Chir Pine, is often found alongside Quercus leucotricophora and Rhododendron arboreum around Dudhatoli range, Uttarakhand. It holds immense ecological and economic importance in the Himalayan region. It is observed that PR exhibits heterogeneity within an image likely due to varying aspect, shadows, and canopy coverage. This study focuses on mapping PR using an innovative individual sample as mean (ISM) approach embedded in the framework of the Possibilistic c-means (PCM) and noise clustering (NC) fuzzy classifier, specifically addressing heterogeneity within the class while comparing it with the conventional mean training parameter approach. The research utilises the Modified Soil Vegetation Index 2 (MSAVI2) from a semi-hypertemporal (SH) dataset consisting of 17 images acquired by the 8-band PlanetScope data. This study also experiments with different numbers of training samples to understand their impact on the output. Results of PCM with an m value of 2.1 and NC with δ value of 50,000 show good classified outputs. It was also found that a training sample size of 11 showed the best result. This study showcases progress in using the SH dataset, ISM-based PCM, and NC models with a limited number of training samples to overcome challenges posed by class heterogeneity.
Article
Local climate zone (LCZ) classification plays a critical role in urban environment research and has attracted extensive attention from many researchers. However, the potential of deep learning-based approaches is not yet fully explored in this field, even though neural networks continue to push the frontier for various applications. In this paper, we propose a novel multimodal multiscale Transformer network for LCZ classification by introducing multiscale patch embedding and multimodal fusion learning in Transformer architecture. The proposed multiscale patch embedding effectively captures hierarchical interrelationships of image contextual neighborhoods, and automatically learns discriminative features. And the proposed multimodal fusion learning enables the network to naturally fuse multispectral and synthetic aperture radar (SAR) data under the guidance of attention mechanism. To further improve classification accuracy, we impose semi-supervised learning to mine unlabeled image data information. Both labeled and pseudo-labeled data jointly drive our network updates. Experiments conducted on the So2Sat LCZ42, CHN15-LCZ and SouthKorea6-LCZ benchmark datasets demonstrate that our proposed approach outperforms other existing methods significantly and achieves state-of-the-art performance. In the generated LCZ maps, urban and natural classes are well distinguished, the urban structure with waters or mountains is well preserved. Finally, we also discuss the impact of the sample receptive field and sample heterogeneity on LCZ classification performance, which provides a new idea for future studies of LCZ classification.
Article
Full-text available
A R T I C L E I N F O Keywords: Land cover Local climate zones (LCZs) Sentinel-2 Multi-seasonal Residual convolutional neural network (ResNet) Long short-term memory (LSTM) Recurrent neural network (RNN) A B S T R A C T The local climate zone (LCZ) scheme was originally proposed to provide an interdisciplinary taxonomy for urban heat island (UHI) studies. In recent years, the scheme has also become a starting point for the development of higher-level products, as the LCZ classes can help provide a generalized understanding of urban structures and land uses. LCZ mapping can therefore theoretically aid in fostering a better understanding of spatio-temporal dynamics of cities on a global scale. However, reliable LCZ maps are not yet available globally. As a first step toward automatic LCZ mapping, this work focuses on LCZ-derived land cover classification, using multi-seasonal Sentinel-2 images. We propose a recurrent residual network (Re-ResNet) architecture that is capable of learning a joint spectral-spatial-temporal feature representation within a unitized framework. To this end, a residual convolutional neural network (ResNet) and a recurrent neural network (RNN) are combined into one end-to-end architecture. The ResNet is able to learn rich spectral-spatial feature representations from single-seasonal imagery , while the RNN can effectively analyze temporal dependencies of multi-seasonal imagery. Cross validations were carried out on a diverse dataset covering seven distinct European cities, and a quantitative analysis of the experimental results revealed that the combined use of the multi-temporal information and Re-ResNet results in an improvement of approximately 7 percent points in overall accuracy. The proposed framework has the potential to produce consistent-quality urban land cover and LCZ maps on a large scale, to support scientific progress in fields such as urban geography and urban climatology.
Article
Full-text available
Despite the application of state-of-the-art fully Convolutional Neural Networks (CNNs) for semantic segmentation of very high-resolution optical imagery, their capacity has not yet been thoroughly examined for the classification of Synthetic Aperture Radar (SAR) images. The presence of speckle noise, the absence of efficient feature expression, and the limited availability of labeled SAR samples have hindered the application of the state-of-the-art CNNs for the classification of SAR imagery. This is of great concern for mapping complex land cover ecosystems, such as wetlands, where backscattering/spectrally similar signatures of land cover units further complicate the matter. Accordingly, we propose a new Fully Convolutional Network (FCN) architecture that can be trained in an end-to-end scheme and is specifically designed for the classification of wetland complexes using polarimetric SAR (PolSAR) imagery. The proposed architecture follows an encoder-decoder paradigm , wherein the input data are fed into a stack of convolutional filters (encoder) to extract high-level abstract features and a stack of transposed convolutional filters (decoder) to gradually up-sample the low resolution output to the spatial resolution of the original input image. The proposed network also benefits from recent advances in CNN designs, namely the addition of inception modules and skip connections with residual units. The former component improves multi-scale inference and enriches contextual information, while the latter contributes to the recovery of more detailed information and simplifies optimization. Moreover, an in-depth investigation of the learned features via opening the black box demonstrates that convolutional filters extract discriminative polarimetric features, thus mitigating the limitation of the feature engineering design in PolSAR image processing. Experimental results from full polarimetric RADARSAT-2 imagery illustrate that the proposed network outperforms the conventional random forest classifier and the state-of-the-art FCNs, such as FCN-32s, FCN-16s, FCN-8s, and SegNet, both visually and numerically for wetland mapping.
Article
Full-text available
Unprecedented urbanization in particular in countries of the global south result in informal urban development processes, especially in mega cities. With an estimated 1 billion slum dwellers globally, the United Nations have made the fight against poverty the number one sustainable development goal. To provide better infrastructure and thus a better life to slum dwellers, detailed information on the spatial location and size of slums is of crucial importance. In the past, remote sensing has proven to be an extremely valuable and effective tool for mapping slums. The nature of used mapping approaches by machine learning, however, made it necessary to invest a lot of effort in training the models. Recent advances in deep learning allow for transferring trained fully convolu-tional networks (FCN) from one data set to another. Thus, in our study we aim at analyzing transfer learning capabilities of FCNs to slum mapping in various satellite images. A model trained on very high resolution optical satellite imagery from QuickBird is transferred to Sentinel-2 and TerraSAR-X data. While free-of-charge Sentinel-2 data is widely available, its comparably lower resolution makes slum mapping a challenging task. TerraSAR-X data on the other hand, has a higher resolution and is considered a powerful data source for intra-urban structure analysis. Due to the different image characteristics of SAR compared to optical data, however, transferring the model could not improve the performance of semantic segmentation but we observe very high accuracies for mapped slums in the optical data: QuickBird image obtains 86-88% (positive prediction value and sensitivity) and a significant increase for Sentinel-2 applying transfer learning can be observed (from 38 to 55% and from 79 to 85% for PPV and sensitivity, respectively). Using transfer learning proofs extremely valuable in retrieving information on small-scaled urban structures such as slum patches even in satellite images of decametric resolution .
Article
Full-text available
Detailed information about built-up areas is valuable for mapping complex urban environments. Although a large number of classification algorithms for such areas have been developed, they are rarely tested from the perspective of feature engineering and feature learning. Therefore, we launched a unique investigation to provide a full test of the Operational Land Imager (OLI) imagery for 15-m resolution built-up area classification in 2015, in Beijing, China. Training a classifier requires many sample points, and we proposed a method based on the European Space Agency’s (ESA) 38-m global built-up area data of 2014, OpenStreetMap, and MOD13Q1-NDVI to achieve the rapid and automatic generation of a large number of sample points. Our aim was to examine the influence of a single pixel and image patch under traditional feature engineering and modern feature learning strategies. In feature engineering, we consider spectra, shape, and texture as the input features, and support vector machine (SVM), random forest (RF), and AdaBoost as the classification algorithms. In feature learning, the convolutional neural network (CNN) is used as the classification algorithm. In total, 26 built-up land cover maps were produced. The experimental results show the following: (1) The approaches based on feature learning are generally better than those based on feature engineering in terms of classification accuracy, and the performance of ensemble classifiers (e.g., RF) are comparable to that of CNN. Two-dimensional CNN and the 7-neighborhood RF have the highest classification accuracies at nearly 91%; (2) Overall, the classification effect and accuracy based on image patches are better than those based on single pixels. The features that can highlight the information of the target category (e.g., PanTex (texture-derived built-up presence index) and enhanced morphological building index (EMBI)) can help improve classification accuracy. The code and experimental results are available at https://github.com/zhangtao151820/CompareMethod.
Article
Full-text available
Using the cloud-computing resources of Google's Earth Engine (EE) and a range of satellite sensors (input features) this paper for the first time explores the potential of up-scaling the current Local Climate Zone mapping efforts to regional and global scales. Using a transferability framework, we test whether information from one city contains valuable information to cate-gorise a different city, simultaneously exploring the role of the input features and the characteristics of individual cities. It was found that the accuracies of the EE approach are comparable to the standard WUDAPT method, making EE a viable alternative approach. The results from the city-to-city experiments are generally poor when compared to the single city benchmark experiments , indicating that the collection of site-specific training areas remains relevant. However, LCZ mapping accuracies are considerably improved when a) the source of the training data is from a city in the same ecoregion as the city of interest and b) if the training areas from several cities are combined. These results support the claim that the LCZ framework is a universal urban typology and indicate that, provided a continued optimisation of input features and quality of training areas, up-scaling to regional or global levels is feasible.
Article
Full-text available
Global Local Climate Zone (LCZ) maps, indicating urban structures and land use, are crucial for Urban Heat Island (UHI) studies and also as starting points to better understand the spatio-temporal dynamics of cities worldwide. However, reliable LCZ maps are not available on a global scale, hindering scientific progress across a range of disciplines that study the functionality of sustainable cities. As a first step towards large-scale LCZ mapping, this paper tries to provide guidance about data/feature choice. To this end, we evaluate the spectral reflectance and spectral indices of the globally available Sentinel-2 and Landsat-8 imagery, as well as the Global Urban Footprint (GUF) dataset, the OpenStreetMap layers buildings and land use and the Visible Infrared Imager Radiometer Suite (VIIRS)-based Nighttime Light (NTL) data, regarding their relevance for discriminating different Local Climate Zones (LCZs). Using a Residual convolutional neural Network (ResNet), a systematic analysis of feature importance is performed with a manually-labeled dataset containing nine cities located in Europe. Based on the investigation of the data and feature choice, we propose a framework to fully exploit the available datasets. The results show that GUF, OSM and NTL can contribute to the classification accuracy of some LCZs with relatively few samples, and it is suggested that Landsat-8 and Sentinel-2 spectral reflectances should be jointly used, for example in a majority voting manner, as proven by the improvement from the proposed framework, for large-scale LCZ mapping.
Article
Full-text available
Terrestrial landscape has complex three-dimensional (3D) features that are difficult to extract using traditional methods based on 2D representations. These methods often relegate such features to raster or metric-based (two-dimensional) representations based on Digital Surface Models (DSM) or Digital Elevation Models (DEM), and thus are not suitable for resolving morphological and intensity features for fine-scale land cover mapping. Small-footprint LiDAR provides an ideal way for capturing these 3D features. This research develops a novel method of integrating airborne LiDAR derived features and multi-temporal Landsat images to classify land cover types. We tested our approach in Williamson County, Illinois, which has diverse and mixed landscape features. Specifically, our method applied a 3D convolutional neural network (CNN) approach to extract features from LiDAR point clouds by (1) creating an occupancy grid, an intensity grid at 1-meter resolution, and then (2) normalizing and incorporating data into the 3D CNN. The extracted features (e.g., morphological and intensity features) from the 3D CNN were finally combined with multi-temporal spectral data to enhance the performance of land cover classification based on a Support Vector Machine classifier. Visual interpretation from both hyper-resolution photos and point clouds was used for training and preparation of testing data. The classification results show that our method outperforms a traditional method by 2.65% (from 81.52% to 84.17%) when solely using LiDAR and 2.19% (from 90.20% to 92.57%) when combining all available imageries. We demonstrate that our method can effectively extract LiDAR features and improve fine-scale land cover mapping through fusion of complementary types of remote sensing data.
Article
Fine-grained semantic segmentation results are typically difficult to obtain for subdecimeter aerial imagery segmentation as a result of complex remote sensing content and optical conditions. Recently, convolutional neural networks (CNNs) have shown outstanding performance on this task. Although many deep neural network structures and techniques have been applied to improve accuracy, few have attended to improving the differentiation of easily confused classes. In this paper, we propose TreeUNet, a tool that uses an adaptive network to increase the classification rate at the pixel level. Specifically, based on a deep semantic model infrastructure, a Tree-CNN block in which each node represents a ResNeXt unit is constructed adaptively in accordance with the confusion matrix and the proposed TreeCutting algorithm. By transmitting feature maps through concatenating connections, the Tree-CNN block fuses multiscale features and learns best weights for the model. In experiments on the ISPRS two-dimensional Vaihingen and Potsdam semantic labelling datasets, the results obtained by TreeUNet are competitive among published state-of-the-art methods. Detailed comparison and analysis show that the improvement brought by the adaptive Tree-CNN block is significant.
Article
The World Urban Database and Access Portal Tools (WUDAPT) project has grown out of the need for better information on the form and function of cities globally. Cities are described using Local Climate Zones (LCZ), which are associated with a range of key urban climate model parameters and thus can serve as inputs to high resolution urban climate models. We refer to this as level 0 data for each city. The LCZ level 0 product is produced using freely available Landsat imagery, crowdsourced training areas from the community, and the open source SAGA software. This paper outlines the protocol by which LCZ maps generated by different members of the community are produced and evaluated. In particular, the quality assessment comprises cross-validation, review, and cross-comparison with other data sets. To date, the results from the different quality assessments show that the LCZ maps are generally of moderate quality, i.e. 50–60% overall accuracy (OA), but this is much higher when considering all built-up classes together or using weights that take the morphological and climatic similarity of certain classes into account. The training data contributed by researchers from around the world also vary in quality and in the interpretation of the landscape, which affects the final quality of the LCZ maps. The acceptable level of quality needed will depend heavily on the application of the data. However, initial modelling studies that use the level 0 products as inputs showed improved performance in simulating the urban climate when replacing the default surface descriptions with the WUDAPT level 0 data. This is also promising for the application of level 0 data in regional and global climate and weather models and supports the assumption that the current level 0 products are already of sufficient quality for certain applications. Moreover, there are various ongoing developments to improve the methods used to produce LCZ maps and their accuracy.