ArticlePDF Available

Comparison between convolutional neural networks and random forest for local climate zone classification in mega urban areas using Landsat images

September 2019
ISPRS Journal of Photogrammetry and Remote Sensing 157:155-170

September 2019
157:155-170

DOI:10.1016/j.isprsjprs.2019.09.009

Authors:

Cheolhee Yoo

The Hong Kong Polytechnic University

Daehyeon Han

Ulsan National Institute of Science and Technology

Jungho Im

Ulsan National Institute of Science and Technology

Benjamin Bechtel

Ruhr-Universität Bochum

The Local Climate Zone (LCZ) scheme is a classification system providing a standardization framework to present the characteristics of urban forms and functions, especially for urban heat island (UHI) research. Landsat-based 100 m resolution LCZ maps have been classified by the World Urban Database and Portal Tool (WUDAPT) method using a random forest (RF) machine learning classifier. Some studies have proposed modified RF and convolutional neural network (CNN) approaches. This study aims to compare CNN with an RF classifier for LCZ mapping in great detail. We designed five schemes (three RF-based schemes (S1-S3) and two CNN-based ones (S4-S5)), which consist of various combinations of input features from bitemporal Landsat 8 data over four global mega cities: Rome, Hong Kong, Madrid, and Chicago. Among the five schemes, the CNN-based one with the incorporation of a larger neighborhood information showed the best classification performance. When compared to the WUDAPT workflow, the overall accuracies for entire land cover classes (OA) and for urban LCZ types (i.e., LCZ1-10; OA urb) increased by about 6-8% and 10-13%, respectively, for the four cities. The trans-ferability of LCZ models for the four cities were evaluated, showing that CNN consistently resulted in higher accuracy (increased by about 7-18% and 18-29% for OA and OA urb , respectively) than RF. This study revealed that the CNN classifier classified particularly well for the specific LCZ classes in which buildings were mixed with trees or buildings or plants were sparsely distributed. The research findings can provide a basis for guidance of future LCZ classification using deep learning.

The local climate zone (LCZ) types identified in urban climate research (from Bechtel et al., 2017 after Stewart and Oke, 2012), © CC-BY 4.0.

…

Study area and Local Climate Zone (LCZ) reference data with legends.

…

Results of McNemar's Chi-squared test. The orange squares indicate the significant accuracy difference at the 99% confidence level, while the blue ones at the 95% confidence level. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

…

Confusion matrices of the most accurate model among the 10-time runs of S3, the best scheme among the RF-based schemes, and S5, the best of the CNNbased schemes for Rome. * indicates the red-star classes. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

…

Selected winter and summer Landsat 8 scenes for each city.

…

Figures - uploaded by Cheolhee Yoo

Content may be subject to copyright.

Content uploaded by Cheolhee Yoo

Content may be subject to copyright.

Contents lists available at ScienceDirect

ISPRS Journal of Photogrammetry and Remote Sensing

journal homepage: www.elsevier.com/locate/isprsjprs

Comparison between convolutional neural networks and random forest for

local climate zone classiﬁcation in mega urban areas using Landsat images

Cheolhee Yoo

, Daehyeon Han

, Jungho Im

a,⁎

, Benjamin Bechtel

School of Urban and Environmental Engineering, Ulsan National Institute of Science and Technology (UNIST), Ulsan 44919, South Korea

Department of Geography, Ruhr-University Bochum, Bochum 44801, Germany

ARTICLE INFO

Keywords:

Local climate zone

Convolutional neural networks

Random forest

Urban climate

Landsat

ABSTRACT

The Local Climate Zone (LCZ) scheme is a classiﬁcation system providing a standardization framework to present

the characteristics of urban forms and functions, especially for urban heat island (UHI) research. Landsat-based

100 m resolution LCZ maps have been classiﬁed by the World Urban Database and Portal Tool (WUDAPT)

method using a random forest (RF) machine learning classiﬁer. Some studies have proposed modiﬁed RF and

convolutional neural network (CNN) approaches. This study aims to compare CNN with an RF classiﬁer for LCZ

mapping in great detail. We designed ﬁve schemes (three RF-based schemes (S1–S3) and two CNN-based ones

(S4–S5)), which consist of various combinations of input features from bitemporal Landsat 8 data over four

global mega cities: Rome, Hong Kong, Madrid, and Chicago. Among the ﬁve schemes, the CNN-based one with

the incorporation of a larger neighborhood information showed the best classiﬁcation performance. When

compared to the WUDAPT workﬂow, the overall accuracies for entire land cover classes (OA) and for urban LCZ

types (i.e., LCZ1-10; OA

urb

) increased by about 6–8% and 10–13%, respectively, for the four cities. The trans-

ferability of LCZ models for the four cities were evaluated, showing that CNN consistently resulted in higher

accuracy (increased by about 7–18% and 18–29% for OA and OA

urb

, respectively) than RF. This study revealed

that the CNN classiﬁer classiﬁed particularly well for the speciﬁc LCZ classes in which buildings were mixed with

trees or buildings or plants were sparsely distributed. The research ﬁndings can provide a basis for guidance of

future LCZ classiﬁcation using deep learning.

1. Introduction

Although the ratio of urban areas to global land surface is just 3%,

about 54% of the world's population live in urban centers; by 2050, that

number will increase to nearly 65% (Cohen, 2015). Urbanization results

in the increased absorption of solar radiation due to the expanded

impervious area, the reduced sky view factor due to the greater number

of (high-rise) buildings, and the release of artiﬁcial heat in the urban

canyon especially in mega cities (Barnes et al., 2001; Giridharan et al.,

2004; Han-qiu and Ben-qing, 2004; Rizwan et al., 2008). The urban

heat island phenomenon (UHI), that is urban areas are warmer than the

surrounding areas, is important these days as it interacts with other

urban climate problems, such as heat waves and air pollution (Founda

and Santamouris, 2017; Salata et al., 2017; Yadav et al., 2017;

Fallmann et al., 2016). Diﬀerent types of UHIs need to be diﬀerentiated,

most importantly the surface temperature UHI (SUHI) and the air

temperature UHI in the canopy layer, which is from the ground to the

height of buildings.

Traditionally UHI studies analyze the temperature diﬀerence be-

tween urban and rural areas. These can be diﬀerentiated by satellite-

based land cover data based on speciﬁc class types (i.e., typical land

cover classiﬁcation), is one of the possible solutions. Typical global land

cover data used in existing UHI studies include the 500 m resolution

MODIS land cover product (MCD12Q1) (Friedl et al., 2010), the 300 m

resolution GlobCover 2009 dataset produced by ESA (Bontemps et al.,

2011), and the Global Land Cover product (GLC or GlobalLand30)

produced by Chen et al. (2015) with Landsat data for 30 m resolution

(Mathew et al., 2018; Lauwaet et al., 2015; Liu et al., 2018b). However,

these products have only one urban land cover class: “urban and built-

up class” in MODIS, “artiﬁcial surfaces and associated areas” in Glob-

Cover 2009, and “artiﬁcial surfaces” in GlobalLand30. Stewart and Oke

(2012), however, explained that the thermal properties of urban areas

vary with the height and density of the buildings in them. Thus, there is

a limit to analyzing the detailed UHI characteristics of a city using

global land cover products that have a single urban class.

In fact, many countries have produced their own detailed land cover

https://doi.org/10.1016/j.isprsjprs.2019.09.009

Received 6 February 2019; Received in revised form 11 September 2019; Accepted 12 September 2019

⁎

Corresponding author.

E-mail address: ersgis@unist.ac.kr (J. Im).

ISPRS Journal of Photogrammetry and Remote Sensing 157 (2019) 155–170

data with at least several urban type classes. The national land cover

product of the United States (NLCD2011), for example, has a total of 20

land cover classes and four of them are urban types based on the degree

of development (i.e., high intensity, medium intensity, low intensity,

and open space). The European CORINE (Co-ORdinated INformation on

the Environment) land cover has 11 urban-related classes in its level 3

product. The Urban Atlas product also provides high-resolution land

use maps of urban areas in European countries. Because urban classes

vary by product, the use of the urban classes for studying global heat

phenomena is relatively limited. Since the classiﬁcation criteria of these

products, such as NLCD2011, CORINE and Urban Atlas, focus only on

the density of the impervious areas with consideration of land use in-

formation, the factors strongly linked to UHI—including the sky view

factor and building height to street width ratio—were barely considered

when the products were generated.

To overcome such an issue, researchers in the UHI ﬁeld have de-

signed a classiﬁcation system that well ﬁts this purpose. Local Climate

Zone (LCZ) is a classiﬁcation system designed by Stewart and Oke

(2012) especially for UHI research. The LCZ consists of 10 urban LCZ

types and 7 natural LCZ types. It has a culturally neutral framework

which is generic and easy to understand for global urban climate stu-

dies (Fig. 1). Bechtel et al. (2015) devised a World Urban Database and

Portal Tool (WUDAPT) method to construct a 100 m resolution pixel-

based LCZ map using Landsat 8 images. Landsat 8 is a polar orbiting

satellite sensor system that can capture global areas with a resolution of

30 m (for visible, NIR, and SWIR bands) to 100 m (for thermal bands)

every 16 days. The WUDAPT method resamples the Landsat image of

each city into 100 m resolution (i.e., using the zonal mean) to get the

spectral information of local-scale urban structures. Local experts with

deep knowledge of individual cities build LCZ reference polygons using

high resolution Google Earth images. These polygons are then con-

verted into 100 m resolution pixels and used for training and testing

LCZ classiﬁcation models with Landsat images. WUDAPT uses random

forest (RF), a rule-based machine learning approach, for classiﬁcation.

The LCZ maps of many cities all over the globe (about 90 cities as of

August 2018) have been built in this way and shared through the

WUDAPT portal (http://www.wudapt.org) (Bechtel et al., 2019).

The LCZ maps produced by the WUDAPT method have been used to

ﬁnd several key parameters that aﬀect UHI (Giridharan and Emmanuel,

2018; Kaloustian and Bechtel, 2016). Land surface temperature and air

temperature have been analyzed for LCZ classes (Beck et al., 2018;

Wang et al., 2018; Cai et al., 2018). Furthermore, the eﬀect of re-

spiratory particulate matter on land surface temperature has been dis-

cussed using various LCZ classes (Ziaul and Pal, 2018). The WUDAPT-

based LCZ maps, however, are still limited in terms of classiﬁcation

accuracy. The average Overall Accuracy (OA) of the 90 LCZs uploaded

on the WUDAPT portal is 74.5%, leaving much room for improvement.

In particular, the average OA of the urban LCZ types (OA

urb

) of the 90

LCZs is just 59.3%, which means that the urban LCZ types are not as

accurate as the other general natural LCZ types such as forest and

water. The low classiﬁcation accuracy of urban features (i.e., urban LCZ

types) is a major limitation for urban climate-related research.

Therefore, the WUDAPT community has encouraged scientists to

explore various classiﬁcation approaches to further improve LCZ clas-

siﬁcation (Yokoya et al., 2018). For example, Danylo et al. (2016)

added various spectral metrics (i.e., zonal maximum and minimum) to

the input variables of the RF classiﬁer. Their OA improved by 2% when

compared with the traditional WUDAPT method for LCZ classiﬁcation

in Kiev, Ukraine. Verdonck et al. (2017) extracted the spectral in-

formation (i.e., mean, minimum, maximum, median, and 25th and 75th

quantile values) of neighboring pixels through a moving window ap-

proach. These six new features were used as input variables in the RF

machine learning model. The OA of the LCZ classiﬁcation of Antwerp,

Brussels, and Ghent in Belgium were improved by 7.9%, 13.0%, and

5.4%, respectively, when compared to the original WUDAPT method.

These studies improved LCZ classiﬁcation by using additional input

variables in a way that got more spectral features on a contextual do-

main into the RF classiﬁer.

In recent years, deep learning models which exploit many layers of

non-linear information have been widely used for image classiﬁcation,

object segmentation, and text determination (Schmidhuber, 2015;

LeCun et al., 2015; Wang et al., 2012). Among various deep learning

models, Convolutional Neural Networks (CNN) has been shown to ex-

hibit high performance in image classiﬁcation tasks (Krizhevsky et al.,

2012; Vedaldi and Lenc, 2015; Kim et al., 2018b). CNN, a feedforward

network with feature learning, extracts inherent spatial features at each

layer. Theoretically, CNN has the ability of self-study and in-depth

learning for feature extraction, weight sharing and dimension reduction

Fig. 1. The local climate zone (LCZ) types identiﬁed in urban climate research (from Bechtel et al., 2017 after Stewart and Oke, 2012), © CC-BY 4.0.

C. Yoo, et al. ISPRS Journal of Photogrammetry and Remote Sensing 157 (2019) 155–170

156

by combining a backpropagation mechanism and a gradient descent

optimization method. Back propagation gives an opportunity for

backward feedback to enhance the reliability, and the gradient descent

method is used in the self-training process.

Numerous studies have used CNN for land cover classiﬁcation from

satellite images (Paoletti et al., 2018; Xu et al., 2018; Marcos et al.,

2018), including recent applications for LCZ classiﬁcation. Sukhanov

et al. (2017) designed a multi-level ensemble model combining RF,

Gradient Boosting Machines, and a simple CNN with small input data

size (i.e., 3 × 3) to create LCZ maps, which was trained for ﬁve cites

(i.e., Berlin, Rome, Paris, Sao Paulo and Hong Kong) and then tested

over four diﬀerent cities (i.e., Amsterdam, Chicago, Madrid and Xi’an).

Qiu et al. (2018) used a residual convolutional neural network (ResNet)

to conduct a systematic analysis of feature importance from multi-

source datasets for LCZ classiﬁcation across 9 cities located in Europe.

Since RF is the most successfully used LCZ classiﬁer so far, it is im-

portant to know the advantages and disadvantages of using CNN over

RF for LCZ classiﬁcation. However, there has been minimum explora-

tion investigating LCZ classiﬁcation performance between the CNN and

RF classiﬁers.

This study aims to compare CNN with the RF classiﬁer for LCZ

classiﬁcation. We designed ﬁve schemes, which consist of various

combinations of input data over four global mega cities: Rome, Hong

Kong, Madrid and Chicago. The objectives of this research were to: (1)

examine ﬁve schemes in order to identify the eﬀect of CNN when

compared to other methods that employ RF classiﬁers, which were

proposed in previous studies; (2) investigate a speciﬁc set of LCZ classes

that produce high classiﬁcation accuracies; (3) compare the LCZ map

generated from two diﬀerent types of classiﬁers with reference data;

and (4) discuss the research direction of improving local climate zone

classiﬁcation methods for future use.

2. Study area and data

2.1. Study area

Rome, Hong Kong, Madrid, and Chicago were selected as our study

areas (Fig. 2). These four cities represent various climatic (Table 1) and

geographic characteristics. In addition, their urban structure diﬀers,

which enables us to verify the robustness of the proposed approaches.

Rome, the capital city of Italy, is in the midwestern region of the

Italian peninsula, and the center of the city is about 24 km inland from

the Mediterranean Sea. Rome has about 2.9 million residents living

within an area of 1,285 km

, making it Italy’s largest and most populous

city. The city has a monocentric urban structure with increasing den-

sities toward the city center.

Hong Kong is located on the southern coast of China. The city covers

about 1,104 km

of land, with 7.4 million residents. Hong Kong is

known for its unique urban form and high-density land use. Most areas

of the city are hilly, and just under a quarter of the study domain is

habitable (i.e., built-up area).

Madrid is the capital city of Spain, a densely populated metropolis

located in a relatively ﬂat area lying in the center of the southern

Meseta of the Iberian Peninsula. Madrid is the largest city in Spain, with

3.2 million residents living in 604 km

. We selected the study region

covering the Madrid metropolitan area, comprising monocentric

Madrid and its surrounding municipalities called autonomous com-

munities.

Chicago is the third largest city in the US, situated beside the huge

Lake Michigan in Illinois. The city of Chicago has about 2.7 million

residents in an area of 606 km

. Chicago tends to have a regularly

shaped street pattern and city blocks based on their grid plan (Ellickson,

2012). We selected the study region that includes the Chicago me-

tropolitan area, comprising the city and its suburbs. The high-density

urban center is located in the city of Chicago, while low-density sub-

urban areas surrounding it.

2.2. Satellite input data

Two Landsat 8 images of diﬀerent seasons for each city were

downloaded from the US Geological Survey Earth Explorer site

(https://earthexplorer.usgs.gov). The acquisition dates with clear sky

conditions for the Landsat data are presented in Table 2. We chose two

scenes per city close to summer and winter to consider seasonal eﬀects,

such as the phenology of vegetation, and to increase classiﬁcation ac-

curacy, as found by Bechtel et al. (2015). All Landsat images were ﬁrst

clipped covering each city and then atmospheric-corrected into scaled

reﬂectance data using ENVI Fast Line-of-sight Atmospheric Analysis of

Hypercubes (FLAASH). Nine of the 11 bands (bands 1–7, 10, and 11) in

each Landsat 8 scene were used as input data. Bands 1–7 were the 30 m

resolution Operational Land Imager (OLI) spectral bands, and bands 10

and 11 were 30 m resolution thermal bands interpolated from 100 m

resolution data collected from Thermal Infrared Sensor (TIRS).

2.3. Reference data

LCZ reference data for the four cities are available from the 2017

IEEE GRSS data fusion contest organized by the Image Analysis and

Data Fusion Technical Committee, in collaboration with WUDAPT and

GeoWiki (Fig. 2). These data were extracted from the WUDAPT data-

base and further revised to be as accurate as possible (Tuia et al., 2017;

Yokoya et al, 2018). Due to unique urban structures and compositions,

the number of LCZ classes diﬀers from city to city. Rome has 10 LCZ

classes (6 urban LCZ types and 4 natural LCZ types); Hong Kong has 13

LCZ classes (8 urban LCZ types and 5 natural LCZ types); Madrid has 14

LCZ classes (7 urban LCZ types and 7 natural LCZ types); and Chicago

has 15 LCZ classes (9 urban LCZ types and 6 natural LCZ types). In

addition, the number of polygons digitized for each LCZ class diﬀers

between both classes and cities. The polygons of each LCZ class were

randomly divided into two parts: the ﬁrst for training the models and

the other for testing them. We tried to equally divide the polygons into

the two sets, considering both the number of polygons and the number

of 100 m resolution LCZ pixels within each polygon. It is well known

that if the training and validation sample pixels share the same poly-

gons, the classiﬁcation accuracy can be inﬂated (Zhen et al., 2013).

Some LCZ classes in each city, however, form a small number of

polygons (fewer than 3), because the classes were not widely dis-

tributed within the city. Dividing these small numbers of polygons into

two sets would make the models poorly trained. Therefore, we labeled

these classes “red-star class”. For the red-star classes, two sets were

created by dividing the number of pixels of each polygon into two

groups through a random sampling approach. The number of polygons

and pixels of the two sets for each LCZ class for the four cities are shown

in Table 3.

The Global Man-made Impervious Surface (GMIS) data were used to

analyze the LCZ maps generated for each city. GMIS provides the 30 m

resolution global fractional impervious cover for the year of 2010,

which were derived from Landsat data (de Colstoun et al., 2017). To

identify the medium-to-high density developed areas, we extracted the

GMIS pixels which have an impervious fraction over 70% within the

study domain for each city.

3. Methods

3.1. Random forest (RF) classiﬁer

RF has been widely used in the remote sensing ﬁeld for both clas-

siﬁcation (Sim et al., 2018; Park et al., 2018; Li et al., 2013) and re-

gression (Lee et al., 2018; Yoo et al., 2018; Richardson et al., 2017). RF

is an algorithm based on classiﬁcation and regression trees (CART),

which uses a recursive binary split method to reach ﬁnal nodes in a tree

structure (Breiman, 2001). RF produces numerous independent trees

with randomly selected subsets through bootstrapping from training

C. Yoo, et al. ISPRS Journal of Photogrammetry and Remote Sensing 157 (2019) 155–170

157

samples and from input variables at every node of a tree. To achieve a

ﬁnal decision, RF adopts an ensemble approach from numerous trees

through majority voting for classiﬁcation.

In this study, the RF was implemented using a random forest

package provided in R software (https://www.r-project.org/). All

parameters except for the number of trees were set as the default values

provided by the package (i.e., the number of training samples for each

tree was 66.7% of the entire training samples, the number of randomly

sampled variables as candidates at each split was the square root of the

number of input variables, and the minimum size of the terminal node

was 1). The number of trees (i.e., ntree) was selected at the modeling

process.

3.2. Convolutional neural networks (CNN) classiﬁer

CNN is a kind of artiﬁcial neural network and basically consists of

convolutional layers, pooling layers, and fully connected layers. When

compared to typical neural networks, the distinguishing feature of CNN

is its use of convolutional layers. With the 3-dimensional input data

(width, height, and channel), the output of a convolutional layer is

transmitted to the next layer keeping the same 3-dimensional shape.

The input and output data of the convolutional layer are called feature

maps. The convolution is performed with several ﬁlters (or kernels)

over the input feature maps. Each moving ﬁlter sweeps the input fea-

ture maps conducting a dot-product with corresponding elements of the

input feature maps, and then the total sum is obtained. The depth of the

output feature maps is no longer the number of channels but the

number of ﬁlters. For example, when 32 ﬁlters are used in the ﬁrst

convolutional layer, the output feature map has a depth of 32 regardless

of the number of channels in the input feature map.

Convolution reduces the size of the output feature maps. To prevent

this, padding is widely used. Padding refers to ﬁlling the input feature

maps with a speciﬁc value before doing the convolution. Padding is

mainly used to adjust the spatial size of the output feature maps. The

value to be ﬁlled can be determined according to the model, but zero-

Fig. 2. Study area and Local Climate Zone (LCZ) reference data with legends.

Table 1

The climatic characteristics of the cities. The classes in parentheses correspond to the Köppen-Geiger climate classiﬁcation (Peel et al., 2007).

City Description of climate

Rome Mediterranean climate with dry summers and cool, humid winters (Csa)

Hong Kong Humid subtropical climate with a hot and humid summer (Cfa)

Madrid Inland Mediterranean climate, transitioning to a semi-arid climate in the eastern part of the city (Csa)

Chicago Hot humid continental climate with distinct seasons such as warm to hot and humid summers and cold, snowy winters (Dfa)

Table 2

Selected winter and summer Landsat 8 scenes for each city.

Scene 1 Scene 2

Rome January 11, 2017 August 23, 2017

Hong Kong February 12, 2018 October 23, 2017

Madrid January 12, 2015 August 13, 2017

Chicago February 03, 2017 September 12, 2016

C. Yoo, et al. ISPRS Journal of Photogrammetry and Remote Sensing 157 (2019) 155–170

158

padding is widely used in various applications. Once feature maps are

extracted through the convolutional layers, generally sub-sampling is

conducted to reduce the size of data. This downsampling process is

called as pooling. Pooling locally summarizes the output of the previous

layer making translation invariance, which focuses on the presence of

the feature rather than the location (Goodfellow et al., 2016). In ad-

dition, pooling helps to avoid the overﬁtting problem by making the

model simpler. The number of weights to be optimized is signiﬁcantly

reduced at the pooling layers, creating a lower computational cost. Max

pooling is commonly used based on the concept that the maximum

values of a feature map can represent local features (Zhou and

Chellappa, 1988). Finally, fully connected layers are used as the clas-

siﬁer using ﬁnal output feature maps. By using the features from pre-

vious layers, fully connected layers determine the ﬁnal class with the

highest probability using a softmax function. It is a commonly used

classiﬁer in multi-class classiﬁcation problems in neural networks

(Goodfellow et al., 2016; Yu et al., 2017; Kim et al., 2018a). Fully

connected layers consist of a set of weights to be optimized for a node.

By using the features from previous layers, fully connected layers de-

termine the ﬁnal class with the highest probability using a softmax

function. It is a commonly used classiﬁer in multi-class classiﬁcation

problems in neural networks (Goodfellow et al., 2016; Yu et al., 2017;

Kim et al., 2018a). Fully connected layers consist of a set of weights to

be optimized for a node.

An activation function converts the sum of input data into an output

result. To get the beneﬁt of multiple layers on a neural network, it is

essential to use a nonlinear activation function. The rectiﬁed linear unit

(ReLU) is the most popular activation function in deep learning for its

excellent performance with a relatively simple structure (Glorot et al.,

2011; LeCun et al., 2015).

All of the weights, such as ﬁlters in convolutional layers and nodes

in the fully connected layers, are randomly initialized. By reducing the

error between the estimated result and reference data, weights are

gradually optimized. This iterative process is called backpropagation,

which calculates the derivative of the error function to ﬁnd the

minimum error (Rumerlhar, 1986; Goodfellow et al., 2016). All of the

weights are updated by the optimization method using the calculated

gradient.

In this study CNN was implemented using the Keras open-source

library. There are many ways to construct the CNN architecture.

Therefore, it is important to ﬁnd an optimal model that works well with

data considering their characteristics. Unfortunately, there is no way to

directly ﬁnd an optimal model in deep learning. A multitude of tests is

typically conducted to ﬁnd the optimal CNN parameters considering

performance and eﬃciency. In this study, 32, 64, 128 and 256 ﬁlters at

convolutional layers were tested to determine an optimal structure. We

ﬁnally constructed a CNN model, which consisted of four convolutional

layers with 32 3 × 3-sized ﬁlters. The ReLU activation function was

adopted at each layer. Max pooling with a 2 × 2 window and a stride of

2 was performed after the second and fourth convolutional layers. A

fully connected layer with 256 nodes was applied after the convolu-

tional and max-pooling layers. A soft-max function was used to classify

the LCZ type. The adaptive moment estimation (ADAM) optimizer was

used to minimize the error function, which is typically used in neural

Table 3

Training and test datasets of each LCZ type by city. The values in the training and test columns are the number of polygons. The number of the corresponding 100 m

resolution pixels is shown in parentheses. * is allocated to the red-star classes, which have only a few reference polygons of the LCZ classes. The LCZ ﬁgures in the left

column are from Stewart and Oke (2012).

LCZ Rome Hong Kong Madrid Chicago

Training Test Training Test Training Test Training Test

1– 13 (318) 13 (313) – 2* (228)

213 (775) 12 (776) 6 (112) 5 (67) 12 (1567) 5 (5647) 2* (126)

32* (104) 7 (195) 7 (131) 1* (92) 3 (128) 3 (123)

4– 9 (383) 10 (290) 3* (305) 2* (140)

511 (749) 12 (746) 4 (76) 4 (50) 5 (715) 3 (620) 2* (104)

63 (239) 4 (241) 7 (64) 6 (56) 6 (932) 6 (894) 10 (2059) 12 (1901)

7– – – –

87 (235) 4 (194) 4 (86) 5 (51) 10 (1433) 12 (1380) 11 (2231) 10 (2296)

9– – 1* (82) 4 (422) 3 (429)

10 2* (51) 5 (109) 4 (110) – 2 (238) 2 (227)

A2 (146) 3 (138) 7 (832) 7 (784) 1 (1115) 3 (244) 6 (515) 7 (4 8 8)

B2 (293) 3 (262) 7 (207) 6 (200) 8 (1906) 8 (1888) 5 (188) 5 (1 53)

C– 5 (379) 4 (312) 2 (982) 2 (250) –

D4 (512) 3 (472) 6 (332) 6 (236) 9 (3621) 6 (3517) 4 (1150) 4 (1227)

E– – 3 (324) 2 (312) 4 (115) 3 (86)

F– – 1* (304) 3 (31) 2 (28)

G3* (485) 5 (1282) 4 (1054) 2 (391) 2 (385) 5 (967) 4 (984)

C. Yoo, et al.

ISPRS Journal of Photogrammetry and Remote Sensing 157 (2019) 155–170

159

networks especially for the classiﬁcation task (Kingma and Ba, 2014). A

graphics processor unit (GPU) of Nvidia GTX 1080Ti with 11 GB

memory was used to speed up model training with 256 batch size and

1000 epochs. The ﬁnal CNN structure used in this study is shown in

Fig. 3.

3.3. Classiﬁcation scheme design

To produce a 100 m resolution pixel–based LCZ map from Landsat

images, this study used two classiﬁers— RF and CNN. We designed ﬁve

classiﬁcation schemes (three RF-based schemes (S1–S3) and two CNN-

based ones (S4–S5) with diﬀerent input features and classiﬁers (Fig. 4)).

3.3.1. Benchmark RF-based schemes (S1–S3)

RF is the classiﬁer adopted by the existing LCZ classiﬁcation com-

munity, including the WUDAPT method. We designed three schemes

(S1–S3) with RF, based on the benchmark of existing studies. S1 cor-

responds to the existing WUDAPT method. The 30 m resolution Landsat

images were bilinearly resampled to 10 m resolution, then resampled to

100 m resolution by a zonal mean function based on the LCZ grid area.

S2 benchmarked the method proposed by Danylo et al. (2016), which

achieved an increase in the classiﬁcation accuracy by adding more

spectral information to the WUDAPT model as input variables. The

10 m bilinear resampled Landsat images were resampled to 100 m, not

only by zonal mean but also by maximum and minimum within the LCZ

grid area. The three features were constructed for each Landsat band in

S2. S3 benchmarked the method suggested by Verdonck et al. (2017).

To consider the contextual characteristics of a feature, the mean,

minimum, maximum, median, 25th and 75th quantile values of the

nine pixels in a 3 × 3 window (i.e., one center pixel and its surrounding

eight pixels) were calculated from 100-meter zonal-mean Landsat

images. In each scheme, we used the features constructed from 18

bands (i.e., 9 bands for one scene) of two Landsat images in (or very

close to) the winter and summer seasons (Table 2) as input variables. In

summary, the number of input variables of each scheme was: 18, 54,

and 108 for S1, S2, and S3, respectively (Table 4). We extracted the

pixel values of the input variables at the location corresponding to LCZ

reference pixels in each scheme.

3.3.2. CNN-based schemes (S4–S5)

We proposed two diﬀerent schemes based on CNN. The 30 m re-

solution Landsat images were bilinearly resampled to 10 m, allowing

100 (10 × 10) pixels to be placed in a single 100 m LCZ grid. Each 10 m

resolution image was normalized using the min–max approach, to re-

duce training time (Ba et al., 2016). In the case of S4, the 10 × 10 size

features of 10 m resolution Landsat images in each LCZ reference pixel

Fig. 3. The structure of CNN we designed in this study. N indicates the size of input image (i.e., a 10 × 10 size image has N of 10). The k in the last output means the

number of LCZ types to be classiﬁed for each city.

Fig. 4. The schematic process ﬂow showing how to prepare the input features for each scheme.

C. Yoo, et al. ISPRS Journal of Photogrammetry and Remote Sensing 157 (2019) 155–170

160

area were extracted and fed into CNN. The ﬁnal scheme (S5) takes into

consideration the surrounding area of a focus pixel (i.e., the same area

of the moving window in S3). We extracted the 30 × 30 size features of

10 m resolution Landsat images and fed them into the CNN classiﬁer. In

summary, the S4 has a 10 × 10–sized 10 m resolution feature for each

band, while the S5 has a 30 × 30–sized 10 m resolution feature for each

band as input variables (Table 4). After the 10 m resolution images

were fed into the CNN model, the fully connected layers could make a

ﬁnal decision of one LCZ class for each image in order to produce a

100 m resolution LCZ map. The number of trainable parameters of the

S4 and S5 for the four cities are summarized in Table S1.

3.4. Modelling and accuracy assessment

A randomly selected 90% of the training samples (i.e., training set)

were used to train the models and the remaining 10% were used to

identify the optimum parameter values for the models. Through this

process, we selected the optimal number of RF trees (i.e., ntree) within

100–1000 based on overall accuracy (OA) for each RF-based scheme

(S1–3). In the case of the CNN-based schemes (S4–5), the model re-

sulting in the best accuracy based on the 10% samples in 1000 epochs

was selected. We ran the models ten times for each scheme to examine

the robustness of the methods, and assessed accuracy using the separate

test datasets (Table 3). For an assessment of accuracy, we used not only

OA but also OA

urb

, which is the accuracy among the urban LCZ types

(LCZs 1–10) and OA

nat

, which is the accuracy among the natural LCZ

types (LCZs A–G). In addition, we obtained the F1-score (Eq. (1)) from

user’s accuracy (UA) and producer’s accuracy (PA) of each LCZ class to

further examine the classiﬁcation accuracy by class. As the F1-score is

the harmonic mean of UA and PA, the score is not only an indicator of

the classiﬁcation capability but also able to explain how similar the two

values (i.e., UA and PA) are (Sokolova and Lapalme, 2009).

× ×

F1 (2 UA PA)

(UA PA)

(1)

Finally, we selected one model among the 10 simulated models to

map LCZ for each city, based on the highest value of the sum of OA and

urb

for each scheme. We also conducted McNemar's test to evaluate

the signiﬁcance of the diﬀerences in the classiﬁcation results by

scheme.

3.5. Transferability experiments

We further compared the transferability between CNN and RF

classiﬁers by applying the LCZ models developed for three cities to the

remaining city based on the best performing RF and CNN models from

the experiment of individual cities. In other words, reference data of

one city was used to evaluate the transferability of the LCZ model that

was developed using reference data of the other three cities as training

samples shown in Table 3. The procedure for designing the models for

transferability test is the same as that documented in Section 3.4.

Considering the diﬀerent LCZ types by city, only the LCZ labels be-

longing to the test city were selected when training the LCZ models.

4. Results and discussion

4.1. Overall performance of the schemes

Table 5 shows the accuracy assessment results of the ﬁve schemes

for four cities. When compared to S1 (i.e., the WUDAPT method), S2

showed an increase of OA of 2% for Madrid and of 3% for Chicago,

which agrees with the ﬁndings from Danylo et al. (2016). Moreover, it

should be noted that the OA

urb

of S2 signiﬁcantly increased when

compared to that of S1 for Hong Kong, Madrid, and Chicago. Interest-

ingly, the OA

nat

did not signiﬁcantly increase for all cities in S2. This

suggests that putting various spectral information (i.e., maximum and

minimum) as input variables might contribute to the increase in the

accuracy for urban LCZ types (i.e., LCZ1–10), which have more het-

erogeneous spectral characteristics.

For Hong Kong, while the OA

urb

of S2 was higher than that of S1,

the OA

nat

of S2 is lower than that of S1. For natural LCZ types (i.e.,

LCZA–Z) in Hong Kong, adding manually extracted features as input

data rather decreased the accuracy. Moreover, there was no signiﬁcant

accuracy diﬀerence between S1 and S2 for Rome. This implies that

including more contextual information as input variables in the RF does

not always guarantee improving classiﬁcation accuracy.

Unlike S1 and S2, where we manually selected input features, the

CNN-based S4 can automatically learn multi-level features from the

original input images. It is not surprising, therefore, that S4 shows

higher OA value than S1 for all four cities. In addition, S4 showed

higher OA than S2, except for Chicago; one possible reason is that the

added contextual information in S2 was meaningful enough to improve

accuracy in Chicago where the city blocks have regular arrangements.

Athiwaratkun and Kang (2015) showed that using well-learned features

as input variables in RF yielded higher accuracy than CNN.

The inﬂuence of considering neighborhood pixels as input features

is seen in both RF- and CNN-based schemes. S3 produced the highest

OA value among the three RF-based schemes (S1–3), which is con-

sistent with Verdonck et al. (2017). The CNN-based S5, with 30 × 30-

sized input features, showed the highest accuracy among the ﬁve

schemes, by increasing OA in all cities by 5–8% when compared to the

Table 4

Summary of each scheme with input feature types.

Scheme Classiﬁer # of input features Feature types (spatial resolution)

S1 RF 18 Zonal mean (100 m)

S2 RF 54 Zonal mean, maximum and minimum (100 m)

S3 RF 108 Mean, minimum, maximum, median, 25th and 75th quantile values of 3×3 moving window (100 m)

S4 CNN 18 10 × 10 sized (10 m)

S5 CNN 18 30 × 30 sized (10 m)

Table 5

Accuracy assessment results for ﬁve schemes of four cities with the average

statistic values from 10 times runs for each scheme. The numbers in parentheses

are standard deviations of OA with 10 times runs.

Scheme Rome Hong Kong

OA (σ) % OA

urb

% OA

nat

% OA (σ) % OA

urb

% OA

nat

S1 (RF) 72.05 (0.13) 68.17 79.13 71.58 (0.43) 52.96 79.27

S2 (RF) 72.45 (0.19) 68.17 80.27 71.42 (0.18) 56.73 77.49

S3 (RF) 75.36 (0.25) 73.76 78.27 75.37 (0.06) 64.34 79.93

S4 (CNN) 73.32 (0.64) 72.22 75.33 74.84 (0.52) 54.62 83.20

S5 (CNN) 80.34 (1.04) 81.99 77.34 79.80 (0.63) 65.15 85.85

Scheme Madrid Chicago

OA (σ) % OA

urb

% OA

nat

% OA (σ) % OA

urb

% OA

nat

S1 (RF) 82.75 (0.18) 76.65 87.07 84.22 (0.06) 80.96 90.02

S2 (RF) 84.41 (0.07) 79.76 87.71 87.28 (0.09) 85.22 90.96

S3 (RF) 85.78 (0.14) 81.58 88.76 89.66 (0.10) 88.71 91.36

S4 (CNN) 85.33 (0.42) 80.67 88.64 86.18 (0.21) 83.44 91.07

S5 (CNN) 89.72 (0.41) 88.18 90.82 90.85 (0.38) 90.46 91.54

C. Yoo, et al. ISPRS Journal of Photogrammetry and Remote Sensing 157 (2019) 155–170

161

WUDAPT method (S1). These ﬁndings agree well with Zhang and Tang

(2019), who showed that accuracy improved when the surrounding

areas of the center target pixel were fed into CNN. In particular, when

comparing the accuracy diﬀerence between S2 and S3 with that be-

tween S4 and S5, the inﬂuence of contextual information appeared

more eﬀective in CNN than in RF. This implies that the input features of

S5 that consider the surrounding areas of the target LCZ pixels (i.e.,

30 × 30-sized images) contributed to learning more meaningful fea-

tures in the convolutional layers of CNN than those of S4 (i.e., 10 × 10-

sized images). Since the wider areas were considered more in S5 than in

S4, the CNN could learn more signiﬁcant patterns, probably due to the

broader information integrated by combining local patterns, especially

for urban features (Min et al., 2017). Moreover, the OA

urb

of S5 in-

creased by about 10–13% compared to that of S1 for four cities. The

increasing rate of OA

urb

between S5 and S1 is much higher than that of

nat

, implying that CNN-based S5 can be considered as the most ef-

fective LCZ mapping model for the mega urban areas.

The imbalance problem of accuracy by class occurred when the

number of samples diﬀered greatly among classes, resulting in poor

performance over the minority classes (Huang et al., 2016; Jeatrakul

et al., 2010; Zhou and Liu, 2006). In particular, RF is known to be less

sensitive to unbalanced sample size than neural network-based CNN

(Liu et al., 2018a; Liu et al., 2013). In Rome, S4 showed a higher OA

urb

but a lower OA

nat

than S2. For Hong Kong, on the other hand, S4

showed the opposite pattern. One reason may be that the ratio of

samples among the LCZ classes varies by city. The sample sizes in

Table 3 show that Rome has fewer samples of natural classes, and more

samples of urban classes than natural classes. In Hong Kong, however,

the number of samples in the urban LCZ types was very small, while the

number of samples in the natural LCZ types was much larger than that

of urban classes. When training CNN, LCZ classes with a relatively large

number of samples could be more correctly classiﬁed than weakly re-

presented LCZ classes. Such an imbalance problem of training sample

size by class seemed to be mitigated in S5 when compared to S4.

Consequently, the consideration of neighborhood pixels in CNN led to

the good classiﬁcation of the LCZ classes even with a small sample size

(i.e., natural classes for Rome, urban classes for Hong Kong).

The standard deviations of the results in Table 5 show that the CNN-

based schemes (S4–S5) yielded a higher variation of accuracy than

those of the RF-based schemes (S1–S3). This implies that RF produces

more consistent results than CNN, because RF is an ensemble-based

model (Lebedev et al., 2014; Kursa, 2014; Khoshgoftaar et al., 2007).

Despite the relatively high standard deviation in the S5 results, most of

the S5 classiﬁcations produced higher accuracy than those of the RF-

based schemes.

Using the most accurate model among the 10 simulations for each of

the ﬁve schemes, the signiﬁcance of the accuracy diﬀerences between

the classiﬁcations was assessed by McNemar’s Chi-squared test (Fig. 5).

In Rome, S1 yielded an outcome comparable to S2, and the perfor-

mance of both S2 and S3 were similar to that of S4. In the case of Hong

Kong, S1 and S2 showed similar results, as did S3 and S4. For Madrid

and Chicago, all classiﬁcations were statistically diﬀerent, except for

the S3/S4 pair for Madrid. S5, the CNN scheme, achieved signiﬁcantly

higher accuracy than the other schemes in all four cities (Table 5). S5 is

considered to have a great utility in LCZ classiﬁcation because it con-

sistently shows statistical signiﬁcance with the other schemes, resulting

in the highest classiﬁcation accuracy (Table 5 and Fig. 5). Interestingly,

the accuracy diﬀerence between S4 and S3 was not signiﬁcant in Rome,

Hong Kong, and Madrid. This implies that the RF model considering the

neighborhood area (300 × 300 m; S3) produced a similar performance

with the CNN model without considering such a large neighborhood of

the LCZ grid (100 × 100 m; S4). In the case of Chicago, however, S3

and S4 resulted in a signiﬁcant diﬀerence, showing higher accuracy of

the RF model for S3 than the CNN for S4 (Table 5). In Chicago where

the city has been developed based on regular grids, increasing features

(i.e., spectral and neighboring information) could bring suﬃciently

high accuracy in RF. This is particularly true in light of the accuracy

diﬀerences between the pair S2 and S1 and the pair S3 and S1, which

are both the highest among four cities in Table 5.

4.2. Classiﬁcation accuracy per class

Fig. 6 shows the F1-score for each of the four cities on the average of

10-time runs per scheme. Figs. 7–10 show the confusion matrices of the

most accurate models among the 10-time runs of S3, which is the best

scheme among the RF-based schemes, and S5, which is the best of the

CNN-based schemes, as shown in Table 5.

It should be noted that CNN-based S5 showed the highest F1-score

among the ﬁve schemes on LCZ5 and LCZ6 for all cities except the red-

star classes. For LCZ5 and LCZ6, where the abundant trees are mixed

with openly arranged low or mid-rise buildings, the RF-based S3 mis-

classiﬁed them as other classes, such as densely packed buildings (i.e.,

LCZ1-3) or natural LCZ types (Figs. 7–10). On the other hand, the CNN-

based S5 classiﬁed the classes more accurately than S3. One possible

reason is that CNN can learn the regions of mixed pixels with buildings

in the images by incorporating the surrounding area information.

Awrangjeb et al. (2012) reported that using the building edge in-

formation improved the detection performance of the trees, which were

misclassiﬁed as buildings. For S3 in Rome, LCZ6 (open low-rise) was

confused by various classes, especially LCZB (scattered trees) in Fig. 7a.

However, the accuracy of LCZ6 clearly increased in S5 (Fig. 7b). Among

the urban LCZ types in Hong Kong, LCZ5 (open mid-rise) and LCZ6

(open mid-rise) showed higher F1-scores in S5 than those in the other

schemes. In Fig. 8a, LCZ5 was often confused with LCZ4 (open high-

rise) and LCZA (dense trees) in S3, and LCZ6 was confused with LCZD

(low plants) in S3. Fig. 8b shows that such misclassiﬁcation of LCZ5 and

LCZ6 happened much less in S5 than inS3. In Madrid, the confusion of

LCZ5 with LCZ2 in S3 signiﬁcantly improved in S5, and the confusion

between LCZ6 and LCZ5 in S3 also clearly improved in S5 (Fig. 9a–b).

The LCZ9 (sparsely built) in the CNN-based schemes showed higher

F1-scores than the RF-based schemes did. Especially in Chicago, LCZ9

tended to be misclassiﬁed as LCZD (low plants) in S3, but this error was

observed to be reduced in S5 (Fig. 10a–b). In fact, LCZ9 shows a unique

spatial structure where small-sized buildings are sparsely built among

Fig. 5. Results of McNemar’s Chi-squared test. The orange squares indicate the signiﬁcant accuracy diﬀerence at the 99% conﬁdence level, while the blue ones at the

95% conﬁdence level. (For interpretation of the references to colour in this ﬁgure legend, the reader is referred to the web version of this article.)

C. Yoo, et al. ISPRS Journal of Photogrammetry and Remote Sensing 157 (2019) 155–170

162

vegetation. Thus, it is not surprising that the CNN classiﬁers classiﬁed

this class well, which is consistent with the results of Fu et al. (2018),

who reported that CNN showed the higher classiﬁcation accuracy for

the mixed objects when compared to RF.

One possible reason that the CNN's F1-score was relatively lower

than that of RF in LCZA in Rome and LCZ8 in Hong Kong could be a

data imbalance problem due to the relatively small number of training

samples of the classes (Table 3). In Hong Kong, however, LCZ5 and

LCZ6, which have open arrangement of buildings mixed with tress,

conﬁrmed that the F1-scores of CNN were higher than those of RF, even

if the number of samples was small. In this study, the data imbalance

problem might exert a relatively weak inﬂuence on accuracy because

LCZ classes with a small number of polygons were classiﬁed into the

red-star class in each city.

We calculated the F1-score diﬀerence between S3 and S2 (RF

schemes) and S5 and S4 (CNN schemes) to identify the neighboring

eﬀects for all LCZs, except red-star classes. Interestingly, the class

yielding the highest diﬀerence for each city is LCZ6 in Rome, LCZ5 in

Hong Kong and Madrid, and LCZ3 in Chicago (Figure S1) for the dif-

ference between S5 and S4 (CNN schemes). This result implies that the

consideration of neighborhoods could be more eﬀective for urban types

classes, especially when using CNN. Moreover, LCZ5 and LCZ6, which

are open arrangement classes, showed signiﬁcantly improved accuracy

when using CNN with the incorporation of neighborhood information

for all cities except Chicago, where the accuracy is still good enough,

even before the consideration of neighboring areas.

The results of the red-star class should be carefully interpreted in

Fig. 6. A positive bias may appear because the training and test sets

were randomly stratiﬁed samples within one polygon (Verdonck et al.,

2017). In particular, the red-star classes tended to have a higher F1-

score than the other schemes in S3 and S5, which incorporate the

neighboring areas into their classiﬁcations.

Fig. 6. Comparison of F1-score between the ﬁve schemes (S1–5) for the four cities. The F1-scores were averaged from 10 runs for each scheme. * indicates the red-

star classes. (For interpretation of the references to colour in this ﬁgure legend, the reader is referred to the web version of this article.)

C. Yoo, et al. ISPRS Journal of Photogrammetry and Remote Sensing 157 (2019) 155–170

163

4.3. Mapping LCZ for four cities

Figs. 11 and 12 show the 30 m resolution GMIS with LCZ references

and the developed LCZ maps for the four cities from S3 and S5, which

have the highest accuracy among the RF- and CNN-based schemes,

respectively. We divided the results of the two classiﬁed maps into four

cases and then calculated their ratios, as shown in Table 6.

The two maps of Rome have diﬀerent classiﬁcation results for the

suburban areas consisting of open arrangements that appear at a dis-

tance from the monocentric city center. In Rome, the study domain

denoted by the blue box (middle bottom) in S3 was classiﬁed as LCZ5

(open mid-rise), while that in S5 was classiﬁed as LCZD (low plants). In

addition, the east bottom part of the study domain bound by yellow box

shows an amount of open low-rise areas in S5, while S3 tended to show

this area as scattered trees. On the other hand, S3 classiﬁed the

northeastern part more as open mid-rise classes than in S5, which

classiﬁed the area as low plants and scattered trees. When compared to

the impervious cover and Google Earth images (not shown), S5 seemed

to classify the built-type classes better than the S3 did, while S3 tended

to be confused between vegetation and mixed buildings. These results

correspond to the accuracy assessment where the F1-scores of LCZ5 and

LCZ6 showed better performance in CNN than in RF in Rome. It should

be noted that LCZs 5, 6, B and D in Rome appear as the dominant LCZ

classes in the classiﬁed maps (Table S2). Considering that LCZs 5 and 6

are open-arrangement urban LCZ types, CNN's ability to classify these

types of LCZs better than RF seems to account for the map disparities

for Rome. Two maps of Rome were more likely to be classiﬁed diﬀer-

ently between urban and natural LCZ types: 21.01% (Table 6).

In Hong Kong, the non-residential (i.e., hilly) and habitable areas

are clearly distinguished from each other, so the diﬀerence within each

natural and urban LCZ type is somewhat larger than that between

natural and urban LCZ types in the two maps (Table 6). Especially, the

maps of S3 and S5 in Hong Kong exhibited diﬀerences within natural

LCZ types, such as low plants, trees, and bushes, in some areas based on

visual inspection. The classiﬁcation accuracy of LCZD in Hong Kong

was higher in the CNN-based schemes than the RF-based schemes,

considering both the F1-scores and the confusion matrices. The region

of the study bounded by a black box in S5 was classiﬁed more as LCZD

(low plants) than LCZC (bush, scrub), as opposed to its more pre-

dominant classiﬁcation of LCZC (bush, scrub) in S3. As in the land use

map of Hong Kong (Chan et al., 2016), these areas are dominated by

grassland, which implies that CNN could distinguish between plants

and scrub better than RF. Unlike RF, less confusion occurred between

LCZC and LCZD in CNN, which corresponds to the results of the con-

fusion matrices for not only Hong Kong but also Madrid (Figs. 8 & 9).

In Madrid, various small clustered areas called autonomous

Fig. 7. Confusion matrices of the most accurate model among the 10-time runs of S3, the best scheme among the RF-based schemes, and S5, the best of the CNN-

based schemes for Rome. * indicates the red-star classes. (For interpretation of the references to colour in this ﬁgure legend, the reader is referred to the web version

of this article.)

C. Yoo, et al. ISPRS Journal of Photogrammetry and Remote Sensing 157 (2019) 155–170

164

communities surround the city-core. This is a conurbation in which

extended suburbs and villages comprise dense built-type classes, such as

compact mid-rise; identiﬁed using Google Earth images (not shown).

These clusters appear more distinctly in S5 than in S3 (bounded by a

blue box), and were compared to the impervious cover, which clearly

showed that the S5 classiﬁed the clusters of these dense buildings re-

latively well. In RF, these compact clusters were misclassiﬁed as open

arrangement buildings, which corresponds well to the accuracy as-

sessment result showing that LCZ2 was more often confused among

other urban LCZ types in RF compared to CNN (Fig. 9). It is interesting

that LCZ2 was not confused that often with natural LCZ types in the

confusion matrix of S3, but the generated LCZ map of S3 showed some

misclassiﬁcation of LCZ2, often confused with LCZB. This could be

because there are few reference samples to test in these cluster areas. In

the case of Madrid, the ratio of natural LCZ types in the study region is

remarkably high, resulting in a classiﬁcation diﬀerence among natural

LCZ types that is the highest, at 14.65% (Table 6). The diﬀerent clas-

siﬁcation of urban and natural LCZ types in the two maps (~5.02%;

Table 6), could originate from the municipalities surrounding Madrid,

which were better classiﬁed in CNN than in RF due to the textural

patterns over large areas.

The promising results for LCZ classiﬁcation by CNN could be useful

data for the various urban climate studies especially for the regions

with abundant LCZ classes mixed with diﬀerent objects (i.e., buildings

with trees and bare-soil with shrubs). Although to a lesser extent than

Rome, the two maps of Chicago show a diﬀerence in the open ar-

rangement of low-density suburban areas surrounding the high-density

urban center, particularly in their diﬀerent classiﬁcations for LCZ types:

Urban and Natural (10.43% in Table 6). However, it should be noted

that CNN could result in low user’s accuracy. In Chicago, LCZ9 (sparsely

built) was distributed widely in the middle top of the study domain

(bounded by a black box) of the maps of S3 and S5. The CNN-based S5

has the advantage of catching the sparse buildings between some of the

trees and plants by object detection, but LCZ9 seems instead to be over-

classiﬁed on the map of S5 when compared to S3. This also corresponds

well with the result of the confusion matrices in Chicago (Fig. 10), as S5

showed a higher producer’s accuracy, but a lower user’s accuracy than

S3 for LCZ9. Although this paper used only the Landsat data corre-

sponding to the WUDAPT method, if input variables, such as Sentinel-1

backscattered data, were used additionally to explain the characteristics

of buildings (Koppel et al., 2017; Demuzere et al., 2019), the limitation

of the CNN could be improved.

The CNN-based classiﬁcation is known to take more time from the

training stage to the mapping stage than the RF. Nonetheless, the CNN-

based S5 had high classiﬁcation accuracy and was of high value in

classifying speciﬁc LCZ types where the objects were mixed, when

compared to those of the RF-based S3.

Fig. 8. Confusion matrices of the most accurate model among the 10-time runs of S3, the best scheme among the RF-based schemes, and S5, the best of the CNN-

based schemes for Hong Kong. * indicates the red-star classes. (For interpretation of the references to colour in this ﬁgure legend, the reader is referred to the web

version of this article.)

C. Yoo, et al. ISPRS Journal of Photogrammetry and Remote Sensing 157 (2019) 155–170

165

4.4. Evaluation of model transferability

Table 7 shows the transferability assessment results based on the

two best performing schemes from the experiment of individual cities.

Interestingly, the CNN-based scheme (S5) showed a distinctly higher

performance than the RF-based scheme (S3) both for the OA and OA

urb

In particular, the signiﬁcant improvement of OA

urb

for all four cities

was found corresponding to the ﬁndings in our single-city experiments,

which implies the superiority of the object detection-based character-

istics of CNN classiﬁers. In recent years, research on a transferability

framework has been attempted, with LCZ reference samples of speciﬁc

cities trained and applied to other cities (Demuzere et al., 2019; Qiu

et al., 2019; Yokoya et al., 2018). For example, Demuzere et al. (2019)

examined global transferability of LCZ models using RF classiﬁers with

the Google Earth Engine. However, they found the transferability of the

LCZ models was still challenging because the accuracies of their models

were generally poor (average OA of the 15 cities close to 50%). The

results of this present study identiﬁed the advantages of using CNN

classiﬁers over RF in the transferability framework of LCZ classiﬁcation,

especially for urban-type LCZ classiﬁcation. When compared to the

single-city experiment results in Table 5, the accuracy of the transfer-

ability experiment was a bit lower, varying by city, possibly due to the

limited coverage of reference data for training. It is crucial to construct

thorough and suﬃcient reference data of LCZ classes for various urban

structural types over the globe to improve the transferability of LCZ

models.

4.5. Novelty, limitations, and future directions

To our knowledge, this is the ﬁrst study to compare and discuss LCZ

classiﬁcation results between RF and CNN classiﬁers, in detail.

Although some previous studies tried to compare the LCZ classiﬁcation

results among diﬀerent classiﬁers including basic machine learning

algorithms (i.e., RF, Support Vector Machine (SVM) and Neural

Networks (NN)), they didn’t examine deep learning-based classiﬁers

(Bechtel and Daneke, 2012; Bechtel et al., 2016). More recently, a few

studies on LCZ classiﬁcation using CNN classiﬁers have been conducted

(Sukhanov et al., 2017; Qiu et al., 2018). However, they did not fully

compare the classiﬁcation performance with the existing models using

RF classiﬁers. Furthermore, this paper compared the results using dif-

ferent sizes of input data (i.e., 10 × 10 and 30 × 30) fed into the CNN

classiﬁers. The positive eﬀect of an increasing input patch size in CNN

has been proven in diﬀerent studies (Hamwood et al., 2018), which is

also shown in the LCZ mapping in this study. In particular, the speciﬁc

LCZ classes (i.e. LCZ5 and LCZ6) that have a favorable impact using

CNN were identiﬁed when an increasing size of input data was applied

when compared to the impact of RF under the same conditions. This

result can provide meaningful guidance for the continued research of

Fig. 9. Confusion matrices of the most accurate model among the 10-time runs of S3, the best scheme among the RF-based schemes, and S5, the best of the CNN-

based schemes for Madrid. * indicates the red-star classes. (For interpretation of the references to colour in this ﬁgure legend, the reader is referred to the web version

of this article.)

C. Yoo, et al. ISPRS Journal of Photogrammetry and Remote Sensing 157 (2019) 155–170

166

LCZ classiﬁcation using CNN classiﬁers. In addition, many LCZ classi-

ﬁcation studies have focused on only OA for their accuracy assessment.

In this study, OA

urb

, the overall accuracy between the urban LCZ types,

was also carefully examined when comparing the accuracy of the pro-

posed schemes. The validity of the results of the LCZ classiﬁcation using

two classiﬁers was strengthened by applying it to four cities with dif-

ferent urban structures and geographical characteristics in various

continents such as Europe, Asia, and America.

The major limitation of this study is the small sample size of the

speciﬁc LCZ classes (i.e., red-star classes). In this study, reference LCZ

data were provided by the IEEE data fusion contest to ensure the re-

liability of the data. In order to validate LCZ classiﬁcation with a

minimum bias, the reference polygons should be divided into training

and test sets. For the red-star classes, however, we divided the datasets

by stratiﬁed random sampling among the pixels in a polygon, because

of the limited number of polygons. The red-star classes are likely to

have a positive bias in their classiﬁcation results, so care is needed in

any interpretation. Further improvement in accuracy for the LCZ classes

with a small number of samples (i.e., LCZE in Chicago) is expected

through the utilization of data augmentation methods discussed by

Yokoya et al. (2018). The CNN-based S5 showed higher accuracy in

four cities when compared to other RF-based schemes, but we could not

pinpoint which objects contributed to the detection of each LCZ class in

CNN. The use of high spatial resolution satellite data (i.e., Sentinels) in

future LCZ classiﬁcation will improve the object detection ability of

CNN classiﬁers. In addition, using high-resolution images will enable a

more detailed analysis, especially if heat maps of CNN classiﬁers are

used. It is also possible to make Landsat images as higher-resolution

images by using pan-sharpening techniques (Xing et al., 2018;

Gilbertson et al., 2017; Rahaman et al., 2017). Recently, in the deep

learning ﬁeld, CNN and other machine learning classiﬁers are being

combined to construct better models (Zhang et al., 2018; Soltau et al.,

2014). These techniques can be applied to the ﬁeld of LCZ classiﬁcation

as well.

When it comes to the CNN model, the fully connected network

(FCN) is adopted in recent land cover classiﬁcation, with the aspect of

the semantic segmentation (Mohammadimanesh et al., 2019; Wurm

et al., 2019; Yue et al., 2019). FCN has the advantage of learning spatial

relationships at diﬀerent scales (Volpi and Tuia, 2016), which can be

expected to yield improved performance in LCZ classiﬁcation by taking

into account the various size and shape of each LCZ class in future

work.

5. Conclusion

In this study, we compared the two classiﬁers, RF and CNN, for LCZ

classiﬁcation in four mega cities—Rome, Hong Kong, Madrid, and

Chicago—using bitemporal Landsat images. A total of ﬁve schemes

Fig. 10. Confusion matrices of the most accurate model among the 10-time runs of S3, the best scheme among the RF-based schemes, and S5, the best of the CNN-

based schemes for Chicago. * indicates the red-star classes. (For interpretation of the references to colour in this ﬁgure legend, the reader is referred to the web

version of this article.)

C. Yoo, et al. ISPRS Journal of Photogrammetry and Remote Sensing 157 (2019) 155–170

167

were constructed and compared. Three RF-based schemes (S1–3) were

benchmarked based on previous LCZ classiﬁcation research studies.

Two CNN-based schemes (S4–5) were benchmarked using diﬀerent

input feature sizes. Among the ﬁve schemes, S5 showed the best clas-

siﬁcation performance. When compared to the existing WUDAPT

workﬂow (i.e., S1), the OA and OA

urb

of S5 increased by about 6–8%

and 10–13%, respectively, for the four cities. This study has revealed

that the CNN classiﬁers were particularly good at classifying the spe-

ciﬁc LCZ classes in which buildings were mixed with trees or buildings

and trees were sparsely distributed. We also found that the

Fig. 11. LCZ maps of the best classiﬁcation model among the 10-time runs of S3, the best scheme among the RF-based schemes, and S5, the best of the CNN-based

schemes for Rome and Hong Kong. Impervious covers from GMIS and LCZ reference datasets are also presented.

Fig. 12. LCZ maps of the best classiﬁcation model among the 10-time runs of S3, the best scheme among the RF-based schemes, and S5, the best of the CNN-based

schemes for Madrid and Chicago. Impervious covers from GMIS and LCZ reference datasets are also presented.

C. Yoo, et al. ISPRS Journal of Photogrammetry and Remote Sensing 157 (2019) 155–170

168

classiﬁcation performance of CNN signiﬁcantly improved when the

input features were created with consideration of the lager neighbor-

hood areas. The results from the transferability experiment of the LCZ

models supported the superiority of the CNN approach over RF in terms

of both OA and OA

urb

for all four cities. In the future, the CNN-based

approach will become more advantageous when incorporating higher-

resolution satellite images (i.e., Sentinels) and additional spatio-

temporal features.

Acknowledgements

This research was supported by the Space Technology Development

Program and the Basic Science Research Program through the National

Foundation of Korea (NRF) funded by the Ministry of Science, ICT, &

Future Planning and the Ministry of Education of Korea, respectively

(Grant: NRF-2017M1A3A3A02015981; NRF-2017R1D1A1B03028129),

and the Korea Meteorological Administration Research and

Development Program under Grant KMIPA 2017-7010. CY was also

supported by Global PhD Fellowship Program through the National

Research Foundation of Korea (NRF), funded by the Ministry of

Education (NRF-2018H1A2A1062207). We also would like to thank

WUDAPT (the World Urban Database and Access Portal Tools project,

www.wudapt.org), the IEEE GRSS Image Analysis and Data Fusion

Technical Committee, and all the contributors for LCZ ground-truth

samples, in particular Chao Ren, Dragan Milosevic, Guillaume Dumas,

and Maria De Fatima Andrade.

Appendix A. Supplementary material

Supplementary data to this article can be found online at https://

doi.org/10.1016/j.isprsjprs.2019.09.009.

References

Athiwaratkun, B., Kang, K., 2015. Feature representation in convolutional neural net-

works. arXiv preprint arXiv:1507.02313.

Awrangjeb, M., Zhang, C., Fraser, C.S., 2012. Building detection in complex scenes

thorough eﬀective separation of buildings from trees. Photogramm. Eng. Remote

Sens. 78, 729–745.

Ba, J.L., Kiros, J.R., Hinton, G.E., 2016. Layer normalization. arXiv preprint arXiv:1607.

06450.

Barnes, K.B., Morgan, J., Roberge, M., 2001. Impervious surfaces and the quality of

natural and built environments. Department of Geography and Environmental

Planning, Towson University, Baltimore.

Bechtel, B., Alexander, P.J., Beck, C., Böhner, J., Brousse, O., Ching, J., Demuzere, M.,

Fonte, C., Gál, T., Hidalgo, J., 2019. Generating WUDAPT Level 0 data–Current status

of production and evaluation. Urban Clim. 27, 24–45.

Bechtel, B., Alexander, P.J., Böhner, J., Ching, J., Conrad, O., Feddema, J., Mills, G., See,

L., Stewart, I., 2015. Mapping local climate zones for a worldwide database of the

form and function of cities. ISPRS Int. J. Geo-Inf. 4, 199–219.

Bechtel, B., Daneke, C., 2012. Classiﬁcation of local climate zones based on multiple earth

observation data. IEEE J-Stars 5, 1191.

Bechtel, B., Demuzere, M., Sismanidis, P., Fenner, D., Brousse, O., Beck, C., Van Coillie, F.,

Conrad, O., Keramitsoglou, I., Middel, A., 2017. Quality of crowdsourced data on

urban morphology—The human inﬂuence experiment (HUMINEX). Urban Sci. 1, 15.

Bechtel, B., See, L., Mills, G., Foley, M., 2016. Classiﬁcation of local climate zones using

SAR and multispectral data in an arid environment. IEEE J-Stars 9, 3097–3105.

Beck, C., Straub, A., Breitner, S., Cyrys, J., Philipp, A., Rathmann, J., Schneider, A., Wolf,

K., Jacobeit, J., 2018. Air temperature characteristics of local climate zones in the

Augsburg urban area (Bavaria, southern Germany) under varying synoptic condi-

tions. Urban Clim. 25, 152–166.

Bontemps, S., Defourny, P., Bogaert, E.V., Arino, O., Kalogirou, V., Perez, J.R., 2011.

GLOBCOVER 2009-Products description and validation report.

Breiman, L., 2001. Random forests. Machine Learn. 45, 5–32.

Cai, M., Ren, C., Xu, Y., Lau, K.K.-L., Wang, R., 2018. Investigating the relationship be-

tween local climate zone and land surface temperature using an improved WUDAPT

methodology–A case study of Yangtze River Delta, China. Urban Clim. 24, 485–502.

Chan, E.H., Wang, A., Lang, W., 2016. Comprehensive Evaluation Framework for

Sustainable Land Use: Case Study of Hong Kong in 2000–2010. J. Urban Plann. Dev.

142, 05016007.

Chen, J., Chen, J., Liao, A., Cao, X., Chen, L., Chen, X., He, C., Han, G., Peng, S., Lu, M.,

2015. Global land cover mapping at 30 m resolution: A POK-based operational ap-

proach. ISPRS J. Photogramm. 103, 7–27.

Cohen, B., 2015. Urbanization, City growth, and the New United Nations development

agenda. Cornerstone 3, 4–7.

Danylo, O., See, L., Bechtel, B., Schepaschenko, D., Fritz, S., 2016. Contributing to

WUDAPT: a local climate zone classiﬁcation of two cities in Ukraine. IEEE J-Stars 9,

1841–1853.

de Colstoun, E.C.B., Huang, C., Wang, P., Tilton, J.C., Tan, B., Phillips, J., Niemczura, S.,

Ling, P.-Y., Wolfe, R., 2017. Documentation for the Global Man-made Impervious

Surface (GMIS) Dataset From Landsat.

Demuzere, M., Bechtel, B., Mills, G., 2019. Global transferability of local climate zone

models. Urban Clim. 27, 46–63.

Ellickson, R.C., 2012. The law and economics of street layouts: How a grid pattern

beneﬁts a downtown. Ala. L. Rev. 64, 463.

Fallmann, J., Forkel, R., Emeis, S., 2016. Secondary eﬀects of urban heat island mitigation

measures on air quality. Atmos. Environ. 125, 199–211.

Founda, D., Santamouris, M., 2017. Synergies between urban heat island and heat waves

in Athens (Greece), during an extremely hot summer (2012). Sci. Rep. 7, 10973.

Friedl, M.A., Sulla-Menashe, D., Tan, B., Schneider, A., Ramankutty, N., Sibley, A., Huang,

X., 2010. MODIS Collection 5 global land cover: Algorithm reﬁnements and char-

acterization of new datasets. Remote Sens. Environ. 114, 168–182.

Fu, T., Ma, L., Li, M., Johnson, B.A., 2018. Using convolutional neural network to identify

irregular segmentation objects from very high-resolution remote sensing imagery. J.

Appl. Remote Sens. 12, 025010.

Gilbertson, J.K., Kemp, J., Van Niekerk, A., 2017. Eﬀect of pan-sharpening multi-tem-

poral Landsat 8 imagery for crop type diﬀerentiation using diﬀerent classiﬁcation

techniques. Comput. Electron. Agric. 134, 151–159.

Giridharan, R., Emmanuel, R., 2018. The impact of urban compactness, comfort strategies

and energy consumption on tropical urban heat island intensity: a review. Sustain.

Cities Soc. 40, 677–687.

Giridharan, R., Ganesan, S., Lau, S., 2004. Daytime urban heat island eﬀect in high-rise

and high-density residential developments in Hong Kong. Energy Build. 36, 525–534.

Glorot, X., Bordes, A., Bengio, Y., 2011. Deep sparse rectiﬁer neural networks. In:

Proceedings of the fourteenth international conference on artiﬁcial intelligence and

statistics, pp. 315–323.

Goodfellow, I., Bengio, Y., Courville, A., Bengio, Y., 2016. Deep Learning. MIT Press

Cambridge.

Hamwood, J., Alonso-Caneiro, D., Read, S.A., Vincent, S.J., Collins, M.J., 2018. Eﬀect of

patch size and network architecture on a convolutional neural network approach for

automatic segmentation of OCT retinal layers. Biomed. Opt. Express 9, 3049–3066.

Han-qiu, X., Ben-qing, C., 2004. Remote sensing of the urban heat island and its changes

in Xiamen City of SE China. J. Environ. Sci. 16, 276–281.

Huang, C., Li, Y., Change Loy, C., Tang, X., 2016. Learning deep representation for im-

balanced classiﬁcation. In: Proceedings of the IEEE Conference on Computer Vision

and Pattern Recognition, pp. 5375–5384.

Jeatrakul, P., Wong, K.W., Fung, C.C., 2010. Classiﬁcation of imbalanced data by com-

bining the complementary neural network and SMOTE algorithm. In: International

Conference on Neural Information Processing. Springer, pp. 152–159.

Kaloustian, N., Bechtel, B., 2016. Local climatic zoning and urban heat island in Beirut.

Procedia Eng. 169, 216–223.

Khoshgoftaar, T.M., Golawala, M., Van Hulse, J., 2007. An empirical study of learning

from imbalanced data using random forest, Tools with Artiﬁcial Intelligence, 2007.

ICTAI 2007. In: 19th IEEE International Conference on. IEEE, pp. 310–317.

Table 6

The percentages of the LCZ diﬀerences between two classiﬁed maps (S3 and S5)

for four cities shown in Figs. 11 and 12.

Rome Hong Kong Madrid Chicago

Classiﬁcation within the same LCZ 60.21% 73.47% 77.87% 80.31%

Diﬀerent classiﬁcation within Urban

LCZ types

11.29% 8.09% 2.46% 5.70%

Diﬀerent Classiﬁcation within

Natural LCZ types

7.49% 11.10% 14.65% 3.56%

Diﬀerent Classiﬁcation for LCZ types:

Urban and Natural

21.01% 7.33% 5.02% 10.43%

Table 7

Transferability assessment results by test city based on S3 and S5, the best

performing RF and CNN schemes from the single city experiments, respectively.

The overall accuracies were extracted from the best model among 10-time runs.

Scheme Rome Hong Kong

OA % OA

urb

% OA

nat

% OA % OA

urb

% OA

nat

S3 (RF) 45.20 43.33 48.61 52.03 5.34 71.31

S5 (CNN) 62.69 67.42 54.07 58.68 28.18 71.27

Scheme Madrid Chicago

OA % OA

urb

% OA

nat

% OA % OA

urb

% OA

nat

S3 (RF) 60.64 47.68 69.83 27.24 6.89 63.45

S5 (CNN) 78.03 77.08 78.71 41.52 25.13 70.70

C. Yoo, et al. ISPRS Journal of Photogrammetry and Remote Sensing 157 (2019) 155–170

169

Kim, M., Lee, J., Han, D., Shin, M., Im, J., Lee, J., Quackenbush, L.J., Gu, Z., 2018a.

Convolutional neural network-based land cover classiﬁcation using 2-D spectral re-

ﬂectance curve graphs with multitemporal satellite imagery. IEEE J-Stars 11,

4604–4617.

Kim, M., Lee, J., Im, J., 2018b. Deep learning-based monitoring of overshooting cloud

tops from geostationary satellite data. Gisci. Remote Sens. 55, 763–792.

Kingma, D.P., Ba, J., 2014. Adam: A method for stochastic optimization. arXiv preprint

arXiv:1412.6980.

Koppel, K., Zalite, K., Voormansik, K., Jagdhuber, T., 2017. Sensitivity of Sentinel-1

backscatter to characteristics of buildings. Int. J. Remote Sens. 38, 6298–6318.

Krizhevsky, A., Sutskever, I., Hinton, G.E., 2012. Imagenet classiﬁcation with deep con-

volutional neural networks. In: Advances in Neural Information Processing Systems,

pp. 1097–1105.

Kursa, M.B., 2014. Robustness of Random Forest-based gene selection methods. BMC

Bioinf. 15, 8.

Lauwaet, D., Hooyberghs, H., Maiheu, B., Lefebvre, W., Driesen, G., Van Looy, S., De

Ridder, K., 2015. Detailed Urban Heat Island projections for cities worldwide: dy-

namical downscaling CMIP5 global climate models. Climate 3, 391–415.

Lebedev, A., Westman, E., Van Westen, G., Kramberger, M., Lundervold, A., Aarsland, D.,

Soininen, H., Kłoszewska, I., Mecocci, P., Tsolaki, M., 2014. Random Forest en-

sembles for detection and prediction of Alzheimer's disease with a good between-

cohort robustness. NeuroImage: Clin. 6, 115–125.

LeCun, Y., Bengio, Y., Hinton, G., 2015. Deep learning. Nature 521, 436–444.

Lee, J., Im, J., Kim, K., Quackenbush, L.J., 2018. Machine learning approaches for esti-

mating forest stand height using plot-based observations and airborne LiDAR data.

Forests 9, 268.

Li, M., Im, J., Beier, C., 2013. Machine learning approaches for forest classiﬁcation and

change analysis using multi-temporal Landsat TM images over Huntington Wildlife

Forest. Gisci. Remote Sens. 50, 361–384.

Liu, M., Wang, M., Wang, J., Li, D., 2013. Comparison of random forest, support vector

machine and back propagation neural network for electronic tongue data classiﬁca-

tion: application to the recognition of orange beverage and Chinese vinegar. Sens.

Actuat. B 177, 970–980.

Liu, T., Abd-Elrahman, A., Morton, J., Wilhelm, V.L., 2018a. Comparing fully convolu-

tional networks, random forest, support vector machine, and patch-based deep con-

volutional neural networks for object-based wetland mapping using images from

small unmanned aircraft system. Gisci. Remote Sens. 55, 243–264.

Liu, Y., Fang, X., Xu, Y., Zhang, S., Luan, Q., 2018b. Assessment of surface urban heat

island across China’s three main urban agglomerations. Theor. Appl. Climatol. 133,

473–488.

Marcos, D., Volpi, M., Kellenberger, B., Tuia, D., 2018. Land cover mapping at very high

resolution with rotation equivariant CNNs: Towards small yet accurate models. ISPRS

J. Photogramm. 145, 96–107.

Mathew, A., Khandelwal, S., Kaul, N., 2018. Investigating spatio-temporal surface urban

heat island growth over Jaipur city using geospatial techniques. Sustain. Cities Soc.

40, 484–500.

Min, S., Lee, B., Yoon, S., 2017. Deep learning in bioinformatics. Brieﬁngs Bioinf. 18,

851–869.

Mohammadimanesh, F., Salehi, B., Mahdianpari, M., Gill, E., Molinier, M., 2019. A new

fully convolutional neural network for semantic segmentation of polarimetric SAR

imagery in complex land cover ecosystem. ISPRS J. Photogramm. Remote Sens. 151,

223–236.

Paoletti, M., Haut, J., Plaza, J., Plaza, A., 2018. A new deep convolutional neural network

for fast hyperspectral image classiﬁcation. ISPRS J. Photogramm. 145, 120–147.

Park, S., Im, J., Park, S., Yoo, C., Han, H., Rhee, J., 2018. Classiﬁcation and mapping of

paddy rice by combining landsat and SAR time series data. Remote Sens-Basel 10,

447.

Peel, M.C., Finlayson, B.L., McMahon, T.A., 2007. Updated world map of the Koppen-

Geiger climate classiﬁcation. Hydrol. Earth Syst. Sci. 11, 1633–1644.

Qiu, C., Mou, L., Schmitt, M., Zhu, X.X., 2019. Local climate zone-based urban land cover

classiﬁcation from multi-seasonal Sentinel-2 images with a recurrent residual net-

work. ISPRS J. Photogramm. Remote Sens. 154, 151–162.

Qiu, C., Schmitt, M., Mou, L., Ghamisi, P., Zhu, X., 2018. Feature importance analysis for

local climate zone classiﬁcation using a residual convolutional neural network with

multi-source datasets. Remote Sens-Basel 10, 1572.

Rahaman, K.R., Hassan, Q.K., Ahmed, M.R., 2017. Pan-sharpening of Landsat-8 images

and its application in calculating vegetation greenness and canopy water contents.

ISPRS Int. J. Geo-Inf. 6, 168.

Richardson, H.J., Hill, D.J., Denesiuk, D.R., Fraser, L.H., 2017. A comparison of geo-

graphic datasets and ﬁeld measurements to model soil carbon using random forests

and stepwise regressions (British Columbia, Canada). Gisci. Remote Sens. 54,

573–591.

Rizwan, A.M., Dennis, L.Y., Chunho, L., 2008. A review on the generation, determination

and mitigation of Urban Heat Island. J. Environ. Sci. 20, 120–128.

Rumerlhar, D., 1986. Learning representation by back-propagating errors. Nature 323,

533–536.

Salata, F., Golasi, I., Petitti, D., de Lieto Vollaro, E., Coppi, M., de Lieto Vollaro, A., 2017.

Relating microclimate, human thermal comfort and health during heat waves: an

analysis of heat island mitigation strategies through a case study in an urban outdoor

environment. Sustain. Cities Soc. 30, 79–96.

Schmidhuber, J., 2015. Deep learning in neural networks: an overview. Neural networks

61, 85–117.

Sim, S., Im, J., Park, S., Park, H., Ahn, M.H., Chan, P.W., 2018. Icing detection over East

Asia from geostationary satellite data using machine learning approaches. Remote

Sens-Basel 10, 631.

Sokolova, M., Lapalme, G., 2009. A systematic analysis of performance measures for

classiﬁcation tasks. Inform. Process. Manage. 45, 427–437.

Soltau, H., Saon, G., Sainath, T.N., 2014. Joint training of convolutional and non-con-

volutional neural networks. ICASSP 5572–5576.

Stewart, I.D., Oke, T.R., 2012. Local climate zones for urban temperature studies. Bull.

Am. Meteorol. Soc. 93, 1879–1900.

Sukhanov, S., Tankoyeu, I., Louradour, J., Heremans, R., Troﬁmova, D., Debes, C., 2017.

Multilevel ensembling for local climate zones classiﬁcation. In: 2017 IEEE

International Geoscience and Remote Sensing Symposium (IGARSS). IEEE, pp.

1201–1204.

Tuia, D., Moser, G., Le Saux, B., Bechtel, B., See, L., 2017. 2017 IEEE GRSS data fusion

contest: open data for global multimodal land use classiﬁcation [Technical

Committees]. IEEE Geosci. Remote Sens. Mag. 5, 70–73.

Vedaldi, A., Lenc, K., 2015. Matconvnet: convolutional neural networks for matlab. In:

Proceedings of the 23rd ACM International Conference on Multimedia. ACM, pp.

689–692.

Verdonck, M.-L., Okujeni, A., van der Linden, S., Demuzere, M., De Wulf, R., Van Coillie,

F., 2017. Inﬂuence of neighbourhood information on ‘local climate zone’mapping in

heterogeneous cities. Int. J. Appl. Earth Obs. Geoinf. 62, 102–113.

Volpi, M., Tuia, D., 2016. Dense semantic labeling of subdecimeter resolution images with

convolutional neural networks. IEEE Trans. Geosci. Remote Sens. 55, 881–893.

Wang, C., Middel, A., Myint, S.W., Kaplan, S., Brazel, A.J., Lukasczyk, J., 2018. Assessing

local climate zones in arid cities: the case of Phoenix, Arizona and Las Vegas, Nevada.

ISPRS J. Photogramm. 141, 59–71.

Wang, T., Wu, D.J., Coates, A., Ng, A.Y., 2012. End-to-end text recognition with con-

volutional neural networks. In: Pattern Recognition (ICPR), 2012 21st International

Conference on. IEEE, pp. 3304–3308.

Wurm, M., Stark, T., Zhu, X.X., Weigand, M., Taubenböck, H., 2019. Semantic segmen-

tation of slums in satellite images using transfer learning on fully convolutional

neural networks. ISPRS J. Photogramm. Remote Sens. 150, 59–69.

Xing, Y., Wang, M., Yang, S., Jiao, L., 2018. Pan-sharpening via deep metric learning.

ISPRS J. Photogramm. 145, 165–183.

Xu, Z., Guan, K., Casler, N., Peng, B., Wang, S., 2018. A 3D convolutional neural network

method for land cover classiﬁcation using LiDAR and multi-temporal Landsat ima-

gery. ISPRS J. Photogramm. 144, 423–434.

Yadav, N., Sharma, C., Peshin, S., Masiwal, R., 2017. Study of intra-city urban heat island

intensity and its inﬂuence on atmospheric chemistry and energy consumption in

Delhi. Sustain. Cities Soc. 32, 202–211.

Yokoya, N., Ghamisi, P., Xia, J., Sukhanov, S., Heremans, R., Tankoyeu, I., Bechtel, B., Le

Saux, B., Moser, G., Tuia, D., 2018. Open data for global multimodal land use clas-

siﬁcation: outcome of the 2017 IEEE GRSS Data Fusion Contest. IEEE J-Stars 11,

1363–1377.

Yoo, C., Im, J., Park, S., Quackenbush, L.J., 2018. Estimation of daily maximum and

minimum air temperatures in urban landscapes using MODIS time series satellite

data. ISPRS J. Photogramm. 137, 149–162.

Yu, X.R., Wu, X.M., Luo, C.B., Ren, P., 2017. Deep learning in remote sensing scene

classiﬁcation: a data augmentation enhanced convolutional neural network frame-

work. Gisci. Remote Sens. 54, 741–758.

Yue, K., Yang, L., Li, R., Hu, W., Zhang, F., Li, W., 2019. TreeUNet: Adaptive Tree con-

volutional neural networks for subdecimeter aerial image segmentation. ISPRS J.

Photogramm. Remote Sens. 156, 1–13.

Zhang, C., Pan, X., Li, H., Gardiner, A., Sargent, I., Hare, J., Atkinson, P.M., 2018. A

hybrid MLP-CNN classiﬁer for very ﬁne resolution remotely sensed image classiﬁ-

cation. ISPRS J. Photogramm. 140, 133–144.

Zhang, T., Tang, H., 2019. A Comprehensive Evaluation of Approaches for Built-Up Area

Extraction from Landsat OLI Images Using Massive Samples. Remote Sens-Basel 11, 2.

Zhen, Z., Quackenbush, L.J., Stehman, S.V., Zhang, L., 2013. Impact of training and va-

lidation sample selection on classiﬁcation accuracy and accuracy assessment when

using reference polygons in object-based classiﬁcation. Int. J. Remote Sens. 34,

6914–6930.

Zhou, Y.-T., Chellappa, R., 1988. Computation of optical ﬂow using a neural network. In:

IEEE International Conference on Neural Networks, pp. 71–78.

Zhou, Z.H., Liu, X.Y., 2006. Training cost-sensitive neural networks with methods ad-

dressing the class imbalance problem. IEEE Trans. Knowl. Data Eng. 18, 63–77.

Ziaul, S., Pal, S., 2018. Analyzing control of respiratory particulate matter on Land

Surface Temperature in local climatic zones of English Bazar Municipality and

Surroundings. Urban Clim. 24, 34–50.

C. Yoo, et al. ISPRS Journal of Photogrammetry and Remote Sensing 157 (2019) 155–170

170

Application and future of local climate zone system in urban climate assessment and planning—Bibliometrics and meta-analysis

Article

Jul 2024
CITIES

In recent years, the concept of Local Climate Zone (LCZ) has been widely used in various cross-cutting areas of urban climate planning and has the potential to become a generalized assessment tool. Although the underlying framework for detailed LCZ theory and mapping has been proposed, a generalized methodology and multidisciplinary cross-cutting applicability arguments are still lacking, which is not friendly enough for future LCZ synergistic urban planning and policy output. Therefore, there is an urgent need for a comprehensive survey of empirical studies of LCZ systems in cities to improve the understanding and address the above issues. In this study, bibliometric analysis and meta-analysis were used to provide a systematic review of LCZ empirical studies over the past decade; analyze the number of studies, geographical distribution, keywords and research hotspots; and discuss the following themes: conducting studies on a global scale; establishing a new standardized mapping process; heat island assessment based on global datasets; using LCZ as a tool for assessing the thermal health of cities; improving the compatibility of the LCZ framework with climate models; and urban planning and design applications incorporating nonphysical factors. Scientific and practical communities can quickly clarify the current status and challenges of using LCZ in urban climate planning and provide references for expanding the application of LCZ.

Advanced stacked integration method for forecasting long-term drought severity: CNN with machine learning models

Article

Full-text available

Apr 2024

Study region: Eight governorates in upper Egypt namely Aswan, Asyut, Beni-Suef, Fayoum, Luxor, Minya, Qena and Sohag. Study focus: This study aims to develop novel hybrid machine learning (ML) models for forecasting the drought phenomena based on limited inputs for the eight Egyptian govern-orates, and ii) evaluate the performance and accuracy of the developed ML models for predicting Palmer Drought Severity Index (PDSI) to recommend the optimal model based on performance statistical metrics. The hybrid ML models were Convolution Neural Networks (CNN)-Long Short-Term Memory (LSTM), CNN-Random Forest (RF), CNN-Support Vector Machine (SVR), and CNN-Extreme Gradient Boosting (XGB). New hydrological insights for the region: Results showed that CNN-LSTM model outperformed the others followed by CNN-RF. Values of NSE, MAE, MARE, IA, R 2 , and RMSE for CNN-LSTM were 0.885, 0.915, − 2.073, 0.967, 0.885, and 0.573, respectively. For the testing stage CNN-SVR model was found to perform the best; average values of NSE, MAE, MARE, IA, R 2 , and RMSE were 0.828, 0.364, − 2.903, 0.950, 0.828 and 0.688, respectively. This study provided a way forward for convenient estimation of the PDSI Index from the meteorological data in terms of advancing deep learning algorithms. The developed hybrid models, more or less, can satisfactory predict PDSI values. Additionally, the study suggests the CNN-LSTM model as the most suitable model to advance future investigation in the study area.

Unequal impacts of urban industrial land expansion on economic growth and carbon dioxide emissions

Article

Full-text available

Apr 2024

Industrial land drives economic growth but also contributes to global warming through carbon dioxide emissions. Still, the variance in its impact on economies and emissions across countries at different development stages is understudied. Here, we used satellite data and machine learning to map industrial land at 30 m resolution in ten countries with substantial industrial value-added, and analyzed the impact of industrial land expansion on economic growth and emissions in 216 subnational regions from 2000 to 2019. We found that industrial land expansion was the leading factor for economic growth and emissions in developing regions, contributing 31% and 55%, respectively. Conversely, developed regions showed a diminished impact (8% and 3%, respectively), with a shift towards other economic growth drivers like education. Our findings encourage developing regions to consider the adverse effects of climate change during industrial land expansion and that developed regions prioritize human capital investment over further land expansion.

Application of LCZ to Time-Series Urban Morphology Detection

Chapter

Jun 2024

Over the past few decades, urbanization has led to significant changes in land use and cover, impacting urban climate and public health, as well as energy consumption. In 2012, the local climate zone (LCZ) classification system was introduced to better represent the complexity of urban morphology. However, mapping LCZ over a long period has been challenging. This chapter presents a machine learning-based framework for mapping annual LCZ time series in three major urban agglomerations in China, providing spatial-temporal consistency in the resulting maps. The chapter also reveals the spatial and temporal patterns of LCZ time series, with the high-rise and open urban LCZ types becoming more prominent in urban morphology over the past two decades. Urban morphology varies considerably in urban expansion and urban renewal areas. From 2000 to 2020, inter-city urban morphology differences narrowed between the three urban agglomerations, but intra-city urban morphology differences widened.

Current Popular Methods for LCZ Mapping

Chapter

Jun 2024

GIS-based mapping and Remote sensing-based mapping have been widely employed in LCZ classification. GIS mapping method uses multiple data sources, such as remote sensing imagery, aerial photographs, and existing GIS databases of planning information, which allows detailed descriptions of urban forms and land cover conditions. WUDAPT (World Urban Database and Access Portal Tools) is a global project aimed at creating a worldwide comprehensive database of LCZ based on remote sensing techniques. It provides a fast and low-cost way of classifying land cover conditions based on free-access remote sensing images. The supervised pixel-based classification of WUDAPT shows wide applicability in different regions around the world, which is especially helpful for developing regions to fast establish LCZ classification maps. The combined mapping method of GIS-based and WUDAPT mapping is an emerging trend that aims to leverage the strengths of both GIS and remote sensing techniques to improve the accuracy and efficiency of LCZ mapping. LCZ classification maps and databases are increasingly applied in climatic planning as information support for decision-making.

Recent Improvements in Supervised Pixel-Based LCZ Classification

Chapter

Jun 2024

Accurate local climate zone (LCZ) maps are crucial for urban environmental studies. The last chapter introduced the ways to generate LCZ maps, including object-based and pixel-based remote sensing and GIS methods. Among these methodologies, the supervised pixel-based method using open-access remote sensing imagery has gained popularity, providing a fast and cost-efficient way for LCZ classification. Implementing the World Urban Database and Access Portal Tools (WUDAPT) further provides an open platform and global database for consistent supervised pixel-based LCZ information to support different types of applications and research (http://www.wudapt.org/). This chapter outlines three critical components in supervised pixel-based LCZ classification, including (1) geometrical pre-processing and classification platform (Sect. 4.1), (2) remote sensing data (Sect. 4.2); and (3) classification algorithm (Sect. 4.3), and their recent development and improvement in LCZ classification. Three case studies comparing different classification algorithms in Asian cities’ LCZ mapping (Sects. 4.4, 4.5, and 4.6) are also presented in this chapter.

Classifying Pinus roxburghii Using an Innovative Training Approach of Fuzzy Models While Handling Heterogeneity Within Class in Western Himalayan Forests

Article

May 2024

Remote sensing can be used for effectively mapping plant species, thereby aiding in their sustainable management. Pinus roxburghii (PR), also known as Chir Pine, is often found alongside Quercus leucotricophora and Rhododendron arboreum around Dudhatoli range, Uttarakhand. It holds immense ecological and economic importance in the Himalayan region. It is observed that PR exhibits heterogeneity within an image likely due to varying aspect, shadows, and canopy coverage. This study focuses on mapping PR using an innovative individual sample as mean (ISM) approach embedded in the framework of the Possibilistic c-means (PCM) and noise clustering (NC) fuzzy classifier, specifically addressing heterogeneity within the class while comparing it with the conventional mean training parameter approach. The research utilises the Modified Soil Vegetation Index 2 (MSAVI2) from a semi-hypertemporal (SH) dataset consisting of 17 images acquired by the 8-band PlanetScope data. This study also experiments with different numbers of training samples to understand their impact on the output. Results of PCM with an m value of 2.1 and NC with δ value of 50,000 show good classified outputs. It was also found that a training sample size of 11 showed the best result. This study showcases progress in using the SH dataset, ISM-based PCM, and NC models with a limited number of training samples to overcome challenges posed by class heterogeneity.

Spatial-temporal patterns and influencing factors of the Building Green View Index: A new approach for quantifying 3D urban greenery visibility

Article

May 2024

Local Climate Zone Classification via Semi-Supervised Multimodal Multiscale Transformer

Article

Jan 2024

Local climate zone (LCZ) classification plays a critical role in urban environment research and has attracted extensive attention from many researchers. However, the potential of deep learning-based approaches is not yet fully explored in this field, even though neural networks continue to push the frontier for various applications. In this paper, we propose a novel multimodal multiscale Transformer network for LCZ classification by introducing multiscale patch embedding and multimodal fusion learning in Transformer architecture. The proposed multiscale patch embedding effectively captures hierarchical interrelationships of image contextual neighborhoods, and automatically learns discriminative features. And the proposed multimodal fusion learning enables the network to naturally fuse multispectral and synthetic aperture radar (SAR) data under the guidance of attention mechanism. To further improve classification accuracy, we impose semi-supervised learning to mine unlabeled image data information. Both labeled and pseudo-labeled data jointly drive our network updates. Experiments conducted on the So2Sat LCZ42, CHN15-LCZ and SouthKorea6-LCZ benchmark datasets demonstrate that our proposed approach outperforms other existing methods significantly and achieves state-of-the-art performance. In the generated LCZ maps, urban and natural classes are well distinguished, the urban structure with waters or mountains is well preserved. Finally, we also discuss the impact of the sample receptive field and sample heterogeneity on LCZ classification performance, which provides a new idea for future studies of LCZ classification.

Methodology to classify high voltage transmission poles using CNN approach from satellite images for safety public regulation application: Study case of rural area in Thailand

Article

Mar 2024

Local climate zone-based urban land cover classification from multi-seasonal Sentinel-2 images with a recurrent residual network

Article

Full-text available

Jun 2019
ISPRS J PHOTOGRAMM

A R T I C L E I N F O Keywords: Land cover Local climate zones (LCZs) Sentinel-2 Multi-seasonal Residual convolutional neural network (ResNet) Long short-term memory (LSTM) Recurrent neural network (RNN) A B S T R A C T The local climate zone (LCZ) scheme was originally proposed to provide an interdisciplinary taxonomy for urban heat island (UHI) studies. In recent years, the scheme has also become a starting point for the development of higher-level products, as the LCZ classes can help provide a generalized understanding of urban structures and land uses. LCZ mapping can therefore theoretically aid in fostering a better understanding of spatio-temporal dynamics of cities on a global scale. However, reliable LCZ maps are not yet available globally. As a first step toward automatic LCZ mapping, this work focuses on LCZ-derived land cover classification, using multi-seasonal Sentinel-2 images. We propose a recurrent residual network (Re-ResNet) architecture that is capable of learning a joint spectral-spatial-temporal feature representation within a unitized framework. To this end, a residual convolutional neural network (ResNet) and a recurrent neural network (RNN) are combined into one end-to-end architecture. The ResNet is able to learn rich spectral-spatial feature representations from single-seasonal imagery , while the RNN can effectively analyze temporal dependencies of multi-seasonal imagery. Cross validations were carried out on a diverse dataset covering seven distinct European cities, and a quantitative analysis of the experimental results revealed that the combined use of the multi-temporal information and Re-ResNet results in an improvement of approximately 7 percent points in overall accuracy. The proposed framework has the potential to produce consistent-quality urban land cover and LCZ maps on a large scale, to support scientific progress in fields such as urban geography and urban climatology.

A new fully convolutional neural network for semantic segmentation of polarimetric SAR imagery in complex land cover ecosystem

Article

Full-text available

Apr 2019
ISPRS J PHOTOGRAMM

Despite the application of state-of-the-art fully Convolutional Neural Networks (CNNs) for semantic segmentation of very high-resolution optical imagery, their capacity has not yet been thoroughly examined for the classification of Synthetic Aperture Radar (SAR) images. The presence of speckle noise, the absence of efficient feature expression, and the limited availability of labeled SAR samples have hindered the application of the state-of-the-art CNNs for the classification of SAR imagery. This is of great concern for mapping complex land cover ecosystems, such as wetlands, where backscattering/spectrally similar signatures of land cover units further complicate the matter. Accordingly, we propose a new Fully Convolutional Network (FCN) architecture that can be trained in an end-to-end scheme and is specifically designed for the classification of wetland complexes using polarimetric SAR (PolSAR) imagery. The proposed architecture follows an encoder-decoder paradigm , wherein the input data are fed into a stack of convolutional filters (encoder) to extract high-level abstract features and a stack of transposed convolutional filters (decoder) to gradually up-sample the low resolution output to the spatial resolution of the original input image. The proposed network also benefits from recent advances in CNN designs, namely the addition of inception modules and skip connections with residual units. The former component improves multi-scale inference and enriches contextual information, while the latter contributes to the recovery of more detailed information and simplifies optimization. Moreover, an in-depth investigation of the learned features via opening the black box demonstrates that convolutional filters extract discriminative polarimetric features, thus mitigating the limitation of the feature engineering design in PolSAR image processing. Experimental results from full polarimetric RADARSAT-2 imagery illustrate that the proposed network outperforms the conventional random forest classifier and the state-of-the-art FCNs, such as FCN-32s, FCN-16s, FCN-8s, and SegNet, both visually and numerically for wetland mapping.

Semantic Segmentation of slums in satellite images using transfer learning on fully convolutional neural networks

Article

Full-text available

Feb 2019
ISPRS J PHOTOGRAMM

Unprecedented urbanization in particular in countries of the global south result in informal urban development processes, especially in mega cities. With an estimated 1 billion slum dwellers globally, the United Nations have made the fight against poverty the number one sustainable development goal. To provide better infrastructure and thus a better life to slum dwellers, detailed information on the spatial location and size of slums is of crucial importance. In the past, remote sensing has proven to be an extremely valuable and effective tool for mapping slums. The nature of used mapping approaches by machine learning, however, made it necessary to invest a lot of effort in training the models. Recent advances in deep learning allow for transferring trained fully convolu-tional networks (FCN) from one data set to another. Thus, in our study we aim at analyzing transfer learning capabilities of FCNs to slum mapping in various satellite images. A model trained on very high resolution optical satellite imagery from QuickBird is transferred to Sentinel-2 and TerraSAR-X data. While free-of-charge Sentinel-2 data is widely available, its comparably lower resolution makes slum mapping a challenging task. TerraSAR-X data on the other hand, has a higher resolution and is considered a powerful data source for intra-urban structure analysis. Due to the different image characteristics of SAR compared to optical data, however, transferring the model could not improve the performance of semantic segmentation but we observe very high accuracies for mapped slums in the optical data: QuickBird image obtains 86-88% (positive prediction value and sensitivity) and a significant increase for Sentinel-2 applying transfer learning can be observed (from 38 to 55% and from 79 to 85% for PPV and sensitivity, respectively). Using transfer learning proofs extremely valuable in retrieving information on small-scaled urban structures such as slum patches even in satellite images of decametric resolution .

A Comprehensive Evaluation of Approaches for Built-Up Area Extraction from Landsat OLI Images Using Massive Samples

Article

Full-text available

Dec 2018

Detailed information about built-up areas is valuable for mapping complex urban environments. Although a large number of classification algorithms for such areas have been developed, they are rarely tested from the perspective of feature engineering and feature learning. Therefore, we launched a unique investigation to provide a full test of the Operational Land Imager (OLI) imagery for 15-m resolution built-up area classification in 2015, in Beijing, China. Training a classifier requires many sample points, and we proposed a method based on the European Space Agency’s (ESA) 38-m global built-up area data of 2014, OpenStreetMap, and MOD13Q1-NDVI to achieve the rapid and automatic generation of a large number of sample points. Our aim was to examine the influence of a single pixel and image patch under traditional feature engineering and modern feature learning strategies. In feature engineering, we consider spectra, shape, and texture as the input features, and support vector machine (SVM), random forest (RF), and AdaBoost as the classification algorithms. In feature learning, the convolutional neural network (CNN) is used as the classification algorithm. In total, 26 built-up land cover maps were produced. The experimental results show the following: (1) The approaches based on feature learning are generally better than those based on feature engineering in terms of classification accuracy, and the performance of ensemble classifiers (e.g., RF) are comparable to that of CNN. Two-dimensional CNN and the 7-neighborhood RF have the highest classification accuracies at nearly 91%; (2) Overall, the classification effect and accuracy based on image patches are better than those based on single pixels. The features that can highlight the information of the target category (e.g., PanTex (texture-derived built-up presence index) and enhanced morphological building index (EMBI)) can help improve classification accuracy. The code and experimental results are available at https://github.com/zhangtao151820/CompareMethod.

Global transferability of local climate zone models

Article

Full-text available

Nov 2018

Using the cloud-computing resources of Google's Earth Engine (EE) and a range of satellite sensors (input features) this paper for the first time explores the potential of up-scaling the current Local Climate Zone mapping efforts to regional and global scales. Using a transferability framework, we test whether information from one city contains valuable information to cate-gorise a different city, simultaneously exploring the role of the input features and the characteristics of individual cities. It was found that the accuracies of the EE approach are comparable to the standard WUDAPT method, making EE a viable alternative approach. The results from the city-to-city experiments are generally poor when compared to the single city benchmark experiments , indicating that the collection of site-specific training areas remains relevant. However, LCZ mapping accuracies are considerably improved when a) the source of the training data is from a city in the same ecoregion as the city of interest and b) if the training areas from several cities are combined. These results support the claim that the LCZ framework is a universal urban typology and indicate that, provided a continued optimisation of input features and quality of training areas, up-scaling to regional or global levels is feasible.

Feature Importance Analysis for Local Climate Zone Classification Using a Residual Convolutional Neural Network with Multi-Source Datasets

Article

Full-text available

Oct 2018

Global Local Climate Zone (LCZ) maps, indicating urban structures and land use, are crucial for Urban Heat Island (UHI) studies and also as starting points to better understand the spatio-temporal dynamics of cities worldwide. However, reliable LCZ maps are not available on a global scale, hindering scientific progress across a range of disciplines that study the functionality of sustainable cities. As a first step towards large-scale LCZ mapping, this paper tries to provide guidance about data/feature choice. To this end, we evaluate the spectral reflectance and spectral indices of the globally available Sentinel-2 and Landsat-8 imagery, as well as the Global Urban Footprint (GUF) dataset, the OpenStreetMap layers buildings and land use and the Visible Infrared Imager Radiometer Suite (VIIRS)-based Nighttime Light (NTL) data, regarding their relevance for discriminating different Local Climate Zones (LCZs). Using a Residual convolutional neural Network (ResNet), a systematic analysis of feature importance is performed with a manually-labeled dataset containing nine cities located in Europe. Based on the investigation of the data and feature choice, we propose a framework to fully exploit the available datasets. The results show that GUF, OSM and NTL can contribute to the classification accuracy of some LCZs with relatively few samples, and it is suggested that Landsat-8 and Sentinel-2 spectral reflectances should be jointly used, for example in a majority voting manner, as proven by the improvement from the proposed framework, for large-scale LCZ mapping.

A 3D convolutional neural network method for land cover classification using LiDAR and multi-temporal Landsat imagery

Article

Full-text available

Aug 2018
ISPRS J PHOTOGRAMM

Terrestrial landscape has complex three-dimensional (3D) features that are difficult to extract using traditional methods based on 2D representations. These methods often relegate such features to raster or metric-based (two-dimensional) representations based on Digital Surface Models (DSM) or Digital Elevation Models (DEM), and thus are not suitable for resolving morphological and intensity features for fine-scale land cover mapping. Small-footprint LiDAR provides an ideal way for capturing these 3D features. This research develops a novel method of integrating airborne LiDAR derived features and multi-temporal Landsat images to classify land cover types. We tested our approach in Williamson County, Illinois, which has diverse and mixed landscape features. Specifically, our method applied a 3D convolutional neural network (CNN) approach to extract features from LiDAR point clouds by (1) creating an occupancy grid, an intensity grid at 1-meter resolution, and then (2) normalizing and incorporating data into the 3D CNN. The extracted features (e.g., morphological and intensity features) from the 3D CNN were finally combined with multi-temporal spectral data to enhance the performance of land cover classification based on a Support Vector Machine classifier. Visual interpretation from both hyper-resolution photos and point clouds was used for training and preparation of testing data. The classification results show that our method outperforms a traditional method by 2.65% (from 81.52% to 84.17%) when solely using LiDAR and 2.19% (from 90.20% to 92.57%) when combining all available imageries. We demonstrate that our method can effectively extract LiDAR features and improve fine-scale land cover mapping through fusion of complementary types of remote sensing data.

TreeUNet: Adaptive Tree convolutional neural networks for subdecimeter aerial image segmentation

Article

Oct 2019
ISPRS J PHOTOGRAMM

Fine-grained semantic segmentation results are typically difficult to obtain for subdecimeter aerial imagery segmentation as a result of complex remote sensing content and optical conditions. Recently, convolutional neural networks (CNNs) have shown outstanding performance on this task. Although many deep neural network structures and techniques have been applied to improve accuracy, few have attended to improving the differentiation of easily confused classes. In this paper, we propose TreeUNet, a tool that uses an adaptive network to increase the classification rate at the pixel level. Specifically, based on a deep semantic model infrastructure, a Tree-CNN block in which each node represents a ResNeXt unit is constructed adaptively in accordance with the confusion matrix and the proposed TreeCutting algorithm. By transmitting feature maps through concatenating connections, the Tree-CNN block fuses multiscale features and learns best weights for the model. In experiments on the ISPRS two-dimensional Vaihingen and Potsdam semantic labelling datasets, the results obtained by TreeUNet are competitive among published state-of-the-art methods. Detailed comparison and analysis show that the improvement brought by the adaptive Tree-CNN block is significant.

Convolutional Neural Network-Based Land Cover Classification Using 2-D Spectral Reflectance Curve Graphs With Multitemporal Satellite Imagery

Article

Dec 2018

Generating WUDAPT Level 0 data – Current status of production and evaluation

Article

Nov 2018

The World Urban Database and Access Portal Tools (WUDAPT) project has grown out of the need for better information on the form and function of cities globally. Cities are described using Local Climate Zones (LCZ), which are associated with a range of key urban climate model parameters and thus can serve as inputs to high resolution urban climate models. We refer to this as level 0 data for each city. The LCZ level 0 product is produced using freely available Landsat imagery, crowdsourced training areas from the community, and the open source SAGA software. This paper outlines the protocol by which LCZ maps generated by different members of the community are produced and evaluated. In particular, the quality assessment comprises cross-validation, review, and cross-comparison with other data sets. To date, the results from the different quality assessments show that the LCZ maps are generally of moderate quality, i.e. 50–60% overall accuracy (OA), but this is much higher when considering all built-up classes together or using weights that take the morphological and climatic similarity of certain classes into account. The training data contributed by researchers from around the world also vary in quality and in the interpretation of the landscape, which affects the final quality of the LCZ maps. The acceptable level of quality needed will depend heavily on the application of the data. However, initial modelling studies that use the level 0 products as inputs showed improved performance in simulating the urban climate when replacing the default surface descriptions with the WUDAPT level 0 data. This is also promising for the application of level 0 data in regional and global climate and weather models and supports the assumption that the current level 0 products are already of sufficient quality for certain applications. Moreover, there are various ongoing developments to improve the methods used to produce LCZ maps and their accuracy.

Comparison between convolutional neural networks and random forest for local climate zone classification in mega urban areas using Landsat images

Abstract and Figures

Recommended publications

Fighting fake Chinese Herbal Medicines

Simulation model prepares cardiologists for surgeries

Analysis of Environmental Change Detection Using Satellite Images (Case Study: Irrawaddy Delta, Myan...

Land Cover Changes and Their Driving Mechanisms in Central Asia from 2001 to 2017 Supported by Googl...

Object-Based Change Detection in the Cerrado Biome Using Landsat Time Series

Analysis of Urbanization of Zagreb