ArticlePDF Available

Toward a Generalizable Image Representation for Large-Scale Change Detection: Application to Generic Damage Analysis

January 2016
IEEE Transactions on Geoscience and Remote Sensing 54(6):1-10

January 2016
54(6):1-10

DOI:10.1109/TGRS.2016.2516402

Authors:

Uber

Each year, multiple catastrophic events impact vulnerable populations around the planet. Assessing the damage caused by these events in a timely and accurate manner is crucial for efficient execution of relief efforts to help the victims of these calamities. Given the low accessibility of the damaged areas, high-resolution optical satellite imagery has emerged as a valuable source of information to quickly asses the extent of damage by manually analyzing the pre- and postevent imagery of the region. To make this analysis more efficient, multiple learning techniques using a variety of image representations have been proposed. However, most of these representations are prone to variabilities in capture angle, sun location, and seasonal variations. To evaluate these representations in the context of damage detection, we present a benchmark of 86 pre- and postevent image pairs with respective reference data derived from United Nation Operational Satellite Applications Programme (UNOSAT) assessment maps, spanning a total area of 4665 [Formula: see text] from 11 different locations around the world. The technical contribution of our work is a novel image representation based on shape distributions of image patches encoded with locality-constrained linear coding. We empirically demonstrate that our proposed representation provides an improvement of at least 5%, in equal error rate, over alternate approaches. Finally, we present a thorough robustness analysis of the considered representational schemes, with respect to capture-angle variabilities and multiple sensor combinations.

Scatter plot showing the differences of sun and satellite angles for the 86 selected strips. The scatter is provided in angular space, where the x-axis represents the radius and y-axis the actual angle.

…

Average of the ROC curves obtained by linear supervised classification. This graph shows the impact of the coding schemas and coding parameters on the false negative and true positive rates.

…

Effects of satellite, sun-angle differences and image misalignment on EER derived from shape distribution features. On the left plots, the color encodes the EER going from low values in blue to higher ones in red.

…

Figures - uploaded by Lionel Gueguen

Content may be subject to copyright.

Content uploaded by Lionel Gueguen

Content may be subject to copyright.

Towards A Generalizable Image Representation

For Large-Scale Change Detection: Application

To Generic Damage Analysis

Lionel Gueguen and Raffay Hamid

Abstract

Each year, multiple catastrophic events impact vulnerable populations around the planet. Assessing the damage

caused by these events in a timely and accurate manner is crucial for efﬁcient execution of relief efforts to help the

victims of these calamities. Given the low accessibility of the damaged areas, high resolution optical satellite imagery

has emerged as a valuable source of information to quickly asses the extent of damage by manually analyzing the

pre- and post-event imagery of the region. To make this analysis more efﬁcient, multiple learning techniques using a

variety of image representations have been proposed. However, most of these representations are prone to variabilities

in capture angle, sun location and seasonal variations. To evaluate these representations in the context of damage

detection, we present a benchmark of 86 pre- and post-event image-pairs with respective reference data derived

from UNOSAT assessment maps, spanning a total area of 4,665 km2from 11 different locations around the world.

The technical contribution of our work is a novel image representation based on shape distributions of image-patches

encoded with locality-constrained linear coding. We empirically demonstrate that our proposed representation provides

an improvement of at least 5% in equal-error-rate over alternate approaches. Finally, we present a thorough robustness

analysis of the considered representational schemes with respect to capture-angle variabilities and multiple sensor

combinations.

Index Terms

Change detection, post-damage assessment, shape distribution, locally linear encoding, benchmark.

I. INTRODUCTION

Each year, hundreds of catastrophic events impact vulnerable areas around the world. Assessing the extent of

damage caused by these crises is crucial in the timely allocation of resources to help the affected populations. Since

disaster-locations are usually not readily accessible, the use of either optical or SAR very high resolution satellite

imagery has emerged as a valuable source of information for estimating the impact of catastrophic events. However,

currently these assessments are done by manually analyzing both pre- and post-event images or only the post images

of distressed areas, making it a labor-intensive and expensive process. It is therefore important to scale-up damage

Lionel Gueguen and Raffay Hamid are with the Image Mining R&D group of DigitalGlobe Inc. based in Westminster, Colorado, USA.

December 16, 2015 DRAFT

(a) Eleven AOIs

1-a

1-b

2-a

2-b

3-a

3-b

4-a

4-b

(b) Examples

Fig. 1. (Top) Areas of interest shown with red dots. Each area represents multiple local regions. We considered 11 local regions,

spanning 4,665 km2. (Bottom) Before and after imagery for different events shown in the two rows. These include (1) Typhoon

(Phillipines, Oct. 2013), (2) Armed conﬂict (Central African Republic, Dec. 2013), (3) Earthquake (Pakistan, Sept. 2013), and

(4) Internally displaced people’s shelters (Somalia, May 2013).

detection to larger areas accurately and efﬁciently. Our work is a step towards solving this problem by the change

detection analysis of optical pre- and post-images.

In the following, we summarize some of the key challenges that need to be addressed in this regard, and how

our work contributes towards them:

1- Comprehensive Data-Set: Thus far, there has been a lack of comprehensive labeled data-set that could be used

to explore automatic damage detection at scale. To this end, we present a benchmark data-set of 86 pairs of pre-

and post-event VHR optical satellite imagery of distressed areas covering 4,665 km2with the associated reference

dataset of damaged regions acquired by expert interpreters. This data-set was collected by using the satellites of

DigitalGlobe Inc. Our data-set covers 11 different regions from around the world, and spans a wide range of

terrains and climates, with a variety of damage types (see Figure 1). This data-set enables us to rigorously explore

the various facets of the problem at hand.

2- Appropriate Feature Choice: The scale of our problem naturally presents an accuracy-efﬁciency tradeoff for

the features being considered. To this end, we introduce the use of trees-of-shapes features [1] in a soft-assignment

locality-constrained linear encoding framework [2] that focuses more on the shape characteristics of a scene, as

opposed to its edge attributes (as done by other popular descriptors e.g., SIFT [3]). Our results show that this

December 16, 2015 DRAFT

difference proves to be quite important to detect damaged areas accurately. We present a thorough empirical analysis

for the effectiveness of our proposed scheme, and compare it to multiple alternatives.

3- Algorithmic Efﬁciency: Given that the scale of the damaged areas is usually quite large, and the identiﬁcation

of damaged areas is required as soon as possible, having a framework with high algorithmic efﬁciency is of primary

importance. To this end, we propose several algorithmic speed-ups to implement our proposed framework.

The rest of the paper is arranged as follows. We begin by going over some of the relevant previous work within

the context of change and damage detection using satellite imagery in Section II. Section III describes in detail our

benchmark dataset used to evaluate change detection methods for post-disaster situation assessment. In Section IV

we describe an enhancement of shape distribution in terms of locality-constrained shape encoding. Results are

described in section V showing the improvements achieved by locally linear encoding. Finally section VI presents

the main conclusions of our work along with some discussion.

II. PREVIOUS WORK

The problem of change detection specially within the context of damage detection has been explored from mul-

tiple different perspectives. Various change detection methods [4] have been proposed to perform such analysis

automatically in an efﬁcient manner. However, most of these methods usually focus on only one type of disaster

with only a few cases. For examples, the work in [5] shows the ability to capture earthquake damages, while [6]

and [7] underline the capability of detecting damage structures due to armed-conﬂict. The work in [8] compares

image representations for tornado damages, while [9] [10] explore refugee camps formation. Although quite useful,

these methods propose ad-hoc image representations that do not converge to a single generalizable characterization

due to the lack of a large benchmark data-set encompassing the many variabilities encountered in damage analysis.

Such variabilities include type of disaster, geographical areas, acquisition angles, and sensor combinations, among

others.

Several unsupervised change-detection methods for optical images with passive sensors have been proposed

over the past decade. Among them, a widely used technique is the change vector analysis (CVA) [11]. The CVA

technique consists of computing differences between two images measuring the same scene at two distinct instants.

As a result, each image pixel is associated to a single difference value or a multidimensional spectral change vector.

The analysis of these spectral change vectors through unsupervised classiﬁcation produces a change map [12].

While this approach is quite general, it does not allow specifying the change of interest, and results in detecting

phenological changes as much as the changes of interest. Furthermore, it can suffer from spatial inaccuracies in

image registration.

Improvements to the CVA approach have been proposed for capturing phenological changes by canonical correla-

tion analysis and local mutual information [10], [13]. Other improvements have also been proposed for overcoming

spatial inaccuracies by modeling the spatial context before the change analysis [14], [15]. The combination of

these unsupervised approaches makes it possible to generate unsupervised change heat-maps which are robust to

phenological variations and to spatial inaccuracies by modeling the most probable transformation from the pre-

December 16, 2015 DRAFT

TABLE I

FOU R OF TH E SI X OF DIG ITAL GLOB E’S S ATEL LIT E ARE U SE D IN OU R 86 IMAGE-PAIRS. HER E TH E PER -PIXEL RESOLUTIONS OF

QUI CKBIRD (QB-2), WO RLD VIEW-1 (WV-1), WORL DVIEW-2 (WV-2) AND GE OEY E-1 (GE-1) AR E 0.61M,0.5M,0.46M A ND 0.41M

RE SPE CT IVE LY.

Satellites QB-2 WV-1 WV-2 GE-1

QB-2 1 - - -

WV-1 9 14 - -

WV-2 3 23 6 -

GE-1 5 11 9 5

image to the post-image and vice-versa. However these unsupervised approaches are limited by capturing relevant

changes as opposed to other real changes which are not of interest for the considered application.

To address this challenge, supervised methods have been proposed to focus the change heat-maps towards the

changes of interest [16] . Given the high skill-set required from the photo-interpreters to assess the damage accurately,

acquiring reliable ground-truth training examples is particularly challenging, and therefore this dependence on

manually labelled training data has to be minimized [17]. Towards this end, semi-supervised approaches have been

recently proposed showing an improvement for collecting labelled samples [18].

III. DATA SET AND BENCHMARK

We compiled a large scale data set to benchmark change detection methods in the context of damage detection.

To this end, eleven areas of interest (cf. Table III) around the world were selected. This selection was made

on the basis of major catastrophic events analyzed by the United Nations Institute for Training and Research

(UNITAR/UNOSAT) [19]. This international organization is responsible for publishing maps of geo-located points

indicating relevant changes on the ground. We used this information from UNITAR/UNOSAT to build our reference

datasets for the 11 selected AOIs, which is composed of 29945 points. Examples of these geolocated change points

are illustrated in Figs. 8- 11. The layout for these AOIs is shown in Figure 1. Different types of crisis events

were considered in our AOIs including armed conﬂicts, earthquakes, typhoons, and refugee-camp developments

(see Table III and Figure 1 for example cases).

We collected high resolution panchromatic imagery from DigitalGlobe’s archive such that we can form image-

pairs covering the considered AOIs that were acquired as close as possible to the dates of the catastrophic events.

In the context of damage analysis, the pre- and post-event images are likely to be captured from different sensors,

and with different acquisition conditions. To incorporate these acquisition variabilities, we selected multiple pairs

per AOI, resulting in a set of 86 image-pairs from the DigitalGlobe’s satellite constellation: QuickBird, WorldView-

1, WorldView-2and GeoEye-1without any restriction on the angle of acquisitions. The variability in our sensor

combinations is given in Table. I.

Another important source of variability in satellite images is the acquisition angles, as it effects the directions and

lengths of shadows cast by physical features on the ground (e.g., human settlements). While maintaining similar

acquisition angles help automate change detection techniques, obtaining pairs of pre- and post-event images in

December 16, 2015 DRAFT

TABLE II

LIS T OF SH AP E DES CRI PT ORS U SE D BY OU R FR AME WOR K.

Area µ0,0

Eccentricity q1−

λ1

λ0

λi

µ2,0+µ0,2

2+ (2i−1) q4µ2

1,1+(µ2,0−µ0,2)2

Hu1η2,0+η0,2

Hu2(η2,0−η0,2)2+ 4η2

1,1

such a tight time constraint cannot be guaranteed. To incorporate this important variability in our data-set, we did

not apply any restriction on the angles resulting in a uniform sampling of the multi-angle space as illustrated in

Figure 2.

TABLE III

THE 11 AREAS OF INTEREST SELECTED FOR BUILDING THE BENCHMARK DATASET FOR CHANGE DETECTION ANALYSIS. THE 1 1 AOIS

HAVE BE EN S ELE CT ED WI TH R ESP EC T TO TH E UNITAR/UNOSAT PHO TO-I NTE RP RETATI ON DATA PU BLI SHE D AT

HT TP://WWW.UNI TAR.O RG /UNO SAT/. VHR PAN CH ROM ATIC IM AGE PA IRS H AVE BE EN SE LEC TE D AS TO D EFI NE DATE I NTE RVALS

EN COM PASS IN G AND A S CLO SE A S POS SI BLE T O THE G IV EN PR E-A ND PO ST-DATE S.

AOI Disaster Type Pre-Date Post-Date Number of pairs UNITAR Ref.

Damage Assessment for Jebri area, Awaran

District, Balochistan Province, Pakistan

earthquake 27/08/13 27/09/13 3 1831

Damage Assessment for Gajar Area, Awaran

District, Balochistan Province, Pakistan

earthquake 26/08/13 26/09/13 10 1828

Damaged Structures in Bentiqui, Leyte,

Philippines

typhoon 11/09/13 11/11/13 2 1866

Structural Development, Afgooye Corridor,

Somalia

refugees 12/02/13 25/05/13 10 1856

Damage Assesment In The City Of Malakal,

Upper Nile State, South Sudan

conﬂict 06/12/13 15/03/14 10 1961

Damage assesment in the city of Bor, Jonglei

State, South Sudan

conﬂict 25/12/13 19/01/14 10 1917

Destruction in Mayom, Unity State, South

Sudan

conﬂict 29/09/13 11/01/14 10 1922

Damage Assessment in the City of Bentiu,

Unity State, South Sudan

conﬂict 02/01/14 18/01/14 10 1919

Destruction in Rubkona, Unity State, South

Sudan

conﬂict 02/01/14 13/01/14 10 1915

Damage Assessment in Paoua, OuhamPende,

Central African Republic

conﬂict 03/11/13 18/06/14 1 2016

Destruction in Bossangoa Area, Ouham,

Central African Republic

conﬂict 05/12/13 22/01/14 10 1957

December 16, 2015 DRAFT

210

240

270

120

300

150

330

180 0

Satellite Azimuth Difference

Satellite Elevation Difference

210

240

270

120

300

150

330

180 0

Sun Azimuth Difference

Sun Elevation Difference

Used Pair Angle Difference

Optimal Pair Angle Difference

Fig. 2. Scatter plot showing the differences of sun and satellite angles for the 86 selected strips. The scatter is provided in angular space, where

the x-axis represents the radius and y-axis the actual angle.

IV. METHODOLOGY

Performing automatic change detection using high resolution imagery raises multiple challenges. In the following

we present our proposed methodology and show how our approach addresses these challenges.

Due to acquisition angles differences, performing a direct pixel comparison produce a lot of false-alarms, and a

window-based image description is always found more robust [20], [21]. Furthermore, the changes that are relevant

in a given situation greatly vary across geography, events types, and acquisition angle combinations. Therefore, it is

generally helpful to indicate the relevant changes with training examples to the detection system. In the following,

we propose a novel patch-based image description exploiting connected component based image analysis along

with an efﬁcient encoding scheme to enhance the accuracy of automatic change analysis as illustrated in Fig. 3.

A. Image Encoding

Max/Min-Tree and tree of shapes have shown great potential for analyzing high resolution panchromatic imagery [1],

[22]. These structures allow efﬁciently and compactly analyzing images without loss of information, by representing

the connected components of lower and upper level sets, i.e.,:

χλ(u) = {p∈Ω|u(p)≤λ},(1)

χλ(u) = {p∈Ω|u(p)≥λ}.(2)

where for a gray-scale image u: Ω 7→ N, we deﬁne χλ(u)and χλ(u)as the lower and upper level-sets of u. Note

that the connected components of these (upper or lower) level sets are a lossless representation of uand provide

its segment based representation, which is fundamentally different from edge based image representations [3].

These patch-based decompositions allow the analysis of image objects at very different scales, enabling to describe

tiny image structures as well as larger ones in one combined representation. When tackling damage analysis, it is

December 16, 2015 DRAFT

important to keep as much ﬁne resolution informations as possible. The lower and upper level sets inherently keep

this ﬁne grained information for further usage.

In the context of satellite image analysis, it is imperative to have image descriptors which are invariant to rotation

and limited translations. To achieve these invariants, we propose to use the shape descriptors of the components

which are rotation and translation invariant. We propose to use the second and third order central moments [23] as

the shape descriptors.

Let {pi}n

i=1 be the npixels composing a peak component Pλ

p1(u). Each pixel piis a pair of horizontal and

vertical coordinates pi= (xi, yi). The average of a spectral band vis given by:

Av(Pλ

p1(u)) = 1

i=1

v(xi, yi).(3)

The central and normalized shape moments µa,b,ηa,b are simply expressed for a pair of integers (a, b)∈N2by:

¯x(Pλ

p1(u)) = 1

i=1

xi,(4)

¯y(Pλ

p1(u)) = 1

i=1

yi,(5)

µa,b(Pλ

p1(u)) =

i=1

(xi−¯x)a(yi−¯y)b,(6)

ηa,b(Pλ

p1(u)) = µa,b

µ(a+b)/2+1

0,0

.(7)

These moments are combined to derive shape descriptors given in Table II. All descriptors can be computed in

linear time by exploiting the nesting property the max/min tree [24], [25], as well of the tree of shapes.

B. Bag-Of-Visual-Words Encoding

In recent years the bag-of-visual-words model has been extremely popular in computer vision for categorizing

large corpus of images [26], [27]. The model treats an image as a collection of spatially unordered representative

visual words. In our context, each constituent overlapping image patch of a large image strip is characterized by

the aggregation of its connected components’ shape descriptors. Previous studies have shown the efﬁciency of this

scheme for satellite image analysis [22]. In this image representation, each connected component is assigned to a

unique dictionary entry which is then used in the computation of visual-words histogram.

Within the context of bag-of-visual-words representation, various alternate encoding schemes have been proposed

that allow more powerful representation than the aforementioned simple hard-assignment scheme. Here we particu-

larly focus on two such improved encoding approaches, i.e., soft assignment and locality constrained (locally) linear

encoding [2]. In both these cases, a dictionary Dkof visual words is ﬁrst learned by sampling from a representative

subset of all connected components descriptors and clustering them in kclusters, where kis manually predeﬁned.

The representative subset is extracted randomly from all the connected components existing in the pre- and post-

images. Given the kcluster centers Dk={Ci}computed off-line, any new connected components descriptors d

can then be encoded. We now present the details of the soft-assignment and locally linear encoding schemes.

December 16, 2015 DRAFT

Soft Assignment: In this encoding technique, each connected component descriptor is softly assigned to more than

one clusters. The soft assignment attributes a weight to each cluster depending on how far the connected component

descriptor is from it, and can be expressed as:

w(d, Ci) = exp(−kd−Cik2

2/λ)

Γ(8)

where

Γ =

i=1

exp(−kd−Cik2

2/λ)(9)

Here λis a scaling parameter determined a priori depending on the dictionary size. With this encoding mechanism

each vector dis encoded by the normalized vector [w(d, C1),· · · , w(d, Ck)], with values between 0 and 1. In the

case of λ=∞, the soft assignment becomes a hard assignment where each descriptor is assigned to a unique

dictionary entry. Given a window Wcovering a set of connected components {C Ci}associated with their descriptors

{di}, its descriptor is obtained by computing soft histograms, i.e.:

hW(i) = X

CCj

w(dj, Ci),(10)

where the soft histogram of a window Wis [hW(1),· · · , hW(k)]. The histograms are not further normalized

as to account for various number of components in a window, relating ultimately to different objects. Also, not

normalizing allows to encompass the contrast information, which becomes very important in built-up areas where

the casted shadows plays a substantial role.

Locality-Constrained (Locally) Linear Encoding: The second encoding method tries to linearly encode the

descriptor with respect to the provided kcentroids [2]. In practice, one is set with the following minimization

problem:

arg min

l||d−Dkl||2(11)

such that,||l||1= 1

Here drepresents a shape descriptors, Dkrepresents the shape dictionary, and lrepresents the local linear code.

Then for any window W, its descriptor is obtained by summing the linear contributions of the connected components

descriptors it contains :

lW(i) = X

CCj

l(dj, Ci),(12)

where l(dj, i)is the linear code of the descriptor djat position i. Each window is then described by its k-length

vector [lW(1),· · · , lW(k)]. Again this descriptor is not further normalized to capture contrast and the number of

shape information.

For both encoding methods and for each shape, the encoding is done in practice by searching for its nnearest

neighbor entry in the codebook, and assuming that the dictionary is composed only of these centroids. This technique

allows to produce a sparse representation of each individual shape descriptors maintaining the memory footprint

small but with a good discriminative power. The weight of a shape is determined by the fraction of pixels covering

the considered window. This process is repeated for each window in the pre- and post-event images.

December 16, 2015 DRAFT

}

Image 1

Image 2

Feature

Extraction

Supervised

Learning

Final

Detection

Image

Pair

Feature

Encoding

Fig. 3. Given a pair of image-strips, we extract features of their overlapping windows and perform their locality-constrained encoding. We use

these codes in a supervised setting to learn a damage detection classiﬁer.

C. Learning Framework

Our learning framework is illustrated in Fig. 3. This work particularly explores the representational aspect of the

image patches for the problem of damage detection, and therefore focuses more on a supervised learning setting.

For details regarding the active learning aspects of our approach, we refer the reader to see [18].

Based on the available reference points regarding the positive and negative classes, we stack up their corresponding

window representation codes and use them to train a linear support vector machine (SVM). Note that we found

simply using a linear classiﬁer for our problem sufﬁcient since the feature space constructed both by soft-assignment

and locally linear coding already incorporates signiﬁcant amount of non-linearity. Therefore, we do not to incorporate

any further kernel projection in our framework resulting in faster training and detection times. For this work, we

used linear SVM [28] with L1-regularization and L2-loss function, where the L1-regularization allows us to embed

a feature selection, meaning a selection of the visual words that are optimal for discrimination between relevant

and irrelevant changes. As the shape distribution are stacked, different visual words can be selected for the pre-

and post-event images.

D. Algorithmic speed-up

The main bottleneck in our computational framework is by far the feature extraction and encoding step. We

achieve algorithmic speed-ups for this step in two important ways.

First, we use a quasi-linear algorithm for computing tree of shapes [29] to extract efﬁciently the upper and

the lower connected components. This representation organizes into a tree the nested connected components, by

maintaining the relations of lower level connected components being holes of the upper level connected components.

Min and max-tree [25] are efﬁcient algorithms for extracting separately the lower and upper level connected

components sets. These two trees can be fused in a unique tree of shapes by the level line algorithm [24]. In

this paper, we adopt the tree of shape algorithm [29] which has a better worst computational complexity than the

fusion of Max-Tree and Min-Tree. The algorithm provides us a tree of shape representation which can be efﬁciently

stored on two arrays of length n, where nis the number of pixels in the image [30]. The representation enables to

compute the moment based attributes in linear time by exploiting its nested properties [24], [31]. This algorithm

December 16, 2015 DRAFT

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

0.5

0.55

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

False Negative Rate

True Positive Rate

SA 1

SA 4

SA 8

SA 16

LLC 4

LLC 8

LLC 16

Fig. 4. Average of the ROC curves obtained by linear supervised classiﬁcation. This graph shows the impact of the coding schemas and coding

parameters on the false negative and true positive rates.

−40 −30 −20 −10 0 10 20 30

100

150

200

250

300

350

400

Difference Satellite Elevation

Difference Satellite Azimuth

0.05

0.1

0.15

0.2

0.25

0.3

−40 −30 −20 −10 0 10 20 30

−100

−80

−60

−40

−20

100

Difference Sun Elevation

Difference Sun Azimuth

0.05

0.1

0.15

0.2

0.25

0.3

0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5

0.05

0.1

0.15

0.2

0.25

0.3

0.35

Ground Displacement (meters)

Equal Error Rate

Fig. 5. Effects of satellite, sun-angle differences and image misalignment on EER derived from shape distribution features. On the left plots,

the color encodes the EER going from low values in blue to higher ones in red.

performs much faster than extracting the connected component at all possible thresholds and require a memory

footprint independent of the number of grey levels considered.

Our second source of speed-up comes during the computation of code histograms. Given a patch, one approach

could relying on looking at the components falling in it and then estimate the histogram of their code weighted

by the number of pixel falling in the patch. As a connected component may span over several patches, it becomes

more efﬁcient to perform a direct ﬁltering of the tree of shapes [25], where the component contrast is multiplied

by its corresponding code weight. Thus for each visual word code i, one records the components having a non

null weight, and remap and reweigh these components into an R. For a given patch W, one can gather thus its

December 16, 2015 DRAFT

entry lW(i)by computing the sum of Rwithin it: lW(i) = Px∈WR(x). For overlapping rectangular windows,

this process can be achieved efﬁciently with image integrals [32].

V. EX PE RI ME NT S AN D RE SU LTS

A. Comparison of Encoding Schemes

We compare the two proposed encoding schemes, namely Soft Assignement (SA) and Locally Linear Coding (LLC)

with different number of nearest neighbors used during the computation. First of all, each image is partitioned in

overlapping tiles of 50×50 pixels (50×50 m2) on the ground, separated by 10 meters, thus producing a change

detection of 10 meter resolution, given the 1meter resolution of the analyzed imagery. Using our dataset of 86

strip pairs covering an area of 4665 km2, the equivalent number of tiles to be considered is around 86 million.

A single dictionary is computed per pair of images, with 128 shape words encoding the characteristics of the

connected components present in the data set. Thus, each image tile is encoded by a histogram or a linear code

of 128-dimensions. Also, the pre- and post-image tile descriptors are stacked for the change detection analysis

purposes, leading to a 256-dimensions description vector for each 10 meter pixels of the considered AOIs. To put

the amount of data being analyzed in perspective, this representation occupies 335 GB of data for an imagery taking

139 GB of disk space.

Since the reference dataset is given in geolocated format, we ﬁrst rasterize it in the underlying 10 meter feature

grid, and buffer it with a radius of 25 meters to match our tile size. For each image-pair, we randomly pick 50%

of the damage examples as part of the training set and let the 50% remaining ones for evaluating the quality of

the detectors. As damage analysis is a highly imbalanced classiﬁcation problem, we also randomly pick negative

examples in a number equal to the number of positive examples of the training set. These training examples are

fed to the linear SVM.

Using the reference dataset collected from our AOIs, we use the receiver operating curves (ROCs) to evaluate the

different representations considered. Note that true positive rate (TPR) represents the number of relevant damages

detected over the total number of changes. Similarly, the false positive rate (FPR) represents the area of false-alarms

with respect to the area of no damages. For a given (TPR,FPR)tuple, the associated damage indicator covers an

area of f·TPR+(1 −f)·FPR, where fis the fraction of damaged area. Note that the quantity f·TPR +(1−f)·FPR

is also the relative search space size that needs to be examined by a human photo-interpreter during curation. We

use the Equal Error Rate (EER) as a summary of the performance of an ROC curve, which is given as the ROC

point satisfying FPR = (1 −TPR).

The comparison of the 2encoding schemes is shown in Fig. 4 in terms of the average of ROC curves obtained

for the 86 pairs. Also the dependencies to the number of neighbors during the encoding is evaluated and shown in

the same graph. When the number of neighbors goes to one, both methods behave the same as a hard assignment

of the visual words.

From the analysis of ROC curves, it can be seen that the LLC provides the best description for the purpose of

our experiments in comparison to SA. At a TPR = 0.80, the LLC-based representation provides a FPR = 0.12 in

comparison to a FPR = 0.18 provided by the best SA-based representation. As pointed out earlier, the FPR is a

December 16, 2015 DRAFT

direct approximation of the search space, given the small coverage of changes. Thus LLC-based encoding produces

a relative reduction of 30% of the search space.

Secondly, we can observe the impact of the number of neighbors on the accuracy. It can be seen that 4neighbors

provide the best results with LLC, and show a degradation of the accuracies as this number grows. This observation

suggests the existence of a best number of neighbors (greater than 1, hard assignment) maximizing the representation

discrimination. By observing the impact of the number of neighbor on SA, the effect is quite opposite as the accuracy

increases with it until reaching saturation. It can be observed that the description does not improve by using more

than 4-neighbors. Because of the same dictionary used for LLC and SA, this observation lets us draw the conclusion

that any point of the shape features space is close to an average of 4clusters, and that the other clusters are too

far away to be of much use.

B. Robustness To Variabilities

The average EER using shape distribution features as a function of sun, satellite angle-differences and image

misregistration are plotted in Figure 5. It is evident that satellite angle-differences do not impact the representation

accuracy, while the sun angle-differences seem to matter more. This is because sun angles impact shadows which

can signiﬁcantly vary image appearance. We also computed the average EER for each sensor combination given in

Table I. The standard deviation for this is 0.03, indicating that our approach is robust to using pre- and post-event

imagery from different sensors. Finally, for each image pair, we computed average displacement and its norm

is depicted against the EER in Figure 5, showing an independance between both axes, meaning a robustness to

inaccuracies in spatial registration.

C. Collection of Training Examples

In the context of rapid damage detection, the collection of training examples is a crucial component for efﬁcient

delivery. Also one wants to collect a minimum amount of training examples for achieving the best accuracies. In

the following series of experiments we consider different percentages of positive training examples, and evaluate

their impact on the summary Equal Error Rate metric derived form the average ROC curves. The negative examples

are picked randomly in the image. This task is not expensive since damages are in most cases a rare class. In other

words, it is unlikely that a positive example is picked as a negative example by randomly sampling the AOI.

The impact of the number of training examples on the EER is depicted in Fig. 6. The ﬁrst observation shows that

the best LLC-based representation produces a degradation of 0.05 of the ERR by taking 5% of positive training

samples, being in average composed 140 points, instead of 50%. This highlights the generalization capability of this

representation from few examples. Secondly, it can be observed that the ranking of the different coding schemas

is maintained with various percentages of training examples, except the 8-neighbors LLC representation which

degrades much faster than the SA-based representation. This highlights again the importance of the number of

neighbors for LLC for maximizing the detection accuracy with a limited number of training examples.

December 16, 2015 DRAFT

5 10 15 20 25 30 35 40 45 50

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

% No. of Samples

Equal Error Rate

SA 1

SA 4

SA 8

SA 16

LLC 4

LLC 8

LLC 16

Fig. 6. Impact of the percentage of positive training samples of the Equal Error rate with linear classiﬁcation.

D. Computational Complexity

We provide indicative duration numbers of our implementation and show the beneﬁts of the proposed algorithms

over simple approaches. We perform our computation on one 11-bit image of size 10000×10000 pixels. Given the

bit depth, the number of grey level becomes 2048, which has a non negligeable impact one the extraction of the

connected component. Performances are evaluated with a single Intel@3.3GHz CPU.

We assess ﬁrst the beneﬁt of using the tree of shape algorithm over an approach gathering the CC at all possible

thresholds between 0 and 2048. The tree of shape performs the decomposition of the image and the computation

of the moment based attributes in 372 seconds, while the simple approach requires 2544 seconds. We observe 6×

speed up by employing efﬁcient algorithms, without any consideration on the memory consumption which is of the

order of the number of grey values.

Secondly, we assess the beneﬁts in computing the distribution with the proposed technique. The proposed

approach complexity depends linearly on the dictionary size, as we loop over the visual words. Given a visual

word, reconstituting the weighted CCs takes 1.28 seconds. Then, performing the spatial ﬁltering with a 50 pixel

wide window separated by 10 pixels takes 0.39s with the use of image integrals [32]. As a comparison, performing

the spatial ﬁltering and down sampling with a 16-core multi-processor implementation requires 0.9s. There, we

observe again the beneﬁts of strong algorithms. Consequently processing a visual word to obtain the corresponding

histogram bin for all windows requires 1.67s, and given a dictionary with 128 entries, the spatial aggregation

process takes 213s. On the contrary if the process would take place by looping over the windows, the collection of

shape distributions takes 5100s for the same setup which is about 24 times more than the proposed implementation.

December 16, 2015 DRAFT

Fig. 7. Damage Type: Conﬂict. Area: Malakal, Upper Nile State, South Sudan. Top row. The pre- and post- images are displayed side by side

with the UN change point of interests overlaid in green. The right side shows the soft change detection obtained with SD LLC 4, provided that

5% of the relevant changes were given for training the linear SVM. Bottom row. A crop of the same data is displayed at higher resolution,

showing the spatial precision of the change detection.

December 16, 2015 DRAFT

Fig. 8. Damage Type: Conﬂict. Area: Bossangoa, Ouham, Central African Republic. Top row. The pre- and post- images are displayed side by

side with the UN change point of interests overlaid in green. The right side shows the soft change detection obtained with SD LLC 4, provided

that 5% of the relevant changes were given for training the linear SVM. Bottom row. A crop of the same data is displayed at higher resolution,

showing the spatial precision of the change detection.

Indeed, our implementation allow to compute once the elements which are used by several windows.

No speed-up is proposed during the construction of the dictionary and the LLC, as this step takes 100 seconds for

the considered image. Summing the three extraction parts, the proposed representation for a 10000×10000 image

can be computed from a single CPU in 685 seconds as opposed to a straightforward algorithmic implementation

requiring 7744 seconds. Overall, we observe a speed-up of order 10×, which is critical for rapid damage assessment.

Moreover, this proposed single CPU implementation can be further parallelized by processing image blocks, which

can lead to further speed ups.

E. Illustrations and Discussion

Four illustrations of 4-neighbors LLC based change detection are given in Figs. 7- 11. The ﬁrst examples are

instances of the armed conﬂicts hitting regions in South Sudan and Central African Republic. The two consecutives

December 16, 2015 DRAFT

Fig. 9. Random windows of 250 meters are randomly picked over the four illustrated AOIs. For each window the number of reference points

is computed and compared to the average of the automatically extracted change index.

examples are instances of earthquake and typhoon provoked damages which hit areas of Pakistan and Philippines.

While the settlement patterns look different, they cover areas with well separated buildings and our dataset does

not cover damages occuring over high rise or high urban density areas, which may decrease the accuracies

of the approach. Both pre and post panchromatic imagery are displayed for the full collected AOIs, with the

UNITAR/UNOSAT geolocated points of interest overlaid in green. This reference dataset representation allows a

visual comparison to the supervised detection displayed in a colormap ranging from blue to red, where the supervised

detection where obtained by using 4-neighbors LLC based features and the linear SVM model are trained with

25% of the positive examples.

The four examples highlights the good match at global and close scale of the change detection with the UN

geolocated points. While the scenes undergo major changes of phenological nature (see Fig.7), thanks to training

examples the method is able to narrow the search to relevant change areas. Indeed no major false-alarms are

observed among the 4 sets. Examples of Figs.8,10 highlight also the spatial inconsistencies between imagery and

reference dataset, and because of the tiling mechanism these spatial errors do not have a major impact on the ﬁnal

results, as primarly exposed in Fig.5. While it is clear that detecting individually the points is a harder problem, the

results of Fig. 9 show that the representation is generic enough to capture the densities of interests points. Random

windows of 250 meters are randomly picked in the 4 detailed AOIs, where the number of reference points and

the average change index are computed. The normalized correlation is evaluated at .95 for the considered datasets.

Also, these change density maps can be easily turned into counting metrics by simple regression.

We conduct another set of qualitative experiments by visualizing and analyzing detected instances versus unde-

tected ones in Figure 12. Most of the missed damages were subtle, naturally making them challenging to automat-

ically detect. For cases where the damage occurred over buildings or well-deﬁned structures, our representation is

able to detect them with high precision. Mistakes tend to be made over subtle changes or damages covering only a

small area in an image chip such as an isolated house. Gaining in spatial precision would require multi-resolution

December 16, 2015 DRAFT

Fig. 10. Damage Type: Earthquake. Area: Gajar Area, Awaran District, Balochistan Province, Pakistan. Top row. The pre- and post- images

are displayed side by side with the UN change point of interests overlaid in green. The right side shows the soft change detection obtained with

SD LLC 4, provided that 5% of the relevant changes were given for training the linear SVM. Bottom row. A crop of the same data is displayed

at higher resolution, showing the spatial precision of the change detection.

analysis, such as pyramid-based representation, with better geo-located reference points.

VI. CONCLUSION

In this work, we proposed shape distributions with local linear coding scheme as a generalizable representa-

tion of image-patches within the context of performing large scale damage analysis. We presented a benchmark

dataset of of 86 high-resolution image-pairs to evaluate the improvement of locally linear coding (LLC) over hard

assignment when representing the image components encoded in a tree of shapes. The benchmark dataset has

been built by combining DigitalGlobe archive panchromatic imagery with damage reference datasets provided by

UNITAR/UNOSAT photo-interpreters in real-world scenarios. We created the benchmark dataset to encompass as

much variabilities as possible found in the context of rapid post-disaster mapping. In our empirical analysis, the use

of LLC based shape distributions showed an improvement of 5% in equal error rate (EER) in comparison to other

hard and soft assignment based representations. This improvement translates into a reduction by 30% of the search

space for an equivalent recall of the change instances. Finally, a robustness analysis shows the low correlation

December 16, 2015 DRAFT

Fig. 11. Damage Type: Typhoon. Area: Bentiqui, Leyte, Philippines. Top row. The pre- and post- images are displayed side by side with the

UN change point of interests overlaid in green. The right side shows the soft change detection obtained with SD LLC 4, provided that 5% of

the relevant changes were given for training the linear SVM. Bottom row. A crop of the same data is displayed at higher resolution, showing

the spatial precision of the change detection.

between the expected EER and the acquisition angles and ground small displacements, showing the adequacy of

our representation for change analysis in a variety of different contexts.

In our future work, we intend to release the dataset as an open source benchmark to allow more researchers

to evaluate their change detections techniques. We are also interested in expanding our framework with multi-

scale spatial pyramid encoding to enable the representation of isolated changes which are currently not adequately

captured using our single-sized patch based representations.

December 16, 2015 DRAFT

Damages Detected

Damages Missed

Pre-EventPost-EventPre-EventPost-Event

Fig. 12. Example damaged areas detected by our framework as well as the ones our framework missed.

REFERENCES

[1] G.-S. Xia, J. Delon, and Y. Gousseau, “Shape-based invariant texture indexing,” IJCV, vol. 88, no. 3, pp. 382–403, 2010. [Online].

Available: http://dx.doi.org/10.1007/s11263-009-0312-3

[2] J. Wang, J. Yang, K. Yu, F. Lv, T. Huang, and Y. Gong, “Locality-constrained linear coding for image classiﬁcation,” in IEEE CVPR,

2010, pp. 3360–3367.

[3] S. Lazebnik, C. Schmid, and J. Ponce, “Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories,” in

IEEE CVPR, vol. 2, 2006, pp. 2169–2178.

[4] A. Singh, “Digital change detection techniques using remotely-sensed data,” Int. J. Remote Sensing, vol. 10, no. 6, pp. 989–1003, 1989.

[5] P. Gamba, F. Dell’Acqua, and G. Trianni, “Rapid damage detection in the bam area using multitemporal sar and exploiting ancillary data,”

Geoscience and Remote Sensing, IEEE Transactions on, vol. 45, no. 6, pp. 1582–1589, June 2007.

[6] D. Al-Khudhairy, I. Caravaggi, and S. Giada, “Structural damage assessments from ikonos data using change detection, object-oriented

segmentation, and classiﬁcation techniques,” Photogrammetric Engineering & Remote Sensing, vol. 13, no. 7, pp. 825–837, 2005.

[7] L. Gueguen, M. Pesaresi, P. Soille, and A. Gerhardinger, “Morphological descriptors and spatial aggregations for characterizing damaged

buildings in very high resolution images,” in Proc.of the ESA-EUSC-JRC 2009 conference. Image Information Mining: automation of

geospatial intelligence from Earth Observation, Madrid, Spain, Nov. 2009.

[8] S. W. Myint, M. Yuan, R. S. Cerveny, and C. P. Giri, “Comparison of remote sensing image processing techniques to identify tornado

damage areas from landsat tm data,” Sensors, vol. 8, no. 2, pp. 1128–1156, 2008.

[9] T. Kemper, M. Jenerowicz, L. Gueguen, D. Poli, and P. Soille, “Monitoring changes in the menik farm idp camps in sri lanka using

multi-temporal very high-resolution satellite data,” International Journal of Digital Earth, vol. 4, no. sup1, pp. 91–106, 2011. [Online].

Available: http://dx.doi.org/10.1080/17538947.2010.512430

[10] L. Gueguen, P. Soille, and M. Pesaresi, “Change detection based on information measure,” IEEE TGRS, vol. 49, no. 11, pp. 4503–4515,

2011.

[11] F. Bovolo and L. Bruzzone, “A theoretical framework for unsupervised change detection based on change vector analysis in the polar

domain,” IEEE Tran. Geoscience and Remote Sensing, vol. 45, no. 1, pp. 218–236, Jan. 2007.

December 16, 2015 DRAFT

[12] L. Bruzzone and D. Prieto, “Automatic analysis of the difference image for unsupervised change detection,” IEEE TGARS, vol. 38, no. 3,

pp. 1171–1182, May 2000.

[13] A. Nielsen, “The regularized iteratively reweighted mad method for change detection in multi- and hyperspectral data,” IEEE Tran. Image

Processing, 2007.

[14] M. Schr¨

oder, H. Rehrauer, K. Seidel, and M. Datcu, “Spatial information retrieval from remote-sensing images. ii. Gibbs-Markov random

ﬁelds,” IEEE Transactions on Geoscience and Remote Sensing, vol. 36, no. 5, pp. 1446–1455, Sept. 1998.

[15] J. Inglada and G. Mercier, “A new statistical similarity measure for change detection in multitemporal SAR images and its extension to

multiscale change analysis,” IEEE TGARS, vol. 45, no. 5, pp. 1432–1445, May 2007.

[16] L. Bruzzone, D. Prieto, and S. Serpico, “A neural-statistical approach to multitemporal and multisource remote-sensing image classiﬁcation,”

IEEE Transactions on Geoscience and Remote Sensing, vol. 37, no. 3, pp. 1350–1359, May 1999.

[17] B. Demir, F. Bovolo, and L. Bruzzone, “Classiﬁcation of time series of multispectral images with limited training data,” Image Processing,

IEEE Transactions on, vol. 22, no. 8, pp. 3219–3233, Aug 2013.

[18] L. Gueguen and R. Hamid, “Large-scale damage detection using satellite imagery,” IEEE Conference on Computer Vision and Pattern

Recognition, 2015.

[19] “United Nations Institute for Training and Research, urlhttp://www.unitar.org/unosat/maps.”

[20] S. Cui and M. Datcu, “Coarse to ﬁne patches-based multitemporal analysis of very high resolution satellite images,” in Analysis of

Multi-temporal Remote Sensing Images (Multi-Temp), 2011 6th International Workshop on the. IEEE, 2011, pp. 85–88.

[21] G. Mercier, G. Moser, and S. Serpico, “Conditional copulas for change detection in heterogeneous remote sensing images,” IEEE TGARS,

vol. 46, no. 5, pp. 1428–1441, May 2008.

[22] L. Gueguen, “Classifying compound structures in satellite images: A compressed representation for fast queries,” IEEE TGARS, vol. 53,

no. 4, pp. 1803–1818, April 2015.

[23] ——, “Image patch characterization with shape distributions: Application to worldview-2 images,” in IEEE IGARSS, Melbourne, Australia,

2013.

[24] P. Monasse and F. Guichard, “Fast computation of a contrast-invariant image representation,” IEEE Transactions on Image Processing,

vol. 9, no. 5, pp. 860 –872, may 2000.

[25] E. Urbach, J. Roerdink, and M. Wilkinson, “Connected shape-size pattern spectra for rotation and scale-invariant classiﬁcation of gray-scale

images,” IEEE Tran. PAMI, vol. 29, no. 2, pp. 272–285, 2007.

[26] E. Nowak, F. Jurie, and B. Triggs, “Sampling strategies for bag-of-features image classiﬁcation,” in Computer Vision - ECCV 2006, ser.

Lecture Notes in Computer Science, A. Leonardis, H. Bischof, and A. Pinz, Eds. Springer Berlin Heidelberg, 2006, vol. 3954, pp.

490–503. [Online]. Available: http://dx.doi.org/10.1007/11744085 38

[27] H. Jegou, M. Douze, and C. Schmid, “Improving bag-of-features for large scale image search,” International Journal of Computer Vision,

vol. 87, no. 3, pp. 316–336, 2010. [Online]. Available: http://dx.doi.org/10.1007/s11263-009-0285- 2

[28] R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin, “LIBLINEAR: A library for large linear classiﬁcation,” Journal of

Machine Learning Research, vol. 9, pp. 1871–1874, 2008.

[29] T. Geraud, E. Carlinet, S. Crozet, and and Laurent Najman, “A quasi-linear algorithm to compute the tree of shapes of nD images,” in

ISMM, 2013. [Online]. Available: http://hal.archives-ouvertes.fr/docs/00/79/86/20/PDF/geraud.2013.ismm.pdf

[30] W. Hesselink, “Salembier’s min-tree algorithm turned into breadth ﬁrst search,” Information processing letters, vol. 88, pp. 225–229, 2003.

[31] L. Gueguen and G. Ouzounis, “Hierarchical data representation structures for interactive image information mining,” International

Journal of Image and Data Fusion, vol. 3, no. 3, pp. 221–241, 2012. [Online]. Available: http://www.tandfonline.com/doi/abs/10.1080/

19479832.2012.697924

[32] P. Viola and M. Jones, “Rapid object detection using a boosted cascade of simple features,” in Computer Vision and Pattern Recognition,

2001. CVPR 2001. Proceedings of the 2001 IEEE Computer Society Conference on, vol. 1, 2001, pp. I–511–I–518 vol.1.

December 16, 2015 DRAFT

Semi-Supervised Change Detection With Fourier-Based Frequency Transformation

Article

Full-text available

Jan 2024

Semi-supervised change detection (CD) methods have garnered increasing attention due to their capacity to alleviate the dependency of fully-supervised methods on a large number of pixel-level labels. These methods predominantly leverage Generative Adversarial Network (GAN) architecture and consistency regularization technology. However, they encounter challenges associated with background noise from cross-temporal images. In this paper, we propose a novel multi-level consistency-regularization-based semi-supervised CD approach that incorporates Fourier-based frequency transformation and a reliable pseudo label selection scheme. Specifically, we replace the low-frequency spectrum of one temporal image with a frequency domain transformation derived from the corresponding image in the same bi-temporal remote sensing image (RSI) pair, enhancing the model's capability to discern meaningful changes amidst background noise, thereby contributing to more robust change detection. Furthermore, excessively high pseudo-label thresholds in consistency regularization methods may result in the underutilization of valuable unlabeled data. To address this issue, we design a straightforward sigmoid-like function to dynamically adjust the selection threshold for the reliable pseudo label selection scheme. This strategy takes into consideration the learning status throughout the entire training process, ensuring more effective utilization of unlabeled information. We demonstrate significant performance improvements across three widely-used public datasets, namely LEVIR-CD, WHU-CD, and CDD. Notably, on the three datasets with only 1% labeled data, our method achieved an $IoU^{c}$ of 71.29%, 63.90%, and 51.00%, outperforming existing state-of-the-art methods by 2.84%, 1.21%, and 0.98%, respectively. These results robustly substantiate the effectiveness of our approach, showcasing its potential in scenarios where labeled data is limited.

MVAFG: Multiview Fusion and Advanced Feature Guidance Change Detection Network for Remote Sensing Images

Article

Full-text available

Jan 2024

In recent years, change detection (CD) methods have faced challenges in being applied to various types of remote sensing datasets and related research fields, particularly in the domain of change detection in remote sensing images. While convolutional neural networks (CNNs) have significantly advanced change detection in remote sensing images, they struggle with modeling long-distance dependencies between image pairs, leading to poor recognition of semantically similar objects with different features. Meanwhile, Transformer technology has gained widespread popularity for global applications, but it lacks in extracting local features effectively. Current approaches typically rely on single or dual-branch network structures for mining change-related features in remote sensing images, yet they still lack in extracting both local and global features comprehensively. To address these issues, this paper proposes a triple-branch network combining Transformer and CNN, comprising CNN, Transformer, and Channel feature-guided branch. These branches extract and fuse three types of change features from both global and local perspectives. Importantly, the Channel feature-guided branch is introduced to capture continuous and detailed change relationship features, thus enhancing the model's change discrimination ability. Experimental results on three datasets (LEVIR-CD, WHU-CD, and GZ-CD) demonstrate the superior performance of the model over state-of-the-art methods.

Hierarchical Feature Association and Global Correction Network for Change Detection

Article

Full-text available

Aug 2023

Optical satellite image change detection has attracted extensive research due to its comprehensive application in earth observation. Recently, deep learning (DL)-based methods have become dominant in change detection due to their outstanding performance. Remote sensing (RS) images contain different sizes of ground objects, so the information at different scales is crucial for change detection. However, the existing DL-based methods only employ summation or concatenation to aggregate several layers of features, lacking the semantic association of different layers. On the other hand, the UNet-like backbone is favored by deep learning algorithms, but the gradual downscaling and upscaling operation in the mainstream UNet-like backbone has the problem of misalignment, which further affects the accuracy of change detection. In this paper, we innovatively propose a hierarchical feature association and global correction network (HFA-GCN) for change detection. Specifically, a hierarchical feature association module is meticulously designed to model the correlation relationship among different scale features due to the redundant but complementary information among them. Moreover, a global correction module on Transformer is proposed to alleviate the feature misalignment in the UNet-like backbone, which, through feature reuse, extracts global information to reduce false alarms and missed alarms. Experiments were conducted on several publicly available databases, and the experimental results show the proposed method is superior to the existing state-of-the-art change detection models.

A Transformer-based Network with Differential Feature Triple Refinement for Bitemporal Remote Sensing Image Change Detection

Preprint

Full-text available

May 2023

p>In change detection (CD), how to reduce the interferences of pseudo changes and accurately recognize the change of interest (COI) are two important challenges. Recently, considering the powerful long-distance modeling ability of the transformer, some methods try to introduce the transformer into CD and have already proposed several useful CD strategies. However, the existing strategies either do not directly work on the change of interest (COI) or are difficult to give full play to the advantages of the transformer. Therefore, in this paper, we propose a new CD strategy to tackle the above challenges. Specifically, we focus on the difference domain and propose the differential feature triple refinement strategy to precisely characterize COI. We first adopt a CNN-based differential feature extraction (DFET) module to extract the possible detail differences between bitemporal images. Then, we introduce a transformer-based differential feature enhancement (DFEH) module to capture and enhance the COI regions from the preliminarily extracted differences. Finally, we utilize a CNN-based differential feature fusion (DFFS) module to integrate the fine-grained information into the enhanced COI regions. Based on the proposed strategy, we design a new network named DiFormer. We verify six effective hyperparameter configurations and conduct experiments on four commonly researched CD datasets. Extensive experiment results indicate that our proposed strategy has the outstanding generalization ability and obtains the better balance between computation costs and model performance. Peculiarly, when only adopting the Natural Scene Image Pretraining (NSIP), our method still exceeds the recently proposed CD methods which especially focus on the improvement of Remote Sensing Image Pretraining (RSIP).</p

DGMA 2 -Net: A Difference-Guided Multiscale Aggregation Attention Network for Remote Sensing Change Detection

Article

Jan 2024

Remote sensing change detection (RSCD) focuses on identifying regions that have undergone changes between two remote sensing images captured at different times. Recently, convolutional neural networks (CNNs) have shown promising results in the challenging task of RSCD. However, these methods do not efficiently fuse bitemporal features and extract useful information that is beneficial to subsequent RSCD tasks. In addition, they did not consider multilevel feature interactions in feature aggregation and ignore relationships between difference features and bitemporal features, which thus affects the RSCD results. To address the above problems, a difference-guided multiscale aggregation attention network, DGMA 2 -Net, is developed. Bitemporal features at different levels are extracted through a Siamese convolutional network and a multiscale difference fusion module (MDFM) is then created to fuse bitemporal features and extract, in a multiscale manner, difference features containing rich contextual information. After the MDFM treatment, two difference aggregation modules (DAMs) are used to aggregate difference features at different levels for multilevel feature interactions. The features through DAMs are sent to the difference-enhanced attention modules (DEAMs) to strengthen the connections between bitemporal features and difference features and further refine change features. Finally, refined change features are superimposed from deep to shallow and a change map is produced. In validating the effectiveness of DGMA 2 -Net, a series of experiments are conducted on three public RSCD benchmark datasets (LEVIR-CD, BCDD, and SYSU-CD). The experimental results demonstrate that DGMA 2 -Net surpasses the current eight state-of-the-art methods in RSCD. Our code is released at https://github.com/yikuizhai/DGMA2-Net.

Beyond Pixel-Level Annotation: Exploring Self-Supervised Learning for Change Detection With Image-Level Supervision

Article

Jan 2024

Change detection (CD) in high-resolution remote sensing has received large attention due to its wide range of applications. Many methods have been proposed in the literature and achieved excellent performance. However, they are often fully supervised, thus requiring abundant pixel-level labeled samples, which is time-consuming and labor-intensive. Especially compared to the common single-temporal interpretation, labeling bi-temporal images is often more complicated. Therefore, this study combines weakly supervised learning (WSL) to reduce label acquisition costs. However, changed regions are small, fragmented, and similar to the background, which increases the gap between weakly supervised and fully supervised tasks. To address these difficulties, we explore self-supervised methods to construct a WSL framework based on image-level labels for general CD, termed WSLCD in this article. First, we design a double-branch Siamese network to derive embeddings and initial class attention maps (CAMs), which input the original image pair and the spatially transformed image pair. Second, mutual learning and equivariant regularization (MLER) are enforced on CAMs from different views, which implements consistency constraints in confusion regions and makes CAMs learn from each other based on saliency regions. Furthermore, prototype-based contrastive learning (PCL) is designed such that unreliable pixels can learn from prototypes computed from reliable pixels. PCL includes intraview contrast and cross-view contrast depending on whether the prototypes and class embeddings are from the same view. With the above strategies, we narrow the gap between image-level weakly supervised CD and fully supervised CD. Experiments are conducted on three CD datasets, including CLCD, DSIFN, and GCD. Our method achieves state-of-the-art performance on pseudo-label generation and CD. The code is available at https://github.com/mfzhao1998/WSLCD .

A Triple-Branch Hybrid Attention Network With Bitemporal Feature Joint Refinement For Remote Sensing Image Semantic Change Detection

Article

Jan 2024

Compared with binary change detection (BCD), semantic change detection (SCD) further provides the category information of bitemporal changed regions which is significant for the practical application of Earth Observation. Although the recently proposed triple-branch structures including one BCD branch and two classification branches can effectively achieve the task balance, they still need to employ the carefully designed difference extraction module and branch interactions to capture the bitemporal correlations, which increases the complexity of the semantic information utilization. In this paper, we propose a new triple-branch network named JFRNet to tackle this challenge. From the perspective of the SCD process, because the category information and the change information are both derived from bitemporal images, we take the joint bitemporal features as the unified input, which can help each branch perceive the bitemporal semantic correlations without any additional interaction operations. From the perspective of the SCD structure, we introduce the convolutional attention fusion module (CAFM) and the convolutional attention refinement module (CARM) to unify the branch structure, which can help our model refine the unique semantic information without any specially designed difference extraction modules. Extensive experiment results on three available datasets indicate that compared with the baseline methods, our proposed JFRNet successfully simplifies the reasoning process and obtains the better SCD performance.

Selective Transfer Based Evolutionary Multitasking Optimization for Change Detection

Article

Jun 2024

Change detection in multitemporal remote sensing images aims to generate a difference image (DI) and then analyze it to identify the unchanged/changed areas. The current change detection techniques always investigate a single change detection task of two images from the image series one by one and may ignore the relevant information across the different tasks. Furthermore, theoretical results have demonstrated that the distribution of DI can be interpreted by a Rayleigh-Rice mixture model (RRMM). The parameters of RRMM are usually estimated by the expectation-maximization (EM) algorithm, which is easy to be trapped into local minima. In order to address these issues, a selective transfer based evolutionary multitasking change detection method is proposed to deal with multiple change detection tasks concurrently. For each change detection task, the log-likelihood function and centroid distance function are considered as two objectives to be optimized simultaneously. In the proposed method, a RRMM parameter estimation driven initialization method with random partition of the data is designed by maximum likelihood estimates of the parameters. Then the next population is generated by the intra-task and inter-task genetic transfer operators. A selective knowledge transfer based local search strategy is proposed to further improve the population by applying EM algorithm. In this strategy, the samples in the unchanged class of multiple tasks are utilized to estimate the parameters to acquire knowledge transferred from the other task. Experiments on three real remote sensing data sets demonstrate that the selective transfer based evolutionary multitasking change detection method is able to accelerate the convergence and achieve superior performance in terms of accuracy.

AANet: An Ambiguity-Aware Network for Remote Sensing Image Change Detection

Article

Jan 2024

Remote-sensing image change detection (CD) task plays an important role in land-use surveys, city construction investigations, and other vital industries. Recently, deep learning has become a mainstream method for this task due to its satisfactory performance in most cases. However, it often suffers from difficulties in dealing with ambiguous regions, where pseudo-changes happen or real changes are corrupted. In this article, we propose an ambiguity-aware network (AANet) to address the aforementioned issue. Specifically, our network first adopts convolutional layers to learn features from dual-temporal images. After that, an ambiguity refinement module (ARM) is designed to extract the ambiguity regions and then difference features are generated based on it. Considering that the scales of different changed objects vary, a weight rearrangement module (WRM) is proposed to fuse the difference features from different layers. To test the performance of our proposed model, we conduct experiments on three benchmark datasets, including SYSU-CD, SVCD, and LEVIR-CD. The experimental results show that our model can outperform several state-of-the-art models on all three datasets, which validates its effectiveness. The source code of our proposed model will be released at https://github.com/KevinDaldry/AANet .

CD-GAN: A robust fusion-based generative adversarial network for unsupervised remote sensing change detection with heterogeneous sensors

Article

Feb 2024
INFORM FUSION

Large-Scale Damage Detection Using Satellite Imagery

Conference Paper

Full-text available

Jun 2015

Satellite imagery is a valuable source of information for assessing damages in distressed areas undergoing a calamity, such as an earthquake or an armed conflict. However, the sheer amount of data required to be inspected for this assessment makes it impractical to do it manually. To address this problem, we present a semi-supervised learning framework for large-scale damage detection in satellite imagery. We present a comparative evaluation of our framework using over 88 million images collected from 4,665 square kilometers from 12 different locations around the world. To enable accurate and efficient damage detection, we introduce a novel use of hierarchical shape features in the bags-of-visual words setting. We analyze how practical factors such as sun, sensor-resolution, and satellite-angle differences impact the effectiveness of our proposed representation, and compare it to five alternative features in multiple learning settings. Finally, we demonstrate through a user-study that our semi-supervised framework results in a ten-fold reduction in human annotation time at a minimal loss in detection accuracy compared to an exhaustive manual inspection

Classifying Compound Structures in Satellite Images: A Compressed Representation for Fast Queries

Article

Full-text available

Mar 2015

Lionel Gueguen

With the increased spatial resolution of current sensor constellations, more details are captured about our changing planet, enabling the recognition of a greater range of land use/land cover classes. While pixeland object-based classification approaches are widely used for extracting information from imagery, recent studies have shown the importance of spatial contexts for discriminating more specific and challenging classes. This paper proposes a new compact representation for the fast query/classification of compound structures from very high resolution optical remote sensing imagery. This bag-of-features representation relies on the multiscale segmentation of the input image and the quantization of image structures pooled into visual word distributions for the characterization of compound structures. A compressed form of the visual word distributions is described, allowing adaptive and fast queries/classification of image patterns. The proposed representation and the query methodology are evaluated for the classification of the UC Merced 21-class data set, for the detection of informal settlements and for the discrimination of challenging agricultural classes. The results show that the proposed representation competes with state-of-the-art techniques. In addition, the complexity analysis demonstrates that the representation requires about 5% of the image storage space while allowing us to perform queries at a speed down to 1 s/ 1000 km2/CPU for 2-m multispectral data.

A Quasi-linear Algorithm to Compute the Tree of Shapes of nD Images

Conference Paper

Full-text available

May 2013

To compute the morphological self-dual representation of images, namely the tree of shapes, the state-of-the-art algorithms do not have a satisfactory time complexity. Furthermore the proposed algorithms are only effective for 2D images and they are far from being simple to implement. That is really penalizing since a self-dual representation of images is a structure that gives rise to many powerful operators and applications, and that could be very useful for 3D images. In this paper we propose a simple-to-write algorithm to compute the tree of shapes; it works for nD images and has a quasi-linear complexity when data quantization is low, typically 12 bits or less. To get that result, this paper introduces a novel representation of images that has some amazing properties of continuity, while remaining discrete.

Image patch characterization with shape distributions: Application to WorldView-2 images

Conference Paper

Full-text available

Jul 2013

Lionel Gueguen

This paper describes an image patch characterization for image in-formation mining tasks. An image patch is first decomposed into a multi-scale segmentation thanks to the Max Tree representation. Then, each segment is described by shift invariant shape attributes. Finally, the segment attributes are aggregated into a shape distribu-tion which constitutes the patch characterization. Illustrations of this image content description are given for patches of a WorldView-2 multi-spectral scene, and the information relevance is assessed by an automatic classification of the patch characteristics which is com-pared to land use/land cover annotations.

Structural Damage Assessments from Ikonos Data Using Change Detection, Object-Oriented Segmentation, and Classification Techniques

Article

Full-text available

Jul 2005
PHOTOGRAMM ENG REM S

Recent improvements in the spatial resolution of commercial satellite imagery make it possible to apply very high-resolution (VHR) satellite data for assessing structural damage in the aftermath of humanitarian crises, such as, armed conflicts. Visual interpretation of pre- and post-crisis very high-resolution satellite imagery is the most straightforward method for discriminating structural damage and assessing its extent. However, the feasibility of using visual interpretation alone diminishes in the cases of large and dense urban settlements and spatial resolutions in the range of 2 m to 3 meters and larger. Visual interpretation can be further complicated at spatial resolutions greater than 1 m if accompanied by shadow formation and differences in sensor and solar conditions between the pre- and post-conflict images. In this study, we address these problems through investigating the use of traditional change techniques, namely, image differencing and principle component analysis, with an object-oriented image classification software, e-Cognition. Pre-conflict Ikonos (2 m resolution) images of Jenin in the Palestinian territories and Brest (1 m resolution) in FYROM were classified using the e-Cognition software. Thereafter, the pre-conflict classification was used to guide the classification, using e-Cognition, of the pixel-based change detection analysis. The second part of the study examines the feasibility of using mathematical morphological operators to automatically identify likely structurally damaged zones in dense urban settings. The overall results are promising and show that object-oriented segmentation and classification systems facilitate the interpretation of change detection results derived from very high-resolution (1 m and 2 m) commercial satellite data. The results show that object-oriented classification techniques enhance quantitative analysis of traditional pixel-based change detection applied to very high-resolution satellite data and facilitate the interpretation of changes in urban features. Finally, the results suggest that mathematical morphological methods are a potential new avenue for automatically extracting likely damaged zones from very high-resolution satellite imagery in the aftermath of disasters.

Morphological Descriptors and Spatial Aggregations for Characterizing Damaged Buildings in Very High Resolution Images

Article

Full-text available

Jan 2009

A methodology for extracting complex structures in VHR images is presented. Using morphological descriptors for representing simple structures, the method proposes to aggregate those in a fuzzy logic framework. The result is a map of complex structure membership. Then, the methodology is adapted for characterizing damaged buildings in a VHR scene. During the characterization process, the image analyst uncertainty knowledge is incorporated through visual inspection. Damaged buildings membership obtained on a Quickbird wide scene show the effectiveness of the methodology, in terms of reliability and user-machine interactions. In addition, such representation is demonstrated to be versatile, as demonstrated in spatial statistics extraction.

Hierarchical data representation structures for interactive image information mining

Article

Sep 2012

In this article an interactive image information mining protocol is presented aiming at a computationally efficient pattern interpretation. The method operates on very high resolution (VHR) remote-sensing optical imagery and follows a modular approach. Images are projected onto a hierarchical image representation structure, the Max-Tree, which interfaces multi-dimensional features of the image components. Positive and negative samples are selected interactively from the image space and are translated into features describing best the targeted and non-desired patterns. Sourcing the feature entries into a hierarchical clustering algorithm, the kd-Tree, yields a structured representation that ensures fast classification. A classification is computed directly from the kd-Tree and is applied on the Max-Tree for accepting or rejecting image components. The complete process cycle is demonstrated on gigapixel-sized VHR satellite images and requires 3 min for building the Max-Tree, 30 min for hierarchical clustering and less than 10 s for each example based query.

Review Article Digital Change Detection Techniques Using Remotely-Sensed Data

Article

Jun 1989

Ashbindu Singh

A variety of procedures for change detection based on comparison of multitemporal digital remote sensing data have been developed. An evaluation of results indicates that various procedures of change detection produce different maps of change even in the same environment.

Digital change detection techniques using remotely sensed data

Article

Nov 1988
INT J REMOTE SENS

A. Singh

Classification of Time Series of Multispectral Images With Limited Training Data

Article

Aug 2013
IEEE T IMAGE PROCESS

Image classification usually requires the availability of reliable reference data collected for the considered image to train supervised classifiers. Unfortunately when time series of images are considered, this is seldom possible because of the costs associated with reference data collection. In most of the applications it is realistic to have reference data available for one or few images of a time series acquired on the area of interest. In this paper, we present a novel system for automatically classifying image time series that takes advantage of image(s) with an associated reference information (i.e., the source domain) to classify image(s) for which reference information is not available (i.e., the target domain). The proposed system exploits the already available knowledge on the source domain and, when possible, integrates it with a minimum amount of new labeled data for the target domain. In addition, it is able to handle possible significant differences between statistical distributions of the source and target domains. Here, the method is presented in the context of classification of remote sensing image time series, where ground reference data collection is a highly critical and demanding task. Experimental results show the effectiveness of the proposed technique. The method can work on multimodal (e.g., multispectral) images.

Toward a Generalizable Image Representation for Large-Scale Change Detection: Application to Generic Damage Analysis

Abstract and Figures

Recommended publications

Scaling Effects for Synchronous vs. Asynchronous Video in Multi-robot Search

Linked to the Drug?: Women Imprisoned for Drug Crimes

The effects of invisible watermarking on satellite image classification

Postcard from the ICTY