ArticlePDF Available

Toward a Generalizable Image Representation for Large-Scale Change Detection: Application to Generic Damage Analysis

Authors:

Abstract and Figures

Each year, multiple catastrophic events impact vulnerable populations around the planet. Assessing the damage caused by these events in a timely and accurate manner is crucial for efficient execution of relief efforts to help the victims of these calamities. Given the low accessibility of the damaged areas, high-resolution optical satellite imagery has emerged as a valuable source of information to quickly asses the extent of damage by manually analyzing the pre- and postevent imagery of the region. To make this analysis more efficient, multiple learning techniques using a variety of image representations have been proposed. However, most of these representations are prone to variabilities in capture angle, sun location, and seasonal variations. To evaluate these representations in the context of damage detection, we present a benchmark of 86 pre- and postevent image pairs with respective reference data derived from United Nation Operational Satellite Applications Programme (UNOSAT) assessment maps, spanning a total area of 4665 [Formula: see text] from 11 different locations around the world. The technical contribution of our work is a novel image representation based on shape distributions of image patches encoded with locality-constrained linear coding. We empirically demonstrate that our proposed representation provides an improvement of at least 5%, in equal error rate, over alternate approaches. Finally, we present a thorough robustness analysis of the considered representational schemes, with respect to capture-angle variabilities and multiple sensor combinations.
Content may be subject to copyright.
1
Towards A Generalizable Image Representation
For Large-Scale Change Detection: Application
To Generic Damage Analysis
Lionel Gueguen and Raffay Hamid
Abstract
Each year, multiple catastrophic events impact vulnerable populations around the planet. Assessing the damage
caused by these events in a timely and accurate manner is crucial for efficient execution of relief efforts to help the
victims of these calamities. Given the low accessibility of the damaged areas, high resolution optical satellite imagery
has emerged as a valuable source of information to quickly asses the extent of damage by manually analyzing the
pre- and post-event imagery of the region. To make this analysis more efficient, multiple learning techniques using a
variety of image representations have been proposed. However, most of these representations are prone to variabilities
in capture angle, sun location and seasonal variations. To evaluate these representations in the context of damage
detection, we present a benchmark of 86 pre- and post-event image-pairs with respective reference data derived
from UNOSAT assessment maps, spanning a total area of 4,665 km2from 11 different locations around the world.
The technical contribution of our work is a novel image representation based on shape distributions of image-patches
encoded with locality-constrained linear coding. We empirically demonstrate that our proposed representation provides
an improvement of at least 5% in equal-error-rate over alternate approaches. Finally, we present a thorough robustness
analysis of the considered representational schemes with respect to capture-angle variabilities and multiple sensor
combinations.
Index Terms
Change detection, post-damage assessment, shape distribution, locally linear encoding, benchmark.
I. INTRODUCTION
Each year, hundreds of catastrophic events impact vulnerable areas around the world. Assessing the extent of
damage caused by these crises is crucial in the timely allocation of resources to help the affected populations. Since
disaster-locations are usually not readily accessible, the use of either optical or SAR very high resolution satellite
imagery has emerged as a valuable source of information for estimating the impact of catastrophic events. However,
currently these assessments are done by manually analyzing both pre- and post-event images or only the post images
of distressed areas, making it a labor-intensive and expensive process. It is therefore important to scale-up damage
Lionel Gueguen and Raffay Hamid are with the Image Mining R&D group of DigitalGlobe Inc. based in Westminster, Colorado, USA.
December 16, 2015 DRAFT
2
(a) Eleven AOIs
1-a
1-b
2-a
2-b
3-a
3-b
4-a
4-b
(b) Examples
Fig. 1. (Top) Areas of interest shown with red dots. Each area represents multiple local regions. We considered 11 local regions,
spanning 4,665 km2. (Bottom) Before and after imagery for different events shown in the two rows. These include (1) Typhoon
(Phillipines, Oct. 2013), (2) Armed conflict (Central African Republic, Dec. 2013), (3) Earthquake (Pakistan, Sept. 2013), and
(4) Internally displaced people’s shelters (Somalia, May 2013).
detection to larger areas accurately and efficiently. Our work is a step towards solving this problem by the change
detection analysis of optical pre- and post-images.
In the following, we summarize some of the key challenges that need to be addressed in this regard, and how
our work contributes towards them:
1- Comprehensive Data-Set: Thus far, there has been a lack of comprehensive labeled data-set that could be used
to explore automatic damage detection at scale. To this end, we present a benchmark data-set of 86 pairs of pre-
and post-event VHR optical satellite imagery of distressed areas covering 4,665 km2with the associated reference
dataset of damaged regions acquired by expert interpreters. This data-set was collected by using the satellites of
DigitalGlobe Inc. Our data-set covers 11 different regions from around the world, and spans a wide range of
terrains and climates, with a variety of damage types (see Figure 1). This data-set enables us to rigorously explore
the various facets of the problem at hand.
2- Appropriate Feature Choice: The scale of our problem naturally presents an accuracy-efficiency tradeoff for
the features being considered. To this end, we introduce the use of trees-of-shapes features [1] in a soft-assignment
locality-constrained linear encoding framework [2] that focuses more on the shape characteristics of a scene, as
opposed to its edge attributes (as done by other popular descriptors e.g., SIFT [3]). Our results show that this
December 16, 2015 DRAFT
3
difference proves to be quite important to detect damaged areas accurately. We present a thorough empirical analysis
for the effectiveness of our proposed scheme, and compare it to multiple alternatives.
3- Algorithmic Efficiency: Given that the scale of the damaged areas is usually quite large, and the identification
of damaged areas is required as soon as possible, having a framework with high algorithmic efficiency is of primary
importance. To this end, we propose several algorithmic speed-ups to implement our proposed framework.
The rest of the paper is arranged as follows. We begin by going over some of the relevant previous work within
the context of change and damage detection using satellite imagery in Section II. Section III describes in detail our
benchmark dataset used to evaluate change detection methods for post-disaster situation assessment. In Section IV
we describe an enhancement of shape distribution in terms of locality-constrained shape encoding. Results are
described in section V showing the improvements achieved by locally linear encoding. Finally section VI presents
the main conclusions of our work along with some discussion.
II. PREVIOUS WORK
The problem of change detection specially within the context of damage detection has been explored from mul-
tiple different perspectives. Various change detection methods [4] have been proposed to perform such analysis
automatically in an efficient manner. However, most of these methods usually focus on only one type of disaster
with only a few cases. For examples, the work in [5] shows the ability to capture earthquake damages, while [6]
and [7] underline the capability of detecting damage structures due to armed-conflict. The work in [8] compares
image representations for tornado damages, while [9] [10] explore refugee camps formation. Although quite useful,
these methods propose ad-hoc image representations that do not converge to a single generalizable characterization
due to the lack of a large benchmark data-set encompassing the many variabilities encountered in damage analysis.
Such variabilities include type of disaster, geographical areas, acquisition angles, and sensor combinations, among
others.
Several unsupervised change-detection methods for optical images with passive sensors have been proposed
over the past decade. Among them, a widely used technique is the change vector analysis (CVA) [11]. The CVA
technique consists of computing differences between two images measuring the same scene at two distinct instants.
As a result, each image pixel is associated to a single difference value or a multidimensional spectral change vector.
The analysis of these spectral change vectors through unsupervised classification produces a change map [12].
While this approach is quite general, it does not allow specifying the change of interest, and results in detecting
phenological changes as much as the changes of interest. Furthermore, it can suffer from spatial inaccuracies in
image registration.
Improvements to the CVA approach have been proposed for capturing phenological changes by canonical correla-
tion analysis and local mutual information [10], [13]. Other improvements have also been proposed for overcoming
spatial inaccuracies by modeling the spatial context before the change analysis [14], [15]. The combination of
these unsupervised approaches makes it possible to generate unsupervised change heat-maps which are robust to
phenological variations and to spatial inaccuracies by modeling the most probable transformation from the pre-
December 16, 2015 DRAFT
4
TABLE I
FOU R OF TH E SI X OF DIG ITAL GLOB ES S ATEL LIT E ARE U SE D IN OU R 86 IMAGE-PAIRS. HER E TH E PER -PIXEL RESOLUTIONS OF
QUI CKBIRD (QB-2), WO RLD VIEW-1 (WV-1), WORL DVIEW-2 (WV-2) AND GE OEY E-1 (GE-1) AR E 0.61M,0.5M,0.46M A ND 0.41M
RE SPE CT IVE LY.
Satellites QB-2 WV-1 WV-2 GE-1
QB-2 1 - - -
WV-1 9 14 - -
WV-2 3 23 6 -
GE-1 5 11 9 5
image to the post-image and vice-versa. However these unsupervised approaches are limited by capturing relevant
changes as opposed to other real changes which are not of interest for the considered application.
To address this challenge, supervised methods have been proposed to focus the change heat-maps towards the
changes of interest [16] . Given the high skill-set required from the photo-interpreters to assess the damage accurately,
acquiring reliable ground-truth training examples is particularly challenging, and therefore this dependence on
manually labelled training data has to be minimized [17]. Towards this end, semi-supervised approaches have been
recently proposed showing an improvement for collecting labelled samples [18].
III. DATA SET AND BENCHMARK
We compiled a large scale data set to benchmark change detection methods in the context of damage detection.
To this end, eleven areas of interest (cf. Table III) around the world were selected. This selection was made
on the basis of major catastrophic events analyzed by the United Nations Institute for Training and Research
(UNITAR/UNOSAT) [19]. This international organization is responsible for publishing maps of geo-located points
indicating relevant changes on the ground. We used this information from UNITAR/UNOSAT to build our reference
datasets for the 11 selected AOIs, which is composed of 29945 points. Examples of these geolocated change points
are illustrated in Figs. 8- 11. The layout for these AOIs is shown in Figure 1. Different types of crisis events
were considered in our AOIs including armed conflicts, earthquakes, typhoons, and refugee-camp developments
(see Table III and Figure 1 for example cases).
We collected high resolution panchromatic imagery from DigitalGlobe’s archive such that we can form image-
pairs covering the considered AOIs that were acquired as close as possible to the dates of the catastrophic events.
In the context of damage analysis, the pre- and post-event images are likely to be captured from different sensors,
and with different acquisition conditions. To incorporate these acquisition variabilities, we selected multiple pairs
per AOI, resulting in a set of 86 image-pairs from the DigitalGlobe’s satellite constellation: QuickBird, WorldView-
1, WorldView-2and GeoEye-1without any restriction on the angle of acquisitions. The variability in our sensor
combinations is given in Table. I.
Another important source of variability in satellite images is the acquisition angles, as it effects the directions and
lengths of shadows cast by physical features on the ground (e.g., human settlements). While maintaining similar
acquisition angles help automate change detection techniques, obtaining pairs of pre- and post-event images in
December 16, 2015 DRAFT
5
TABLE II
LIS T OF SH AP E DES CRI PT ORS U SE D BY OU R FR AME WOR K.
Area µ0,0
Eccentricity q1
λ1
λ0
λi
µ2,0+µ0,2
2+ (2i1) q4µ2
1,1+(µ2,0µ0,2)2
2
Hu1η2,0+η0,2
Hu2(η2,0η0,2)2+ 4η2
1,1
such a tight time constraint cannot be guaranteed. To incorporate this important variability in our data-set, we did
not apply any restriction on the angles resulting in a uniform sampling of the multi-angle space as illustrated in
Figure 2.
TABLE III
THE 11 AREAS OF INTEREST SELECTED FOR BUILDING THE BENCHMARK DATASET FOR CHANGE DETECTION ANALYSIS. THE 1 1 AOIS
HAVE BE EN S ELE CT ED WI TH R ESP EC T TO TH E UNITAR/UNOSAT PHO TO-I NTE RP RETATI ON DATA PU BLI SHE D AT
HT TP://WWW.UNI TAR.O RG /UNO SAT/. VHR PAN CH ROM ATIC IM AGE PA IRS H AVE BE EN SE LEC TE D AS TO D EFI NE DATE I NTE RVALS
EN COM PASS IN G AND A S CLO SE A S POS SI BLE T O THE G IV EN PR E-A ND PO ST-DATE S.
AOI Disaster Type Pre-Date Post-Date Number of pairs UNITAR Ref.
Damage Assessment for Jebri area, Awaran
District, Balochistan Province, Pakistan
earthquake 27/08/13 27/09/13 3 1831
Damage Assessment for Gajar Area, Awaran
District, Balochistan Province, Pakistan
earthquake 26/08/13 26/09/13 10 1828
Damaged Structures in Bentiqui, Leyte,
Philippines
typhoon 11/09/13 11/11/13 2 1866
Structural Development, Afgooye Corridor,
Somalia
refugees 12/02/13 25/05/13 10 1856
Damage Assesment In The City Of Malakal,
Upper Nile State, South Sudan
conflict 06/12/13 15/03/14 10 1961
Damage assesment in the city of Bor, Jonglei
State, South Sudan
conflict 25/12/13 19/01/14 10 1917
Destruction in Mayom, Unity State, South
Sudan
conflict 29/09/13 11/01/14 10 1922
Damage Assessment in the City of Bentiu,
Unity State, South Sudan
conflict 02/01/14 18/01/14 10 1919
Destruction in Rubkona, Unity State, South
Sudan
conflict 02/01/14 13/01/14 10 1915
Damage Assessment in Paoua, OuhamPende,
Central African Republic
conflict 03/11/13 18/06/14 1 2016
Destruction in Bossangoa Area, Ouham,
Central African Republic
conflict 05/12/13 22/01/14 10 1957
December 16, 2015 DRAFT
6
10
20
30
40
30
210
60
240
90
270
120
300
150
330
180 0
Satellite Azimuth Difference
Satellite Elevation Difference
10
20
30
40
30
210
60
240
90
270
120
300
150
330
180 0
Sun Azimuth Difference
Sun Elevation Difference
Used Pair Angle Difference
Optimal Pair Angle Difference
Fig. 2. Scatter plot showing the differences of sun and satellite angles for the 86 selected strips. The scatter is provided in angular space, where
the x-axis represents the radius and y-axis the actual angle.
IV. METHODOLOGY
Performing automatic change detection using high resolution imagery raises multiple challenges. In the following
we present our proposed methodology and show how our approach addresses these challenges.
Due to acquisition angles differences, performing a direct pixel comparison produce a lot of false-alarms, and a
window-based image description is always found more robust [20], [21]. Furthermore, the changes that are relevant
in a given situation greatly vary across geography, events types, and acquisition angle combinations. Therefore, it is
generally helpful to indicate the relevant changes with training examples to the detection system. In the following,
we propose a novel patch-based image description exploiting connected component based image analysis along
with an efficient encoding scheme to enhance the accuracy of automatic change analysis as illustrated in Fig. 3.
A. Image Encoding
Max/Min-Tree and tree of shapes have shown great potential for analyzing high resolution panchromatic imagery [1],
[22]. These structures allow efficiently and compactly analyzing images without loss of information, by representing
the connected components of lower and upper level sets, i.e.,:
χλ(u) = {p|u(p)λ},(1)
χλ(u) = {p|u(p)λ}.(2)
where for a gray-scale image u: Ω 7→ N, we define χλ(u)and χλ(u)as the lower and upper level-sets of u. Note
that the connected components of these (upper or lower) level sets are a lossless representation of uand provide
its segment based representation, which is fundamentally different from edge based image representations [3].
These patch-based decompositions allow the analysis of image objects at very different scales, enabling to describe
tiny image structures as well as larger ones in one combined representation. When tackling damage analysis, it is
December 16, 2015 DRAFT
7
important to keep as much fine resolution informations as possible. The lower and upper level sets inherently keep
this fine grained information for further usage.
In the context of satellite image analysis, it is imperative to have image descriptors which are invariant to rotation
and limited translations. To achieve these invariants, we propose to use the shape descriptors of the components
which are rotation and translation invariant. We propose to use the second and third order central moments [23] as
the shape descriptors.
Let {pi}n
i=1 be the npixels composing a peak component Pλ
p1(u). Each pixel piis a pair of horizontal and
vertical coordinates pi= (xi, yi). The average of a spectral band vis given by:
Av(Pλ
p1(u)) = 1
n
n
X
i=1
v(xi, yi).(3)
The central and normalized shape moments µa,b,ηa,b are simply expressed for a pair of integers (a, b)N2by:
¯x(Pλ
p1(u)) = 1
n
n
X
i=1
xi,(4)
¯y(Pλ
p1(u)) = 1
n
n
X
i=1
yi,(5)
µa,b(Pλ
p1(u)) =
n
X
i=1
(xi¯x)a(yi¯y)b,(6)
ηa,b(Pλ
p1(u)) = µa,b
µ(a+b)/2+1
0,0
.(7)
These moments are combined to derive shape descriptors given in Table II. All descriptors can be computed in
linear time by exploiting the nesting property the max/min tree [24], [25], as well of the tree of shapes.
B. Bag-Of-Visual-Words Encoding
In recent years the bag-of-visual-words model has been extremely popular in computer vision for categorizing
large corpus of images [26], [27]. The model treats an image as a collection of spatially unordered representative
visual words. In our context, each constituent overlapping image patch of a large image strip is characterized by
the aggregation of its connected components’ shape descriptors. Previous studies have shown the efficiency of this
scheme for satellite image analysis [22]. In this image representation, each connected component is assigned to a
unique dictionary entry which is then used in the computation of visual-words histogram.
Within the context of bag-of-visual-words representation, various alternate encoding schemes have been proposed
that allow more powerful representation than the aforementioned simple hard-assignment scheme. Here we particu-
larly focus on two such improved encoding approaches, i.e., soft assignment and locality constrained (locally) linear
encoding [2]. In both these cases, a dictionary Dkof visual words is first learned by sampling from a representative
subset of all connected components descriptors and clustering them in kclusters, where kis manually predefined.
The representative subset is extracted randomly from all the connected components existing in the pre- and post-
images. Given the kcluster centers Dk={Ci}computed off-line, any new connected components descriptors d
can then be encoded. We now present the details of the soft-assignment and locally linear encoding schemes.
December 16, 2015 DRAFT
8
Soft Assignment: In this encoding technique, each connected component descriptor is softly assigned to more than
one clusters. The soft assignment attributes a weight to each cluster depending on how far the connected component
descriptor is from it, and can be expressed as:
w(d, Ci) = exp(−kdCik2
2)
Γ(8)
where
Γ =
k
X
i=1
exp(−kdCik2
2)(9)
Here λis a scaling parameter determined a priori depending on the dictionary size. With this encoding mechanism
each vector dis encoded by the normalized vector [w(d, C1),· · · , w(d, Ck)], with values between 0 and 1. In the
case of λ=, the soft assignment becomes a hard assignment where each descriptor is assigned to a unique
dictionary entry. Given a window Wcovering a set of connected components {C Ci}associated with their descriptors
{di}, its descriptor is obtained by computing soft histograms, i.e.:
hW(i) = X
CCj
w(dj, Ci),(10)
where the soft histogram of a window Wis [hW(1),· · · , hW(k)]. The histograms are not further normalized
as to account for various number of components in a window, relating ultimately to different objects. Also, not
normalizing allows to encompass the contrast information, which becomes very important in built-up areas where
the casted shadows plays a substantial role.
Locality-Constrained (Locally) Linear Encoding: The second encoding method tries to linearly encode the
descriptor with respect to the provided kcentroids [2]. In practice, one is set with the following minimization
problem:
arg min
l||dDkl||2(11)
such that,||l||1= 1
Here drepresents a shape descriptors, Dkrepresents the shape dictionary, and lrepresents the local linear code.
Then for any window W, its descriptor is obtained by summing the linear contributions of the connected components
descriptors it contains :
lW(i) = X
CCj
l(dj, Ci),(12)
where l(dj, i)is the linear code of the descriptor djat position i. Each window is then described by its k-length
vector [lW(1),· · · , lW(k)]. Again this descriptor is not further normalized to capture contrast and the number of
shape information.
For both encoding methods and for each shape, the encoding is done in practice by searching for its nnearest
neighbor entry in the codebook, and assuming that the dictionary is composed only of these centroids. This technique
allows to produce a sparse representation of each individual shape descriptors maintaining the memory footprint
small but with a good discriminative power. The weight of a shape is determined by the fraction of pixels covering
the considered window. This process is repeated for each window in the pre- and post-event images.
December 16, 2015 DRAFT
9
}
Image 1
Image 2
Feature
Extraction
Supervised
Learning
Final
Detection
Image
Pair
Feature
Encoding
Fig. 3. Given a pair of image-strips, we extract features of their overlapping windows and perform their locality-constrained encoding. We use
these codes in a supervised setting to learn a damage detection classifier.
C. Learning Framework
Our learning framework is illustrated in Fig. 3. This work particularly explores the representational aspect of the
image patches for the problem of damage detection, and therefore focuses more on a supervised learning setting.
For details regarding the active learning aspects of our approach, we refer the reader to see [18].
Based on the available reference points regarding the positive and negative classes, we stack up their corresponding
window representation codes and use them to train a linear support vector machine (SVM). Note that we found
simply using a linear classifier for our problem sufficient since the feature space constructed both by soft-assignment
and locally linear coding already incorporates significant amount of non-linearity. Therefore, we do not to incorporate
any further kernel projection in our framework resulting in faster training and detection times. For this work, we
used linear SVM [28] with L1-regularization and L2-loss function, where the L1-regularization allows us to embed
a feature selection, meaning a selection of the visual words that are optimal for discrimination between relevant
and irrelevant changes. As the shape distribution are stacked, different visual words can be selected for the pre-
and post-event images.
D. Algorithmic speed-up
The main bottleneck in our computational framework is by far the feature extraction and encoding step. We
achieve algorithmic speed-ups for this step in two important ways.
First, we use a quasi-linear algorithm for computing tree of shapes [29] to extract efficiently the upper and
the lower connected components. This representation organizes into a tree the nested connected components, by
maintaining the relations of lower level connected components being holes of the upper level connected components.
Min and max-tree [25] are efficient algorithms for extracting separately the lower and upper level connected
components sets. These two trees can be fused in a unique tree of shapes by the level line algorithm [24]. In
this paper, we adopt the tree of shape algorithm [29] which has a better worst computational complexity than the
fusion of Max-Tree and Min-Tree. The algorithm provides us a tree of shape representation which can be efficiently
stored on two arrays of length n, where nis the number of pixels in the image [30]. The representation enables to
compute the moment based attributes in linear time by exploiting its nested properties [24], [31]. This algorithm
December 16, 2015 DRAFT
10
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
0.5
0.55
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
1
False Negative Rate
True Positive Rate
SA 1
SA 4
SA 8
SA 16
LLC 4
LLC 8
LLC 16
Fig. 4. Average of the ROC curves obtained by linear supervised classification. This graph shows the impact of the coding schemas and coding
parameters on the false negative and true positive rates.
−40 −30 −20 −10 0 10 20 30
0
50
100
150
200
250
300
350
400
Difference Satellite Elevation
Difference Satellite Azimuth
0.05
0.1
0.15
0.2
0.25
0.3
−40 −30 −20 −10 0 10 20 30
−100
−80
−60
−40
−20
0
20
40
60
80
100
Difference Sun Elevation
Difference Sun Azimuth
0.05
0.1
0.15
0.2
0.25
0.3
0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
Ground Displacement (meters)
Equal Error Rate
Fig. 5. Effects of satellite, sun-angle differences and image misalignment on EER derived from shape distribution features. On the left plots,
the color encodes the EER going from low values in blue to higher ones in red.
performs much faster than extracting the connected component at all possible thresholds and require a memory
footprint independent of the number of grey levels considered.
Our second source of speed-up comes during the computation of code histograms. Given a patch, one approach
could relying on looking at the components falling in it and then estimate the histogram of their code weighted
by the number of pixel falling in the patch. As a connected component may span over several patches, it becomes
more efficient to perform a direct filtering of the tree of shapes [25], where the component contrast is multiplied
by its corresponding code weight. Thus for each visual word code i, one records the components having a non
null weight, and remap and reweigh these components into an R. For a given patch W, one can gather thus its
December 16, 2015 DRAFT
11
entry lW(i)by computing the sum of Rwithin it: lW(i) = PxWR(x). For overlapping rectangular windows,
this process can be achieved efficiently with image integrals [32].
V. EX PE RI ME NT S AN D RE SU LTS
A. Comparison of Encoding Schemes
We compare the two proposed encoding schemes, namely Soft Assignement (SA) and Locally Linear Coding (LLC)
with different number of nearest neighbors used during the computation. First of all, each image is partitioned in
overlapping tiles of 50×50 pixels (50×50 m2) on the ground, separated by 10 meters, thus producing a change
detection of 10 meter resolution, given the 1meter resolution of the analyzed imagery. Using our dataset of 86
strip pairs covering an area of 4665 km2, the equivalent number of tiles to be considered is around 86 million.
A single dictionary is computed per pair of images, with 128 shape words encoding the characteristics of the
connected components present in the data set. Thus, each image tile is encoded by a histogram or a linear code
of 128-dimensions. Also, the pre- and post-image tile descriptors are stacked for the change detection analysis
purposes, leading to a 256-dimensions description vector for each 10 meter pixels of the considered AOIs. To put
the amount of data being analyzed in perspective, this representation occupies 335 GB of data for an imagery taking
139 GB of disk space.
Since the reference dataset is given in geolocated format, we first rasterize it in the underlying 10 meter feature
grid, and buffer it with a radius of 25 meters to match our tile size. For each image-pair, we randomly pick 50%
of the damage examples as part of the training set and let the 50% remaining ones for evaluating the quality of
the detectors. As damage analysis is a highly imbalanced classification problem, we also randomly pick negative
examples in a number equal to the number of positive examples of the training set. These training examples are
fed to the linear SVM.
Using the reference dataset collected from our AOIs, we use the receiver operating curves (ROCs) to evaluate the
different representations considered. Note that true positive rate (TPR) represents the number of relevant damages
detected over the total number of changes. Similarly, the false positive rate (FPR) represents the area of false-alarms
with respect to the area of no damages. For a given (TPR,FPR)tuple, the associated damage indicator covers an
area of f·TPR+(1 f)·FPR, where fis the fraction of damaged area. Note that the quantity f·TPR +(1f)·FPR
is also the relative search space size that needs to be examined by a human photo-interpreter during curation. We
use the Equal Error Rate (EER) as a summary of the performance of an ROC curve, which is given as the ROC
point satisfying FPR = (1 TPR).
The comparison of the 2encoding schemes is shown in Fig. 4 in terms of the average of ROC curves obtained
for the 86 pairs. Also the dependencies to the number of neighbors during the encoding is evaluated and shown in
the same graph. When the number of neighbors goes to one, both methods behave the same as a hard assignment
of the visual words.
From the analysis of ROC curves, it can be seen that the LLC provides the best description for the purpose of
our experiments in comparison to SA. At a TPR = 0.80, the LLC-based representation provides a FPR = 0.12 in
comparison to a FPR = 0.18 provided by the best SA-based representation. As pointed out earlier, the FPR is a
December 16, 2015 DRAFT
12
direct approximation of the search space, given the small coverage of changes. Thus LLC-based encoding produces
a relative reduction of 30% of the search space.
Secondly, we can observe the impact of the number of neighbors on the accuracy. It can be seen that 4neighbors
provide the best results with LLC, and show a degradation of the accuracies as this number grows. This observation
suggests the existence of a best number of neighbors (greater than 1, hard assignment) maximizing the representation
discrimination. By observing the impact of the number of neighbor on SA, the effect is quite opposite as the accuracy
increases with it until reaching saturation. It can be observed that the description does not improve by using more
than 4-neighbors. Because of the same dictionary used for LLC and SA, this observation lets us draw the conclusion
that any point of the shape features space is close to an average of 4clusters, and that the other clusters are too
far away to be of much use.
B. Robustness To Variabilities
The average EER using shape distribution features as a function of sun, satellite angle-differences and image
misregistration are plotted in Figure 5. It is evident that satellite angle-differences do not impact the representation
accuracy, while the sun angle-differences seem to matter more. This is because sun angles impact shadows which
can significantly vary image appearance. We also computed the average EER for each sensor combination given in
Table I. The standard deviation for this is 0.03, indicating that our approach is robust to using pre- and post-event
imagery from different sensors. Finally, for each image pair, we computed average displacement and its norm
is depicted against the EER in Figure 5, showing an independance between both axes, meaning a robustness to
inaccuracies in spatial registration.
C. Collection of Training Examples
In the context of rapid damage detection, the collection of training examples is a crucial component for efficient
delivery. Also one wants to collect a minimum amount of training examples for achieving the best accuracies. In
the following series of experiments we consider different percentages of positive training examples, and evaluate
their impact on the summary Equal Error Rate metric derived form the average ROC curves. The negative examples
are picked randomly in the image. This task is not expensive since damages are in most cases a rare class. In other
words, it is unlikely that a positive example is picked as a negative example by randomly sampling the AOI.
The impact of the number of training examples on the EER is depicted in Fig. 6. The first observation shows that
the best LLC-based representation produces a degradation of 0.05 of the ERR by taking 5% of positive training
samples, being in average composed 140 points, instead of 50%. This highlights the generalization capability of this
representation from few examples. Secondly, it can be observed that the ranking of the different coding schemas
is maintained with various percentages of training examples, except the 8-neighbors LLC representation which
degrades much faster than the SA-based representation. This highlights again the importance of the number of
neighbors for LLC for maximizing the detection accuracy with a limited number of training examples.
December 16, 2015 DRAFT
13
5 10 15 20 25 30 35 40 45 50
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
% No. of Samples
Equal Error Rate
SA 1
SA 4
SA 8
SA 16
LLC 4
LLC 8
LLC 16
Fig. 6. Impact of the percentage of positive training samples of the Equal Error rate with linear classification.
D. Computational Complexity
We provide indicative duration numbers of our implementation and show the benefits of the proposed algorithms
over simple approaches. We perform our computation on one 11-bit image of size 10000×10000 pixels. Given the
bit depth, the number of grey level becomes 2048, which has a non negligeable impact one the extraction of the
connected component. Performances are evaluated with a single Intel@3.3GHz CPU.
We assess first the benefit of using the tree of shape algorithm over an approach gathering the CC at all possible
thresholds between 0 and 2048. The tree of shape performs the decomposition of the image and the computation
of the moment based attributes in 372 seconds, while the simple approach requires 2544 seconds. We observe 6×
speed up by employing efficient algorithms, without any consideration on the memory consumption which is of the
order of the number of grey values.
Secondly, we assess the benefits in computing the distribution with the proposed technique. The proposed
approach complexity depends linearly on the dictionary size, as we loop over the visual words. Given a visual
word, reconstituting the weighted CCs takes 1.28 seconds. Then, performing the spatial filtering with a 50 pixel
wide window separated by 10 pixels takes 0.39s with the use of image integrals [32]. As a comparison, performing
the spatial filtering and down sampling with a 16-core multi-processor implementation requires 0.9s. There, we
observe again the benefits of strong algorithms. Consequently processing a visual word to obtain the corresponding
histogram bin for all windows requires 1.67s, and given a dictionary with 128 entries, the spatial aggregation
process takes 213s. On the contrary if the process would take place by looping over the windows, the collection of
shape distributions takes 5100s for the same setup which is about 24 times more than the proposed implementation.
December 16, 2015 DRAFT
14
Fig. 7. Damage Type: Conflict. Area: Malakal, Upper Nile State, South Sudan. Top row. The pre- and post- images are displayed side by side
with the UN change point of interests overlaid in green. The right side shows the soft change detection obtained with SD LLC 4, provided that
5% of the relevant changes were given for training the linear SVM. Bottom row. A crop of the same data is displayed at higher resolution,
showing the spatial precision of the change detection.
December 16, 2015 DRAFT
15
Fig. 8. Damage Type: Conflict. Area: Bossangoa, Ouham, Central African Republic. Top row. The pre- and post- images are displayed side by
side with the UN change point of interests overlaid in green. The right side shows the soft change detection obtained with SD LLC 4, provided
that 5% of the relevant changes were given for training the linear SVM. Bottom row. A crop of the same data is displayed at higher resolution,
showing the spatial precision of the change detection.
Indeed, our implementation allow to compute once the elements which are used by several windows.
No speed-up is proposed during the construction of the dictionary and the LLC, as this step takes 100 seconds for
the considered image. Summing the three extraction parts, the proposed representation for a 10000×10000 image
can be computed from a single CPU in 685 seconds as opposed to a straightforward algorithmic implementation
requiring 7744 seconds. Overall, we observe a speed-up of order 10×, which is critical for rapid damage assessment.
Moreover, this proposed single CPU implementation can be further parallelized by processing image blocks, which
can lead to further speed ups.
E. Illustrations and Discussion
Four illustrations of 4-neighbors LLC based change detection are given in Figs. 7- 11. The first examples are
instances of the armed conflicts hitting regions in South Sudan and Central African Republic. The two consecutives
December 16, 2015 DRAFT
16
Fig. 9. Random windows of 250 meters are randomly picked over the four illustrated AOIs. For each window the number of reference points
is computed and compared to the average of the automatically extracted change index.
examples are instances of earthquake and typhoon provoked damages which hit areas of Pakistan and Philippines.
While the settlement patterns look different, they cover areas with well separated buildings and our dataset does
not cover damages occuring over high rise or high urban density areas, which may decrease the accuracies
of the approach. Both pre and post panchromatic imagery are displayed for the full collected AOIs, with the
UNITAR/UNOSAT geolocated points of interest overlaid in green. This reference dataset representation allows a
visual comparison to the supervised detection displayed in a colormap ranging from blue to red, where the supervised
detection where obtained by using 4-neighbors LLC based features and the linear SVM model are trained with
25% of the positive examples.
The four examples highlights the good match at global and close scale of the change detection with the UN
geolocated points. While the scenes undergo major changes of phenological nature (see Fig.7), thanks to training
examples the method is able to narrow the search to relevant change areas. Indeed no major false-alarms are
observed among the 4 sets. Examples of Figs.8,10 highlight also the spatial inconsistencies between imagery and
reference dataset, and because of the tiling mechanism these spatial errors do not have a major impact on the final
results, as primarly exposed in Fig.5. While it is clear that detecting individually the points is a harder problem, the
results of Fig. 9 show that the representation is generic enough to capture the densities of interests points. Random
windows of 250 meters are randomly picked in the 4 detailed AOIs, where the number of reference points and
the average change index are computed. The normalized correlation is evaluated at .95 for the considered datasets.
Also, these change density maps can be easily turned into counting metrics by simple regression.
We conduct another set of qualitative experiments by visualizing and analyzing detected instances versus unde-
tected ones in Figure 12. Most of the missed damages were subtle, naturally making them challenging to automat-
ically detect. For cases where the damage occurred over buildings or well-defined structures, our representation is
able to detect them with high precision. Mistakes tend to be made over subtle changes or damages covering only a
small area in an image chip such as an isolated house. Gaining in spatial precision would require multi-resolution
December 16, 2015 DRAFT
17
Fig. 10. Damage Type: Earthquake. Area: Gajar Area, Awaran District, Balochistan Province, Pakistan. Top row. The pre- and post- images
are displayed side by side with the UN change point of interests overlaid in green. The right side shows the soft change detection obtained with
SD LLC 4, provided that 5% of the relevant changes were given for training the linear SVM. Bottom row. A crop of the same data is displayed
at higher resolution, showing the spatial precision of the change detection.
analysis, such as pyramid-based representation, with better geo-located reference points.
VI. CONCLUSION
In this work, we proposed shape distributions with local linear coding scheme as a generalizable representa-
tion of image-patches within the context of performing large scale damage analysis. We presented a benchmark
dataset of of 86 high-resolution image-pairs to evaluate the improvement of locally linear coding (LLC) over hard
assignment when representing the image components encoded in a tree of shapes. The benchmark dataset has
been built by combining DigitalGlobe archive panchromatic imagery with damage reference datasets provided by
UNITAR/UNOSAT photo-interpreters in real-world scenarios. We created the benchmark dataset to encompass as
much variabilities as possible found in the context of rapid post-disaster mapping. In our empirical analysis, the use
of LLC based shape distributions showed an improvement of 5% in equal error rate (EER) in comparison to other
hard and soft assignment based representations. This improvement translates into a reduction by 30% of the search
space for an equivalent recall of the change instances. Finally, a robustness analysis shows the low correlation
December 16, 2015 DRAFT
18
Fig. 11. Damage Type: Typhoon. Area: Bentiqui, Leyte, Philippines. Top row. The pre- and post- images are displayed side by side with the
UN change point of interests overlaid in green. The right side shows the soft change detection obtained with SD LLC 4, provided that 5% of
the relevant changes were given for training the linear SVM. Bottom row. A crop of the same data is displayed at higher resolution, showing
the spatial precision of the change detection.
between the expected EER and the acquisition angles and ground small displacements, showing the adequacy of
our representation for change analysis in a variety of different contexts.
In our future work, we intend to release the dataset as an open source benchmark to allow more researchers
to evaluate their change detections techniques. We are also interested in expanding our framework with multi-
scale spatial pyramid encoding to enable the representation of isolated changes which are currently not adequately
captured using our single-sized patch based representations.
December 16, 2015 DRAFT
19
Damages Detected
Damages Missed
Pre-EventPost-EventPre-EventPost-Event
Fig. 12. Example damaged areas detected by our framework as well as the ones our framework missed.
REFERENCES
[1] G.-S. Xia, J. Delon, and Y. Gousseau, “Shape-based invariant texture indexing,” IJCV, vol. 88, no. 3, pp. 382–403, 2010. [Online].
Available: http://dx.doi.org/10.1007/s11263-009-0312-3
[2] J. Wang, J. Yang, K. Yu, F. Lv, T. Huang, and Y. Gong, “Locality-constrained linear coding for image classification,” in IEEE CVPR,
2010, pp. 3360–3367.
[3] S. Lazebnik, C. Schmid, and J. Ponce, “Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories,” in
IEEE CVPR, vol. 2, 2006, pp. 2169–2178.
[4] A. Singh, “Digital change detection techniques using remotely-sensed data,” Int. J. Remote Sensing, vol. 10, no. 6, pp. 989–1003, 1989.
[5] P. Gamba, F. Dell’Acqua, and G. Trianni, “Rapid damage detection in the bam area using multitemporal sar and exploiting ancillary data,”
Geoscience and Remote Sensing, IEEE Transactions on, vol. 45, no. 6, pp. 1582–1589, June 2007.
[6] D. Al-Khudhairy, I. Caravaggi, and S. Giada, “Structural damage assessments from ikonos data using change detection, object-oriented
segmentation, and classification techniques,” Photogrammetric Engineering & Remote Sensing, vol. 13, no. 7, pp. 825–837, 2005.
[7] L. Gueguen, M. Pesaresi, P. Soille, and A. Gerhardinger, “Morphological descriptors and spatial aggregations for characterizing damaged
buildings in very high resolution images,” in Proc.of the ESA-EUSC-JRC 2009 conference. Image Information Mining: automation of
geospatial intelligence from Earth Observation, Madrid, Spain, Nov. 2009.
[8] S. W. Myint, M. Yuan, R. S. Cerveny, and C. P. Giri, “Comparison of remote sensing image processing techniques to identify tornado
damage areas from landsat tm data,” Sensors, vol. 8, no. 2, pp. 1128–1156, 2008.
[9] T. Kemper, M. Jenerowicz, L. Gueguen, D. Poli, and P. Soille, “Monitoring changes in the menik farm idp camps in sri lanka using
multi-temporal very high-resolution satellite data,” International Journal of Digital Earth, vol. 4, no. sup1, pp. 91–106, 2011. [Online].
Available: http://dx.doi.org/10.1080/17538947.2010.512430
[10] L. Gueguen, P. Soille, and M. Pesaresi, “Change detection based on information measure,” IEEE TGRS, vol. 49, no. 11, pp. 4503–4515,
2011.
[11] F. Bovolo and L. Bruzzone, “A theoretical framework for unsupervised change detection based on change vector analysis in the polar
domain,” IEEE Tran. Geoscience and Remote Sensing, vol. 45, no. 1, pp. 218–236, Jan. 2007.
December 16, 2015 DRAFT
20
[12] L. Bruzzone and D. Prieto, “Automatic analysis of the difference image for unsupervised change detection,IEEE TGARS, vol. 38, no. 3,
pp. 1171–1182, May 2000.
[13] A. Nielsen, “The regularized iteratively reweighted mad method for change detection in multi- and hyperspectral data,IEEE Tran. Image
Processing, 2007.
[14] M. Schr¨
oder, H. Rehrauer, K. Seidel, and M. Datcu, “Spatial information retrieval from remote-sensing images. ii. Gibbs-Markov random
fields,” IEEE Transactions on Geoscience and Remote Sensing, vol. 36, no. 5, pp. 1446–1455, Sept. 1998.
[15] J. Inglada and G. Mercier, “A new statistical similarity measure for change detection in multitemporal SAR images and its extension to
multiscale change analysis,” IEEE TGARS, vol. 45, no. 5, pp. 1432–1445, May 2007.
[16] L. Bruzzone, D. Prieto, and S. Serpico, “A neural-statistical approach to multitemporal and multisource remote-sensing image classification,
IEEE Transactions on Geoscience and Remote Sensing, vol. 37, no. 3, pp. 1350–1359, May 1999.
[17] B. Demir, F. Bovolo, and L. Bruzzone, “Classification of time series of multispectral images with limited training data,” Image Processing,
IEEE Transactions on, vol. 22, no. 8, pp. 3219–3233, Aug 2013.
[18] L. Gueguen and R. Hamid, “Large-scale damage detection using satellite imagery,IEEE Conference on Computer Vision and Pattern
Recognition, 2015.
[19] “United Nations Institute for Training and Research, urlhttp://www.unitar.org/unosat/maps.”
[20] S. Cui and M. Datcu, “Coarse to fine patches-based multitemporal analysis of very high resolution satellite images,” in Analysis of
Multi-temporal Remote Sensing Images (Multi-Temp), 2011 6th International Workshop on the. IEEE, 2011, pp. 85–88.
[21] G. Mercier, G. Moser, and S. Serpico, “Conditional copulas for change detection in heterogeneous remote sensing images,IEEE TGARS,
vol. 46, no. 5, pp. 1428–1441, May 2008.
[22] L. Gueguen, “Classifying compound structures in satellite images: A compressed representation for fast queries,” IEEE TGARS, vol. 53,
no. 4, pp. 1803–1818, April 2015.
[23] ——, “Image patch characterization with shape distributions: Application to worldview-2 images,” in IEEE IGARSS, Melbourne, Australia,
2013.
[24] P. Monasse and F. Guichard, “Fast computation of a contrast-invariant image representation,” IEEE Transactions on Image Processing,
vol. 9, no. 5, pp. 860 –872, may 2000.
[25] E. Urbach, J. Roerdink, and M. Wilkinson, “Connected shape-size pattern spectra for rotation and scale-invariant classification of gray-scale
images,” IEEE Tran. PAMI, vol. 29, no. 2, pp. 272–285, 2007.
[26] E. Nowak, F. Jurie, and B. Triggs, “Sampling strategies for bag-of-features image classification,” in Computer Vision - ECCV 2006, ser.
Lecture Notes in Computer Science, A. Leonardis, H. Bischof, and A. Pinz, Eds. Springer Berlin Heidelberg, 2006, vol. 3954, pp.
490–503. [Online]. Available: http://dx.doi.org/10.1007/11744085 38
[27] H. Jegou, M. Douze, and C. Schmid, “Improving bag-of-features for large scale image search,” International Journal of Computer Vision,
vol. 87, no. 3, pp. 316–336, 2010. [Online]. Available: http://dx.doi.org/10.1007/s11263-009-0285- 2
[28] R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin, “LIBLINEAR: A library for large linear classification,” Journal of
Machine Learning Research, vol. 9, pp. 1871–1874, 2008.
[29] T. Geraud, E. Carlinet, S. Crozet, and and Laurent Najman, “A quasi-linear algorithm to compute the tree of shapes of nD images,” in
ISMM, 2013. [Online]. Available: http://hal.archives-ouvertes.fr/docs/00/79/86/20/PDF/geraud.2013.ismm.pdf
[30] W. Hesselink, “Salembier’s min-tree algorithm turned into breadth first search,” Information processing letters, vol. 88, pp. 225–229, 2003.
[31] L. Gueguen and G. Ouzounis, “Hierarchical data representation structures for interactive image information mining,” International
Journal of Image and Data Fusion, vol. 3, no. 3, pp. 221–241, 2012. [Online]. Available: http://www.tandfonline.com/doi/abs/10.1080/
19479832.2012.697924
[32] P. Viola and M. Jones, “Rapid object detection using a boosted cascade of simple features,” in Computer Vision and Pattern Recognition,
2001. CVPR 2001. Proceedings of the 2001 IEEE Computer Society Conference on, vol. 1, 2001, pp. I–511–I–518 vol.1.
December 16, 2015 DRAFT
... C HANGE detection (CD) plays a crucial role in remote sensing image (RSI) processing, which is the quantitative analysis of land use changes from RSIs captured at different time periods. Beyond its fundamental role, CD finds application in various real-world scenarios, such as disaster assessment [1]- [3], land cover recognition [4]- [6], and forest monitoring [7]- [9]. ...
Article
Full-text available
Semi-supervised change detection (CD) methods have garnered increasing attention due to their capacity to alleviate the dependency of fully-supervised methods on a large number of pixel-level labels. These methods predominantly leverage Generative Adversarial Network (GAN) architecture and consistency regularization technology. However, they encounter challenges associated with background noise from cross-temporal images. In this paper, we propose a novel multi-level consistency-regularization-based semi-supervised CD approach that incorporates Fourier-based frequency transformation and a reliable pseudo label selection scheme. Specifically, we replace the low-frequency spectrum of one temporal image with a frequency domain transformation derived from the corresponding image in the same bi-temporal remote sensing image (RSI) pair, enhancing the model's capability to discern meaningful changes amidst background noise, thereby contributing to more robust change detection. Furthermore, excessively high pseudo-label thresholds in consistency regularization methods may result in the underutilization of valuable unlabeled data. To address this issue, we design a straightforward sigmoid-like function to dynamically adjust the selection threshold for the reliable pseudo label selection scheme. This strategy takes into consideration the learning status throughout the entire training process, ensuring more effective utilization of unlabeled information. We demonstrate significant performance improvements across three widely-used public datasets, namely LEVIR-CD, WHU-CD, and CDD. Notably, on the three datasets with only 1% labeled data, our method achieved an $IoU^{c}$ of 71.29%, 63.90%, and 51.00%, outperforming existing state-of-the-art methods by 2.84%, 1.21%, and 0.98%, respectively. These results robustly substantiate the effectiveness of our approach, showcasing its potential in scenarios where labeled data is limited.
... C HANGE detection in dual-temporal remote sensing imagery is a crucial component of terrestrial change monitoring and is extensively applied in fields such as urban planning [1], forest cover mapping [2], and disaster damage assessment [3], [4]. Although the definition of change detection varies by application, its core objective is to mark binary changes on the surface from registered images taken at two different times [5]- [8]. ...
Article
Full-text available
In recent years, change detection (CD) methods have faced challenges in being applied to various types of remote sensing datasets and related research fields, particularly in the domain of change detection in remote sensing images. While convolutional neural networks (CNNs) have significantly advanced change detection in remote sensing images, they struggle with modeling long-distance dependencies between image pairs, leading to poor recognition of semantically similar objects with different features. Meanwhile, Transformer technology has gained widespread popularity for global applications, but it lacks in extracting local features effectively. Current approaches typically rely on single or dual-branch network structures for mining change-related features in remote sensing images, yet they still lack in extracting both local and global features comprehensively. To address these issues, this paper proposes a triple-branch network combining Transformer and CNN, comprising CNN, Transformer, and Channel feature-guided branch. These branches extract and fuse three types of change features from both global and local perspectives. Importantly, the Channel feature-guided branch is introduced to capture continuous and detailed change relationship features, thus enhancing the model's change discrimination ability. Experimental results on three datasets (LEVIR-CD, WHU-CD, and GZ-CD) demonstrate the superior performance of the model over state-of-the-art methods.
... Change detection, which is dedicated to monitoring the dynamic change of land surface features, plays an increasing role in remote sensing applications [1], such as urban sprawl monitoring [2], forest cover change surveys [3], disaster damage assessment (e.g., landslides, earthquakes) [4,5], and others [6]. ...
Article
Full-text available
Optical satellite image change detection has attracted extensive research due to its comprehensive application in earth observation. Recently, deep learning (DL)-based methods have become dominant in change detection due to their outstanding performance. Remote sensing (RS) images contain different sizes of ground objects, so the information at different scales is crucial for change detection. However, the existing DL-based methods only employ summation or concatenation to aggregate several layers of features, lacking the semantic association of different layers. On the other hand, the UNet-like backbone is favored by deep learning algorithms, but the gradual downscaling and upscaling operation in the mainstream UNet-like backbone has the problem of misalignment, which further affects the accuracy of change detection. In this paper, we innovatively propose a hierarchical feature association and global correction network (HFA-GCN) for change detection. Specifically, a hierarchical feature association module is meticulously designed to model the correlation relationship among different scale features due to the redundant but complementary information among them. Moreover, a global correction module on Transformer is proposed to alleviate the feature misalignment in the UNet-like backbone, which, through feature reuse, extracts global information to reduce false alarms and missed alarms. Experiments were conducted on several publicly available databases, and the experimental results show the proposed method is superior to the existing state-of-the-art change detection models.
... Besides, compared to natural scene images, the special imaging mechanism enables RS images to contain more abundant ground objects. Therefore, CD is widely applied in various earth observation tasks, such as urban expansion [1,2], land utilization [3,4], water cover [5,6] and disaster assessment [7,8]. ...
Preprint
Full-text available
p>In change detection (CD), how to reduce the interferences of pseudo changes and accurately recognize the change of interest (COI) are two important challenges. Recently, considering the powerful long-distance modeling ability of the transformer, some methods try to introduce the transformer into CD and have already proposed several useful CD strategies. However, the existing strategies either do not directly work on the change of interest (COI) or are difficult to give full play to the advantages of the transformer. Therefore, in this paper, we propose a new CD strategy to tackle the above challenges. Specifically, we focus on the difference domain and propose the differential feature triple refinement strategy to precisely characterize COI. We first adopt a CNN-based differential feature extraction (DFET) module to extract the possible detail differences between bitemporal images. Then, we introduce a transformer-based differential feature enhancement (DFEH) module to capture and enhance the COI regions from the preliminarily extracted differences. Finally, we utilize a CNN-based differential feature fusion (DFFS) module to integrate the fine-grained information into the enhanced COI regions. Based on the proposed strategy, we design a new network named DiFormer. We verify six effective hyperparameter configurations and conduct experiments on four commonly researched CD datasets. Extensive experiment results indicate that our proposed strategy has the outstanding generalization ability and obtains the better balance between computation costs and model performance. Peculiarly, when only adopting the Natural Scene Image Pretraining (NSIP), our method still exceeds the recently proposed CD methods which especially focus on the improvement of Remote Sensing Image Pretraining (RSIP).</p
Article
Remote sensing change detection (RSCD) focuses on identifying regions that have undergone changes between two remote sensing images captured at different times. Recently, convolutional neural networks (CNNs) have shown promising results in the challenging task of RSCD. However, these methods do not efficiently fuse bitemporal features and extract useful information that is beneficial to subsequent RSCD tasks. In addition, they did not consider multilevel feature interactions in feature aggregation and ignore relationships between difference features and bitemporal features, which thus affects the RSCD results. To address the above problems, a difference-guided multiscale aggregation attention network, DGMA <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> -Net, is developed. Bitemporal features at different levels are extracted through a Siamese convolutional network and a multiscale difference fusion module (MDFM) is then created to fuse bitemporal features and extract, in a multiscale manner, difference features containing rich contextual information. After the MDFM treatment, two difference aggregation modules (DAMs) are used to aggregate difference features at different levels for multilevel feature interactions. The features through DAMs are sent to the difference-enhanced attention modules (DEAMs) to strengthen the connections between bitemporal features and difference features and further refine change features. Finally, refined change features are superimposed from deep to shallow and a change map is produced. In validating the effectiveness of DGMA <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> -Net, a series of experiments are conducted on three public RSCD benchmark datasets (LEVIR-CD, BCDD, and SYSU-CD). The experimental results demonstrate that DGMA <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> -Net surpasses the current eight state-of-the-art methods in RSCD. Our code is released at https://github.com/yikuizhai/DGMA2-Net.
Article
Change detection (CD) in high-resolution remote sensing has received large attention due to its wide range of applications. Many methods have been proposed in the literature and achieved excellent performance. However, they are often fully supervised, thus requiring abundant pixel-level labeled samples, which is time-consuming and labor-intensive. Especially compared to the common single-temporal interpretation, labeling bi-temporal images is often more complicated. Therefore, this study combines weakly supervised learning (WSL) to reduce label acquisition costs. However, changed regions are small, fragmented, and similar to the background, which increases the gap between weakly supervised and fully supervised tasks. To address these difficulties, we explore self-supervised methods to construct a WSL framework based on image-level labels for general CD, termed WSLCD in this article. First, we design a double-branch Siamese network to derive embeddings and initial class attention maps (CAMs), which input the original image pair and the spatially transformed image pair. Second, mutual learning and equivariant regularization (MLER) are enforced on CAMs from different views, which implements consistency constraints in confusion regions and makes CAMs learn from each other based on saliency regions. Furthermore, prototype-based contrastive learning (PCL) is designed such that unreliable pixels can learn from prototypes computed from reliable pixels. PCL includes intraview contrast and cross-view contrast depending on whether the prototypes and class embeddings are from the same view. With the above strategies, we narrow the gap between image-level weakly supervised CD and fully supervised CD. Experiments are conducted on three CD datasets, including CLCD, DSIFN, and GCD. Our method achieves state-of-the-art performance on pseudo-label generation and CD. The code is available at https://github.com/mfzhao1998/WSLCD .
Article
Compared with binary change detection (BCD), semantic change detection (SCD) further provides the category information of bitemporal changed regions which is significant for the practical application of Earth Observation. Although the recently proposed triple-branch structures including one BCD branch and two classification branches can effectively achieve the task balance, they still need to employ the carefully designed difference extraction module and branch interactions to capture the bitemporal correlations, which increases the complexity of the semantic information utilization. In this paper, we propose a new triple-branch network named JFRNet to tackle this challenge. From the perspective of the SCD process, because the category information and the change information are both derived from bitemporal images, we take the joint bitemporal features as the unified input, which can help each branch perceive the bitemporal semantic correlations without any additional interaction operations. From the perspective of the SCD structure, we introduce the convolutional attention fusion module (CAFM) and the convolutional attention refinement module (CARM) to unify the branch structure, which can help our model refine the unique semantic information without any specially designed difference extraction modules. Extensive experiment results on three available datasets indicate that compared with the baseline methods, our proposed JFRNet successfully simplifies the reasoning process and obtains the better SCD performance.
Article
Change detection in multitemporal remote sensing images aims to generate a difference image (DI) and then analyze it to identify the unchanged/changed areas. The current change detection techniques always investigate a single change detection task of two images from the image series one by one and may ignore the relevant information across the different tasks. Furthermore, theoretical results have demonstrated that the distribution of DI can be interpreted by a Rayleigh-Rice mixture model (RRMM). The parameters of RRMM are usually estimated by the expectation-maximization (EM) algorithm, which is easy to be trapped into local minima. In order to address these issues, a selective transfer based evolutionary multitasking change detection method is proposed to deal with multiple change detection tasks concurrently. For each change detection task, the log-likelihood function and centroid distance function are considered as two objectives to be optimized simultaneously. In the proposed method, a RRMM parameter estimation driven initialization method with random partition of the data is designed by maximum likelihood estimates of the parameters. Then the next population is generated by the intra-task and inter-task genetic transfer operators. A selective knowledge transfer based local search strategy is proposed to further improve the population by applying EM algorithm. In this strategy, the samples in the unchanged class of multiple tasks are utilized to estimate the parameters to acquire knowledge transferred from the other task. Experiments on three real remote sensing data sets demonstrate that the selective transfer based evolutionary multitasking change detection method is able to accelerate the convergence and achieve superior performance in terms of accuracy.
Article
Remote-sensing image change detection (CD) task plays an important role in land-use surveys, city construction investigations, and other vital industries. Recently, deep learning has become a mainstream method for this task due to its satisfactory performance in most cases. However, it often suffers from difficulties in dealing with ambiguous regions, where pseudo-changes happen or real changes are corrupted. In this article, we propose an ambiguity-aware network (AANet) to address the aforementioned issue. Specifically, our network first adopts convolutional layers to learn features from dual-temporal images. After that, an ambiguity refinement module (ARM) is designed to extract the ambiguity regions and then difference features are generated based on it. Considering that the scales of different changed objects vary, a weight rearrangement module (WRM) is proposed to fuse the difference features from different layers. To test the performance of our proposed model, we conduct experiments on three benchmark datasets, including SYSU-CD, SVCD, and LEVIR-CD. The experimental results show that our model can outperform several state-of-the-art models on all three datasets, which validates its effectiveness. The source code of our proposed model will be released at https://github.com/KevinDaldry/AANet .
Conference Paper
Full-text available
Satellite imagery is a valuable source of information for assessing damages in distressed areas undergoing a calamity, such as an earthquake or an armed conflict. However, the sheer amount of data required to be inspected for this assessment makes it impractical to do it manually. To address this problem, we present a semi-supervised learning framework for large-scale damage detection in satellite imagery. We present a comparative evaluation of our framework using over 88 million images collected from 4,665 square kilometers from 12 different locations around the world. To enable accurate and efficient damage detection, we introduce a novel use of hierarchical shape features in the bags-of-visual words setting. We analyze how practical factors such as sun, sensor-resolution, and satellite-angle differences impact the effectiveness of our proposed representation, and compare it to five alternative features in multiple learning settings. Finally, we demonstrate through a user-study that our semi-supervised framework results in a ten-fold reduction in human annotation time at a minimal loss in detection accuracy compared to an exhaustive manual inspection
Article
Full-text available
With the increased spatial resolution of current sensor constellations, more details are captured about our changing planet, enabling the recognition of a greater range of land use/land cover classes. While pixeland object-based classification approaches are widely used for extracting information from imagery, recent studies have shown the importance of spatial contexts for discriminating more specific and challenging classes. This paper proposes a new compact representation for the fast query/classification of compound structures from very high resolution optical remote sensing imagery. This bag-of-features representation relies on the multiscale segmentation of the input image and the quantization of image structures pooled into visual word distributions for the characterization of compound structures. A compressed form of the visual word distributions is described, allowing adaptive and fast queries/classification of image patterns. The proposed representation and the query methodology are evaluated for the classification of the UC Merced 21-class data set, for the detection of informal settlements and for the discrimination of challenging agricultural classes. The results show that the proposed representation competes with state-of-the-art techniques. In addition, the complexity analysis demonstrates that the representation requires about 5% of the image storage space while allowing us to perform queries at a speed down to 1 s/ 1000 km2/CPU for 2-m multispectral data.
Conference Paper
Full-text available
To compute the morphological self-dual representation of images, namely the tree of shapes, the state-of-the-art algorithms do not have a satisfactory time complexity. Furthermore the proposed algorithms are only effective for 2D images and they are far from being simple to implement. That is really penalizing since a self-dual representation of images is a structure that gives rise to many powerful operators and applications, and that could be very useful for 3D images. In this paper we propose a simple-to-write algorithm to compute the tree of shapes; it works for nD images and has a quasi-linear complexity when data quantization is low, typically 12 bits or less. To get that result, this paper introduces a novel representation of images that has some amazing properties of continuity, while remaining discrete.
Conference Paper
Full-text available
This paper describes an image patch characterization for image in-formation mining tasks. An image patch is first decomposed into a multi-scale segmentation thanks to the Max Tree representation. Then, each segment is described by shift invariant shape attributes. Finally, the segment attributes are aggregated into a shape distribu-tion which constitutes the patch characterization. Illustrations of this image content description are given for patches of a WorldView-2 multi-spectral scene, and the information relevance is assessed by an automatic classification of the patch characteristics which is com-pared to land use/land cover annotations.
Article
Full-text available
Recent improvements in the spatial resolution of commercial satellite imagery make it possible to apply very high-resolution (VHR) satellite data for assessing structural damage in the aftermath of humanitarian crises, such as, armed conflicts. Visual interpretation of pre- and post-crisis very high-resolution satellite imagery is the most straightforward method for discriminating structural damage and assessing its extent. However, the feasibility of using visual interpretation alone diminishes in the cases of large and dense urban settlements and spatial resolutions in the range of 2 m to 3 meters and larger. Visual interpretation can be further complicated at spatial resolutions greater than 1 m if accompanied by shadow formation and differences in sensor and solar conditions between the pre- and post-conflict images. In this study, we address these problems through investigating the use of traditional change techniques, namely, image differencing and principle component analysis, with an object-oriented image classification software, e-Cognition. Pre-conflict Ikonos (2 m resolution) images of Jenin in the Palestinian territories and Brest (1 m resolution) in FYROM were classified using the e-Cognition software. Thereafter, the pre-conflict classification was used to guide the classification, using e-Cognition, of the pixel-based change detection analysis. The second part of the study examines the feasibility of using mathematical morphological operators to automatically identify likely structurally damaged zones in dense urban settings. The overall results are promising and show that object-oriented segmentation and classification systems facilitate the interpretation of change detection results derived from very high-resolution (1 m and 2 m) commercial satellite data. The results show that object-oriented classification techniques enhance quantitative analysis of traditional pixel-based change detection applied to very high-resolution satellite data and facilitate the interpretation of changes in urban features. Finally, the results suggest that mathematical morphological methods are a potential new avenue for automatically extracting likely damaged zones from very high-resolution satellite imagery in the aftermath of disasters.
Article
Full-text available
A methodology for extracting complex structures in VHR images is presented. Using morphological descriptors for representing simple structures, the method proposes to aggregate those in a fuzzy logic framework. The result is a map of complex structure membership. Then, the methodology is adapted for characterizing damaged buildings in a VHR scene. During the characterization process, the image analyst uncertainty knowledge is incorporated through visual inspection. Damaged buildings membership obtained on a Quickbird wide scene show the effectiveness of the methodology, in terms of reliability and user-machine interactions. In addition, such representation is demonstrated to be versatile, as demonstrated in spatial statistics extraction.
Article
In this article an interactive image information mining protocol is presented aiming at a computationally efficient pattern interpretation. The method operates on very high resolution (VHR) remote-sensing optical imagery and follows a modular approach. Images are projected onto a hierarchical image representation structure, the Max-Tree, which interfaces multi-dimensional features of the image components. Positive and negative samples are selected interactively from the image space and are translated into features describing best the targeted and non-desired patterns. Sourcing the feature entries into a hierarchical clustering algorithm, the kd-Tree, yields a structured representation that ensures fast classification. A classification is computed directly from the kd-Tree and is applied on the Max-Tree for accepting or rejecting image components. The complete process cycle is demonstrated on gigapixel-sized VHR satellite images and requires 3 min for building the Max-Tree, 30 min for hierarchical clustering and less than 10 s for each example based query.
Article
A variety of procedures for change detection based on comparison of multitemporal digital remote sensing data have been developed. An evaluation of results indicates that various procedures of change detection produce different maps of change even in the same environment.
Article
A variety of procedures for change detection based on comparison of multitemporal digital remote sensing data have been developed. An evaluation of results indicates that various procedures of change detection produce different maps of change even in the same environment. -Author
Article
Image classification usually requires the availability of reliable reference data collected for the considered image to train supervised classifiers. Unfortunately when time series of images are considered, this is seldom possible because of the costs associated with reference data collection. In most of the applications it is realistic to have reference data available for one or few images of a time series acquired on the area of interest. In this paper, we present a novel system for automatically classifying image time series that takes advantage of image(s) with an associated reference information (i.e., the source domain) to classify image(s) for which reference information is not available (i.e., the target domain). The proposed system exploits the already available knowledge on the source domain and, when possible, integrates it with a minimum amount of new labeled data for the target domain. In addition, it is able to handle possible significant differences between statistical distributions of the source and target domains. Here, the method is presented in the context of classification of remote sensing image time series, where ground reference data collection is a highly critical and demanding task. Experimental results show the effectiveness of the proposed technique. The method can work on multimodal (e.g., multispectral) images.