Content uploaded by Lionel Gueguen
Author content
All content in this area was uploaded by Lionel Gueguen on Dec 31, 2018
Content may be subject to copyright.
1
Towards A Generalizable Image Representation
For Large-Scale Change Detection: Application
To Generic Damage Analysis
Lionel Gueguen and Raffay Hamid
Abstract
Each year, multiple catastrophic events impact vulnerable populations around the planet. Assessing the damage
caused by these events in a timely and accurate manner is crucial for efficient execution of relief efforts to help the
victims of these calamities. Given the low accessibility of the damaged areas, high resolution optical satellite imagery
has emerged as a valuable source of information to quickly asses the extent of damage by manually analyzing the
pre- and post-event imagery of the region. To make this analysis more efficient, multiple learning techniques using a
variety of image representations have been proposed. However, most of these representations are prone to variabilities
in capture angle, sun location and seasonal variations. To evaluate these representations in the context of damage
detection, we present a benchmark of 86 pre- and post-event image-pairs with respective reference data derived
from UNOSAT assessment maps, spanning a total area of 4,665 km2from 11 different locations around the world.
The technical contribution of our work is a novel image representation based on shape distributions of image-patches
encoded with locality-constrained linear coding. We empirically demonstrate that our proposed representation provides
an improvement of at least 5% in equal-error-rate over alternate approaches. Finally, we present a thorough robustness
analysis of the considered representational schemes with respect to capture-angle variabilities and multiple sensor
combinations.
Index Terms
Change detection, post-damage assessment, shape distribution, locally linear encoding, benchmark.
I. INTRODUCTION
Each year, hundreds of catastrophic events impact vulnerable areas around the world. Assessing the extent of
damage caused by these crises is crucial in the timely allocation of resources to help the affected populations. Since
disaster-locations are usually not readily accessible, the use of either optical or SAR very high resolution satellite
imagery has emerged as a valuable source of information for estimating the impact of catastrophic events. However,
currently these assessments are done by manually analyzing both pre- and post-event images or only the post images
of distressed areas, making it a labor-intensive and expensive process. It is therefore important to scale-up damage
Lionel Gueguen and Raffay Hamid are with the Image Mining R&D group of DigitalGlobe Inc. based in Westminster, Colorado, USA.
December 16, 2015 DRAFT
2
(a) Eleven AOIs
1-a
1-b
2-a
2-b
3-a
3-b
4-a
4-b
(b) Examples
Fig. 1. (Top) Areas of interest shown with red dots. Each area represents multiple local regions. We considered 11 local regions,
spanning 4,665 km2. (Bottom) Before and after imagery for different events shown in the two rows. These include (1) Typhoon
(Phillipines, Oct. 2013), (2) Armed conflict (Central African Republic, Dec. 2013), (3) Earthquake (Pakistan, Sept. 2013), and
(4) Internally displaced people’s shelters (Somalia, May 2013).
detection to larger areas accurately and efficiently. Our work is a step towards solving this problem by the change
detection analysis of optical pre- and post-images.
In the following, we summarize some of the key challenges that need to be addressed in this regard, and how
our work contributes towards them:
1- Comprehensive Data-Set: Thus far, there has been a lack of comprehensive labeled data-set that could be used
to explore automatic damage detection at scale. To this end, we present a benchmark data-set of 86 pairs of pre-
and post-event VHR optical satellite imagery of distressed areas covering 4,665 km2with the associated reference
dataset of damaged regions acquired by expert interpreters. This data-set was collected by using the satellites of
DigitalGlobe Inc. Our data-set covers 11 different regions from around the world, and spans a wide range of
terrains and climates, with a variety of damage types (see Figure 1). This data-set enables us to rigorously explore
the various facets of the problem at hand.
2- Appropriate Feature Choice: The scale of our problem naturally presents an accuracy-efficiency tradeoff for
the features being considered. To this end, we introduce the use of trees-of-shapes features [1] in a soft-assignment
locality-constrained linear encoding framework [2] that focuses more on the shape characteristics of a scene, as
opposed to its edge attributes (as done by other popular descriptors e.g., SIFT [3]). Our results show that this
December 16, 2015 DRAFT
3
difference proves to be quite important to detect damaged areas accurately. We present a thorough empirical analysis
for the effectiveness of our proposed scheme, and compare it to multiple alternatives.
3- Algorithmic Efficiency: Given that the scale of the damaged areas is usually quite large, and the identification
of damaged areas is required as soon as possible, having a framework with high algorithmic efficiency is of primary
importance. To this end, we propose several algorithmic speed-ups to implement our proposed framework.
The rest of the paper is arranged as follows. We begin by going over some of the relevant previous work within
the context of change and damage detection using satellite imagery in Section II. Section III describes in detail our
benchmark dataset used to evaluate change detection methods for post-disaster situation assessment. In Section IV
we describe an enhancement of shape distribution in terms of locality-constrained shape encoding. Results are
described in section V showing the improvements achieved by locally linear encoding. Finally section VI presents
the main conclusions of our work along with some discussion.
II. PREVIOUS WORK
The problem of change detection specially within the context of damage detection has been explored from mul-
tiple different perspectives. Various change detection methods [4] have been proposed to perform such analysis
automatically in an efficient manner. However, most of these methods usually focus on only one type of disaster
with only a few cases. For examples, the work in [5] shows the ability to capture earthquake damages, while [6]
and [7] underline the capability of detecting damage structures due to armed-conflict. The work in [8] compares
image representations for tornado damages, while [9] [10] explore refugee camps formation. Although quite useful,
these methods propose ad-hoc image representations that do not converge to a single generalizable characterization
due to the lack of a large benchmark data-set encompassing the many variabilities encountered in damage analysis.
Such variabilities include type of disaster, geographical areas, acquisition angles, and sensor combinations, among
others.
Several unsupervised change-detection methods for optical images with passive sensors have been proposed
over the past decade. Among them, a widely used technique is the change vector analysis (CVA) [11]. The CVA
technique consists of computing differences between two images measuring the same scene at two distinct instants.
As a result, each image pixel is associated to a single difference value or a multidimensional spectral change vector.
The analysis of these spectral change vectors through unsupervised classification produces a change map [12].
While this approach is quite general, it does not allow specifying the change of interest, and results in detecting
phenological changes as much as the changes of interest. Furthermore, it can suffer from spatial inaccuracies in
image registration.
Improvements to the CVA approach have been proposed for capturing phenological changes by canonical correla-
tion analysis and local mutual information [10], [13]. Other improvements have also been proposed for overcoming
spatial inaccuracies by modeling the spatial context before the change analysis [14], [15]. The combination of
these unsupervised approaches makes it possible to generate unsupervised change heat-maps which are robust to
phenological variations and to spatial inaccuracies by modeling the most probable transformation from the pre-
December 16, 2015 DRAFT
4
TABLE I
FOU R OF TH E SI X OF DIG ITAL GLOB E’S S ATEL LIT E ARE U SE D IN OU R 86 IMAGE-PAIRS. HER E TH E PER -PIXEL RESOLUTIONS OF
QUI CKBIRD (QB-2), WO RLD VIEW-1 (WV-1), WORL DVIEW-2 (WV-2) AND GE OEY E-1 (GE-1) AR E 0.61M,0.5M,0.46M A ND 0.41M
RE SPE CT IVE LY.
Satellites QB-2 WV-1 WV-2 GE-1
QB-2 1 - - -
WV-1 9 14 - -
WV-2 3 23 6 -
GE-1 5 11 9 5
image to the post-image and vice-versa. However these unsupervised approaches are limited by capturing relevant
changes as opposed to other real changes which are not of interest for the considered application.
To address this challenge, supervised methods have been proposed to focus the change heat-maps towards the
changes of interest [16] . Given the high skill-set required from the photo-interpreters to assess the damage accurately,
acquiring reliable ground-truth training examples is particularly challenging, and therefore this dependence on
manually labelled training data has to be minimized [17]. Towards this end, semi-supervised approaches have been
recently proposed showing an improvement for collecting labelled samples [18].
III. DATA SET AND BENCHMARK
We compiled a large scale data set to benchmark change detection methods in the context of damage detection.
To this end, eleven areas of interest (cf. Table III) around the world were selected. This selection was made
on the basis of major catastrophic events analyzed by the United Nations Institute for Training and Research
(UNITAR/UNOSAT) [19]. This international organization is responsible for publishing maps of geo-located points
indicating relevant changes on the ground. We used this information from UNITAR/UNOSAT to build our reference
datasets for the 11 selected AOIs, which is composed of 29945 points. Examples of these geolocated change points
are illustrated in Figs. 8- 11. The layout for these AOIs is shown in Figure 1. Different types of crisis events
were considered in our AOIs including armed conflicts, earthquakes, typhoons, and refugee-camp developments
(see Table III and Figure 1 for example cases).
We collected high resolution panchromatic imagery from DigitalGlobe’s archive such that we can form image-
pairs covering the considered AOIs that were acquired as close as possible to the dates of the catastrophic events.
In the context of damage analysis, the pre- and post-event images are likely to be captured from different sensors,
and with different acquisition conditions. To incorporate these acquisition variabilities, we selected multiple pairs
per AOI, resulting in a set of 86 image-pairs from the DigitalGlobe’s satellite constellation: QuickBird, WorldView-
1, WorldView-2and GeoEye-1without any restriction on the angle of acquisitions. The variability in our sensor
combinations is given in Table. I.
Another important source of variability in satellite images is the acquisition angles, as it effects the directions and
lengths of shadows cast by physical features on the ground (e.g., human settlements). While maintaining similar
acquisition angles help automate change detection techniques, obtaining pairs of pre- and post-event images in
December 16, 2015 DRAFT
5
TABLE II
LIS T OF SH AP E DES CRI PT ORS U SE D BY OU R FR AME WOR K.
Area µ0,0
Eccentricity q1−
λ1
λ0
λi
µ2,0+µ0,2
2+ (2i−1) q4µ2
1,1+(µ2,0−µ0,2)2
2
Hu1η2,0+η0,2
Hu2(η2,0−η0,2)2+ 4η2
1,1
such a tight time constraint cannot be guaranteed. To incorporate this important variability in our data-set, we did
not apply any restriction on the angles resulting in a uniform sampling of the multi-angle space as illustrated in
Figure 2.
TABLE III
THE 11 AREAS OF INTEREST SELECTED FOR BUILDING THE BENCHMARK DATASET FOR CHANGE DETECTION ANALYSIS. THE 1 1 AOIS
HAVE BE EN S ELE CT ED WI TH R ESP EC T TO TH E UNITAR/UNOSAT PHO TO-I NTE RP RETATI ON DATA PU BLI SHE D AT
HT TP://WWW.UNI TAR.O RG /UNO SAT/. VHR PAN CH ROM ATIC IM AGE PA IRS H AVE BE EN SE LEC TE D AS TO D EFI NE DATE I NTE RVALS
EN COM PASS IN G AND A S CLO SE A S POS SI BLE T O THE G IV EN PR E-A ND PO ST-DATE S.
AOI Disaster Type Pre-Date Post-Date Number of pairs UNITAR Ref.
Damage Assessment for Jebri area, Awaran
District, Balochistan Province, Pakistan
earthquake 27/08/13 27/09/13 3 1831
Damage Assessment for Gajar Area, Awaran
District, Balochistan Province, Pakistan
earthquake 26/08/13 26/09/13 10 1828
Damaged Structures in Bentiqui, Leyte,
Philippines
typhoon 11/09/13 11/11/13 2 1866
Structural Development, Afgooye Corridor,
Somalia
refugees 12/02/13 25/05/13 10 1856
Damage Assesment In The City Of Malakal,
Upper Nile State, South Sudan
conflict 06/12/13 15/03/14 10 1961
Damage assesment in the city of Bor, Jonglei
State, South Sudan
conflict 25/12/13 19/01/14 10 1917
Destruction in Mayom, Unity State, South
Sudan
conflict 29/09/13 11/01/14 10 1922
Damage Assessment in the City of Bentiu,
Unity State, South Sudan
conflict 02/01/14 18/01/14 10 1919
Destruction in Rubkona, Unity State, South
Sudan
conflict 02/01/14 13/01/14 10 1915
Damage Assessment in Paoua, OuhamPende,
Central African Republic
conflict 03/11/13 18/06/14 1 2016
Destruction in Bossangoa Area, Ouham,
Central African Republic
conflict 05/12/13 22/01/14 10 1957
December 16, 2015 DRAFT
6
10
20
30
40
30
210
60
240
90
270
120
300
150
330
180 0
Satellite Azimuth Difference
Satellite Elevation Difference
10
20
30
40
30
210
60
240
90
270
120
300
150
330
180 0
Sun Azimuth Difference
Sun Elevation Difference
Used Pair Angle Difference
Optimal Pair Angle Difference
Fig. 2. Scatter plot showing the differences of sun and satellite angles for the 86 selected strips. The scatter is provided in angular space, where
the x-axis represents the radius and y-axis the actual angle.
IV. METHODOLOGY
Performing automatic change detection using high resolution imagery raises multiple challenges. In the following
we present our proposed methodology and show how our approach addresses these challenges.
Due to acquisition angles differences, performing a direct pixel comparison produce a lot of false-alarms, and a
window-based image description is always found more robust [20], [21]. Furthermore, the changes that are relevant
in a given situation greatly vary across geography, events types, and acquisition angle combinations. Therefore, it is
generally helpful to indicate the relevant changes with training examples to the detection system. In the following,
we propose a novel patch-based image description exploiting connected component based image analysis along
with an efficient encoding scheme to enhance the accuracy of automatic change analysis as illustrated in Fig. 3.
A. Image Encoding
Max/Min-Tree and tree of shapes have shown great potential for analyzing high resolution panchromatic imagery [1],
[22]. These structures allow efficiently and compactly analyzing images without loss of information, by representing
the connected components of lower and upper level sets, i.e.,:
χλ(u) = {p∈Ω|u(p)≤λ},(1)
χλ(u) = {p∈Ω|u(p)≥λ}.(2)
where for a gray-scale image u: Ω 7→ N, we define χλ(u)and χλ(u)as the lower and upper level-sets of u. Note
that the connected components of these (upper or lower) level sets are a lossless representation of uand provide
its segment based representation, which is fundamentally different from edge based image representations [3].
These patch-based decompositions allow the analysis of image objects at very different scales, enabling to describe
tiny image structures as well as larger ones in one combined representation. When tackling damage analysis, it is
December 16, 2015 DRAFT
7
important to keep as much fine resolution informations as possible. The lower and upper level sets inherently keep
this fine grained information for further usage.
In the context of satellite image analysis, it is imperative to have image descriptors which are invariant to rotation
and limited translations. To achieve these invariants, we propose to use the shape descriptors of the components
which are rotation and translation invariant. We propose to use the second and third order central moments [23] as
the shape descriptors.
Let {pi}n
i=1 be the npixels composing a peak component Pλ
p1(u). Each pixel piis a pair of horizontal and
vertical coordinates pi= (xi, yi). The average of a spectral band vis given by:
Av(Pλ
p1(u)) = 1
n
n
X
i=1
v(xi, yi).(3)
The central and normalized shape moments µa,b,ηa,b are simply expressed for a pair of integers (a, b)∈N2by:
¯x(Pλ
p1(u)) = 1
n
n
X
i=1
xi,(4)
¯y(Pλ
p1(u)) = 1
n
n
X
i=1
yi,(5)
µa,b(Pλ
p1(u)) =
n
X
i=1
(xi−¯x)a(yi−¯y)b,(6)
ηa,b(Pλ
p1(u)) = µa,b
µ(a+b)/2+1
0,0
.(7)
These moments are combined to derive shape descriptors given in Table II. All descriptors can be computed in
linear time by exploiting the nesting property the max/min tree [24], [25], as well of the tree of shapes.
B. Bag-Of-Visual-Words Encoding
In recent years the bag-of-visual-words model has been extremely popular in computer vision for categorizing
large corpus of images [26], [27]. The model treats an image as a collection of spatially unordered representative
visual words. In our context, each constituent overlapping image patch of a large image strip is characterized by
the aggregation of its connected components’ shape descriptors. Previous studies have shown the efficiency of this
scheme for satellite image analysis [22]. In this image representation, each connected component is assigned to a
unique dictionary entry which is then used in the computation of visual-words histogram.
Within the context of bag-of-visual-words representation, various alternate encoding schemes have been proposed
that allow more powerful representation than the aforementioned simple hard-assignment scheme. Here we particu-
larly focus on two such improved encoding approaches, i.e., soft assignment and locality constrained (locally) linear
encoding [2]. In both these cases, a dictionary Dkof visual words is first learned by sampling from a representative
subset of all connected components descriptors and clustering them in kclusters, where kis manually predefined.
The representative subset is extracted randomly from all the connected components existing in the pre- and post-
images. Given the kcluster centers Dk={Ci}computed off-line, any new connected components descriptors d
can then be encoded. We now present the details of the soft-assignment and locally linear encoding schemes.
December 16, 2015 DRAFT
8
Soft Assignment: In this encoding technique, each connected component descriptor is softly assigned to more than
one clusters. The soft assignment attributes a weight to each cluster depending on how far the connected component
descriptor is from it, and can be expressed as:
w(d, Ci) = exp(−kd−Cik2
2/λ)
Γ(8)
where
Γ =
k
X
i=1
exp(−kd−Cik2
2/λ)(9)
Here λis a scaling parameter determined a priori depending on the dictionary size. With this encoding mechanism
each vector dis encoded by the normalized vector [w(d, C1),· · · , w(d, Ck)], with values between 0 and 1. In the
case of λ=∞, the soft assignment becomes a hard assignment where each descriptor is assigned to a unique
dictionary entry. Given a window Wcovering a set of connected components {C Ci}associated with their descriptors
{di}, its descriptor is obtained by computing soft histograms, i.e.:
hW(i) = X
CCj
w(dj, Ci),(10)
where the soft histogram of a window Wis [hW(1),· · · , hW(k)]. The histograms are not further normalized
as to account for various number of components in a window, relating ultimately to different objects. Also, not
normalizing allows to encompass the contrast information, which becomes very important in built-up areas where
the casted shadows plays a substantial role.
Locality-Constrained (Locally) Linear Encoding: The second encoding method tries to linearly encode the
descriptor with respect to the provided kcentroids [2]. In practice, one is set with the following minimization
problem:
arg min
l||d−Dkl||2(11)
such that,||l||1= 1
Here drepresents a shape descriptors, Dkrepresents the shape dictionary, and lrepresents the local linear code.
Then for any window W, its descriptor is obtained by summing the linear contributions of the connected components
descriptors it contains :
lW(i) = X
CCj
l(dj, Ci),(12)
where l(dj, i)is the linear code of the descriptor djat position i. Each window is then described by its k-length
vector [lW(1),· · · , lW(k)]. Again this descriptor is not further normalized to capture contrast and the number of
shape information.
For both encoding methods and for each shape, the encoding is done in practice by searching for its nnearest
neighbor entry in the codebook, and assuming that the dictionary is composed only of these centroids. This technique
allows to produce a sparse representation of each individual shape descriptors maintaining the memory footprint
small but with a good discriminative power. The weight of a shape is determined by the fraction of pixels covering
the considered window. This process is repeated for each window in the pre- and post-event images.
December 16, 2015 DRAFT
9
}
Image 1
Image 2
Feature
Extraction
Supervised
Learning
Final
Detection
Image
Pair
Feature
Encoding
Fig. 3. Given a pair of image-strips, we extract features of their overlapping windows and perform their locality-constrained encoding. We use
these codes in a supervised setting to learn a damage detection classifier.
C. Learning Framework
Our learning framework is illustrated in Fig. 3. This work particularly explores the representational aspect of the
image patches for the problem of damage detection, and therefore focuses more on a supervised learning setting.
For details regarding the active learning aspects of our approach, we refer the reader to see [18].
Based on the available reference points regarding the positive and negative classes, we stack up their corresponding
window representation codes and use them to train a linear support vector machine (SVM). Note that we found
simply using a linear classifier for our problem sufficient since the feature space constructed both by soft-assignment
and locally linear coding already incorporates significant amount of non-linearity. Therefore, we do not to incorporate
any further kernel projection in our framework resulting in faster training and detection times. For this work, we
used linear SVM [28] with L1-regularization and L2-loss function, where the L1-regularization allows us to embed
a feature selection, meaning a selection of the visual words that are optimal for discrimination between relevant
and irrelevant changes. As the shape distribution are stacked, different visual words can be selected for the pre-
and post-event images.
D. Algorithmic speed-up
The main bottleneck in our computational framework is by far the feature extraction and encoding step. We
achieve algorithmic speed-ups for this step in two important ways.
First, we use a quasi-linear algorithm for computing tree of shapes [29] to extract efficiently the upper and
the lower connected components. This representation organizes into a tree the nested connected components, by
maintaining the relations of lower level connected components being holes of the upper level connected components.
Min and max-tree [25] are efficient algorithms for extracting separately the lower and upper level connected
components sets. These two trees can be fused in a unique tree of shapes by the level line algorithm [24]. In
this paper, we adopt the tree of shape algorithm [29] which has a better worst computational complexity than the
fusion of Max-Tree and Min-Tree. The algorithm provides us a tree of shape representation which can be efficiently
stored on two arrays of length n, where nis the number of pixels in the image [30]. The representation enables to
compute the moment based attributes in linear time by exploiting its nested properties [24], [31]. This algorithm
December 16, 2015 DRAFT
10
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
0.5
0.55
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
1
False Negative Rate
True Positive Rate
SA 1
SA 4
SA 8
SA 16
LLC 4
LLC 8
LLC 16
Fig. 4. Average of the ROC curves obtained by linear supervised classification. This graph shows the impact of the coding schemas and coding
parameters on the false negative and true positive rates.
−40 −30 −20 −10 0 10 20 30
0
50
100
150
200
250
300
350
400
Difference Satellite Elevation
Difference Satellite Azimuth
0.05
0.1
0.15
0.2
0.25
0.3
−40 −30 −20 −10 0 10 20 30
−100
−80
−60
−40
−20
0
20
40
60
80
100
Difference Sun Elevation
Difference Sun Azimuth
0.05
0.1
0.15
0.2
0.25
0.3
0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
Ground Displacement (meters)
Equal Error Rate
Fig. 5. Effects of satellite, sun-angle differences and image misalignment on EER derived from shape distribution features. On the left plots,
the color encodes the EER going from low values in blue to higher ones in red.
performs much faster than extracting the connected component at all possible thresholds and require a memory
footprint independent of the number of grey levels considered.
Our second source of speed-up comes during the computation of code histograms. Given a patch, one approach
could relying on looking at the components falling in it and then estimate the histogram of their code weighted
by the number of pixel falling in the patch. As a connected component may span over several patches, it becomes
more efficient to perform a direct filtering of the tree of shapes [25], where the component contrast is multiplied
by its corresponding code weight. Thus for each visual word code i, one records the components having a non
null weight, and remap and reweigh these components into an R. For a given patch W, one can gather thus its
December 16, 2015 DRAFT
11
entry lW(i)by computing the sum of Rwithin it: lW(i) = Px∈WR(x). For overlapping rectangular windows,
this process can be achieved efficiently with image integrals [32].
V. EX PE RI ME NT S AN D RE SU LTS
A. Comparison of Encoding Schemes
We compare the two proposed encoding schemes, namely Soft Assignement (SA) and Locally Linear Coding (LLC)
with different number of nearest neighbors used during the computation. First of all, each image is partitioned in
overlapping tiles of 50×50 pixels (50×50 m2) on the ground, separated by 10 meters, thus producing a change
detection of 10 meter resolution, given the 1meter resolution of the analyzed imagery. Using our dataset of 86
strip pairs covering an area of 4665 km2, the equivalent number of tiles to be considered is around 86 million.
A single dictionary is computed per pair of images, with 128 shape words encoding the characteristics of the
connected components present in the data set. Thus, each image tile is encoded by a histogram or a linear code
of 128-dimensions. Also, the pre- and post-image tile descriptors are stacked for the change detection analysis
purposes, leading to a 256-dimensions description vector for each 10 meter pixels of the considered AOIs. To put
the amount of data being analyzed in perspective, this representation occupies 335 GB of data for an imagery taking
139 GB of disk space.
Since the reference dataset is given in geolocated format, we first rasterize it in the underlying 10 meter feature
grid, and buffer it with a radius of 25 meters to match our tile size. For each image-pair, we randomly pick 50%
of the damage examples as part of the training set and let the 50% remaining ones for evaluating the quality of
the detectors. As damage analysis is a highly imbalanced classification problem, we also randomly pick negative
examples in a number equal to the number of positive examples of the training set. These training examples are
fed to the linear SVM.
Using the reference dataset collected from our AOIs, we use the receiver operating curves (ROCs) to evaluate the
different representations considered. Note that true positive rate (TPR) represents the number of relevant damages
detected over the total number of changes. Similarly, the false positive rate (FPR) represents the area of false-alarms
with respect to the area of no damages. For a given (TPR,FPR)tuple, the associated damage indicator covers an
area of f·TPR+(1 −f)·FPR, where fis the fraction of damaged area. Note that the quantity f·TPR +(1−f)·FPR
is also the relative search space size that needs to be examined by a human photo-interpreter during curation. We
use the Equal Error Rate (EER) as a summary of the performance of an ROC curve, which is given as the ROC
point satisfying FPR = (1 −TPR).
The comparison of the 2encoding schemes is shown in Fig. 4 in terms of the average of ROC curves obtained
for the 86 pairs. Also the dependencies to the number of neighbors during the encoding is evaluated and shown in
the same graph. When the number of neighbors goes to one, both methods behave the same as a hard assignment
of the visual words.
From the analysis of ROC curves, it can be seen that the LLC provides the best description for the purpose of
our experiments in comparison to SA. At a TPR = 0.80, the LLC-based representation provides a FPR = 0.12 in
comparison to a FPR = 0.18 provided by the best SA-based representation. As pointed out earlier, the FPR is a
December 16, 2015 DRAFT
12
direct approximation of the search space, given the small coverage of changes. Thus LLC-based encoding produces
a relative reduction of 30% of the search space.
Secondly, we can observe the impact of the number of neighbors on the accuracy. It can be seen that 4neighbors
provide the best results with LLC, and show a degradation of the accuracies as this number grows. This observation
suggests the existence of a best number of neighbors (greater than 1, hard assignment) maximizing the representation
discrimination. By observing the impact of the number of neighbor on SA, the effect is quite opposite as the accuracy
increases with it until reaching saturation. It can be observed that the description does not improve by using more
than 4-neighbors. Because of the same dictionary used for LLC and SA, this observation lets us draw the conclusion
that any point of the shape features space is close to an average of 4clusters, and that the other clusters are too
far away to be of much use.
B. Robustness To Variabilities
The average EER using shape distribution features as a function of sun, satellite angle-differences and image
misregistration are plotted in Figure 5. It is evident that satellite angle-differences do not impact the representation
accuracy, while the sun angle-differences seem to matter more. This is because sun angles impact shadows which
can significantly vary image appearance. We also computed the average EER for each sensor combination given in
Table I. The standard deviation for this is 0.03, indicating that our approach is robust to using pre- and post-event
imagery from different sensors. Finally, for each image pair, we computed average displacement and its norm
is depicted against the EER in Figure 5, showing an independance between both axes, meaning a robustness to
inaccuracies in spatial registration.
C. Collection of Training Examples
In the context of rapid damage detection, the collection of training examples is a crucial component for efficient
delivery. Also one wants to collect a minimum amount of training examples for achieving the best accuracies. In
the following series of experiments we consider different percentages of positive training examples, and evaluate
their impact on the summary Equal Error Rate metric derived form the average ROC curves. The negative examples
are picked randomly in the image. This task is not expensive since damages are in most cases a rare class. In other
words, it is unlikely that a positive example is picked as a negative example by randomly sampling the AOI.
The impact of the number of training examples on the EER is depicted in Fig. 6. The first observation shows that
the best LLC-based representation produces a degradation of 0.05 of the ERR by taking 5% of positive training
samples, being in average composed 140 points, instead of 50%. This highlights the generalization capability of this
representation from few examples. Secondly, it can be observed that the ranking of the different coding schemas
is maintained with various percentages of training examples, except the 8-neighbors LLC representation which
degrades much faster than the SA-based representation. This highlights again the importance of the number of
neighbors for LLC for maximizing the detection accuracy with a limited number of training examples.
December 16, 2015 DRAFT
13
5 10 15 20 25 30 35 40 45 50
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
% No. of Samples
Equal Error Rate
SA 1
SA 4
SA 8
SA 16
LLC 4
LLC 8
LLC 16
Fig. 6. Impact of the percentage of positive training samples of the Equal Error rate with linear classification.
D. Computational Complexity
We provide indicative duration numbers of our implementation and show the benefits of the proposed algorithms
over simple approaches. We perform our computation on one 11-bit image of size 10000×10000 pixels. Given the
bit depth, the number of grey level becomes 2048, which has a non negligeable impact one the extraction of the
connected component. Performances are evaluated with a single Intel@3.3GHz CPU.
We assess first the benefit of using the tree of shape algorithm over an approach gathering the CC at all possible
thresholds between 0 and 2048. The tree of shape performs the decomposition of the image and the computation
of the moment based attributes in 372 seconds, while the simple approach requires 2544 seconds. We observe 6×
speed up by employing efficient algorithms, without any consideration on the memory consumption which is of the
order of the number of grey values.
Secondly, we assess the benefits in computing the distribution with the proposed technique. The proposed
approach complexity depends linearly on the dictionary size, as we loop over the visual words. Given a visual
word, reconstituting the weighted CCs takes 1.28 seconds. Then, performing the spatial filtering with a 50 pixel
wide window separated by 10 pixels takes 0.39s with the use of image integrals [32]. As a comparison, performing
the spatial filtering and down sampling with a 16-core multi-processor implementation requires 0.9s. There, we
observe again the benefits of strong algorithms. Consequently processing a visual word to obtain the corresponding
histogram bin for all windows requires 1.67s, and given a dictionary with 128 entries, the spatial aggregation
process takes 213s. On the contrary if the process would take place by looping over the windows, the collection of
shape distributions takes 5100s for the same setup which is about 24 times more than the proposed implementation.
December 16, 2015 DRAFT
14
Fig. 7. Damage Type: Conflict. Area: Malakal, Upper Nile State, South Sudan. Top row. The pre- and post- images are displayed side by side
with the UN change point of interests overlaid in green. The right side shows the soft change detection obtained with SD LLC 4, provided that
5% of the relevant changes were given for training the linear SVM. Bottom row. A crop of the same data is displayed at higher resolution,
showing the spatial precision of the change detection.
December 16, 2015 DRAFT
15
Fig. 8. Damage Type: Conflict. Area: Bossangoa, Ouham, Central African Republic. Top row. The pre- and post- images are displayed side by
side with the UN change point of interests overlaid in green. The right side shows the soft change detection obtained with SD LLC 4, provided
that 5% of the relevant changes were given for training the linear SVM. Bottom row. A crop of the same data is displayed at higher resolution,
showing the spatial precision of the change detection.
Indeed, our implementation allow to compute once the elements which are used by several windows.
No speed-up is proposed during the construction of the dictionary and the LLC, as this step takes 100 seconds for
the considered image. Summing the three extraction parts, the proposed representation for a 10000×10000 image
can be computed from a single CPU in 685 seconds as opposed to a straightforward algorithmic implementation
requiring 7744 seconds. Overall, we observe a speed-up of order 10×, which is critical for rapid damage assessment.
Moreover, this proposed single CPU implementation can be further parallelized by processing image blocks, which
can lead to further speed ups.
E. Illustrations and Discussion
Four illustrations of 4-neighbors LLC based change detection are given in Figs. 7- 11. The first examples are
instances of the armed conflicts hitting regions in South Sudan and Central African Republic. The two consecutives
December 16, 2015 DRAFT
16
Fig. 9. Random windows of 250 meters are randomly picked over the four illustrated AOIs. For each window the number of reference points
is computed and compared to the average of the automatically extracted change index.
examples are instances of earthquake and typhoon provoked damages which hit areas of Pakistan and Philippines.
While the settlement patterns look different, they cover areas with well separated buildings and our dataset does
not cover damages occuring over high rise or high urban density areas, which may decrease the accuracies
of the approach. Both pre and post panchromatic imagery are displayed for the full collected AOIs, with the
UNITAR/UNOSAT geolocated points of interest overlaid in green. This reference dataset representation allows a
visual comparison to the supervised detection displayed in a colormap ranging from blue to red, where the supervised
detection where obtained by using 4-neighbors LLC based features and the linear SVM model are trained with
25% of the positive examples.
The four examples highlights the good match at global and close scale of the change detection with the UN
geolocated points. While the scenes undergo major changes of phenological nature (see Fig.7), thanks to training
examples the method is able to narrow the search to relevant change areas. Indeed no major false-alarms are
observed among the 4 sets. Examples of Figs.8,10 highlight also the spatial inconsistencies between imagery and
reference dataset, and because of the tiling mechanism these spatial errors do not have a major impact on the final
results, as primarly exposed in Fig.5. While it is clear that detecting individually the points is a harder problem, the
results of Fig. 9 show that the representation is generic enough to capture the densities of interests points. Random
windows of 250 meters are randomly picked in the 4 detailed AOIs, where the number of reference points and
the average change index are computed. The normalized correlation is evaluated at .95 for the considered datasets.
Also, these change density maps can be easily turned into counting metrics by simple regression.
We conduct another set of qualitative experiments by visualizing and analyzing detected instances versus unde-
tected ones in Figure 12. Most of the missed damages were subtle, naturally making them challenging to automat-
ically detect. For cases where the damage occurred over buildings or well-defined structures, our representation is
able to detect them with high precision. Mistakes tend to be made over subtle changes or damages covering only a
small area in an image chip such as an isolated house. Gaining in spatial precision would require multi-resolution
December 16, 2015 DRAFT
17
Fig. 10. Damage Type: Earthquake. Area: Gajar Area, Awaran District, Balochistan Province, Pakistan. Top row. The pre- and post- images
are displayed side by side with the UN change point of interests overlaid in green. The right side shows the soft change detection obtained with
SD LLC 4, provided that 5% of the relevant changes were given for training the linear SVM. Bottom row. A crop of the same data is displayed
at higher resolution, showing the spatial precision of the change detection.
analysis, such as pyramid-based representation, with better geo-located reference points.
VI. CONCLUSION
In this work, we proposed shape distributions with local linear coding scheme as a generalizable representa-
tion of image-patches within the context of performing large scale damage analysis. We presented a benchmark
dataset of of 86 high-resolution image-pairs to evaluate the improvement of locally linear coding (LLC) over hard
assignment when representing the image components encoded in a tree of shapes. The benchmark dataset has
been built by combining DigitalGlobe archive panchromatic imagery with damage reference datasets provided by
UNITAR/UNOSAT photo-interpreters in real-world scenarios. We created the benchmark dataset to encompass as
much variabilities as possible found in the context of rapid post-disaster mapping. In our empirical analysis, the use
of LLC based shape distributions showed an improvement of 5% in equal error rate (EER) in comparison to other
hard and soft assignment based representations. This improvement translates into a reduction by 30% of the search
space for an equivalent recall of the change instances. Finally, a robustness analysis shows the low correlation
December 16, 2015 DRAFT
18
Fig. 11. Damage Type: Typhoon. Area: Bentiqui, Leyte, Philippines. Top row. The pre- and post- images are displayed side by side with the
UN change point of interests overlaid in green. The right side shows the soft change detection obtained with SD LLC 4, provided that 5% of
the relevant changes were given for training the linear SVM. Bottom row. A crop of the same data is displayed at higher resolution, showing
the spatial precision of the change detection.
between the expected EER and the acquisition angles and ground small displacements, showing the adequacy of
our representation for change analysis in a variety of different contexts.
In our future work, we intend to release the dataset as an open source benchmark to allow more researchers
to evaluate their change detections techniques. We are also interested in expanding our framework with multi-
scale spatial pyramid encoding to enable the representation of isolated changes which are currently not adequately
captured using our single-sized patch based representations.
December 16, 2015 DRAFT
19
Damages Detected
Damages Missed
Pre-EventPost-EventPre-EventPost-Event
Fig. 12. Example damaged areas detected by our framework as well as the ones our framework missed.
REFERENCES
[1] G.-S. Xia, J. Delon, and Y. Gousseau, “Shape-based invariant texture indexing,” IJCV, vol. 88, no. 3, pp. 382–403, 2010. [Online].
Available: http://dx.doi.org/10.1007/s11263-009-0312-3
[2] J. Wang, J. Yang, K. Yu, F. Lv, T. Huang, and Y. Gong, “Locality-constrained linear coding for image classification,” in IEEE CVPR,
2010, pp. 3360–3367.
[3] S. Lazebnik, C. Schmid, and J. Ponce, “Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories,” in
IEEE CVPR, vol. 2, 2006, pp. 2169–2178.
[4] A. Singh, “Digital change detection techniques using remotely-sensed data,” Int. J. Remote Sensing, vol. 10, no. 6, pp. 989–1003, 1989.
[5] P. Gamba, F. Dell’Acqua, and G. Trianni, “Rapid damage detection in the bam area using multitemporal sar and exploiting ancillary data,”
Geoscience and Remote Sensing, IEEE Transactions on, vol. 45, no. 6, pp. 1582–1589, June 2007.
[6] D. Al-Khudhairy, I. Caravaggi, and S. Giada, “Structural damage assessments from ikonos data using change detection, object-oriented
segmentation, and classification techniques,” Photogrammetric Engineering & Remote Sensing, vol. 13, no. 7, pp. 825–837, 2005.
[7] L. Gueguen, M. Pesaresi, P. Soille, and A. Gerhardinger, “Morphological descriptors and spatial aggregations for characterizing damaged
buildings in very high resolution images,” in Proc.of the ESA-EUSC-JRC 2009 conference. Image Information Mining: automation of
geospatial intelligence from Earth Observation, Madrid, Spain, Nov. 2009.
[8] S. W. Myint, M. Yuan, R. S. Cerveny, and C. P. Giri, “Comparison of remote sensing image processing techniques to identify tornado
damage areas from landsat tm data,” Sensors, vol. 8, no. 2, pp. 1128–1156, 2008.
[9] T. Kemper, M. Jenerowicz, L. Gueguen, D. Poli, and P. Soille, “Monitoring changes in the menik farm idp camps in sri lanka using
multi-temporal very high-resolution satellite data,” International Journal of Digital Earth, vol. 4, no. sup1, pp. 91–106, 2011. [Online].
Available: http://dx.doi.org/10.1080/17538947.2010.512430
[10] L. Gueguen, P. Soille, and M. Pesaresi, “Change detection based on information measure,” IEEE TGRS, vol. 49, no. 11, pp. 4503–4515,
2011.
[11] F. Bovolo and L. Bruzzone, “A theoretical framework for unsupervised change detection based on change vector analysis in the polar
domain,” IEEE Tran. Geoscience and Remote Sensing, vol. 45, no. 1, pp. 218–236, Jan. 2007.
December 16, 2015 DRAFT
20
[12] L. Bruzzone and D. Prieto, “Automatic analysis of the difference image for unsupervised change detection,” IEEE TGARS, vol. 38, no. 3,
pp. 1171–1182, May 2000.
[13] A. Nielsen, “The regularized iteratively reweighted mad method for change detection in multi- and hyperspectral data,” IEEE Tran. Image
Processing, 2007.
[14] M. Schr¨
oder, H. Rehrauer, K. Seidel, and M. Datcu, “Spatial information retrieval from remote-sensing images. ii. Gibbs-Markov random
fields,” IEEE Transactions on Geoscience and Remote Sensing, vol. 36, no. 5, pp. 1446–1455, Sept. 1998.
[15] J. Inglada and G. Mercier, “A new statistical similarity measure for change detection in multitemporal SAR images and its extension to
multiscale change analysis,” IEEE TGARS, vol. 45, no. 5, pp. 1432–1445, May 2007.
[16] L. Bruzzone, D. Prieto, and S. Serpico, “A neural-statistical approach to multitemporal and multisource remote-sensing image classification,”
IEEE Transactions on Geoscience and Remote Sensing, vol. 37, no. 3, pp. 1350–1359, May 1999.
[17] B. Demir, F. Bovolo, and L. Bruzzone, “Classification of time series of multispectral images with limited training data,” Image Processing,
IEEE Transactions on, vol. 22, no. 8, pp. 3219–3233, Aug 2013.
[18] L. Gueguen and R. Hamid, “Large-scale damage detection using satellite imagery,” IEEE Conference on Computer Vision and Pattern
Recognition, 2015.
[19] “United Nations Institute for Training and Research, urlhttp://www.unitar.org/unosat/maps.”
[20] S. Cui and M. Datcu, “Coarse to fine patches-based multitemporal analysis of very high resolution satellite images,” in Analysis of
Multi-temporal Remote Sensing Images (Multi-Temp), 2011 6th International Workshop on the. IEEE, 2011, pp. 85–88.
[21] G. Mercier, G. Moser, and S. Serpico, “Conditional copulas for change detection in heterogeneous remote sensing images,” IEEE TGARS,
vol. 46, no. 5, pp. 1428–1441, May 2008.
[22] L. Gueguen, “Classifying compound structures in satellite images: A compressed representation for fast queries,” IEEE TGARS, vol. 53,
no. 4, pp. 1803–1818, April 2015.
[23] ——, “Image patch characterization with shape distributions: Application to worldview-2 images,” in IEEE IGARSS, Melbourne, Australia,
2013.
[24] P. Monasse and F. Guichard, “Fast computation of a contrast-invariant image representation,” IEEE Transactions on Image Processing,
vol. 9, no. 5, pp. 860 –872, may 2000.
[25] E. Urbach, J. Roerdink, and M. Wilkinson, “Connected shape-size pattern spectra for rotation and scale-invariant classification of gray-scale
images,” IEEE Tran. PAMI, vol. 29, no. 2, pp. 272–285, 2007.
[26] E. Nowak, F. Jurie, and B. Triggs, “Sampling strategies for bag-of-features image classification,” in Computer Vision - ECCV 2006, ser.
Lecture Notes in Computer Science, A. Leonardis, H. Bischof, and A. Pinz, Eds. Springer Berlin Heidelberg, 2006, vol. 3954, pp.
490–503. [Online]. Available: http://dx.doi.org/10.1007/11744085 38
[27] H. Jegou, M. Douze, and C. Schmid, “Improving bag-of-features for large scale image search,” International Journal of Computer Vision,
vol. 87, no. 3, pp. 316–336, 2010. [Online]. Available: http://dx.doi.org/10.1007/s11263-009-0285- 2
[28] R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin, “LIBLINEAR: A library for large linear classification,” Journal of
Machine Learning Research, vol. 9, pp. 1871–1874, 2008.
[29] T. Geraud, E. Carlinet, S. Crozet, and and Laurent Najman, “A quasi-linear algorithm to compute the tree of shapes of nD images,” in
ISMM, 2013. [Online]. Available: http://hal.archives-ouvertes.fr/docs/00/79/86/20/PDF/geraud.2013.ismm.pdf
[30] W. Hesselink, “Salembier’s min-tree algorithm turned into breadth first search,” Information processing letters, vol. 88, pp. 225–229, 2003.
[31] L. Gueguen and G. Ouzounis, “Hierarchical data representation structures for interactive image information mining,” International
Journal of Image and Data Fusion, vol. 3, no. 3, pp. 221–241, 2012. [Online]. Available: http://www.tandfonline.com/doi/abs/10.1080/
19479832.2012.697924
[32] P. Viola and M. Jones, “Rapid object detection using a boosted cascade of simple features,” in Computer Vision and Pattern Recognition,
2001. CVPR 2001. Proceedings of the 2001 IEEE Computer Society Conference on, vol. 1, 2001, pp. I–511–I–518 vol.1.
December 16, 2015 DRAFT