Conference PaperPDF Available

Abstract and Figures

GeoSensor is a novel, open-source system that enriches change detection over satellite images with event detection over news items and social media content. GeoSensor combines these two orthogonal operations through state-of-the-art Semantic Web technologies. At its core lies the open-source, semantics-enabled Big Data infrastructure developed by the EU H2020 BigDataEurope project. This allows GeoSensor to offer an on-line functionality, despite facing three major challenges of Big Data: Volume (a single satellite image typically occupies a few GBs), Variety (its data sources include two different types of satellite images and various types of user-generated content) and Veracity, as the accuracy of the end result is crucial for the usefulness of our system. We present GeoSensor's architecture in detail, highlighting the advantages of using semantics for taking the most of the knowledge extracted from news items and Earth Observation products. We also verify GeoSensor's efficiency through a preliminary experimental study.
Content may be subject to copyright.
GeoSensor: Semantifying
Change and Event Detection over Big Data
Nikiforos Pittaras1,2, George Papadakis2, George Stamoulis2, Giorgos Argyriou2, E Karra
Taniskidou2, Emmanouil Thanos2, George Giannakopoulos1, Leonidas Tsekouras1and Manolis
Koubarakis2
1NCSR Demokritos, Greece {pittarasnikif, ggianna, ltsekouras}@iit.demokritos.gr,
2Department of Informatics and Telecommunications, National and Kapodistrian University of Athens, Greece
{npittaras, gpapadis, gstam, gioargyr, ekarra, ethanos, koubarak}@di.uoa.gr
ABSTRACT
GeoSensor is a novel, open-source system that enriches change
detection over satellite images with event detection over news
items and social media content. GeoSensor combines these two
orthogonal operations through state-of-the-art Semantic Web tech-
nologies. At its core lies the open-source, semantics-enabled Big
Data infrastructure developed by the EU H2020 BigDataEurope
project. This allows GeoSensor to oer an on-line functionality,
despite facing three major challenges of Big Data: Volume (a sin-
gle satellite image typically occupies a few GBs), Variety (its data
sources include two dierent types of satellite images and various
types of user-generated content) and Veracity, as the accuracy of
the end result is crucial for the usefulness of our system. We present
GeoSensor’s architecture in detail, highlighting the advantages of
using semantics for taking the most of the knowledge extracted
from news items and Earth Observation products. We also verify
GeoSensor’s eciency through a preliminary experimental study.
KEYWORDS
big data, satellite data, linked data, change detection, event detection
ACM Reference Format:
Nikiforos Pittaras
1,2
, George Papadakis
2
, George Stamoulis
2
, Giorgos Argyriou
2
,
E Karra Taniskidou
2
, Emmanouil Thanos
2
, George Giannakopoulos
1
, Leonidas
Tsekouras
1
and Manolis Koubarakis
2
. 2019. GeoSensor: Semantifying Change
and Event Detection over Big Data. In Proceedings of ACM SAC Conference
(SAC’19). ACM, New York, NY, USA, Article 4, 8 pages. https://doi.org/10.
1145/3297280.3297504
1 INTRODUCTION
In remote sensing, change detection is the process of comparing
two or more satellite images that depict the same area on the Earth
surface, but are taken at dierent points in time [
25
,
33
]. Its goal is
to identify dierences between the images in the form of areas with
changes in land cover or land use (e.g., an area that was an olive
grove in the past is now occupied by buildings). This is a crucial
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for prot or commercial advantage and that copies bear this notice and the full citation
on the rst page. Copyrights for components of this work owned by others than ACM
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specic permission and/or a
fee. Request permissions from permissions@acm.org.
SAC’19, April 8-12, 2019, Limassol, Cyprus
©2019 Association for Computing Machinery.
ACM ISBN 978-1-4503-5933-7/19/04. . . $15.00
https://doi.org/10.1145/3297280.3297504
task, as it provides useful information for many applications, e.g.,
studying land cover evolution, monitoring natural disasters or sup-
port to crisis management. As an example, consider Figures 1(a) and
(b), which depict snapshots of Ukhiya, Chittagong, Bangladesh be-
fore and after the settlement of Rohingya refugees on October, 2017.
In situations like this, change detection allows for fast and accurate
estimation of natural or man-made changes on the Earth surface,
providing valuable support to decision-makers. In our example,
the outcomes of change detection appear in Figure 1(c). Modern
satellite technology makes this possible even for remote areas with
humanitarian or security issues that are dicult to reach.
Interest in change detection using satellite images has grown
recently, due to the availability of long time series of images by
agship Earth observation programmes, such as the US Landsat
program
1
and the EU Copernicus Programme
2
. The latter is currently
the world’s largest Earth observation programme with almost 20
satellites, called Sentinels, expected to be in orbit by 2030. It consists
of a set of complex systems that collect data from satellites as well
as in-situ sensors, providing reliable and up-to-date information
on a range of environmental and security issues under a free, full
and open data policy. Information extracted from this data is also
made freely available to users through the Copernicus services
3
,
which address six thematic areas: land, marine, atmosphere, climate,
emergency and security. Techniques for change detection using
time series of satellite images are important in all of these areas [
7
].
To the best of our knowledge, though, there is no open-source
system that addresses the following three Vs of Big Satellite Data:
Volume stems from the combined eect of the inherently qua-
dratic time complexity of change detection and the large size of
satellite images. In the worst case, all pixels of the one image have
to be compared with all pixels of the other image, yielding a rather
time-consuming procedure for a common pair of images - each
image typically occupies few GBs, containing millions of pixels of
low resolution (i.e., each pixel corresponds to tens of square meters
on the Earth surface). Apparently, change detection poses a quite
challenging computational task for commodity hardware.
Veracity requires that decision makers are able to assess the
quality and correctness of the intelligence extracted from satellite
images, based on relevant news content. In practice, this means that
collateral information about news should provide reliable insights
into the detected changes, ideally on real-time.
1https://landsat.usgs.gov
2http://www.copernicus.eu
3http://www.copernicus.eu/main/services
SAC’19, April 8-12, 2019, Limassol, Cyprus N. Piaras et al.
(a) (b) (c)
Figure 1: Satellite images showing Ukhiya, Chittagong, Bangladesh (a) before, and (b) after the Rohingya refugee crisis on
October, 2017. (c) shows the main areas with changes in land cover or land use as identied by GeoSensor.
Variety emanates from the diverse types of images that are
produced by each satellite constellation. The two polar-orbiting
satellites of the Sentinel-1 constellation are equipped with C-band
Synthetic Aperture Radar (SAR) imaging systems, which enable
image acquisitions regardless of weather and light conditions (i.e.,
the sensor is able to acquire images in the presence of clouds and
during night time). In contrast, the two polar-orbiting satellites
of the Sentinel-2 mission provide High-Resolution Optical data,
acquired by a wide swath high-resolution multispectral sensor.
Their images have 12 spectral bands, covering the spectrum from
the visible domain to the short wavelength infrared domain. Being
an optical passive system, imaging is sensitive to weather conditions
and depends on external illumination. Variety further increases due
to the textual data that are necessary for addressing Veracity.
In this work, we present GeoSensor, a geospatial system that
applies change detection to Copernicus data in a way that addresses
these three Vs of Big Satellite Data. In essence, GeoSensor integrates
a remote sensing component with a social sensing one into a highly
scalable processing chain. Remote sensing applies change detection
techniques to SAR images from Sentinel-1, while using optical
Sentinel-2 images for the validation of the end result. Social sensing
applies event detection techniques to cluster together news items
and social media posts that pertain to the same real-world event
and are located in the area, where change detection took place. For
example, Figure 2(a) depicts a cluster of news items that elucidates
the changes appearing in Figure 1(c). The integration of these two
orthogonal components relies on Semantic Web technologies.
The rest of the paper is structured as follows: Section 2 briey
discusses related work, while Section 3 delves into GeoSensor’s
architecture, highlighting the three workows that lie at its core.
In Section 4, we present preliminary experiments over real-world
data the demonstrate the scalability of our system and in Section 5,
we conclude the paper along with directions for future work.
2 RELATED WORK
Change Detection.
Earth observation is the use of remote sensing
technologies to monitor land, marine and atmosphere. Satellite-
based Earth observation relies on the use of satellite-mounted pay-
loads to gather imaging data about Earth characteristics. We can
distinguish two kinds of remote sensing. (i) In passive remote sens-
ing, the satellite instruments monitor the energy received from the
Earth, due to the reection and re-emission of the Sun’s energy by
the Earth’s surface or atmosphere. Optical or thermal sensors are
commonly-used passive sensors (e.g., Sentinel-2 images). (ii) In ac-
tive remote sensing, the satellite sends energy to Earth and monitors
the energy received back from the Earth’s surface or atmosphere,
enabling day and night monitoring during all weather conditions.
Commonly used active sensors are lasers and radar images, like the
SAR images provided by Sentinel-1.
Recent works on change detection use Deep Neural Networks
[
19
,
20
] in a data-driven fashion, performing classication to detect
changes in pixels or areas in the images. Other works use hier-
archical object-based classication methods [
10
]. Such supervised
algorithms, though, lie out of our scope, due to the lack of publicly
available labeled datasets. Developing such datasets from scratch is
a rigorous process that requires heavy human involvement, even
in-situ inspection of identied changes.
Instead, GeoSensor considers unsupervised algorithms for change
detection. At the moment, it is equipped with the established ap-
proach implemented in ESA’s SNAP Toolbox
4
. Yet, its modular
architecture allows for seamlessly extending it with additional state-
of-the-art approaches, like the clustering technique in [12].
Event Detection.
A review of text event detection is presented
in [
37
], with more recent surveys covering a large variety of detec-
tion methods that are crafted for social media [
4
,
31
]. In [
8
], the
authors utilize a semantically-enabled convolutional neural net-
work (CNN) to categorize social media posts, reporting that their
model outperforms TF-IDF and Word2Vec pre-trained embeddings.
4http://step.esa.int/main/toolboxes/snap
GeoSensor SAC’19, April 8-12, 2019, Limassol, Cyprus
(a)
(b) (c)
Figure 2: (a) A set of news items referring to the Ro-
hingya refugee crises, (b) the corresponding event created
by GeoSensor, and (c) the menu providing access to the indi-
vidual news items of the event.
Other works incorporate CNNs for the joint detection of events and
topics [
8
,
11
,
28
]. Yet, these methods rely on supervised learning,
requiring a labeled dataset, unlike our unsupervised approach.
On another line of research, several works use unsupervised,
semantically-aware clustering for event detection. For example,
a semantically rich multiple-vector representation is used in [
26
,
27
], while [
30
] uses a co-occurrence-based semantic expansion of
words to produce event groups. These works report superior perfor-
mance over non-semantic baselines. In [
34
], the authors employ a
classication-based cleaning phase that is followed by content- and
temporal-based clustering. [
1
] performs a clustering on keyword-
based features over tweets, while the structure of the underlying
social network lies at the core of the approaches presented in [
2
,
21
].
However, all these works mainly rely on vector space features that
capture frequency-related statistics, ignoring the positional infor-
mation of tokens in the source text (i.e., bag of n-grams). In contrast,
our approach relies on graphs of n-grams, which eectively capture
token context both in long, curated documents like news articles
and in short, noisy texts like tweets [3, 18, 32].
3 APPROACH
We now present GeoSensor, explaining how it addresses the above
three Vs of Big Satellite Data.
To tackle Variety, GeoSensor relies heavily on state-of-the-art
Semantic Web technologies, which provide time ecient, unied
access to the outcomes of the remote and the social sensing com-
ponents. In this way, it is capable of seamlessly processing a rich
diversity of data sources, which range from the graphic information
in SAR and optical satellite images to the textual information of
news articles and social media posts.
To address Volume, GeoSensor exploits the distributed process-
ing of a cluster based on the BDI platform [
5
], the open-source,
semantics-enabled Big Data infrastructure that was developed in
Image
Aggregator
Change
Detector
GeoTriples
News
Crawler
storage
User
Interface
Event
Detector
Lookup
Service
Keyword
Search
Authorization/
Authentication
Entity
Extractor
Figure 3: The system architecture of GeoSensor.
the context of the EU BigDataEurope
5
project. The BDI platform
combines the massive parallelization capabilities oered by Apache
Spark6with an inherent support for Semantic Web technologies.
To tackle Veracity, GeoSensor uses Sentinel-2 images in com-
bination with the knowledge extracted from the social sensing
component for the verication of changes detected from Sentinel-1
images. This is enhanced by the ability to fetch latest social media
data through the live Twitter keyword search oered by its GUI.
Figure 3 depicts GeoSensor’s architecture. It consists of 11 com-
ponents that are organized into 3 workows, one for each horizontal
layer: the change detection layer is formed by the components at
the bottom (i.e., Image Aggregator, HDFS and Change Detector),
while the event detection layer is implemented by the components
at the top (i.e., News Crawler, Apache Cassandra, Event Detector,
Lookup Service and Entity Extractor). The rest of the components
comprise the semantic layer, which acts as GeoSensor’s backbone.
Next, we describe the functionality of each layer in detail.
3.1 Change Detection Layer
This layer implements the gist of GeoSensor, retrieving and com-
paring pairs of satellite images in order to detect changes in land
cover or land use. It consists of three components.
The rst one is the
Image Aggregator
, a RESTful web service
that downloads from ESA’s Copernicus Open Access Hub
7
the pairs
of Sentinel-1 and Sentinel-2 images with the largest overlap with the
user-dened area of interest. In our example, the Image Aggregator
is responsible for downloading the images in Figure 1(a) and (b),
after the user species Ukhiya, Chittagong, Bangladesh as the area
of interest. This process requires also the user to dene temporal
acquisition criteria, in the form of the images’ sensing dates, i.e., the
time of interest together with a reference date in the past, before the
change took place. In our example, the time of interest - for Figure
1(b) - is October 26, 2017, while the reference date - for Figure 1(a)
5https://www.big-data-europe.eu
6https://spark.apache.org
7https://scihub.copernicus.eu/
SAC’19, April 8-12, 2019, Limassol, Cyprus N. Piaras et al.
Master
Image
Subset
Operator
Calibrate
Operator
Collocate
Operator
GCP
Selection
Operator
Warp
Operator
Slave
Image
Change
Detection
Master
Image
Co-
registered
Image
DBScan
pre-processing
(co-registration)
main
processing
post-
processing
Figure 4: The workow implemented by Change Detector.
- is anything before June, 2017. The downloaded Sentinel images
are then stored to the Hadoop Distributed File System (
HDFS
),
distributing parts of the images to all cluster nodes, facilitating
scalable and fault-tolerant parallel image processing.
Finally, the
Change Detector
applies the workow depicted in
Figure 4, which implements in parallel the state-of-the-art unsu-
pervised approach oered by ESA’s SNAP Toolbox. Its goal is to
compare the downloaded images in order to identify the changes
in land cover or land use. This workow consists of three stages:
(i) Pre-processing uses co-registration [
36
] to ensure that the se-
lected images have identical dimensions and correspond to the
same geolocation. (ii) Main processing compares the individual
pixels in the images to assess their dierence. (iii) Post-processing
clusters together the pixels with high likelihood of changes, form-
ing broader areas with changes in a way that reduces false alarms,
i.e., it excludes outliers caused by noise, which is either inherent in
the satellite images or introduced by inaccuracies of previous steps.
In more detail, we call master image the one corresponding to
the earliest date - Figure 1(a) in our example - and slave image
the one corresponding to latest date - Figure 1(b). Typically, their
dimensions and characteristics are quite dierent, because they
were taken under dierent settings, such as the angle of the satel-
lite. Therefore, pre-processing (co-registration) is indispensable for
aligning the two images in such a way that each pair of correspond-
ing pixels represents the same point on the Earth surface.
Given that individual satellite images typically cover a very large
area on Earth, the
subset operator
crops the original satellite im-
ages to the borders of the user-dened area of interest. This oper-
ation curtails the running time to a signicant extent, restricting
the computational cost to the absolutely essential parts of satellite
images. Its complexity is very low, requiring no parallelization.
The cropped images are given as input to the
collocate op-
erator
, which resamples the pixels of the slave image into the
geographical raster of the master. This operator requires accurate
geopositioning information for both images in the form of ground
control points (GCPs), i.e., markers for certain geographical po-
sitions within a geo-referenced image that are described by their
geo-coordinates and by textual descriptions in the image meta-data.
Next, the
GCP selection operator
generates a set of uniformly
spaced GCPs in the master image and computes their corresponding
GCPs in the slave image. This is done through an iterative process:
for each master GCP, the corresponding slave GCP is approximated
based on their geo-coordinates. Using a predetermined window
size, the areas surrounding each GCP are cross-correlated in order
to adjust the slave GCP to a more accurate position. This procedure
is repeated until the new slave GCP is located within acceptable
limits, or a maximum number of iterations is carried out.
Based on the selected GCPs, the
warp operator
computes the
warp function, which will be used for mapping the pixels of the slave
image into the co-registered image. This is a linear function that is
estimated by repeating the following process until convergence: a
warp function is initially computed using the available master-slave
GCP pairs. The resulting function is used to map the master GCPs
to the slave image. Then, the residuals between the mapped master
and the corresponding slave GCPs are computed along with the root
mean square (RMS) and the standard deviation of all residuals. Next,
the master-slave GCP pairs are ltered to eliminate those exceeding
the mean RMS. Upon completion of this process, the remaining
master-slave GCP pairs are ltered with a predetermined RMS
threshold and the warp function is derived from the retained pairs.
Finally, the co-registered image is generated using the resulting
warp function in combination with bilinear interpolation. This
means that every point of the original slave image is projected
to a point in the master image as the weighted sum of the warp
projection of its four surrounding pixels.
Using the master and the co-registered image as input, the
change detection
algorithm computes the ratio of the correspond-
ing pixels in the two images. The pixels exhibiting very large or
very low ratios indicate candidate areas with changes.
Lastly,
DBScan
[
13
] is applied for post-processing the set of can-
didate areas with changes. DBScan groups together pixels closely
packed together (i.e., with many nearby change indicators), while
treating as outliers pixels that lie alone in low-density regions, with
their nearest neighbors located far away. The end result is a set
of areas with changes in land cover or land use. In our example,
DBScan produces the image in Figure 1(c), yielding the 7 yellow
clusters that correspond to such areas.
Due to the high time complexity of all processes (except the
Subset operator), they are massively parallelized in Apache Spark.
Due to space limitations, we omit the parallelization details.
3.2 Event Detection Layer
To address Veracity, this layer attaches a set of recent events to every
area with identied changes in land cover or land use, providing
users with a possible explanation and verication of the detected
changes. This functionality is oered by the ve components at the
top layer of GeoSensor’s architecture in Figure 3.
The rst component is the
News Crawler
, which scans at half-
hour intervals specic social media sources and news agencies
for the latest news items (posts and news articles, respectively).
For the time being, these sources include most of the RSS feeds
that are freely provided by Reuters in English
8
as well as several
selected public accounts in Twitter
9
, also in English. The crawler
structure, though, is extensible, facilitating the integration of more
information sources, or even the extension with other operation
modes. For example, it has been used as a basic data collection
infrastructure in a summarization application [
16
] and in the EU
project “NOMAD”
10
. In our running example, the News Crawler is
responsible for gathering the news items in Figure 2(a).
8https://www.reuters.com/tools/rss
9https://twitter.com
10http://www.nomad-project.eu
GeoSensor SAC’19, April 8-12, 2019, Limassol, Cyprus
All data gathered by the News Crawler are stored in the second
component, namely
Apache Cassandra11
. We opted for this par-
ticular data management system, due to its capacity to store a large
volume of information, while oering linear scalability and fault-
tolerance (i.e., it provides high availability with no single point of
failure). In fact, Cassandra is crafted for large-scale infrastructures
like the BDI platform, oering robust support for clusters with
multiple commodity servers. Besides, it is an open-source NoSQL
database that is compatible with the SemaGrow component, which
is used by the semantic layer for federated access to the details of
individual news items or entire events (cf. Section 3.3).
The news items stored in Cassandra are periodically processed
by the
Event Detector
module at half-hour intervals. They are
grouped into real-world events by a modied version of NewSum
12
[16], a summarization algorithm providing commercial-grade per-
formance. NewSum uses n-gram graphs [
15
] to model its textual
input, a representation that has been shown to be eective in noisy
settings in multiple genres (i.e., blogs, articles, microblogging and
social media) [
18
,
32
]. In addition, NewSum is robust to multi-
lingual data, ranking among the top performers in multilingual,
multi-document summarization tasks [17].
In more detail, Event Detector rst builds a coarse-grained set of
events. Pairs of news articles are compared with each other using
their n-gram graphs representation and the corresponding graph-
based textual similarity measures [
15
]. Appropriate thresholding
is then applied to retain only the pairs with high similarity. Those
pairs are then grouped into larger sets (pools) of news articles
based on a transitivity analysis that forms clusters from connected
components in the similarity graph. The pools of news articles with
a very low support are discarded, whereas the remaining pools are
considered as “real-world events”. Due to its high time complexity,
this process is parallelized in Apache Spark, as shown in Figure
5. The same procedure is applied independently to Twitter data,
yielding a set of tweet pools. Each tweet pool is then compared with
every pool of news articles. If their similarity exceeds a predened
threshold, the tweet pool is added to the pool of news articles. Then,
every pool of news articles goes through a summarization process
that builds its event description (e.g., title selection) and enriches
it with relevant metadata, i.e., spatiotemporal information, named
entities as well as image elements from its member documents.
These metadata are extracted from its content directly, or with the
help of RESTful-based tools and services, internal (Lookup Service
and Entity Extractor) and external ones (PoolParty
13
and Flickr
14
).
In more detail, the
Lookup Service
associates the location names
from news items with their actual geo-coordinates so that they
can be joined with areas with detected changes in land cover or
land use. The location names are identied and extracted from
the text data in each news item using Apache openNLP
15
. In the
example of Figure 2(a), the location of Kutupalong refugee camp
(Ukhiya, Chittagong, Bangladesh) will be converted into the follow-
ing geo-coordinates:
POLYGON ((92.0455551147462 21.3476104736329, 92.2031173706055
21.3476104736329, 92.2031173706055 21.1280899047852, 92.0455551147462 21.1280899047852,
11http://cassandra.apache.org
12https://github.com/scify
13https://www.poolparty.biz
14https://www.ickr.com
15https://opennlp.apache.org/
Figure 5: The Spark-based implementation of Event Detector.
92.0455551147462 21.3476104736329))
– note that the output is in the form of
the OGC16 standard Well Known Text (WKT).
This conversion may seem a trivial task, given that there is lit-
tle ambiguity in our example. In reality, though, location names
typically suer from high levels of noise. There are homonymous
locations (e.g., London, UK and London, Ontario, Canada) as well
as spelling mistakes (e.g., Landon), due to errors in the extraction
process. To address both challenges, the Lookup Service poses ev-
ery place name as a keyword query to an Apache Lucene
17
index
that contains about 180,000 location names of administrative areas
worldwide (GADM dataset
18
). Lucene’s fuzzy query functionality
deals with spelling mistakes, while homonymy is addressed by
ranking the candidates in decreasing order of the ratio "string simi-
larity/area". The WKT polygon coordinates corresponding to the
top ranked location are nally returned as output.
Valuable metadata are also provided by the
Entity Extractor
,
which enriches the event description with named entities that are
extracted from their textual content, thus empowering a Semantic
Web view of the produced information. This view allows for im-
proved indexing and disambiguation of the main players in an event,
based on the URIs mapped to each extracted entity. At the core of
this functionality lies the
PoolParty Semantic Suite
, which con-
stitutes a state-of-the-art thesaurus management tool that is based
on Linked Data [
35
]. Specically, a “Famous People” thesaurus was
constructed, containing almost half a million entities of well-known
actual and ctitious personalities, each grounded to a URI. Two
RESTful APIs were implemented and hosted by PoolParty. Given
an input text or a news item url, the rst endpoint, called Extractor
API, provides a list with entities deemed relevant to the supplied
content. The entity URIs are stored in Cassandra, along with their
corresponding thesaurus id. The second endpoint, called Metadata
API, retrieves descriptive metadata related to the entity, whose URI
is given as input. These procedures are illustrated in Figure 6.
The Entity Extractor also associates every detected event with
publicly available images from
Flickr
. Using the Flickr search API,
it retrieves photographs geo-tagged within the geolocation(s) of
each event that have been uploaded at a close enough date.
Finally, all event descriptions, including their metadata, are
stored into Cassandra in the appropriate tables that distinguish
16The Open Geospatial Consortium - http://www.opengeospatial.org
17https://lucene.apache.org
18http://www.gadm.org
SAC’19, April 8-12, 2019, Limassol, Cyprus N. Piaras et al.
Figure 6: Entity extraction example, illustrating the Extrac-
tor and Metadata API.
them from individual news items. Duplicate events are discarded
and Strabon is notied for the new entries (see below for details).
3.3 Semantic Layer
This layer constitutes GeoSensor’s backbone, bringing the gap be-
tween the two orthogonal operations of change and event detection.
This is achieved by the four components in the middle of Figure 3,
which encapsulate state-of-the-art Semantic Web technologies.
The rst component is
Geotriples
[
24
], a tool for transforming
geospatial data from their original formats into RDF. In our case,
it converts into RDF the descriptions of areas with changes in
land cover or land use (from change detection) as well as the event
summaries (from event detection). We selected GeoTriples, as it is an
established system that supports a wide variety of data formats [
23
].
The output of Geotriples is stored into
Strabon
[
22
], a state-
of-the-art open-source spatio-temporal triplestore that eciently
executes GeoSPARQL and stSPARQL queries. Strabon supports
spatial datatypes, enabling the serialization of geometric objects in
the OGC standards WKT and Geography Markup Language (GML).
It has been implemented by extending the established RDF store
Sesame (now called RDF4J
19
), using the spatially-enabled database
PostGIS
20
.Strabon is the most ecient spatio-temporal RDF store
available today, as demonstrated by thorough experiments [6, 14].
The third component of this layer is
SemaGrow
[
9
], a query
processing system that provides a single SPARQL endpoint for
federating multiple remote SPARQL endpoints. It is also capable
of transparently optimizing queries and dynamically integrating
heterogeneous data models by applying the appropriate vocabulary
transformations. To boost federated query execution, it employs
vocabulary mapping techniques and a balanced query optimizer,
considering instance statistics from the federated bases, where
available. SemaGrow is highly ecient, consistently outperforming
the state-of-the-art in federated query processing [
9
]. In our case,
SemaGrow federates Cassandra and Strabon, oering a unied
SPARQL endpoint for both of them to GeoSensor’s user interface.
In this way, GeoSensor gains in query performance (with respect
to other systems, e.g., FedX and SPLENDID) and has increased
extensibility – in case new sources need to be added in the future.
GeoSensor’s interface is oered by
Sextant
[
29
], a web-based
application for exploring, interacting and visualizing time-evolving
linked geospatial data. Sextant is also capable of creating, sharing,
19http://rdf4j.org
20https://postgis.net
(a) (b)
Figure 7: User criteria for triggering (a) Change Detection,
and (b) Event Detection.
searching and collaboratively editing maps and of producing statis-
tical charts out of statistically enhanced data sets. It relies heavily
on Semantic Web technologies but oers an intuitive interface that
allows both domain experts and lay users to exploit all available
features. In order to cover all requirements of GeoSensor,Sextant
has been extended with three new functionalities:
(i) Core functionality. Sextant provides an intuitive interface for
initiating the event and the change detection processes of GeoSen-
sor. The window for launching change detection appears in Figure
7(a). The user selects an area of interest either by typing its name
(with the help of auto-complete), or by highlighting it on the Earth
Map. The credentials for Copernicus Open Access Hub are also
required along with the reference and the target date. For event
detection, Figure 7(b) depicts the window that prompts users to
dene three optional search criteria: an area of interest, a time
window dened by two dates, or a keyword that pertains to events
of interest. The last criterion can be a combination of location or
entity names, or any other words that are likely to appear in an
event title. Users can also search for events by setting as the area
of interest one that appears in the results of change detection.
(ii) Authorization/authentication. To support history over each
user’s actions, Sextant implements a sign-up and login functionality.
At its core, lies a database located in GeoSensor’s server that holds
all account information along with the encrypted passwords. To
ensure security over the network, Sextant can be deployed using the
HTTPS protocol. When GeoSensor rst loads, the user is prompted
to create a new account, or to log-in using an existing one. Three
types of users that are supported: (a) The administrators have full
access to all the supported functionality, including the history panel,
and are responsible for accepting or declining sign-up requests
by new users. (b) The classied users are the main users of the
application and have full access to all the supported functionality,
including the history panel. (c) The unclassied users are potential
trial or occasional users that have limits in using the supported
functionality: they lack a history panel, they cannot search for
events using keywords, and their event detection searches return
up to 5 events. They are also deprived of the "SMART" buttons that
alternate change and event detection.
(iii) Live Twitter keyword search. To further clarify the map visual-
ization with the latest raw information, overcoming the processing
GeoSensor SAC’19, April 8-12, 2019, Limassol, Cyprus
0
10
20
30
40
50
60
70
80
Los Angeles Saudi Arabia
min
SNAP 2-threads
2VMs
4VMs
Figure 8: Execution times for parallelization approaches of
the change detection workow.
delay of the event detection layer, Sextant oers an embedded Twit-
ter keyword search function that supports all Twitter API lters,
such as "#" or "@". Using up to ve keywords in the search eld, Sex-
tant can fetch / update relevant tweets in descending chronological
order, presented via an ecient innite scroll technique.
4 EXPERIMENTS
We now present a preliminary experimental evaluation of GeoSen-
sor’s main functionalities, namely the change and the event detec-
tion workows. Note that our evaluation focuses on time eciency,
aiming to assess the response time of each workow. In other words,
eectiveness lies out of the scope of this evaluation, as GeoSensor
employs unsupervised state-of-the-art methods for each operation.
4.1 Change detection
For change detection, we evaluate the time eciency of two dier-
ent approaches: (i) the Change Detector, which uses Apache Spark
to parallelize the process depicted in Figure 4. (ii) the baseline ap-
proach, which corresponds to the multi-threaded implementation
of the same workow, as provided by ESA’s SNAP Toolbox.
Data.
As test data, we use two pairs of Sentinel-1A images. One
comprising two images of Los Angeles, with le sizes of 508MB
and 504MB, and one consisting two images of Saudi Arabia, with
le sizes of 524MB and 526MB.
Experimental Setup.
All experiments were performed on a
server with Ubuntu 12.04, 132GB RAM and 4 AMD Opteron 6320
processors, each having 4 physical cores and 8 logical cores at
2.80GHz. For the Spark implementation, we created 4 virtual ma-
chines (VMs), each one comprising two cores and 20GB RAM. For
each pair of images, we used 2 and 4 VMs. In each case, one VM
was the master and the rest were used as slaves. The multi-threaded
implementation of SNAP was run using 2 cores on the same server.
For each method and conguration, we took 3 measurements of
the execution time and report the average in Figure 8.
Time Eciency & Scalability.
As shown in Figure 8, the 2-
VMs Spark implementation is three times faster that the multi-
threaded one. This shows that the communication overhead of
Spark is negligible in comparison to the processing time and does
not aect the execution times. Furthermore, as we add more slave
nodes to the Spark implementation, the execution times decrease
consistently. We are working, though, on further improving this
performance so as to achieve a linear speedup.
Figure 9: Execution times for parallelization approaches of
the event detection workow.
4.2 Event Detection
For event detection, we perform an empirical evaluation of the
runtime performance of two approaches: (i) the Event Detector,
which implements the Spark-based distributed similarity mapping
pipeline illustrated in Figure 5, and (ii) the baseline approach, which
parallelizes the same pipeline using the Java multi-threading library.
Data.
We use the Reuters 21K news articles dataset
21
. Prepro-
cessing discards everything but the clean text, title and publication
date information, storing all data in Cassandra.
Experimental Setup. We run a set of experiments for two dif-
ferent input sizes, namely for input batches of 4
,
000 and 8
,
000
articles to be clustered into events. These sizes correspond to ap-
proximately 16 and 64 million unique article pairs. For each batch
size, we apply the baseline approach using 2 threads, while for the
Event Detector we vary the number of Spark partitions
p∈ {
2
,
4
}
.
For each conguration, we perform 5experiments and compute the
mean average execution time. We run all experiments on a single,
8-core 2.6 GHz Ubuntu 14.04 virtual machine with 32 GB of memory.
For data storage, we use a Cassandra 2.2.4 docker container.
Time Eciency & Scalability.
Figure 9 depicts the execution
time results per conguration. For the Event Detector, we observe
that the runtime drops signicantly as we increase the number
of Spark partitions, i.e., the number of jobs run in parallel. Yet,
the baseline approach is signicantly slower only for the largest
batch size. The reason is that for a small number of small texts,
as in Reuters 21K, Spark’s parallelization overhead is higher than
the speedup it achieves. We are working on improving the Event
Detector so that its performance is competitive for small workloads.
5 CONCLUSIONS
We have presented GeoSensor, an open-source system we devel-
oped as contribution to the H2020 BigDataEurope project. To the
best of our knowledge, GeoSensor constitutes the rst system that
applies Semantic Web technologies to a combination of remote
and social sensing, in an open-source implementation. The RDF
data model in particular is crucial for GeoSensor’s functionality, as
it oers two major advantages compared to traditional, semantic-
free approaches. First, it allows for eectively dealing with Variety,
seamlessly combining all data processed by GeoSensor towards
meaningful analysis. It also facilitates the use of ontologies together
21http://www.daviddlewis.com/resources/testcollections/reuters21578/
SAC’19, April 8-12, 2019, Limassol, Cyprus N. Piaras et al.
with reasoning techniques so as to derive new facts that are not ex-
plicitly expressed in the available data. The second advantage comes
from the power of linked open data and semantics. Transforming
GeoSensor’s data into RDF allows for eortlessly interlinking it
with other data sources and for discovering hidden links between
entities that assist in data analysis – a process that does not only pro-
vide richer data, but also allows to build fully automated workows
using machine learning algorithms.
Moreover, GeoSensor can be easily deployed in any cluster. All its
components are provided as Docker images, publicly available through
the BDE repository
22
, with the whole system able to launch through
a single docker-compose le, running the individual components as
Docker containers within Docker Swarm
23
. To further enhance its
usability, GeoSensor oers an intuitive UI, suitable for both expert
and lay users, despite the rich information it processes. In fact,
GeoSensor provides a hands-o functionality in the sense that all
its operations are fully automatic, requiring no specialized input or
domain knowledge from its users. In this way, GeoSensor makes a
big step forward in the exploration and visualization of big data in
the context of remote sensing. Our preliminary experimental study
also demonstrated the high time eciency of our system.
In the future, we will test GeoSensor in rigorous, operational sce-
narios where fast, easy-to-use tools are crucial to decision-making.
Acknowledgements.
This work has been supported by the
projects "LEO" and "BigDataEurope", which have been funded by
EU FP7 and Horizon2020 programmes under grant agreements No.
611141 and 644564, respectively. It has also been supported by the
program of Industrial Scholarships of the Stavros Niarchos Founda-
tion
24
. Finally, it has been supported by the project “APOLLONIS:
Greek Infrastructure for Digital Arts, Humanities and Language
Research and Innovation” (MIS 5002738), implemented under the
Action “Reinforcement of the Research and Innovation Infrastruc-
ture”, funded by the Operational Programme “Competitiveness, En-
trepreneurship and Innovation” (NSRF 2014-2020) and co-nanced
by Greece and the EU (European Regional Development Fund).
REFERENCES
[1]
Hamed Abdelhaq, Christian Sengstock, and Michael Gertz. 2013. Eventweet:
Online localized event detection from twitter. PVLDB 6, 12 (2013), 1326–1329.
[2]
Charu C Aggarwal and Karthik Subbian. 2012. Event detection in social streams.
In SDM. 624–635.
[3]
Fotis Aisopos, George Papadakis, Konstantinos Tserpes, and Theodora A. Var-
varigou. 2012. Content vs. context for sentiment analysis: a comparative analysis
over microblogs. In ACM Conference on Hypertext and Social Media. 187–196.
[4]
Farzindar Atefeh and Wael Khreich. 2015. A survey of techniques for event
detection in twitter. Computational Intelligence 31, 1 (2015), 132–164.
[5]
Sören Auer, Simon Scerri, and Aad Versteden et. al. 2017. The BigDataEurope
Platform — Supporting the Variety Dimension of Big Data. In ICWE. 41–59.
[6]
Konstantina Bereta, Panayiotis Smeros, and Manolis Koubarakis. 2013. Repre-
sentation and Querying of Valid Time of Triples in Linked Geospatial Data. In
ESWC. 259–274.
[7]
Francesca Bovolo and Lorenzo Bruzzone. 2015. The time variable in data fusion:
A change detection perspective. IEEE Geosc. Remote Sensing Mag. 3, 3 (2015),
8–26.
[8]
Grégoire Burel, Hassan Saif, and Harith Alani. 2017. Semantic Wide and Deep
Learning for Detecting Crisis-Information Categories on Social Media. In ISWC.
138–155.
[9]
Angelos Charalambidis, Antonis Troumpoukis, and Stasinos Konstantopoulos.
2015. Semagrow: Optimizing federated SPARQL queries. In SEMANTiCS. 121–
128.
22https://github.com/big- data-europe/pilot- sc7-cycle3
23https://docs.docker.com/engine/swarm
24https://www.snf.org/
[10]
Ping Chen, Soo Chin Liew, and Leong Keong Kwoh. 2017. Mangrove mapping
and change detection using satellite imagery. In IGARSS. 5717–5720.
[11]
Yubo Chen, Liheng Xu, Kang Liu, Daojian Zeng, and Jun Zhao. 2015. Event
extraction via dynamic multi-pooling convolutional neural networks. In ACL.
167–176.
[12]
Daniela Espinoza-Molina, Reza Bahmanyar, Ricardo Díaz-Delgado, Javier Busta-
mante, and Mihai Datcu. 2017. Land-cover change detection using local feature
descriptors extracted from spectral indices. In IGARSS. 1938–1941.
[13]
Martin Ester, Hans-Peter Kriegel, Jörg Sander, and Xiaowei Xu. 1996. A Density-
Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise.
In KDD. 226–231.
[14]
George Garbis, Kostis Kyzirakos, and Manolis Koubarakis. 2013. Geographica: A
Benchmark for Geospatial RDF Stores (Long Version). In ISWC. 343–359.
[15]
George Giannakopoulos, Vangelis Karkaletsis, George A. Vouros, and Panagiotis
Stamatopoulos. 2008. Summarization system evaluation revisited: N-gram graphs.
TSLP 5, 3 (2008), 5:1–5:39.
[16]
George Giannakopoulos, George Kiomourtzis, and Vangelis Karkaletsis. 2014.
NewSum: N-Gram Graph-Based Summarization in the Real World. In Innovative
Document Summarization Techniques: Revolutionizing Knowledge Understanding.
[17]
George Giannakopoulos, Je Kubina, John Conroy, Josef Steinberger, and Benoit
et al. Favre. 2015. Multiling 2015: multilingual summarization of single and multi-
documents, on-line fora, and call-center conversations. In SIGDIAL. 270–274.
[18]
George Giannakopoulos, Petra Mavridi, Georgios Paliouras, George Papadakis,
and Konstantinos Tserpes. 2012. Representation models for text classication: a
comparative analysis over three web document types. In WIMS. 1–12.
[19]
Maoguo Gong, Tao Zhan, Puzhao Zhang, and Qiguang Miao. 2017. Superpixel-
Based Dierence Representation Learning for Change Detection in Multispectral
Remote Sensing Images. IEEE Geosci. Remote Sensing 55, 5 (2017), 2658–2673.
[20]
Salman Hameed Khan, Xuming He, Fatih Porikli, and Mohammed Bennamoun.
2017. Forest Change Detection in Incomplete Satellite Images With Deep Neural
Networks. IEEE Trans. Geoscience and Remote Sensing 55, 9 (2017), 5407–5423.
[21]
Shamanth Kumar, Huan Liu, Sameep Mehta, and L Venkata Subramaniam. 2014.
From tweets to events: Exploring a scalable solution for Twitter streams. arXiv
preprint:1405.1392 (2014).
[22]
Kostis Kyzirakos, Manos Karpathiotakis, and Manolis Koubarakis. 2012. Strabon:
A Semantic Geospatial DBMS. In ISWC. 295–311.
[23]
Kostis Kyzirakos, Dimitrianos Savva, Ioannis Vlachopoulos, Alexandros Vasileiou,
Nikolaos Karalis, Manolis Koubarakis, and Stefan Manegold. 2018. GeoTriples:
Transforming Geospatial Data into RDF Graphs Using R2RML and RML Mappings.
Web Semantics: Science, Services and Agents on the World Wide Web (2018).
[24]
Kostis Kyzirakos, Ioannis Vlachopoulos, Dimitrianos Savva, Stefan Manegold,
and Manolis Koubarakis. 2014. GeoTriples: a Tool for Publishing Geospatial Data
as RDF Graphs Using R2RML Mappings. In ISWC. 393–396.
[25]
Dengsheng Lu, P Mausel, E Brondizio, and Emilio Moran. 2004. Change detection
techniques. International journal of remote sensing 25, 12 (2004), 2365–2401.
[26]
Juha Makkonen, Helena Ahonen-Myka, and Marko Salmenkivi. 2002. Applying
semantic classes in event detection and tracking. In ICON 2002. 175–183.
[27]
Juha Makkonen, Helena Ahonen-Myka, and Marko Salmenkivi. 2004. Simple
semantics in topic detection and tracking. Inf. Retr. 7, 3-4 (2004), 347–368.
[28]
Thien Huu Nguyen and Ralph Grishman. 2015. Event detection and domain
adaptation with convolutional neural networks. In ACL. 365–371.
[29]
Charalampos Nikolaou, Kallirroi Dogani, Konstantina Bereta, George Garbis,
Manos Karpathiotakis, Kostis Kyzirakos, and Manolis Koubarakis. 2015. Sextant:
Visualizing time-evolving linked geospatial data. J. Web Sem. 35 (2015), 35–52.
[30]
Ozer Ozdikis, Pinar Senkul, and Halit Oguztuzun. 2012. Semantic expansion of
tweet contents for enhanced event detection in twitter. In ASONAM. 20–24.
[31]
Nikolaos Panagiotou, Ioannis Katakis, and Dimitrios Gunopulos. 2016. Detecting
events in online social networks: Denitions, trends and challenges. In Solving
Large Scale Learning Tasks. Challenges and Algorithms. 42–84.
[32]
George Papadakis, George Giannakopoulos, and Georgios Paliouras. 2016. Graph
vs. bag representation models for the topic classication of web documents.
WWW 19, 5 (2016), 887–920.
[33]
Richard J. Radke, Srinivas Andra, Omar Al-Kofahi, and Badrinath Roysam. 2005.
Image change detection algorithms: a systematic survey. IEEE Trans. Image
Process. 14, 3 (2005), 294–307.
[34]
Jagan Sankaranarayanan, Hanan Samet, Benjamin E Teitler, Michael D Lieberman,
and Jon Sperling. 2009. Twitterstand: news in tweets. In SIGSPATIAL. 42–51.
[35]
Thomas Schandl and Andreas Blumauer. 2010. PoolParty: SKOS Thesaurus
Management Utilizing Linked Data. In ESWC. 421–425.
[36]
Nestor Yague-Martinez, Francesco De Zan, and Pau Prats-Iraola. 2017. Coregis-
tration of Interferometric Stacks of Sentinel-1 TOPS Data. IEEE Geosci. Remote
Sensing 14, 7 (2017), 1002–1006.
[37]
Yiming Yang, Tom Pierce, and Jaime Carbonell. 1998. A study of retrospective
and on-line event detection. In SIGIR. 28–36.
... Les travaux réalisés par [Pittaras et al. 2019] ont pour but d'enrichir la détection de changement obtenue par images satellitaires avec des données ouvertes. Pour La détection de changements entre les images est ensuite réalisée par un module de la suite logicielle SNAP Toolbox 31 . ...
... Cette analyse nous permet ainsi de préciser le sujet de la thèse, dont l'objectif est donc de définir comment représenter et exploiter, à l'aide des technologies du Web sémantique, des données géospatiales identifiées par analyse d'images satellitaires, en particulier des données liées à des changements. L'objectif est très proche de celui de GeoSensor [Pittaras et al. 2019] mais nous supposons déjà réalisée la collecte des images et le calcul des changements, pour nous focaliser sur la représentation sémantique et l'ajout de données contextuelles. ...
... De ce fait, nous avons choisi une approche alternative, où l'on ne représente pas chaque pixel, mais des groupes de pixels comme dans [Pittaras et al. 2019]. En effet, pour les images des satellites Sentinel-2 que nous utilisons, une grille contient plus d'un million de pixels pour un seul raster. ...
Thesis
La détection de changements à partir d’images satellitaires d’observation de la Terre est une tâche utile pour surveiller des évolutions naturelles ou liées à l’activité humaines, mais aussi l’impact d’événements ponctuels (incendies, inondations, etc.). L’utilisation d’apprentissage automatique pour détecter les zones de changements produit des résultats de plus en plus précis, sans toutefois fournir d’information sur la nature ou la cause de ces changements. Pour répondre à ce type de besoin, notre thèse vise à développer des solutions fondées sur des vocabulaires formels et des ontologies, sur la représentation des connaissances, l'annotation et l'intégration sémantique de méta-données associées aux images satellitaires et plus généralement aux données géo-localisées. En effet, les techniques de sémantisation sont un moyen d’offrir une interprétation intelligente des données. Le cas d’étude retenu pour illustrer l’apport de cette approche est le suivi des changements à différents pas de temps et différentes échelles de restitution. L'objectif est d'enrichir les méta-données issues des flux d'images en leur associant des catégories conceptuelles qui leur donnent du sens, de les coupler à des pré-traitements qui répondent aux exigences spécifiques à l’étude des changements et de les intégrer à d'autres informations géographiques disponibles (données climatiques et météorologiques, démographiques, etc. suivant les besoins). La thèse vise donc à montrer l'apport des méta-données sémantiques pour la surveillance des changements mais aussi à évaluer le passage à l'échelle des techniques de sémantisation, notamment de la représentation des données dans les entrepôts sémantiques.Nous proposons un processus qui gère le cycle complet de génération et exploitation de graphes de connaissances à partir de rasters issus de la télédétection et de données issues de l’open data. Les caractéristiques innovantes de ce processus sont les suivantes :i. Un algorithme permettant l’identification automatique de régions d’intérêts (ROI) associées à des valeurs similaires d’un indicateur calculé à partir d’une image satellite, et assurant ainsi un découpage géographique précis comme référence pour l’intégration de données.ii. Une approche orientée sémantique pour la génération de graphes de connaissances depuis différentes sources.
... The fusion of social media and satellite imagery addresses , collect social media data on natural disasters, linking it with remotely sensed data. Other approaches, such as CrisMap [7] and GeoSensor [8], enhance change detection over satellite images with event content from social media. The multisource data framework proposed by Xu et al. [9] integrates Weibo data, RS images, and historical geographic information for waterlogging probability assessment. ...
... Furthermore, Bischke et al. introduced a scalable satellite imagery contextual enrichment system with multimedia content from social media [3]. Pittaras et al. introduced GeoSensor, a novel, open-source system that improves change detection over satellite images with event content from social media [13]. Avvenuti et al. introduced CrisMap[2] a crisis mapping system, that incorporates damage detection and geoparsing. ...
Chapter
Full-text available
Deep learning has revolutionized event detection in social media and remote sensing, allowing for more precise and efficient analysis of immense amounts of data to unveil hidden patterns and insights. Moreover, Merging both data sources has proven efficient despite the absence of multi-sensed datasets. In today’s data-driven globe, it is becoming increasingly critical to process and explore heterogeneous data and to design models handling such data. This paper proposes a new multi-sensed fusion approach that leverages satellite images and tweets as input. We combined two open datasets to obtain a multi-sensed dataset concerning the 2017 hurricane Harvey. We extracted features from satellite imagery using Resnet34 and generated embeddings from tweets using Bert. We fused the embeddings and the features using a selective attention module incorporating cross and self-attention. Our module can filter misleading features from weak modalities on a sample-by-sample basis. We demonstrate that our approach surpasses unimodal models based on tweets or satellite imagery. We compared our results to a few baselines associated with hurricane Harvey and proved that our model surpasses them in accuracy, precision, recall, and F1 measure.
... In the same line, [10] proposes a CNN-based learning approach that simultaneously performs change detection and land cover mapping, while using the predicted land cover information to help to predict changes. Also close to our study, [22] enriches change detection over Sentinel-1-A images with event detected in media content (news, posts). Social sensing applies event detection techniques to cluster together news items and social media posts that pertain to the same real-world event and are located in the area where changes were detected. ...
... Figure 2 illustrates the enrichment of satellite imagery with photos from social media in MediaEval 2017. Pittaras et al. introduced GeoSensor, a novel, open-source system that enriches change detection over satellite images with event content from social media [23]. ...
Conference Paper
Full-text available
Remote sensing is a powerful technology for earth observation. However, the spatial, spectral, and temporal resolution of the imagery are imposing various limits. Lately, with the rise of the internet and smart mobile devices, social media with location-based information has been rapidly emerging. These circumstances led to the prevailing of new scenarios where fine-grained details of social bookmarking websites are enhanced with the wide coverage of satellites. Social media and satellites are both valuable sources of data. An event-driven data, designating either normal common events or unusual suspicious ones that may threaten human lives or damage the infrastructure. In this paper, we provide an insight into the present state of knowledge to better address the task of local suspicious event detection and linking social media with satellite imagery. Also, to track suspicious local events, we treated the detection problem as a retrospective problem by training different classifiers on the crisisLexT26 dataset. Furthermore, we introduced how to use the available geo-locations in the dataset to construct a geo-social dataset by linking it with remote sensing and retrieving satellite imagery before and after the event occurrence.
... While the debate regarding mathematical biology education continues, technological advances have brought big data to biology and medicine research, thus significantly accelerating the pace of scientific discovery. Big data approaches already assist in high-throughput technologies (e.g., mass spectrometry-based techniques [18]), medical imaging (e.g., X-Ray, CT scan, MRI, Ultrasound [19], [20], [21]), remote sensing (from personalized devices -e.g., heart rate, blood glucose [22], [23]; imagery -aerial photography and satellite [24], [25]), time-course data (e.g., EKG, EEG [26], [27]), integrative brain modeling [28], [29] -Statistical challenges of big brain network data [30], identifying anti-aging compounds for humans [31], [32], and many others. Thus, in the era of big data, biology has become more interdisciplinary and more quantitative than ever: discovering new knowledge hidden in petabytes and zetabytes of data depends on using mathematics, statistics, computer science and technological innovation -all attributes that define the emerging field of data science. ...
Article
We live in a data-rich world with rapidly growing databases with zettabytes of data. Innovation, computation, and technological advances have now tremendously accelerated the pace of discovery, providing driverless cars, robotic devices, expert healthcare systems, precision medicine, and automated discovery to mention a few. Even though the definition of the term data science continues to evolve, the sweeping impact it has already produced on society is undeniable. We are at a point when new discoveries through data science have enormous potential to advance progress but also to be used maliciously, with harmful ethical and social consequences. Perhaps nowhere is this more clearly exemplified than in the biological and medical sciences. The confluence of (1) machine learning, (2) mathematical modeling, (3) computation/simulation, and (4) big data have moved us from the sequencing of genomes to gene editing and individualized medicine; yet, unsettled policies regarding data privacy and ethical norms could potentially open doors for serious negative repercussions. The data science revolution has amplified the urgent need for a paradigm shift in undergraduate biology education. It has reaffirmed that data science education interacts and enhances mathematical education in advancing quantitative conceptual and skill development for the new generation of biologists. These connections encourage us to strive to cultivate a broadly skilled workforce of technologically savvy problem-solvers, skilled at handling the unique challenges pertaining to biological data, and capable of collaborating across various disciplines in the sciences, the humanities, and the social sciences. To accomplish this, we suggest development of open curricula that extend beyond the job certification rhetoric and combine data acumen with modeling, experimental, and computational methods through engaging projects, while also providing awareness and deep exploration of their societal implications. This process would benefit from embracing the pedagogy of experiential learning and involve students in open-ended explorations derived from authentic inquiries and ongoing research. On this foundation, we encourage development of flexible data science initiatives for the education of life science undergraduates within and across existing models.
Article
Full-text available
This review focuses on recent research literature on the use of Semantic Web Technologies (SWT) in city planning. The review foregrounds representational, evaluative, projective, and synthetical meta-practices as constituent practices of city planning. We structure our review around these four meta-practices that we consider fundamental to those processes. We find that significant research exists in all four metapractices. Linking across domains by combining various methods of semantic knowledge generation, processing, and management is necessary to bridge gaps between these meta-practices and will enable future Semantic City Planning Systems.
Preprint
Full-text available
Pre-print as published on https://como.ceb.cam.ac.uk/preprints/270/.
Chapter
This chapter describes the evolution of a real, multi-document, multilingual news summarization methodology and application, named NewSum, the research problems behind it, as well as the steps taken to solve these problems. The system uses the representation of n-gram graphs to perform sentence selection and redundancy removal towards summary generation. In addition, it tackles problems related to topic and subtopic detection (via clustering), demonstrates multi-lingual applicability, and—through recent advances—scalability to big data. Furthermore, recent developments over the algorithm allow it to utilize semantic information to better identify and outline events, so as to offer an overall improvement over the base approach.
Conference Paper
Full-text available
When crises hit, many flog to social media to share or consume information related to the event. Social media posts during crises tend to provide valuable reports on affected people, donation offers, help requests, advice provision, etc. Automatically identifying the category of information (e.g., reports on affected individuals, donations and volunteers) contained in these posts is vital for their efficient handling and consumption by effected communities and concerned organisations. In this paper, we introduce Sem-CNN; a wide and deep Convolutional Neural Network (CNN) model designed for identifying the category of information contained in crisis-related social media content. Unlike previous models, which mainly rely on the lexical representations of words in the text, the proposed model integrates an additional layer of semantics that represents the named entities in the text, into a wide and deep CNN network. Results show that the Sem-CNN model consistently outperforms the baselines which consist of statistical and non-semantic deep learning models.
Conference Paper
Full-text available
An effective monitoring and analysis of ecosystems requires developing new tools and knowledge. In this paper, we propose an approach for detecting land-cover changes using satellite Image Time Series. This approach represents each image by spectral indices and then extracts local features of these representations. Next, a clustering technique (e.g., k-means) is applied to the extracted features, where the resulting clusters are assumed to refer to land-cover classes. The land-cover change is then obtained by counting the number of times an assigned class to each point changes along the time series. For our experiments, we use a collection of Landsat-5 images captured every second month from October 2009 to August 2010 over the protected area of the Donana National Park in southwestern Spain, which is the largest sanctuary for migratory birds in western Europe. Results demonstrate that the proposed approach can detect the occurring changes in the main land-cover categories along the assessed time series
Article
Full-text available
Land cover change monitoring is an important task from the perspective of regional resource monitoring, disaster management, land development, and environmental planning. In this paper, we analyze imagery data from remote sensing satellites to detect forest cover changes over a period of 29 years (1987-2015). Since the original data are severely incomplete and contaminated with artifacts, we first devise a spatiotemporal inpainting mechanism to recover the missing surface reflectance information. The spatial filling process makes use of the available data of the nearby temporal instances followed by a sparse encoding-based reconstruction. We formulate the change detection task as a region classification problem. We build a multiresolution profile (MRP) of the target area and generate a candidate set of bounding-box proposals that enclose potential change regions. In contrast to existing methods that use handcrafted features, we automatically learn region representations using a deep neural network in a data-driven fashion. Based on these highly discriminative representations, we determine forest changes and predict their onset and offset timings by labeling the candidate set of proposals. Our approach achieves the state-of-the-art average patch classification rate of 91.6% (an improvement of ~16%) and the mean onset/offset prediction error of 4.9 months (an error reduction of five months) compared with a strong baseline. We also qualitatively analyze the detected changes in the unlabeled image regions, which demonstrate that the proposed forest change detection approach is scalable to new regions.
Article
Full-text available
The coregistration of synthetic aperture radar images is of fundamental importance for the generation of interferograms. The high azimuth coregistration requirements imposed by the TOPS acquisition mode imply that an advanced approach for the coregistration of stacked time series images is needed due to temporal decorrelation effects. In some scenarios, the conventional approach of estimating the shifts pairwise with respect to the same master might result insufficient. Therefore, a joint estimation is proposed here, which exploits jointly all interferograms in order to retrieve more accurate results. Simulated data and Sentinel-1A images acquired in IW mode are used to validate this procedure, demonstrating the better performance of the joint approach when compared to the standard single-master approach.
Conference Paper
Full-text available
The management and analysis of large-scale datasets – described with the term Big Data – involves the three classic dimensions volume, velocity and variety. While the former two are well supported by a plethora of software components, the variety dimension is still rather neglected. We present the BDE platform – an easy-to-deploy, easy-to-use and adaptable (cluster-based and standalone) platform for the execution of big data components and tools like Hadoop, Spark, Flink, Flume and Cassandra. The BDE platform was designed based upon the requirements gathered from seven of the societal challenges put forward by the European Commission in the Horizon 2020 programme and targeted by the BigDataEurope pilots. As a result, the BDE platform allows to perform a variety of Big Data flow tasks like message passing, storage, analysis or publishing. To facilitate the processing of heterogeneous data, a particular innovation of the platform is the Semantic Layer, which allows to directly process RDF data and to map and transform arbitrary data into RDF. The advantages of the BDE platform are demonstrated through seven pilots, each focusing on a major societal challenge.
Chapter
Event detection is a research area that attracted attention during the last years due to the widespread availability of social media data. The problem of event detection has been examined in multiple social media sources like Twitter, Flickr, YouTube and Facebook. The task comprises many challenges including the processing of large volumes of data and high levels of noise. In this article, we present a wide range of event detection algorithms, architectures and evaluation methodologies. In addition, we extensively discuss on available datasets, potential applications and open research issues. The main objective is to provide a compact representation of the recent developments in the field and aid the reader in understanding the main challenges tackled so far as well as identifying interesting future research directions.
Chapter
This chapter describes a real, multi-document, multilingual news summarization application, named NewSum, the research problems behind it, as well as the novel methods proposed and tested to solve these problems. The system uses the representation of n-gram graphs in a novel manner to perform sentence selection and redundancy removal for the summaries and faces problems related to topic and subtopic detection (via clustering) and multi-lingual applicability, which are caused by the nature of the real-world news summarization sources.