Conference PaperPDF Available

GeoSensor: Semantifying Change and Event Detection over Big Data

April 2019

April 2019

DOI:10.1145/3297280.3297504

Conference: ACM SAC 2019
At: Limassol, Cyprus

Authors:

Nikiforos Pittaras

National and Kapodistrian University of Athens

George A. Papadakis

National and Kapodistrian University of Athens

George Stamoulis

National and Kapodistrian University of Athens

Giorgos Argyriou

University of Cyprus

Show all 9 authorsHide

GeoSensor is a novel, open-source system that enriches change detection over satellite images with event detection over news items and social media content. GeoSensor combines these two orthogonal operations through state-of-the-art Semantic Web technologies. At its core lies the open-source, semantics-enabled Big Data infrastructure developed by the EU H2020 BigDataEurope project. This allows GeoSensor to offer an on-line functionality, despite facing three major challenges of Big Data: Volume (a single satellite image typically occupies a few GBs), Variety (its data sources include two different types of satellite images and various types of user-generated content) and Veracity, as the accuracy of the end result is crucial for the usefulness of our system. We present GeoSensor's architecture in detail, highlighting the advantages of using semantics for taking the most of the knowledge extracted from news items and Earth Observation products. We also verify GeoSensor's efficiency through a preliminary experimental study.

Satellite images showing Ukhiya, Chittagong, Bangladesh (a) before, and (b) after the Rohingya refugee crisis on October, 2017. (c) shows the main areas with changes in land cover or land use as identified by GeoSensor.

…

(a) A set of news items referring to the Rohingya refugee crises, (b) the corresponding event created by GeoSensor, and (c) the menu providing access to the individual news items of the event.

…

The system architecture of GeoSensor.

…

The workflow implemented by Change Detector.

…

The Spark-based implementation of Event Detector.

…

Figures - uploaded by George A. Papadakis

Content may be subject to copyright.

Content uploaded by George A. Papadakis

Content may be subject to copyright.

GeoSensor: Semantifying

Change and Event Detection over Big Data

Nikiforos Pittaras1,2, George Papadakis2, George Stamoulis2, Giorgos Argyriou2, E Karra

Taniskidou2, Emmanouil Thanos2, George Giannakopoulos1, Leonidas Tsekouras1and Manolis

Koubarakis2

1NCSR Demokritos, Greece {pittarasnikif, ggianna, ltsekouras}@iit.demokritos.gr,

2Department of Informatics and Telecommunications, National and Kapodistrian University of Athens, Greece

{npittaras, gpapadis, gstam, gioargyr, ekarra, ethanos, koubarak}@di.uoa.gr

ABSTRACT

GeoSensor is a novel, open-source system that enriches change

detection over satellite images with event detection over news

items and social media content. GeoSensor combines these two

orthogonal operations through state-of-the-art Semantic Web tech-

nologies. At its core lies the open-source, semantics-enabled Big

Data infrastructure developed by the EU H2020 BigDataEurope

project. This allows GeoSensor to oer an on-line functionality,

despite facing three major challenges of Big Data: Volume (a sin-

gle satellite image typically occupies a few GBs), Variety (its data

sources include two dierent types of satellite images and various

types of user-generated content) and Veracity, as the accuracy of

the end result is crucial for the usefulness of our system. We present

GeoSensor’s architecture in detail, highlighting the advantages of

using semantics for taking the most of the knowledge extracted

from news items and Earth Observation products. We also verify

GeoSensor’s eciency through a preliminary experimental study.

KEYWORDS

big data, satellite data, linked data, change detection, event detection

ACM Reference Format:

Nikiforos Pittaras

1,2

, George Papadakis

, George Stamoulis

, Giorgos Argyriou

E Karra Taniskidou

, Emmanouil Thanos

, George Giannakopoulos

, Leonidas

Tsekouras

and Manolis Koubarakis

. 2019. GeoSensor: Semantifying Change

and Event Detection over Big Data. In Proceedings of ACM SAC Conference

(SAC’19). ACM, New York, NY, USA, Article 4, 8 pages. https://doi.org/10.

1145/3297280.3297504

1 INTRODUCTION

In remote sensing, change detection is the process of comparing

two or more satellite images that depict the same area on the Earth

surface, but are taken at dierent points in time [

]. Its goal is

to identify dierences between the images in the form of areas with

changes in land cover or land use (e.g., an area that was an olive

grove in the past is now occupied by buildings). This is a crucial

Permission to make digital or hard copies of all or part of this work for personal or

classroom use is granted without fee provided that copies are not made or distributed

for prot or commercial advantage and that copies bear this notice and the full citation

on the rst page. Copyrights for components of this work owned by others than ACM

must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,

to post on servers or to redistribute to lists, requires prior specic permission and/or a

fee. Request permissions from permissions@acm.org.

SAC’19, April 8-12, 2019, Limassol, Cyprus

ACM ISBN 978-1-4503-5933-7/19/04. . . $15.00

https://doi.org/10.1145/3297280.3297504

task, as it provides useful information for many applications, e.g.,

studying land cover evolution, monitoring natural disasters or sup-

port to crisis management. As an example, consider Figures 1(a) and

(b), which depict snapshots of Ukhiya, Chittagong, Bangladesh be-

fore and after the settlement of Rohingya refugees on October, 2017.

In situations like this, change detection allows for fast and accurate

estimation of natural or man-made changes on the Earth surface,

providing valuable support to decision-makers. In our example,

the outcomes of change detection appear in Figure 1(c). Modern

satellite technology makes this possible even for remote areas with

humanitarian or security issues that are dicult to reach.

Interest in change detection using satellite images has grown

recently, due to the availability of long time series of images by

agship Earth observation programmes, such as the US Landsat

program

and the EU Copernicus Programme

. The latter is currently

the world’s largest Earth observation programme with almost 20

satellites, called Sentinels, expected to be in orbit by 2030. It consists

of a set of complex systems that collect data from satellites as well

as in-situ sensors, providing reliable and up-to-date information

on a range of environmental and security issues under a free, full

and open data policy. Information extracted from this data is also

made freely available to users through the Copernicus services

which address six thematic areas: land, marine, atmosphere, climate,

emergency and security. Techniques for change detection using

time series of satellite images are important in all of these areas [

To the best of our knowledge, though, there is no open-source

system that addresses the following three Vs of Big Satellite Data:

•

Volume stems from the combined eect of the inherently qua-

dratic time complexity of change detection and the large size of

satellite images. In the worst case, all pixels of the one image have

to be compared with all pixels of the other image, yielding a rather

time-consuming procedure for a common pair of images - each

image typically occupies few GBs, containing millions of pixels of

low resolution (i.e., each pixel corresponds to tens of square meters

on the Earth surface). Apparently, change detection poses a quite

challenging computational task for commodity hardware.

•

Veracity requires that decision makers are able to assess the

quality and correctness of the intelligence extracted from satellite

images, based on relevant news content. In practice, this means that

collateral information about news should provide reliable insights

into the detected changes, ideally on real-time.

1https://landsat.usgs.gov

2http://www.copernicus.eu

3http://www.copernicus.eu/main/services

SAC’19, April 8-12, 2019, Limassol, Cyprus N. Piaras et al.

(a) (b) (c)

Figure 1: Satellite images showing Ukhiya, Chittagong, Bangladesh (a) before, and (b) after the Rohingya refugee crisis on

October, 2017. (c) shows the main areas with changes in land cover or land use as identied by GeoSensor.

•

Variety emanates from the diverse types of images that are

produced by each satellite constellation. The two polar-orbiting

satellites of the Sentinel-1 constellation are equipped with C-band

Synthetic Aperture Radar (SAR) imaging systems, which enable

image acquisitions regardless of weather and light conditions (i.e.,

the sensor is able to acquire images in the presence of clouds and

during night time). In contrast, the two polar-orbiting satellites

of the Sentinel-2 mission provide High-Resolution Optical data,

acquired by a wide swath high-resolution multispectral sensor.

Their images have 12 spectral bands, covering the spectrum from

the visible domain to the short wavelength infrared domain. Being

an optical passive system, imaging is sensitive to weather conditions

and depends on external illumination. Variety further increases due

to the textual data that are necessary for addressing Veracity.

In this work, we present GeoSensor, a geospatial system that

applies change detection to Copernicus data in a way that addresses

these three Vs of Big Satellite Data. In essence, GeoSensor integrates

a remote sensing component with a social sensing one into a highly

scalable processing chain. Remote sensing applies change detection

techniques to SAR images from Sentinel-1, while using optical

Sentinel-2 images for the validation of the end result. Social sensing

applies event detection techniques to cluster together news items

and social media posts that pertain to the same real-world event

and are located in the area, where change detection took place. For

example, Figure 2(a) depicts a cluster of news items that elucidates

the changes appearing in Figure 1(c). The integration of these two

orthogonal components relies on Semantic Web technologies.

The rest of the paper is structured as follows: Section 2 briey

discusses related work, while Section 3 delves into GeoSensor’s

architecture, highlighting the three workows that lie at its core.

In Section 4, we present preliminary experiments over real-world

data the demonstrate the scalability of our system and in Section 5,

we conclude the paper along with directions for future work.

2 RELATED WORK

Change Detection.

Earth observation is the use of remote sensing

technologies to monitor land, marine and atmosphere. Satellite-

based Earth observation relies on the use of satellite-mounted pay-

loads to gather imaging data about Earth characteristics. We can

distinguish two kinds of remote sensing. (i) In passive remote sens-

ing, the satellite instruments monitor the energy received from the

Earth, due to the reection and re-emission of the Sun’s energy by

the Earth’s surface or atmosphere. Optical or thermal sensors are

commonly-used passive sensors (e.g., Sentinel-2 images). (ii) In ac-

tive remote sensing, the satellite sends energy to Earth and monitors

the energy received back from the Earth’s surface or atmosphere,

enabling day and night monitoring during all weather conditions.

Commonly used active sensors are lasers and radar images, like the

SAR images provided by Sentinel-1.

Recent works on change detection use Deep Neural Networks

[

] in a data-driven fashion, performing classication to detect

changes in pixels or areas in the images. Other works use hier-

archical object-based classication methods [

]. Such supervised

algorithms, though, lie out of our scope, due to the lack of publicly

available labeled datasets. Developing such datasets from scratch is

a rigorous process that requires heavy human involvement, even

in-situ inspection of identied changes.

Instead, GeoSensor considers unsupervised algorithms for change

detection. At the moment, it is equipped with the established ap-

proach implemented in ESA’s SNAP Toolbox

. Yet, its modular

architecture allows for seamlessly extending it with additional state-

of-the-art approaches, like the clustering technique in [12].

Event Detection.

A review of text event detection is presented

in [

], with more recent surveys covering a large variety of detec-

tion methods that are crafted for social media [

]. In [

], the

authors utilize a semantically-enabled convolutional neural net-

work (CNN) to categorize social media posts, reporting that their

model outperforms TF-IDF and Word2Vec pre-trained embeddings.

4http://step.esa.int/main/toolboxes/snap

GeoSensor SAC’19, April 8-12, 2019, Limassol, Cyprus

(a)

(b) (c)

Figure 2: (a) A set of news items referring to the Ro-

hingya refugee crises, (b) the corresponding event created

by GeoSensor, and (c) the menu providing access to the indi-

vidual news items of the event.

Other works incorporate CNNs for the joint detection of events and

topics [

]. Yet, these methods rely on supervised learning,

requiring a labeled dataset, unlike our unsupervised approach.

On another line of research, several works use unsupervised,

semantically-aware clustering for event detection. For example,

a semantically rich multiple-vector representation is used in [

], while [

] uses a co-occurrence-based semantic expansion of

words to produce event groups. These works report superior perfor-

mance over non-semantic baselines. In [

], the authors employ a

classication-based cleaning phase that is followed by content- and

temporal-based clustering. [

] performs a clustering on keyword-

based features over tweets, while the structure of the underlying

social network lies at the core of the approaches presented in [

However, all these works mainly rely on vector space features that

capture frequency-related statistics, ignoring the positional infor-

mation of tokens in the source text (i.e., bag of n-grams). In contrast,

our approach relies on graphs of n-grams, which eectively capture

token context both in long, curated documents like news articles

and in short, noisy texts like tweets [3, 18, 32].

3 APPROACH

We now present GeoSensor, explaining how it addresses the above

three Vs of Big Satellite Data.

To tackle Variety, GeoSensor relies heavily on state-of-the-art

Semantic Web technologies, which provide time ecient, unied

access to the outcomes of the remote and the social sensing com-

ponents. In this way, it is capable of seamlessly processing a rich

diversity of data sources, which range from the graphic information

in SAR and optical satellite images to the textual information of

news articles and social media posts.

To address Volume, GeoSensor exploits the distributed process-

ing of a cluster based on the BDI platform [

], the open-source,

semantics-enabled Big Data infrastructure that was developed in

Image

Aggregator

Change

Detector

GeoTriples

News

Crawler

…

storage

User

Interface

Event

Detector

Lookup

Service

Keyword

Authorization/

Authentication

Entity

Extractor

Figure 3: The system architecture of GeoSensor.

the context of the EU BigDataEurope

project. The BDI platform

combines the massive parallelization capabilities oered by Apache

Spark6with an inherent support for Semantic Web technologies.

To tackle Veracity, GeoSensor uses Sentinel-2 images in com-

bination with the knowledge extracted from the social sensing

component for the verication of changes detected from Sentinel-1

images. This is enhanced by the ability to fetch latest social media

data through the live Twitter keyword search oered by its GUI.

Figure 3 depicts GeoSensor’s architecture. It consists of 11 com-

ponents that are organized into 3 workows, one for each horizontal

layer: the change detection layer is formed by the components at

the bottom (i.e., Image Aggregator, HDFS and Change Detector),

while the event detection layer is implemented by the components

at the top (i.e., News Crawler, Apache Cassandra, Event Detector,

Lookup Service and Entity Extractor). The rest of the components

comprise the semantic layer, which acts as GeoSensor’s backbone.

Next, we describe the functionality of each layer in detail.

3.1 Change Detection Layer

This layer implements the gist of GeoSensor, retrieving and com-

paring pairs of satellite images in order to detect changes in land

cover or land use. It consists of three components.

The rst one is the

Image Aggregator

, a RESTful web service

that downloads from ESA’s Copernicus Open Access Hub

the pairs

of Sentinel-1 and Sentinel-2 images with the largest overlap with the

user-dened area of interest. In our example, the Image Aggregator

is responsible for downloading the images in Figure 1(a) and (b),

after the user species Ukhiya, Chittagong, Bangladesh as the area

of interest. This process requires also the user to dene temporal

acquisition criteria, in the form of the images’ sensing dates, i.e., the

time of interest together with a reference date in the past, before the

change took place. In our example, the time of interest - for Figure

1(b) - is October 26, 2017, while the reference date - for Figure 1(a)

5https://www.big-data-europe.eu

6https://spark.apache.org

7https://scihub.copernicus.eu/

SAC’19, April 8-12, 2019, Limassol, Cyprus N. Piaras et al.

Master

Image

Subset

Operator

Calibrate

Operator

Collocate

Operator

GCP

Selection

Operator

Warp

Operator

Slave

Image

Change

Detection

Master

Image

Co-

registered

Image

DBScan

pre-processing

(co-registration)

main

processing

post-

processing

Figure 4: The workow implemented by Change Detector.

- is anything before June, 2017. The downloaded Sentinel images

are then stored to the Hadoop Distributed File System (

HDFS

distributing parts of the images to all cluster nodes, facilitating

scalable and fault-tolerant parallel image processing.

Finally, the

Change Detector

applies the workow depicted in

Figure 4, which implements in parallel the state-of-the-art unsu-

pervised approach oered by ESA’s SNAP Toolbox. Its goal is to

compare the downloaded images in order to identify the changes

in land cover or land use. This workow consists of three stages:

(i) Pre-processing uses co-registration [

] to ensure that the se-

lected images have identical dimensions and correspond to the

same geolocation. (ii) Main processing compares the individual

pixels in the images to assess their dierence. (iii) Post-processing

clusters together the pixels with high likelihood of changes, form-

ing broader areas with changes in a way that reduces false alarms,

i.e., it excludes outliers caused by noise, which is either inherent in

the satellite images or introduced by inaccuracies of previous steps.

In more detail, we call master image the one corresponding to

the earliest date - Figure 1(a) in our example - and slave image

the one corresponding to latest date - Figure 1(b). Typically, their

dimensions and characteristics are quite dierent, because they

were taken under dierent settings, such as the angle of the satel-

lite. Therefore, pre-processing (co-registration) is indispensable for

aligning the two images in such a way that each pair of correspond-

ing pixels represents the same point on the Earth surface.

Given that individual satellite images typically cover a very large

area on Earth, the

subset operator

crops the original satellite im-

ages to the borders of the user-dened area of interest. This oper-

ation curtails the running time to a signicant extent, restricting

the computational cost to the absolutely essential parts of satellite

images. Its complexity is very low, requiring no parallelization.

The cropped images are given as input to the

collocate op-

erator

, which resamples the pixels of the slave image into the

geographical raster of the master. This operator requires accurate

geopositioning information for both images in the form of ground

control points (GCPs), i.e., markers for certain geographical po-

sitions within a geo-referenced image that are described by their

geo-coordinates and by textual descriptions in the image meta-data.

Next, the

GCP selection operator

generates a set of uniformly

spaced GCPs in the master image and computes their corresponding

GCPs in the slave image. This is done through an iterative process:

for each master GCP, the corresponding slave GCP is approximated

based on their geo-coordinates. Using a predetermined window

size, the areas surrounding each GCP are cross-correlated in order

to adjust the slave GCP to a more accurate position. This procedure

is repeated until the new slave GCP is located within acceptable

limits, or a maximum number of iterations is carried out.

Based on the selected GCPs, the

warp operator

computes the

warp function, which will be used for mapping the pixels of the slave

image into the co-registered image. This is a linear function that is

estimated by repeating the following process until convergence: a

warp function is initially computed using the available master-slave

GCP pairs. The resulting function is used to map the master GCPs

to the slave image. Then, the residuals between the mapped master

and the corresponding slave GCPs are computed along with the root

mean square (RMS) and the standard deviation of all residuals. Next,

the master-slave GCP pairs are ltered to eliminate those exceeding

the mean RMS. Upon completion of this process, the remaining

master-slave GCP pairs are ltered with a predetermined RMS

threshold and the warp function is derived from the retained pairs.

Finally, the co-registered image is generated using the resulting

warp function in combination with bilinear interpolation. This

means that every point of the original slave image is projected

to a point in the master image as the weighted sum of the warp

projection of its four surrounding pixels.

Using the master and the co-registered image as input, the

change detection

algorithm computes the ratio of the correspond-

ing pixels in the two images. The pixels exhibiting very large or

very low ratios indicate candidate areas with changes.

Lastly,

DBScan

[

] is applied for post-processing the set of can-

didate areas with changes. DBScan groups together pixels closely

packed together (i.e., with many nearby change indicators), while

treating as outliers pixels that lie alone in low-density regions, with

their nearest neighbors located far away. The end result is a set

of areas with changes in land cover or land use. In our example,

DBScan produces the image in Figure 1(c), yielding the 7 yellow

clusters that correspond to such areas.

Due to the high time complexity of all processes (except the

Subset operator), they are massively parallelized in Apache Spark.

Due to space limitations, we omit the parallelization details.

3.2 Event Detection Layer

To address Veracity, this layer attaches a set of recent events to every

area with identied changes in land cover or land use, providing

users with a possible explanation and verication of the detected

changes. This functionality is oered by the ve components at the

top layer of GeoSensor’s architecture in Figure 3.

The rst component is the

News Crawler

, which scans at half-

hour intervals specic social media sources and news agencies

for the latest news items (posts and news articles, respectively).

For the time being, these sources include most of the RSS feeds

that are freely provided by Reuters in English

as well as several

selected public accounts in Twitter

, also in English. The crawler

structure, though, is extensible, facilitating the integration of more

information sources, or even the extension with other operation

modes. For example, it has been used as a basic data collection

infrastructure in a summarization application [

] and in the EU

project “NOMAD”

. In our running example, the News Crawler is

responsible for gathering the news items in Figure 2(a).

8https://www.reuters.com/tools/rss

9https://twitter.com

10http://www.nomad-project.eu

GeoSensor SAC’19, April 8-12, 2019, Limassol, Cyprus

All data gathered by the News Crawler are stored in the second

component, namely

Apache Cassandra11

. We opted for this par-

ticular data management system, due to its capacity to store a large

volume of information, while oering linear scalability and fault-

tolerance (i.e., it provides high availability with no single point of

failure). In fact, Cassandra is crafted for large-scale infrastructures

like the BDI platform, oering robust support for clusters with

multiple commodity servers. Besides, it is an open-source NoSQL

database that is compatible with the SemaGrow component, which

is used by the semantic layer for federated access to the details of

individual news items or entire events (cf. Section 3.3).

The news items stored in Cassandra are periodically processed

by the

Event Detector

module at half-hour intervals. They are

grouped into real-world events by a modied version of NewSum

[16], a summarization algorithm providing commercial-grade per-

formance. NewSum uses n-gram graphs [

] to model its textual

input, a representation that has been shown to be eective in noisy

settings in multiple genres (i.e., blogs, articles, microblogging and

social media) [

]. In addition, NewSum is robust to multi-

lingual data, ranking among the top performers in multilingual,

multi-document summarization tasks [17].

In more detail, Event Detector rst builds a coarse-grained set of

events. Pairs of news articles are compared with each other using

their n-gram graphs representation and the corresponding graph-

based textual similarity measures [

]. Appropriate thresholding

is then applied to retain only the pairs with high similarity. Those

pairs are then grouped into larger sets (pools) of news articles

based on a transitivity analysis that forms clusters from connected

components in the similarity graph. The pools of news articles with

a very low support are discarded, whereas the remaining pools are

considered as “real-world events”. Due to its high time complexity,

this process is parallelized in Apache Spark, as shown in Figure

5. The same procedure is applied independently to Twitter data,

yielding a set of tweet pools. Each tweet pool is then compared with

every pool of news articles. If their similarity exceeds a predened

threshold, the tweet pool is added to the pool of news articles. Then,

every pool of news articles goes through a summarization process

that builds its event description (e.g., title selection) and enriches

it with relevant metadata, i.e., spatiotemporal information, named

entities as well as image elements from its member documents.

These metadata are extracted from its content directly, or with the

help of RESTful-based tools and services, internal (Lookup Service

and Entity Extractor) and external ones (PoolParty

and Flickr

In more detail, the

Lookup Service

associates the location names

from news items with their actual geo-coordinates so that they

can be joined with areas with detected changes in land cover or

land use. The location names are identied and extracted from

the text data in each news item using Apache openNLP

. In the

example of Figure 2(a), the location of Kutupalong refugee camp

(Ukhiya, Chittagong, Bangladesh) will be converted into the follow-

ing geo-coordinates:

POLYGON ((92.0455551147462 21.3476104736329, 92.2031173706055

21.3476104736329, 92.2031173706055 21.1280899047852, 92.0455551147462 21.1280899047852,

11http://cassandra.apache.org

12https://github.com/scify

13https://www.poolparty.biz

14https://www.ickr.com

15https://opennlp.apache.org/

Figure 5: The Spark-based implementation of Event Detector.

92.0455551147462 21.3476104736329))

– note that the output is in the form of

the OGC16 standard Well Known Text (WKT).

This conversion may seem a trivial task, given that there is lit-

tle ambiguity in our example. In reality, though, location names

typically suer from high levels of noise. There are homonymous

locations (e.g., London, UK and London, Ontario, Canada) as well

as spelling mistakes (e.g., Landon), due to errors in the extraction

process. To address both challenges, the Lookup Service poses ev-

ery place name as a keyword query to an Apache Lucene

index

that contains about 180,000 location names of administrative areas

worldwide (GADM dataset

). Lucene’s fuzzy query functionality

deals with spelling mistakes, while homonymy is addressed by

ranking the candidates in decreasing order of the ratio "string simi-

larity/area". The WKT polygon coordinates corresponding to the

top ranked location are nally returned as output.

Valuable metadata are also provided by the

Entity Extractor

which enriches the event description with named entities that are

extracted from their textual content, thus empowering a Semantic

Web view of the produced information. This view allows for im-

proved indexing and disambiguation of the main players in an event,

based on the URIs mapped to each extracted entity. At the core of

this functionality lies the

PoolParty Semantic Suite

, which con-

stitutes a state-of-the-art thesaurus management tool that is based

on Linked Data [

]. Specically, a “Famous People” thesaurus was

constructed, containing almost half a million entities of well-known

actual and ctitious personalities, each grounded to a URI. Two

RESTful APIs were implemented and hosted by PoolParty. Given

an input text or a news item url, the rst endpoint, called Extractor

API, provides a list with entities deemed relevant to the supplied

content. The entity URIs are stored in Cassandra, along with their

corresponding thesaurus id. The second endpoint, called Metadata

API, retrieves descriptive metadata related to the entity, whose URI

is given as input. These procedures are illustrated in Figure 6.

The Entity Extractor also associates every detected event with

publicly available images from

Flickr

. Using the Flickr search API,

it retrieves photographs geo-tagged within the geolocation(s) of

each event that have been uploaded at a close enough date.

Finally, all event descriptions, including their metadata, are

stored into Cassandra in the appropriate tables that distinguish

16The Open Geospatial Consortium - http://www.opengeospatial.org

17https://lucene.apache.org

18http://www.gadm.org

SAC’19, April 8-12, 2019, Limassol, Cyprus N. Piaras et al.

Figure 6: Entity extraction example, illustrating the Extrac-

tor and Metadata API.

them from individual news items. Duplicate events are discarded

and Strabon is notied for the new entries (see below for details).

3.3 Semantic Layer

This layer constitutes GeoSensor’s backbone, bringing the gap be-

tween the two orthogonal operations of change and event detection.

This is achieved by the four components in the middle of Figure 3,

which encapsulate state-of-the-art Semantic Web technologies.

The rst component is

Geotriples

[

], a tool for transforming

geospatial data from their original formats into RDF. In our case,

it converts into RDF the descriptions of areas with changes in

land cover or land use (from change detection) as well as the event

summaries (from event detection). We selected GeoTriples, as it is an

established system that supports a wide variety of data formats [

The output of Geotriples is stored into

Strabon

[

], a state-

of-the-art open-source spatio-temporal triplestore that eciently

executes GeoSPARQL and stSPARQL queries. Strabon supports

spatial datatypes, enabling the serialization of geometric objects in

the OGC standards WKT and Geography Markup Language (GML).

It has been implemented by extending the established RDF store

Sesame (now called RDF4J

), using the spatially-enabled database

PostGIS

.Strabon is the most ecient spatio-temporal RDF store

available today, as demonstrated by thorough experiments [6, 14].

The third component of this layer is

SemaGrow

[

], a query

processing system that provides a single SPARQL endpoint for

federating multiple remote SPARQL endpoints. It is also capable

of transparently optimizing queries and dynamically integrating

heterogeneous data models by applying the appropriate vocabulary

transformations. To boost federated query execution, it employs

vocabulary mapping techniques and a balanced query optimizer,

considering instance statistics from the federated bases, where

available. SemaGrow is highly ecient, consistently outperforming

the state-of-the-art in federated query processing [

]. In our case,

SemaGrow federates Cassandra and Strabon, oering a unied

SPARQL endpoint for both of them to GeoSensor’s user interface.

In this way, GeoSensor gains in query performance (with respect

to other systems, e.g., FedX and SPLENDID) and has increased

extensibility – in case new sources need to be added in the future.

GeoSensor’s interface is oered by

Sextant

[

], a web-based

application for exploring, interacting and visualizing time-evolving

linked geospatial data. Sextant is also capable of creating, sharing,

19http://rdf4j.org

20https://postgis.net

(a) (b)

Figure 7: User criteria for triggering (a) Change Detection,

and (b) Event Detection.

searching and collaboratively editing maps and of producing statis-

tical charts out of statistically enhanced data sets. It relies heavily

on Semantic Web technologies but oers an intuitive interface that

allows both domain experts and lay users to exploit all available

features. In order to cover all requirements of GeoSensor,Sextant

has been extended with three new functionalities:

(i) Core functionality. Sextant provides an intuitive interface for

initiating the event and the change detection processes of GeoSen-

sor. The window for launching change detection appears in Figure

7(a). The user selects an area of interest either by typing its name

(with the help of auto-complete), or by highlighting it on the Earth

Map. The credentials for Copernicus Open Access Hub are also

required along with the reference and the target date. For event

detection, Figure 7(b) depicts the window that prompts users to

dene three optional search criteria: an area of interest, a time

window dened by two dates, or a keyword that pertains to events

of interest. The last criterion can be a combination of location or

entity names, or any other words that are likely to appear in an

event title. Users can also search for events by setting as the area

of interest one that appears in the results of change detection.

(ii) Authorization/authentication. To support history over each

user’s actions, Sextant implements a sign-up and login functionality.

At its core, lies a database located in GeoSensor’s server that holds

all account information along with the encrypted passwords. To

ensure security over the network, Sextant can be deployed using the

HTTPS protocol. When GeoSensor rst loads, the user is prompted

to create a new account, or to log-in using an existing one. Three

types of users that are supported: (a) The administrators have full

access to all the supported functionality, including the history panel,

and are responsible for accepting or declining sign-up requests

by new users. (b) The classied users are the main users of the

application and have full access to all the supported functionality,

including the history panel. (c) The unclassied users are potential

trial or occasional users that have limits in using the supported

functionality: they lack a history panel, they cannot search for

events using keywords, and their event detection searches return

up to 5 events. They are also deprived of the "SMART" buttons that

alternate change and event detection.

(iii) Live Twitter keyword search. To further clarify the map visual-

ization with the latest raw information, overcoming the processing

GeoSensor SAC’19, April 8-12, 2019, Limassol, Cyprus

Los Angeles Saudi Arabia

min

SNAP 2-threads

2VMs

4VMs

Figure 8: Execution times for parallelization approaches of

the change detection workow.

delay of the event detection layer, Sextant oers an embedded Twit-

ter keyword search function that supports all Twitter API lters,

such as "#" or "@". Using up to ve keywords in the search eld, Sex-

tant can fetch / update relevant tweets in descending chronological

order, presented via an ecient innite scroll technique.

4 EXPERIMENTS

We now present a preliminary experimental evaluation of GeoSen-

sor’s main functionalities, namely the change and the event detec-

tion workows. Note that our evaluation focuses on time eciency,

aiming to assess the response time of each workow. In other words,

eectiveness lies out of the scope of this evaluation, as GeoSensor

employs unsupervised state-of-the-art methods for each operation.

4.1 Change detection

For change detection, we evaluate the time eciency of two dier-

ent approaches: (i) the Change Detector, which uses Apache Spark

to parallelize the process depicted in Figure 4. (ii) the baseline ap-

proach, which corresponds to the multi-threaded implementation

of the same workow, as provided by ESA’s SNAP Toolbox.

Data.

As test data, we use two pairs of Sentinel-1A images. One

comprising two images of Los Angeles, with le sizes of 508MB

and 504MB, and one consisting two images of Saudi Arabia, with

le sizes of 524MB and 526MB.

Experimental Setup.

All experiments were performed on a

server with Ubuntu 12.04, 132GB RAM and 4 AMD Opteron 6320

processors, each having 4 physical cores and 8 logical cores at

2.80GHz. For the Spark implementation, we created 4 virtual ma-

chines (VMs), each one comprising two cores and 20GB RAM. For

each pair of images, we used 2 and 4 VMs. In each case, one VM

was the master and the rest were used as slaves. The multi-threaded

implementation of SNAP was run using 2 cores on the same server.

For each method and conguration, we took 3 measurements of

the execution time and report the average in Figure 8.

Time Eciency & Scalability.

As shown in Figure 8, the 2-

VMs Spark implementation is three times faster that the multi-

threaded one. This shows that the communication overhead of

Spark is negligible in comparison to the processing time and does

not aect the execution times. Furthermore, as we add more slave

nodes to the Spark implementation, the execution times decrease

consistently. We are working, though, on further improving this

performance so as to achieve a linear speedup.

Figure 9: Execution times for parallelization approaches of

the event detection workow.

4.2 Event Detection

For event detection, we perform an empirical evaluation of the

runtime performance of two approaches: (i) the Event Detector,

which implements the Spark-based distributed similarity mapping

pipeline illustrated in Figure 5, and (ii) the baseline approach, which

parallelizes the same pipeline using the Java multi-threading library.

Data.

We use the Reuters 21K news articles dataset

. Prepro-

cessing discards everything but the clean text, title and publication

date information, storing all data in Cassandra.

Experimental Setup. We run a set of experiments for two dif-

ferent input sizes, namely for input batches of 4

000 and 8

000

articles to be clustered into events. These sizes correspond to ap-

proximately 16 and 64 million unique article pairs. For each batch

size, we apply the baseline approach using 2 threads, while for the

Event Detector we vary the number of Spark partitions

p∈ {

}

For each conguration, we perform 5experiments and compute the

mean average execution time. We run all experiments on a single,

8-core 2.6 GHz Ubuntu 14.04 virtual machine with 32 GB of memory.

For data storage, we use a Cassandra 2.2.4 docker container.

Time Eciency & Scalability.

Figure 9 depicts the execution

time results per conguration. For the Event Detector, we observe

that the runtime drops signicantly as we increase the number

of Spark partitions, i.e., the number of jobs run in parallel. Yet,

the baseline approach is signicantly slower only for the largest

batch size. The reason is that for a small number of small texts,

as in Reuters 21K, Spark’s parallelization overhead is higher than

the speedup it achieves. We are working on improving the Event

Detector so that its performance is competitive for small workloads.

5 CONCLUSIONS

We have presented GeoSensor, an open-source system we devel-

oped as contribution to the H2020 BigDataEurope project. To the

best of our knowledge, GeoSensor constitutes the rst system that

applies Semantic Web technologies to a combination of remote

and social sensing, in an open-source implementation. The RDF

data model in particular is crucial for GeoSensor’s functionality, as

it oers two major advantages compared to traditional, semantic-

free approaches. First, it allows for eectively dealing with Variety,

seamlessly combining all data processed by GeoSensor towards

meaningful analysis. It also facilitates the use of ontologies together

21http://www.daviddlewis.com/resources/testcollections/reuters21578/

SAC’19, April 8-12, 2019, Limassol, Cyprus N. Piaras et al.

with reasoning techniques so as to derive new facts that are not ex-

plicitly expressed in the available data. The second advantage comes

from the power of linked open data and semantics. Transforming

GeoSensor’s data into RDF allows for eortlessly interlinking it

with other data sources and for discovering hidden links between

entities that assist in data analysis – a process that does not only pro-

vide richer data, but also allows to build fully automated workows

using machine learning algorithms.

Moreover, GeoSensor can be easily deployed in any cluster. All its

components are provided as Docker images, publicly available through

the BDE repository

, with the whole system able to launch through

a single docker-compose le, running the individual components as

Docker containers within Docker Swarm

. To further enhance its

usability, GeoSensor oers an intuitive UI, suitable for both expert

and lay users, despite the rich information it processes. In fact,

GeoSensor provides a hands-o functionality in the sense that all

its operations are fully automatic, requiring no specialized input or

domain knowledge from its users. In this way, GeoSensor makes a

big step forward in the exploration and visualization of big data in

the context of remote sensing. Our preliminary experimental study

also demonstrated the high time eciency of our system.

In the future, we will test GeoSensor in rigorous, operational sce-

narios where fast, easy-to-use tools are crucial to decision-making.

Acknowledgements.

This work has been supported by the

projects "LEO" and "BigDataEurope", which have been funded by

EU FP7 and Horizon2020 programmes under grant agreements No.

611141 and 644564, respectively. It has also been supported by the

program of Industrial Scholarships of the Stavros Niarchos Founda-

tion

. Finally, it has been supported by the project “APOLLONIS:

Greek Infrastructure for Digital Arts, Humanities and Language

Research and Innovation” (MIS 5002738), implemented under the

Action “Reinforcement of the Research and Innovation Infrastruc-

ture”, funded by the Operational Programme “Competitiveness, En-

trepreneurship and Innovation” (NSRF 2014-2020) and co-nanced

by Greece and the EU (European Regional Development Fund).

REFERENCES

[1]

Hamed Abdelhaq, Christian Sengstock, and Michael Gertz. 2013. Eventweet:

Online localized event detection from twitter. PVLDB 6, 12 (2013), 1326–1329.

[2]

Charu C Aggarwal and Karthik Subbian. 2012. Event detection in social streams.

In SDM. 624–635.

[3]

Fotis Aisopos, George Papadakis, Konstantinos Tserpes, and Theodora A. Var-

varigou. 2012. Content vs. context for sentiment analysis: a comparative analysis

over microblogs. In ACM Conference on Hypertext and Social Media. 187–196.

[4]

Farzindar Atefeh and Wael Khreich. 2015. A survey of techniques for event

detection in twitter. Computational Intelligence 31, 1 (2015), 132–164.

[5]

Sören Auer, Simon Scerri, and Aad Versteden et. al. 2017. The BigDataEurope

Platform — Supporting the Variety Dimension of Big Data. In ICWE. 41–59.

[6]

Konstantina Bereta, Panayiotis Smeros, and Manolis Koubarakis. 2013. Repre-

sentation and Querying of Valid Time of Triples in Linked Geospatial Data. In

ESWC. 259–274.

[7]

Francesca Bovolo and Lorenzo Bruzzone. 2015. The time variable in data fusion:

A change detection perspective. IEEE Geosc. Remote Sensing Mag. 3, 3 (2015),

8–26.

[8]

Grégoire Burel, Hassan Saif, and Harith Alani. 2017. Semantic Wide and Deep

Learning for Detecting Crisis-Information Categories on Social Media. In ISWC.

138–155.

[9]

Angelos Charalambidis, Antonis Troumpoukis, and Stasinos Konstantopoulos.

2015. Semagrow: Optimizing federated SPARQL queries. In SEMANTiCS. 121–

128.

22https://github.com/big- data-europe/pilot- sc7-cycle3

23https://docs.docker.com/engine/swarm

24https://www.snf.org/

[10]

Ping Chen, Soo Chin Liew, and Leong Keong Kwoh. 2017. Mangrove mapping

and change detection using satellite imagery. In IGARSS. 5717–5720.

[11]

Yubo Chen, Liheng Xu, Kang Liu, Daojian Zeng, and Jun Zhao. 2015. Event

extraction via dynamic multi-pooling convolutional neural networks. In ACL.

167–176.

[12]

Daniela Espinoza-Molina, Reza Bahmanyar, Ricardo Díaz-Delgado, Javier Busta-

mante, and Mihai Datcu. 2017. Land-cover change detection using local feature

descriptors extracted from spectral indices. In IGARSS. 1938–1941.

[13]

Martin Ester, Hans-Peter Kriegel, Jörg Sander, and Xiaowei Xu. 1996. A Density-

Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise.

In KDD. 226–231.

[14]

George Garbis, Kostis Kyzirakos, and Manolis Koubarakis. 2013. Geographica: A

Benchmark for Geospatial RDF Stores (Long Version). In ISWC. 343–359.

[15]

George Giannakopoulos, Vangelis Karkaletsis, George A. Vouros, and Panagiotis

Stamatopoulos. 2008. Summarization system evaluation revisited: N-gram graphs.

TSLP 5, 3 (2008), 5:1–5:39.

[16]

George Giannakopoulos, George Kiomourtzis, and Vangelis Karkaletsis. 2014.

NewSum: N-Gram Graph-Based Summarization in the Real World. In Innovative

Document Summarization Techniques: Revolutionizing Knowledge Understanding.

[17]

George Giannakopoulos, Je Kubina, John Conroy, Josef Steinberger, and Benoit

et al. Favre. 2015. Multiling 2015: multilingual summarization of single and multi-

documents, on-line fora, and call-center conversations. In SIGDIAL. 270–274.

[18]

George Giannakopoulos, Petra Mavridi, Georgios Paliouras, George Papadakis,

and Konstantinos Tserpes. 2012. Representation models for text classication: a

comparative analysis over three web document types. In WIMS. 1–12.

[19]

Maoguo Gong, Tao Zhan, Puzhao Zhang, and Qiguang Miao. 2017. Superpixel-

Based Dierence Representation Learning for Change Detection in Multispectral

Remote Sensing Images. IEEE Geosci. Remote Sensing 55, 5 (2017), 2658–2673.

[20]

Salman Hameed Khan, Xuming He, Fatih Porikli, and Mohammed Bennamoun.

2017. Forest Change Detection in Incomplete Satellite Images With Deep Neural

Networks. IEEE Trans. Geoscience and Remote Sensing 55, 9 (2017), 5407–5423.

[21]

Shamanth Kumar, Huan Liu, Sameep Mehta, and L Venkata Subramaniam. 2014.

From tweets to events: Exploring a scalable solution for Twitter streams. arXiv

preprint:1405.1392 (2014).

[22]

Kostis Kyzirakos, Manos Karpathiotakis, and Manolis Koubarakis. 2012. Strabon:

A Semantic Geospatial DBMS. In ISWC. 295–311.

[23]

Kostis Kyzirakos, Dimitrianos Savva, Ioannis Vlachopoulos, Alexandros Vasileiou,

Nikolaos Karalis, Manolis Koubarakis, and Stefan Manegold. 2018. GeoTriples:

Transforming Geospatial Data into RDF Graphs Using R2RML and RML Mappings.

Web Semantics: Science, Services and Agents on the World Wide Web (2018).

[24]

Kostis Kyzirakos, Ioannis Vlachopoulos, Dimitrianos Savva, Stefan Manegold,

and Manolis Koubarakis. 2014. GeoTriples: a Tool for Publishing Geospatial Data

as RDF Graphs Using R2RML Mappings. In ISWC. 393–396.

[25]

Dengsheng Lu, P Mausel, E Brondizio, and Emilio Moran. 2004. Change detection

techniques. International journal of remote sensing 25, 12 (2004), 2365–2401.

[26]

Juha Makkonen, Helena Ahonen-Myka, and Marko Salmenkivi. 2002. Applying

semantic classes in event detection and tracking. In ICON 2002. 175–183.

[27]

Juha Makkonen, Helena Ahonen-Myka, and Marko Salmenkivi. 2004. Simple

semantics in topic detection and tracking. Inf. Retr. 7, 3-4 (2004), 347–368.

[28]

Thien Huu Nguyen and Ralph Grishman. 2015. Event detection and domain

adaptation with convolutional neural networks. In ACL. 365–371.

[29]

Charalampos Nikolaou, Kallirroi Dogani, Konstantina Bereta, George Garbis,

Manos Karpathiotakis, Kostis Kyzirakos, and Manolis Koubarakis. 2015. Sextant:

Visualizing time-evolving linked geospatial data. J. Web Sem. 35 (2015), 35–52.

[30]

Ozer Ozdikis, Pinar Senkul, and Halit Oguztuzun. 2012. Semantic expansion of

tweet contents for enhanced event detection in twitter. In ASONAM. 20–24.

[31]

Nikolaos Panagiotou, Ioannis Katakis, and Dimitrios Gunopulos. 2016. Detecting

events in online social networks: Denitions, trends and challenges. In Solving

Large Scale Learning Tasks. Challenges and Algorithms. 42–84.

[32]

George Papadakis, George Giannakopoulos, and Georgios Paliouras. 2016. Graph

vs. bag representation models for the topic classication of web documents.

WWW 19, 5 (2016), 887–920.

[33]

Richard J. Radke, Srinivas Andra, Omar Al-Kofahi, and Badrinath Roysam. 2005.

Image change detection algorithms: a systematic survey. IEEE Trans. Image

Process. 14, 3 (2005), 294–307.

[34]

Jagan Sankaranarayanan, Hanan Samet, Benjamin E Teitler, Michael D Lieberman,

and Jon Sperling. 2009. Twitterstand: news in tweets. In SIGSPATIAL. 42–51.

[35]

Thomas Schandl and Andreas Blumauer. 2010. PoolParty: SKOS Thesaurus

Management Utilizing Linked Data. In ESWC. 421–425.

[36]

Nestor Yague-Martinez, Francesco De Zan, and Pau Prats-Iraola. 2017. Coregis-

tration of Interferometric Stacks of Sentinel-1 TOPS Data. IEEE Geosci. Remote

Sensing 14, 7 (2017), 1002–1006.

[37]

Yiming Yang, Tom Pierce, and Jaime Carbonell. 1998. A study of retrospective

and on-line event detection. In SIGIR. 28–36.

Représentation sémantique de données géospaciales au service de l'analyse de changements

Thesis

Oct 2021

Jordane Dorne

La détection de changements à partir d’images satellitaires d’observation de la Terre est une tâche utile pour surveiller des évolutions naturelles ou liées à l’activité humaines, mais aussi l’impact d’événements ponctuels (incendies, inondations, etc.). L’utilisation d’apprentissage automatique pour détecter les zones de changements produit des résultats de plus en plus précis, sans toutefois fournir d’information sur la nature ou la cause de ces changements. Pour répondre à ce type de besoin, notre thèse vise à développer des solutions fondées sur des vocabulaires formels et des ontologies, sur la représentation des connaissances, l'annotation et l'intégration sémantique de méta-données associées aux images satellitaires et plus généralement aux données géo-localisées. En effet, les techniques de sémantisation sont un moyen d’offrir une interprétation intelligente des données. Le cas d’étude retenu pour illustrer l’apport de cette approche est le suivi des changements à différents pas de temps et différentes échelles de restitution. L'objectif est d'enrichir les méta-données issues des flux d'images en leur associant des catégories conceptuelles qui leur donnent du sens, de les coupler à des pré-traitements qui répondent aux exigences spécifiques à l’étude des changements et de les intégrer à d'autres informations géographiques disponibles (données climatiques et météorologiques, démographiques, etc. suivant les besoins). La thèse vise donc à montrer l'apport des méta-données sémantiques pour la surveillance des changements mais aussi à évaluer le passage à l'échelle des techniques de sémantisation, notamment de la représentation des données dans les entrepôts sémantiques.Nous proposons un processus qui gère le cycle complet de génération et exploitation de graphes de connaissances à partir de rasters issus de la télédétection et de données issues de l’open data. Les caractéristiques innovantes de ce processus sont les suivantes :i. Un algorithme permettant l’identification automatique de régions d’intérêts (ROI) associées à des valeurs similaires d’un indicateur calculé à partir d’une image satellite, et assurant ainsi un découpage géographique précis comme référence pour l’intégration de données.ii. Une approche orientée sémantique pour la génération de graphes de connaissances depuis différentes sources.

Detecting Local Crisis Events: A Case Study on the Colorado Wildfires through Social Media and Satellite Imagery

Conference Paper

Full-text available

Apr 2024

Crisis Detection by Social and Remote Sensing Fusion: A Selective Attention Approach

Chapter

Full-text available

Sep 2023

Deep learning has revolutionized event detection in social media and remote sensing, allowing for more precise and efficient analysis of immense amounts of data to unveil hidden patterns and insights. Moreover, Merging both data sources has proven efficient despite the absence of multi-sensed datasets. In today’s data-driven globe, it is becoming increasingly critical to process and explore heterogeneous data and to design models handling such data. This paper proposes a new multi-sensed fusion approach that leverages satellite images and tweets as input. We combined two open datasets to obtain a multi-sensed dataset concerning the 2017 hurricane Harvey. We extracted features from satellite imagery using Resnet34 and generated embeddings from tweets using Bert. We fused the embeddings and the features using a selective attention module incorporating cross and self-attention. Our module can filter misleading features from weak modalities on a sample-by-sample basis. We demonstrate that our approach surpasses unimodal models based on tweets or satellite imagery. We compared our results to a few baselines associated with hurricane Harvey and proved that our model surpasses them in accuracy, precision, recall, and F1 measure.

Giving meaning to unsupervised EO change detection rasters: a semantic-driven approach

Conference Paper

Full-text available

Nov 2020

Suspicious Local Event Detection in Social Media and Remote Sensing: Towards a Geosocial Dataset Construction

Conference Paper

Full-text available

Oct 2020

Remote sensing is a powerful technology for earth observation. However, the spatial, spectral, and temporal resolution of the imagery are imposing various limits. Lately, with the rise of the internet and smart mobile devices, social media with location-based information has been rapidly emerging. These circumstances led to the prevailing of new scenarios where fine-grained details of social bookmarking websites are enhanced with the wide coverage of satellites. Social media and satellites are both valuable sources of data. An event-driven data, designating either normal common events or unusual suspicious ones that may threaten human lives or damage the infrastructure. In this paper, we provide an insight into the present state of knowledge to better address the task of local suspicious event detection and linking social media with satellite imagery. Also, to track suspicious local events, we treated the detection problem as a retrospective problem by training different classifiers on the crisisLexT26 dataset. Furthermore, we introduced how to use the available geo-locations in the dataset to construct a geo-social dataset by linking it with remote sensing and retrieving satellite imagery before and after the event occurrence.

Changing the Nature of Quantitative Biology Education: Data Science as a Driver

Article

Sep 2020
B MATH BIOL

We live in a data-rich world with rapidly growing databases with zettabytes of data. Innovation, computation, and technological advances have now tremendously accelerated the pace of discovery, providing driverless cars, robotic devices, expert healthcare systems, precision medicine, and automated discovery to mention a few. Even though the definition of the term data science continues to evolve, the sweeping impact it has already produced on society is undeniable. We are at a point when new discoveries through data science have enormous potential to advance progress but also to be used maliciously, with harmful ethical and social consequences. Perhaps nowhere is this more clearly exemplified than in the biological and medical sciences. The confluence of (1) machine learning, (2) mathematical modeling, (3) computation/simulation, and (4) big data have moved us from the sequencing of genomes to gene editing and individualized medicine; yet, unsettled policies regarding data privacy and ethical norms could potentially open doors for serious negative repercussions. The data science revolution has amplified the urgent need for a paradigm shift in undergraduate biology education. It has reaffirmed that data science education interacts and enhances mathematical education in advancing quantitative conceptual and skill development for the new generation of biologists. These connections encourage us to strive to cultivate a broadly skilled workforce of technologically savvy problem-solvers, skilled at handling the unique challenges pertaining to biological data, and capable of collaborating across various disciplines in the sciences, the humanities, and the social sciences. To accomplish this, we suggest development of open curricula that extend beyond the job certification rhetoric and combine data acumen with modeling, experimental, and computational methods through engaging projects, while also providing awareness and deep exploration of their societal implications. This process would benefit from embracing the pedagogy of experiential learning and involve students in open-ended explorations derived from authentic inquiries and ongoing research. On this foundation, we encourage development of flexible data science initiatives for the education of life science undergraduates within and across existing models.

Semantic City Planning Systems (SCPS): A Literature Review

Article

Full-text available

Jan 2022
J PLAN LIT

This review focuses on recent research literature on the use of Semantic Web Technologies (SWT) in city planning. The review foregrounds representational, evaluative, projective, and synthetical meta-practices as constituent practices of city planning. We structure our review around these four meta-practices that we consider fundamental to those processes. We find that significant research exists in all four metapractices. Linking across domains by combining various methods of semantic knowledge generation, processing, and management is necessary to bridge gaps between these meta-practices and will enable future Semantic City Planning Systems.

Semantic City Planning Systems (SCPS): A Literature Review

Preprint

Full-text available

Apr 2021

Pre-print as published on https://como.ceb.cam.ac.uk/preprints/270/.

Scaling and Semantically-Enriching Language-Agnostic Summarization

Chapter

Jan 2020

This chapter describes the evolution of a real, multi-document, multilingual news summarization methodology and application, named NewSum, the research problems behind it, as well as the steps taken to solve these problems. The system uses the representation of n-gram graphs to perform sentence selection and redundancy removal towards summary generation. In addition, it tackles problems related to topic and subtopic detection (via clustering), demonstrates multi-lingual applicability, and—through recent advances—scalability to big data. Furthermore, recent developments over the algorithm allow it to utilize semantic information to better identify and outline events, so as to offer an overall improvement over the base approach.

Semantic Wide and Deep Learning for Detecting Crisis-Information Categories on Social Media

Conference Paper

Full-text available

Oct 2017

When crises hit, many flog to social media to share or consume information related to the event. Social media posts during crises tend to provide valuable reports on affected people, donation offers, help requests, advice provision, etc. Automatically identifying the category of information (e.g., reports on affected individuals, donations and volunteers) contained in these posts is vital for their efficient handling and consumption by effected communities and concerned organisations. In this paper, we introduce Sem-CNN; a wide and deep Convolutional Neural Network (CNN) model designed for identifying the category of information contained in crisis-related social media content. Unlike previous models, which mainly rely on the lexical representations of words in the text, the proposed model integrates an additional layer of semantics that represents the named entities in the text, into a wide and deep CNN network. Results show that the Sem-CNN model consistently outperforms the baselines which consist of statistical and non-semantic deep learning models.

Land-Cover Change Detection Using Local Feature Descriptors Extracted From Spectral Indices

Conference Paper

Full-text available

Jul 2017

An effective monitoring and analysis of ecosystems requires developing new tools and knowledge. In this paper, we propose an approach for detecting land-cover changes using satellite Image Time Series. This approach represents each image by spectral indices and then extracts local features of these representations. Next, a clustering technique (e.g., k-means) is applied to the extracted features, where the resulting clusters are assumed to refer to land-cover classes. The land-cover change is then obtained by counting the number of times an assigned class to each point changes along the time series. For our experiments, we use a collection of Landsat-5 images captured every second month from October 2009 to August 2010 over the protected area of the Donana National Park in southwestern Spain, which is the largest sanctuary for migratory birds in western Europe. Results demonstrate that the proposed approach can detect the occurring changes in the main land-cover categories along the assessed time series

Forest Change Detection in Incomplete Satellite Images With Deep Neural Networks

Article

Full-text available

Jun 2017

Land cover change monitoring is an important task from the perspective of regional resource monitoring, disaster management, land development, and environmental planning. In this paper, we analyze imagery data from remote sensing satellites to detect forest cover changes over a period of 29 years (1987-2015). Since the original data are severely incomplete and contaminated with artifacts, we first devise a spatiotemporal inpainting mechanism to recover the missing surface reflectance information. The spatial filling process makes use of the available data of the nearby temporal instances followed by a sparse encoding-based reconstruction. We formulate the change detection task as a region classification problem. We build a multiresolution profile (MRP) of the target area and generate a candidate set of bounding-box proposals that enclose potential change regions. In contrast to existing methods that use handcrafted features, we automatically learn region representations using a deep neural network in a data-driven fashion. Based on these highly discriminative representations, we determine forest changes and predict their onset and offset timings by labeling the candidate set of proposals. Our approach achieves the state-of-the-art average patch classification rate of 91.6% (an improvement of ~16%) and the mean onset/offset prediction error of 4.9 months (an error reduction of five months) compared with a strong baseline. We also qualitatively analyze the detected changes in the unlabeled image regions, which demonstrate that the proposed forest change detection approach is scalable to new regions.

Coregistration of Interferometric Stacks of Sentinel-1 TOPS Data

Article

Full-text available

May 2017

The coregistration of synthetic aperture radar images is of fundamental importance for the generation of interferograms. The high azimuth coregistration requirements imposed by the TOPS acquisition mode imply that an advanced approach for the coregistration of stacked time series images is needed due to temporal decorrelation effects. In some scenarios, the conventional approach of estimating the shifts pairwise with respect to the same master might result insufficient. Therefore, a joint estimation is proposed here, which exploits jointly all interferograms in order to retrieve more accurate results. Simulated data and Sentinel-1A images acquired in IW mode are used to validate this procedure, demonstrating the better performance of the joint approach when compared to the standard single-master approach.

The BigDataEurope Platform – Supporting the Variety Dimension of Big Data

Conference Paper

Full-text available

Jun 2017
Lect Notes Comput Sci

The management and analysis of large-scale datasets – described with the term Big Data – involves the three classic dimensions volume, velocity and variety. While the former two are well supported by a plethora of software components, the variety dimension is still rather neglected. We present the BDE platform – an easy-to-deploy, easy-to-use and adaptable (cluster-based and standalone) platform for the execution of big data components and tools like Hadoop, Spark, Flink, Flume and Cassandra. The BDE platform was designed based upon the requirements gathered from seven of the societal challenges put forward by the European Commission in the Horizon 2020 programme and targeted by the BigDataEurope pilots. As a result, the BDE platform allows to perform a variety of Big Data flow tasks like message passing, storage, analysis or publishing. To facilitate the processing of heterogeneous data, a particular innovation of the platform is the Semantic Layer, which allows to directly process RDF data and to map and transform arbitrary data into RDF. The advantages of the BDE platform are demonstrated through seven pilots, each focusing on a major societal challenge.

GeoTriples: Transforming Geospatial Data into RDF Graphs Using R2RML and RML Mappings

Article

Jan 2018

Mangrove mapping and change detection using satellite imagery

Conference Paper

Jul 2017

Superpixel-Based Difference Representation Learning for Change Detection in Multispectral Remote Sensing Images

Article

Feb 2017

Detecting Events in Online Social Networks: Definitions, Trends and Challenges

Chapter

Jul 2016

Event detection is a research area that attracted attention during the last years due to the widespread availability of social media data. The problem of event detection has been examined in multiple social media sources like Twitter, Flickr, YouTube and Facebook. The task comprises many challenges including the processing of large volumes of data and high levels of noise. In this article, we present a wide range of event detection algorithms, architectures and evaluation methodologies. In addition, we extensively discuss on available datasets, potential applications and open research issues. The main objective is to provide a compact representation of the recent developments in the field and aid the reader in understanding the main challenges tackled so far as well as identifying interesting future research directions.

NewSum: “N-Gram Graph”-Based Summarization in the Real World

Chapter

Jan 2014

This chapter describes a real, multi-document, multilingual news summarization application, named NewSum, the research problems behind it, as well as the novel methods proposed and tested to solve these problems. The system uses the representation of n-gram graphs in a novel manner to perform sentence selection and redundancy removal for the summaries and faces problems related to topic and subtopic detection (via clustering) and multi-lingual applicability, which are caused by the nature of the real-world news summarization sources.

GeoSensor: Semantifying Change and Event Detection over Big Data

Abstract and Figures

Recommended publications

6th CEFRABID 31st Oct 2018 Skype Meeting on Clean energy from road acoustic barriers infrastructure...

On Bessel Structure Moment for Images Retrieval

An Environmental Perspective on Urbanization: Research Needs for Strengthening Environmental Aspects...

Modeling Urban Growth Trajectories and Spatiotemporal Pattern: A Case Study of Lucknow City, India