Conference PaperPDF Available

IoTSE-based open database vulnerability inspection in three Baltic countries: ShoBEVODSDT sees you

December 2021

December 2021

DOI:10.1109/IOTSMS53705.2021.9704952

Conference: International Conference on Internet of Things: Systems, Management and Security
At: Spain (web-based)

Authors:

Anastasija Nikiforova

University of Tartu

Artjoms Daskevics

University of Latvia

This study aims to analyze the state of the security of open data databases, i.e. being accessible from the outside of organization, representing both relational databases and NoSQL of three Baltic countries-Latvia, Lithuania, Estonia. This is done by using previously proposed tool for non-intrusive detection of vulnerable data sources called ShoBEVODSDT (Shodan- and Binary Edge-based vulnerable open data sources detection tool). ShoBEVODSDT is based on the use of Internet of Things Search Engines (IoTSE). It is found to be suitable for this study since it conducts the passive assessment, which means that its use does not harm the databases but rather checks for potentially existing bottlenecks or weaknesses which, if the attack would take place, could be exposed. It allows for both comprehensive analysis for all unprotected data sources falling into the list of predefined data sources-MySQL, PostgreSQL, MongoDB, Redis, Elasticsearch, CouchDB, Cassandra and Memcached, or to define IP range to examine what can be seen from the outside of the organization about the data source. Although some data sources can be described as following the security-by-design principle, some of them face serious challenges in this respect. The study carries out cross-country comparative study on 8 data sources. We inspect both, (1) the most vulnerable data sources and (2) countries characterized by the highest number of open data sources and the highest degree of "value" of data being available to external actors.

Content uploaded by Anastasija Nikiforova

Content may be subject to copyright.

This paper has been accepted for publishing in Proceedings of 8th International Conference on Internet of Things: Systems, Management

and Security (IOTSMS2021). The final authenticated version is available online at https://doi.org/10.1109/IOTSMS53705.2021.9704952

Please, cite this paper as:

Daskevics A. and Nikiforova A. "IoTSE-based open database vulnerability inspection in three Baltic countries:

ShoBEVODSDT sees you," 2021 8th International Conference on Internet of Things: Systems, Management and

Security (IOTSMS), 2021, pp. 1-8, doi: 10.1109/IOTSMS53705.2021.9704952.

XXX-X-XXXX-XXXX-X/XX/$XX.00 ©20XX IEEE

IoTSE-based open database vulnerability inspection

in three Baltic countries: ShoBEVODSDT sees you

Artjoms Daskevics

Faculty of Computing

University of Latvia

Riga, Latvia

artjoms.daskevics@gmail.com

Anastasija Nikiforova

Faculty of Computing, Innovation laboratory

University of Latvia

Riga, Latvia

anastasija.nikiforova@lu.lv, ORCID: 0000-0002-0532-3488

Abstract—This study aims to analyze the state of the

security of open data databases, i.e. being accessible from the

outside of organization, representing both relational databases

and NoSQL of three Baltic countries - Latvia, Lithuania,

Estonia. This is done by using previously proposed tool for non-

intrusive detection of vulnerable data sources called

ShoBEVODSDT (Shodan- and Binary Edge- based vulnerable

open data sources detection tool). ShoBEVODSDT is based on

the use of Internet of Things Search Engines (IoTSE). It is

found to be suitable for this study since it conducts the passive

assessment, which means that its use does not harm the

databases but rather checks for potentially existing bottlenecks

or weaknesses which, if the attack would take place, could be

exposed. It allows for both comprehensive analysis for all

unprotected data sources falling into the list of predefined data

sources - MySQL, PostgreSQL, MongoDB, Redis,

Elasticsearch, CouchDB, Cassandra and Memcached, or to

define IP range to examine what can be seen from the outside

of the organization about the data source. Although some data

sources can be described as following the security-by-design

principle, some of them face serious challenges in this respect.

The study carries out cross-country comparative study on 8

data sources. We inspect both, (1) the most vulnerable data

sources and (2) countries characterized by the highest number

of open data sources and the highest degree of “value” of data

being available to external actors.

Keywords— Internet of Things Search Engine (IoTSE),

Shodan, BinaryEdge, Internet of Things (IoT), database, NoSQL,

vulnerability

I. INTRODUCTION

Nowadays, there are billions interconnected devices

forming an Internet of Things (IoT) ecosystem. With an

increasing number of devices and systems in use, the risk of

security breaches increases as well [1-2]. One of these risks

is posed by open data sources, i.e. open databases by which

are not meant databases which are deliberately open for

others but databases which are not properly protected,

therefore they are available and accessible to external actors

outside the organization. Although it may sound surprisingly,

but the number of such databases is enormous. In many cases

this is caused by misconfiguration, where the responsibility

falls to database holders, in other cases there are

vulnerabilities in the products and services, where apart of

proper configuration additional security mechanisms are

needed. But how to find out whether the database is visible

and even accessible outside the organization? What

information (if any) may be gathered from it? Whether

stronger security mechanisms are needed? Is the

vulnerability rather related to internal configuration or the

database in use?

Although some questions may be partly answered by

referring to Common Vulnerability and Exposure (CVE)

Details and other data sources summarizing vulnerabilities

and patches on different services, this information may be

too general. Therefore, testing and more precisely penetration

testing could be the answer to allow to get an insight on the

current state for specific artifact, i.e. specific system or set of

systems, or region. In our previous study [3] we have

presented a tool for non-intrusive detection of vulnerable data

sources called ShoBEVODSDT (Shodan- and Binary Edge-

based vulnerable open data sources detection tool). This

time, we have applied ShoBEVODSDT tool to three

countries of Baltic region, namely Latvia, Lithuania and

Estonia, to carry out an extensive investigation on the current

state of data sources and their security in a country context.

The aim of this study is threefold: (1) to validate the tool

in real-life circumstances, thus patrolling the previous study,

(2) to draw conclusions on similarities or differences in three

Baltic country - Latvia, Lithuania and Estonia - patterns, i.e.

whether the technological development of Estonia will be

also seen in this matter, (3) to draw more objective

conclusions on the relationships between more vulnerable

open data sources in respect of specific data source, i.e.

allowing the detection of less ”protected by design” data

sources.

Thus, the following research questions (RQ) are posed:

(RQ1.1) What data source is the most likely to be open

database among eight analyzed?

(RQ1.2) What data source is the most likely to be

vulnerable?

(RQ2.1) Which country has the most open data sources?

(RQ2.2) What country has the most vulnerable open data

sources?

The paper is structured as follows: a background and

related studies (Section 2), methodology (Section 3), results

of analysis (Section 4), discussion and limitations (Section 5)

and conclusions (Section 6).

II. BACKGROUND

Today, security and database security in particular are

topical for at least a few reasons. First, databases are part of

each system that have only become more popular with the

involvement of the Internet of Things and integration of this

concept in our daily lives. Secondly, the popularity of

NoSQL and their relatively weak security has significantly

increased the popularity of this topic. The main security

concern is that the most NoSQL databases having a list of

benefits and advantages for users are less likely to provide

security measures, including sometimes very primitive and

simple measures such as authentication, authorization [1, 4].

This also applies to data encryption. Perhaps the most

provocative database in this respect is MongoDB, where in

2018 there were 54 000 databases accessible on the Internet,

which resulted in data leakage of 2.4 million patients of

telemedicine vendor [5]. While there have been

improvements in this respect in recent years, it remains a

problem. However, while the vulnerability of NoSQL

databases is widely debated, this does not mean that SQL

databases are secure and their holders do not risk their data

leaking.

According to a list drawn up by Bekker [5] and Identity

Force on major security breaches in 2020, a large number of

data leaks occur due to unsecured databases. As an example:

 Estee Lauder – 440 million customer records;

 Whisper – 900 million user records;

 Key Ring digital wallet – 14 million users records;

 Prestige Software hotel reservation platform – over

10 million hotel guests, including services such as

Expedia, Hotels.com, Booking.com, Agoda etc.;

 Paay card payments database – 2.5 million card

transactions;

 Slcikwraps – 850 000 customers records;

 Unnamed U.K-based Security Firm has managed to

gain data belonging to Adobe, Twitter, Tumbler,

LinkedIn etc. and users with a total of over 5 billion

records;

 Marijuana Dispensaries – 85 000 medical marijuana

patient and recreational user records, etc.

This Section will briefly cover existing studies on this

topic, which are typically divided in (1) registries allowing

identifying the level of security or vulnerability of the

service, more precisely database, in use and (2) approaches to

test the current state of the service used in a particular

system.

For registries to be used to identify the weakest areas of

the service, the Common Vulnerability and Exposure (CVE)

Details (https://www.cvedetails.com/) is probably the most

popular index used for a variety of services. CVE Details

collects and provides to every stakeholder an index of

registered vulnerabilities of various services, including

databases, dividing vulnerabilities in 13 categories: Denial of

Service, Code Execution, Overflow, Memory Corruption,

SQL injection, XSS, Directory Traversal, HTTP response

splitting, Bypass something, Gain information, Gain

Privileges, CSRF and File Inclusion, where “Gain

Information” category is close to the aspect we inspect.

Another registry is VulDB, i.e. the vulnerability database,

documenting and explaining security vulnerabilities, threats,

and exploits for more than 50 years. As CVE Details, it

provides data not only on databases. However, the number of

databases covered by it is limited and databases, such as

Memcahced, ElasticSearch, characterized by a high number

of vulnerabilities and leaks in recent years (also in line with

[5]), and some other databases covered by our study are not

presented. This registry can therefore be used as a

complimentary source, but in many cases it will not be

applicable.

Not least popular is also NVD – National Vulnerability

Database - the U.S. government repository that includes

databases of security checklist references, security-related

software flaws, misconfigurations, product names, and

impact metrics.

Although these sources are indisputably valuable, they

are rather static and general, i.e. provide general information

on the vulnerabilities of databases used by specific

organizations / systems. However, it is clear that this alone is

not enough. It does not allow to gain insight into what can be

seen outside the organization. The approaches to test the

current state of the service in use can help in it.

According to Bada et al. [1], testing tools such as security

or vulnerability scanners, presenting the threats and risks

found is an essential part of the vulnerability assessment

process. They typically allow to define, identify, and classify

the security holes. According to CERN [7], vulnerability

scanners are divided by the type of tests executed in intrusive

and non-intrusive tests. An intrusive test tries to expose the

vulnerability, which can crash the remote target. A non-

intrusive test attempts not to cause any harm to the target

system. They usually check the remote service version, or

whether the service is configured insecurely. This concept is

close to the central object of the study – identification of

open / unprotected databases. Intrusive tests are indisputably

more accurate, but they cannot be carried out legally in a

production environment. Although a nonintrusive test cannot

determine for sure if a service is vulnerable, it points on the

possibility that it is vulnerable, which is definitely important

and valuable.

Here the concepts of Open Source Intelligence (OSINT)

and Internet of Things Search Engines (IoTSE) come, which

search for and index publicly available and accessible IoT

devices, thereby allowing to understand how publicly

available and accessible are specific devices [8]. OSINT is

defined as a concept describing the search, collection,

analysis and use of information from open sources, and the

methods and tools used [9]. In more general terms, Williams

et al. describe the activities of OSINT in four stages: (1)

collection, (2) processing, (3) exploitation, and (4)

production [10], where processing and exploitation can take

place not entirely sequentially but rather in parallel. They

describe these stages as acquiring or obtaining information,

validating this information, determining its value of and

providing corresponding results to customers.

ShoBEVODSDT presented in [3] and used in this study

follows this paradigm.

The popularity of both concepts is increasing in a variety

of areas, including the detection of open databases and the

assessment of their vulnerabilities and leaks [1, 11], which

can be carried out at different levels, i.e. (1) at system level

for only one organization or (2) more comprehensive, when

overall insight on the state of the art can be gained not

limited to particular organization.

While there are some IoTSE-based tools such as

LeakLooker, LeakLooker X and Lampyre that allow to

automate information gathering, our previous study [3] found

that they have a list of limitations. For instance, the limited

list of databases, inability to perform a country-based

analysis, a format of results that is difficult to process etc. So

we have proposed our own IoTSE-based tool called

ShoBEVODSDT or Shodan- and Binary Edge- based

vulnerable open data sources detection tool. The next section,

the primary aim of which is to present a methodology of the

study, will cover it.

III. METHODOLOGY

The section sets out the main concepts related to the

study methodology, covering the data sources to be analyzed,

the main features of ShoBEVODSDT and introducing a

classification of the results obtained.

A. Data Sources

This study is closely related to our previous study [3],

when we presented ShoBEVODSDT. This close link with

this study means that the number and nature of data sources

we cover is the same. Thus, given that we have designed

ShoBEVODSDT to search for eight predefined open data

sources - MySQL, PostgreSQL, MongoDB, Redis,

Elasticsearch, CouchDB, Cassandra and Memcached, let us

briefly cover them here.

First, these data sources represent both (1) relational SQL

databases and (2) NoSQL databases. To be more precise,

three types of sources – (1) relational databases, (2) NoSQL

databases and (3) data stores, are covered to ensure a broader

view on the state of the play (for more detailed classification

of these data sources by their type, see Table I). This list is

based on three factors: (1) the most popular databases, where

the list is created based on the results of a survey of

developers conducted in the mid of 2020 [12], (2) the

different types of data storages, where apart of relational

databases, NoSQL databases were selected to represent both

types, document-oriented, column-oriented and key-value

databases, (3) our own experience when working with these

data storages, which is important because the specificities of

many of them affect the entire testing process, so at least

basic knowledge and skills working with them benefit.

Although the list of the most popular databases [12]

contradict other statistics, such as [13], where Oracle and MS

SQL are dominant, the list we have used overlaps with it

significantly, and even more, it came from developers.

However, given this limitation, i.e. not all the most popular

databases have been covered, we pose it as a future work,

since ShoBEVODSDT is scalable with a source code

available (https://github.com/zhmyh/Open-Databases), i.e.

the list of data sources to be analyzed may be extended. It

would allow everyone, if necessary, to extend the scope of

the developed tool to their needs.

B. ShoBEVODSDT or Shodan- and Binary Edge- based

Vulnerable Open Data Sources Detection Tool

ShoBEVODSDT has already been presented in [3], thus,

we will not cover it in very details. Instead, we will briefly

cover its main actions and output to be further analyzed for

the purpose of this study.

ShoBEVODSDT supports the detection of vulnerabilities

at early non-intrusive security assessment phases, which

makes it possible to apply it to both, own system or the

whole ecosystem of specific country or region. In this paper

we refer to the second case and apply it to the Baltic region

represented by three countries – Latvia, Estonia and

Lithuania.

While many studies use only one IoT search engine, such

as Shodan, which is considered a de facto OSINT tool [14-

15], ShoBEVODSDT is based on two of them – Binary Edge

and Shodan. It should contribute to the correctness and

completeness of the results and effectively determine their

potential attack surface and contribute to a targeted

assessment of vulnerability.

TABLE I. DATA SOURCES AND THEIR MODELS [BASED ON [13]]

Database

Primary database

model

Secondary database model

MySql

Relational DBMS

document store, spatial DBMS

PostgreSql

Relational DBMS

document store, spatial DBMS

MongoDB

Document store

spatial DBMS, search engine

Redis

Key-value store

document store, graph DBMS,

spatial DBMS, search engine,

time Series DBMS

Elasticsearch

Search engine

document store, Spatial DBMS

CouchDB

Document store

Spatial DBMS

Cassandra

Wide-column store

Memcached

Key-value store

Compared to individual IoTSE, ShoBEVODSDT extends

the list of features provided by Binary Edge and Shodan, and

allow for more categorized analysis of data obtained. The

later aspect is essential to our study, since we intend to cover

three countries and carry out their comparative analysis.

ShoBEVODSDT operation can be characterized as

follows:

 ShoBEVODSDT searches for IP addresses of open

data sources that belong to an appropriate user-

defined country using possible filters from Shodan

and BinaryEdge. These results are combined by

eliminating duplicates (if any) and saving results in

“parsed/<service_name_> _<country>. txt”;

 when an open data source is found, ShoBEVODSDT

gathers available data from relevant IP addresses, and

verifies whether it is possible to retrieve data from a

system that may be possible due to a weak level of

security. ShoBEVODSDT checks found IP addresses

by classifying them by:

(a) service, i.e. MySQL, PostgreSQL,

MongoDB, Redis, ElasticSearch,

CouchDB, Cassandra and Memcached, and

(b) country, i.e. Latvia, Lithuania and

Estonia.

By classification we understand the sorting of

results by matching folders that should be created

when finding an appropriate item for that

classification. This is done by “check” class method

associated with the service. If the connection to the

database has been successful, the IP address is

stored in „good/<service_name>_<country>.txt”,

otherwise, the IP address and error information are

stored in „bad/<service_name> _ <country>.txt”. In

total, up to 48 folders can be created in the case if IP

associated with all eight services belonging to both

three countries have been found;

 ShoBEVODSDT retrieves data from data sources to

which it has managed to connect. This is done by

searching for files in a “parsed/good” that

corresponds to the service and country to be checked,

where the process of downloading database content

differs from one another. It is predefined in

ShoBEVODSDT for abovementioned eight services.

However, in the case if another service should be

added, a source code of ShoBEVODSDT should be

modified. All information is stored in the “parsed”

folder and in a file called “<IP_ADRESE>.txt”.

When the data are retrieved and classified, the next stage

of the obtained data assessment take place to evaluate the

“value” of data obtained.

C. Classification of Data Obtained by ShoBEVODSDT

Although the data obtained by ShoBEVODSDT is

automatically categorized by service and country to which

the specific IP address belongs to, data gathered from open

data sources should also be analyzed and classified to ensure

analysis and comparison of the sensitivity of data and the risk

they may pose to the organization. That is why we have

designed a very simple classification, where IP address are

divided in - (1) IP addresses to which ShoBEVODSDT has

managed to connect to, (2) IPs, to which ShoBEVODSDT

has failed to connect to. Then we refer only to the first

category and classify IP addresses according to the “value”

of information that can be obtained from these data sources.

The classification introduced is available in Table II. As in

[3], each category is assigned points from 0 to 5, depending

on the category where the higher risk, the higher the number

of points assigned to it.

TABLE II. IP ADDRESS CLASSIFICATION [3]

Category

Description

failed to connect

has managed to connect but failed to gather data

has managed to connect, but the database is empty

has managed to connect by gathering system data or non-

sensitive information

has managed to connect and gather sensitive data

compromised database

The nature of the categories is explained by the nature of

data that we have gathered through the approbation of our

tool. Therefore, “the database is empty” is derived as a

separate category that is widespread and less valuable

compared to “has managed to connect but failed to gather

data or information” but more than “has managed to connect

by gathering system data or non-sensitive information”. The

data obtained, however, may contain sensitive information or

database information. In addition, the database can be

compromised, by which we mean databases where all

records have been deleted and a report has been left on the

fact that all data have a backup copy and that the database

holders have to pay ransom (in Bitcoin) to recover the data,

while if it will not be done, fraudsters will report the breach

of the general data protection regulation (GDPR) and the

database holder will get the penalty because data were not

protected and data leakage took place.

IV. ANALYSIS AND RESULTS: SHOBEVODSDT IN USE

All the data provided in this Section have been collected

by the ShoBEVODSDT. Although the source code of the

ShoBEVODSDT is publicly available, thereby supporting

principles of open science, the data gathered are not

published since they could potentially provide information

that can be used for the attacking phase of penetration

testing. This is particularly risky when very sensitive

information is detected, but we consider our solution as a

“white hacking” tool.

In total, ShoBEVODSDT was able to process a total of

15 180 IP addresses, with the majority of IP addresses

belonging to Lithuania (7 453), followed by Estonia (5 352)

and Latvia (2 375). 98.43% of the addresses have failed to

connect. Therefor, the further actions took place with 1.57%

or 238 IP addresses only.

In terms of data source / database, the most popular

service (at least for Baltic States) is MySQL, followed by

PostgreSQL. However, the third most popular data source

varies from country to country and will be covered in the

following sections. The least popular service is Cassandra.

This trend is valid for both three countries analyzed. This

may be due to the fact that MySQL is intended as a website

database and various website deployment services offer

MySQL database for free when renting server, while

Cassandra is meant to store Big data with multiple servers. In

Figure 1, statistics on services are available, where the

percentage of connection status is displayed by the analyzed

data source. Let us now turn to an overview of the results by

country.

A. Latvia

ShoBEVODSDT has managed to find 2 375 IP addresses,

where 2 325 were protected, thus ShoBEVODSDT failed to

connect to them. However, 2.11% were open, which is

significantly higher than an average.

MySQL

PostgreSQL

MongoDB

Redis

Memcached

ElasticSearch

CouchDB

Cassandra

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

13452

1184

153

110

Distribution of IP addresses by successful connection to them

(by service)

connection failed (bad)

successful connection (good)

Fig. 1. Distribution of network hosts with IP addresses attempted to

connect (by service).

For the total number of IP addresses found, this result is

the lowest (50 IPs) but cannot be extrapolated to subsequent

results, i.e. the number of unprotected and vulnerable data

sources. For services in use, MySQL and PostgreSQL are the

most popular, followed by Memcached, MongoDB and

Redis.

Fig. 2 shows a general distribution of successful

connections that ShoBEVODSDT has managed to make,

where Memcached was the most popular database among

those to which a connection has been successful, followed by

ElasticSearch and MySQL. However, in the scope of the total

number of databases found, the total number of Cassandra

and ElasticSearch databases to which ShoBEVODSDT has

managed to connect is the highest – all databases found were

open, followed by Memacached – 82.5% of Memacached

databases, and Redis, i.e. 6.25%. There are also two services

to which ShoBEVODSDT has not managed to connect,

namely PostgreSQL and CouchDB.

For the “value” of data gathered by ShoBEVODSDT,

Fig. 3 shows that Memchached can be characterized by the

highest number of data collection cases when data gathered

can be classified as system data (3 points) and even sensitive

data (4 points). ShoBEVODSDT has identified 4 databases

that have already been compromised. Although Memcached

is characterized by the highest number of vulnerable

databases, this type of vulnerability has not been identified

for any Memcached database. However, the highest number

of compromised databases was found for ElasticSearch - 2

databases with 1 more MySQL and MongoDB. The most

common case for Latvian databases found is

ShoBEVODSDT ability to connect by gathering system data

or non-sensitive information (3 points).

66%

20%

Latvia: distribution of successful connections by

service

MySQL

PostgreSQL

MongoDB

Redis

Memcached

ElasticSearch

CouchDB

Cassandra

Fig. 2. Distribution of successful connections by service for Latvia.

MySQL

PostgreSQL

MongoDB

Redis

Memcached

ElasticSearch

CouchDB

Cassandra

Latvia: clasification of IP addres ses by service and gathered dat a "value"

(from 1 to 5 points)

1 - has managed to

connect but failed to

gather data or informa-

tion

2 - has managed to

connect, but the DB is

empty

3 - has managed to

connect by gathering

system data or non-sens i-

tive information

4 - has managed to

connect and gather sensi-

tive data

5 - compromised data-

base

data source

number of data sources

Fig. 3. Classification of IPs by service and gathered data “value” (Latvia).

B. Estonia

For Estonia, ShoBEVODSDT has managed to find 5 352

IP addresses, where 5 307 were protected, thus

ShoBEVODSDT failed to connect to them. 45 IP addresses

(0.84%) were open and further action took place with them.

Although the number of IP addresses found by

ShoBEVODSDT is more than twice as high as in the case of

Latvia, the ratio calculated as the number of IP addresses to

which it is possible to connect from outside to the total

number of detected IP addresses, is almost three times

higher.

For services in use, MySQL and PostgreSQL are the most

popular, followed by ElasticSearch, MongoDB and Redis.

Although MongoDB and Redis were in the list of top popular

services for Latvia, ElasticSearch is something new here,

while Memcached, which holds third place in Latvia, is

significantly less popular.

Fig. 4 shows a general distribution of successful

connections that ShoBEVODSDT has managed to make,

where ElasticSearch was the most popular database among

those to which a connection has been successful, followed by

MySQL and Memcached. However, in the scope of the total

number of databases found, the total number of ElasticSearch

databases to which ShoBEVODSDT has managed to connect

is the highest – all databases found were open, followed by

Memcached – 87.5%, and MongoDB - 15.8%.

As regards the “value” of data gathered by

ShoBEVODSDT, Fig. 5 shows that in the case of Estonia

MySQL followed by ElasticSearch can be characterized by

the highest number of both compromised databases and cases

when data gathered can be classified as system data (3

points). As regards sensitive data (4 points), ElasticSearch,

followed by Memcahced and Redis are leaders in this

negative trend. Moreover, 8 databases have been classified as

compromised with 4 ElasticSearch databases, 2 MongoDB

and 1 PostgreSQL and Memcached.

22%

18%

47%

Estonia: distribution of successful

connections by service

MySQL

PostgreSQL

MongoDB

Redis

Memcached

ElasticSearch

CouchDB

Cassandra

Fig. 4. Distribution of successful connections by service for Estonia.

MySQL

PostgreSQL

MongoDB

Redis

Memcached

ElasticSearch

CouchDB

Cassandra

Estonia: clasification of IP addresses by service and gathered data "value"

(from 1 to 5 points)

1 - has managed to connect but failed

to gather data or information

2 - has managed to connect, but the

DB is empty

3 - has managed to connect by gathe-

ring system data or non-sensitive in-

formation

4 - has managed to connect and

gather sensitive data

5 - compromised database

Fig. 5. Classification of IPs by service and gathered data “value”

(Estonia).

Here we can see that although the total number of

compromised databases is not very high, ElasticSearch has

demonstrated the highest number of compromised databases

same as it was in the case of Latvia.

For the most common case for Estonian databases found

is the same as it was for Latvia - ability to connect by

gathering system data or non-sensitive information (3

points).

C. Lithuania

In case of Lithuania, ShoBEVODSDT has managed to

find 7 453 IP addresses, where 7 310 were protected. Thus,

further actions took place with 143 IP addresses (1.92%)

which were open. It is even more than a half of the total IP

addresses we process further. Although the number of IP

address to which we have managed to connect is the highest,

the ratio is lower compared to Latvian case but still more

than twice higher than in the case of Estonia.

For services in use, MySQL and PostgreSQL are the most

popular, followed by ElasticSearch, MongoDB and Redis.

An interesting point here is that Lithuania combines the

results of both abovementioned countries and their most

popular services.

Fig. 6 shows a general distribution of successful

connections that ShoBEVODSDT has managed to make,

where ElasticSearch was the most popular database among

those to which a connection has been successful, followed by

Memcached and MongoDB. However, in the scope of the

total number of databases found, the total number of

ElasticSearch to which ShoBEVODSDT has managed to

connect is the highest – all databases were open, followed by

Memcached – 77.6%, and MongoDB, i.e. 14.4%.

As regards the “value” of data gathered, Fig. 7 shows that

in the case of Lithuania ElasticSearch followed by

Memacached can be characterized by the highest number of

data gatherings.

14%

36%

38%

Lithuania: distribution of successful

connections by service

MySQL

PostgreSQL

MongoDB

Redis

Memcached

ElasticSearch

CouchDB

Cassandra

Fig. 6. Distribution of successful connections by service for Lithuania.

MySQL

PostgreSQL

MongoDB

Redis

Memcached

ElasticSearch

CouchDB

Cassandra

Lithuani a: clasification of IP addr esses by service and gathered data "value"

(from 1 to 5 points)

1 - has managed to connect but failed

to gather data or information

2 - has managed to connect, but the

DB is empty

3 - has managed to connect by gathe-

ring system data or non-sensitive in-

formation

4 - has managed to connect and

gather sensitive data

5 - compromised database

Fig. 7. Classification of IPs by service and gathered data “value”

(Lithuania).

Here we also observe the highest number of

compromised databases (5 points), which mainly belong to

ElasticSearch and MongoDB (17 databases per database

type) with another one Memcached database. This means that

¼ of all open databases have been compromised.

The most common case is the same as for the countries

mentioned above – the ability to connect by gathering system

data or non-sensitive information (3 points), but in this case it

is not such an expressive leader, followed by compromised

databases (5 points) and those storing non-sensitive or

system data (2 points) with a little less databases, where

sensitive data have been gathered (3 points).

A. Summary of Results in the Country-by-country Context

For the databases to which ShoBEVODSDT has been

able to connect, it has been found that the highest ratio of the

compromised databases belongs to Lithuania, where a total

of 24.5% of all databases were compromised. It is surprising,

but it is followed by Estonia with 17.8% compromised

databases, while for Latvia only 8% of all databases to which

we have connected to, were compromised.

However, this trend does not applies to the ratio of cases

where sensitive data have been gathered, as the most

negative trend is shown by Latvia (20%), followed by

Lithuania (18.9%), with the best results for Estonia (13.3%).

As regards the gathering system and non-sensitive data,

Estonia demonstrates the most negative trend, where 46.7%

of all databases fall into this category (3 points), followed by

Latvia (44%) and 35% for Lithuania.

Overall, the “value” of the gathered data for the three

countries is 3.22, i.e. closer to the critical level, where the

worst results are demonstrated by Lithuania with 3.45 of 5

points, followed by Estonia with 3.18 and Latvia with 3.02

points. A summary of the analysis is provided in Table III,

where both data on the total IP addresses found, the

successful connections and the distribution of gathered data

“value” are provided by country (the most negative trends are

highlighted in red).

TABLE III. SUMMARY OF RESULTS BY COUNTRY

Latvia

Estonia

Lithuania

Total found

2375

5352

7453

Connection successful

50 (2.1%)

43 (0.8%)

143 (1.9%)

Compromised DB (5

points)

4 (8%)

8 (18.6%)

(24.5%)

sensitive data (4 points)

22 (40%)

21 (48.8%)

27 (18.9%)

System or non-sensitive

data (3 points)

22 (44%)

21 (48.8%)

50 (35%)

DB is empty (2 points)

11 (22%)

7 (16.3%)

29 (20.3%)

Failed to gather data (1

point)

3 (6%)

3 (7%)

3 (2.1%)

AVG data “value”

3.02

3.18

3.45

MySQL

PostgreSQL

MongoDB

Redis

Memcached

ElasticSearch

CouchDB

Cassandra

0,00% 20,00% 40,00% 60,00% 80,00% 100,00%

Sensitivity of gathered data by service (1 to 5 points)

1 - has managed to connect but fai-

led to gather data or information

2 - has managed to connect, but

the DB is empty

3 - has managed to connect by

gathering system data or non-

sensitive information

4 - has managed to connect and

gather sensitive data

5 - compromised database

Fig. 8. Sensitivity of gathered data by data source.

B. Results in the Context of Data Source

We have already found that Memcached and

ElasticSearch were the leading data sources to which

ShoBEVODSDT has managed to connect.

Let us turn now to a brief summary of the sensitivity of

the data, because the knowledge on whether it is possible to

connect to the database outside the organization is not

enough to make conclusions on their security.

Fig. 8 provides statistics on the “value” of data we have

gathered without their division by the service classified

according to Table II, where the most popular category is

“has managed to connect by gathering system data or non-

sensitive information” (45%), followed by “has managed to

connect, but the database is empty” (21%). This could be

considered as a positive result, i.e. while these data sources

are visible for external actors, they are not of very high value

to attackers, although they can facilitate attacks. However,

18% of these data sources contain data that could be used by

attackers, and 12% of them have already been compromised

[3].

For compromised databases, it is known that in 2020,

fraudsters attacked more than 22 000 MongoDB databases

[16], however, our experiment shows that MongoDB is not

the only database, which was compromised. The most

compromised databases, where the number of compromised

databases is related to the total number of databases of a

particular type to which ShoBEVODSDT has managed to

connect to, belong to Elasticsearch (27% of all

Elasticsearch), followed by MongoDB (11% of all

MongoDB) and PostgreSQL (0.08% of all PostgreSQL) and

Memcached (2% of all Memcached databases are

compromised).

For databases from which sensitive data have been

gathered, the leader of this negative trend is Redis, where for

83.3% of open databases to which ShoBEVODSDT has

managed to connect, it was possible to gain sensitive data,

which can be used for exploiting attack to it. MySQL and

Memcahced are also in the list of leaders in this respect.

To sum up:

 PostgreSQL is mainly characterized by

compromised databases and databases from which

non-sensitive data or system data can be gathered;

 MongoDB is characterized by a high number of

cases where databases are compromised (83.3%),

followed by data sources from which sensitive data

can be gathered (4.2%) and some data sources from

which system and non-sensitive data can be

gathered. This finding is also in line with [1, 5];

 Cassandra in this case can be characterized as a

data source to which we have managed to connect,

but the database was empty;

 Redis can be characterized as a data source from

which sensitive data may be gathered. In some

cases, the relevant databases are empty;

 Memcached can be characterized as a data source

where system data and non-sensitive data were

gathered most frequently (61.3%), followed by

sensitive data gatherings (22.6%), empty databases

(12.9%) and 2.1% compromised databases;

 MySQL is characterized by prevailing number of

databases from which non-sensitive or system data

can be gathered (52.6%) with 21.1% databases

from which sensitive data can be gathered and

5.3% compromised databases. However, MySQL

has also proved to be a database, where, although

ShoBEVODSDT has managed to connect to

database, data gathering has failed;

 ElasticSearch databases represent all categories

where the largest number of databases is empty,

followed by a large number of databases that are

already compromised (26.7%), which is also in line

with [6]. In addition, 24.4% of ElasticSearch

databases contain non-sensitive or system data

(24.4%) with 8.1% databases with sensitive data.

However, it remains one of two databases types

where, although ShoBEVODSDT was able to

connect to it, data gathering was unsuccessful.

A summary by service (excluding CouchDB, where

ShoBEVODSDT has found no vulnerabilities) is provided in

Table IV, which provides both data on the total IP addresses

found, the successful connections and the distribution of

gathered data “value” for categories “5”, “4” and “1” (most

and least vulnerable), highlighting the most negative trends

in red.

Overall, the average “value” of the gathered data for eight

services under question is 2.83, where the worst results are

demonstrated by MongoDB with 4.5 of 5 points, followed by

PostgreSQL with 3.7 and ElasticSearch and Memcached with

3.17 and 3.16 points, respectively.

TABLE IV. SUMMARY OF RESULTS BY SERVICE

MySQL

PostgreSQL

MongoDB

Redis

Memcached

ElasticSearch

Cassandra

Total

found

1347

1187

177

122

116

Connectio

successful

0.14

0.3%

7.9%

9.8%

80%

100

14%

Compro

mised

DB (5

points)

5.3%

33%

71%

2.2%

27%

sensitive

data (4

points)

7.1%

83%

24%

Failed to

gather

data (1

point)

21%

17%

3.5%

AVG

data

“value”

2.7

3.67

4.5

3.5

3.15

3.17

These results can be explained not only by the database

holder’s awareness of their data security, but also by the

relevant data sources default security mechanisms. In other

words, data sources with weaker mechanisms are more likely

to be vulnerable. Our examination of data sources under

question lead us to conclusion that Redis, Memcached have

no authentication mechanisms, and MongoDB and

ElasticSearch allow to enable them but do not have them

enabled by default. However MySQL, CouchDB and

Cassandra require authentication data and show better results

when ShoBEVODSDT is used.

This observation makes it possible to state that, in many

cases, even such a primitive and simple approach as proper

authentication mechanisms lead to a significant reduction in

the risk of data leakage and intrusion.

V. DISCUSSION AND LIMITATIONS

First, in this study we use our self-developed tool

ShoBEVODSDT [3], which utilizes a passive assessment

that is characterized by its low level of intrusiveness [17], the

respective data sources are not thoroughly tested to see if the

vulnerabilities identified in the systems actually exist rather

pointing on such possibility.

Secondly, the number of services inspected is limited,

which does not allow us to state with a high degree of

confidence that a particular service is highly vulnerable,

while the other one is totally secure because the number of

databases is not balanced. Thus, although a number indicates

that there is no vulnerabilities among open CouchDB

databases, it cannot be generalized because ShoBEVODSDT

has found only 14 IP addresses, although for other databases

this number exceeds 1 000. Thus, in order to draw more

generalizable conclusions on services, the sample should be

balanced. However, this was not the main aim of this study,

mainly by examining the state of the art in respect of three

countries.

In addition, in the future we also plan to perform a

comparison of the results obtained with CVE Details aimed

at verifying whether there is a relationship between the

registered “Gain Information” vulnerabilities and the data

that we have managed to collect. Similar approach was

applied by Genge et al. [15] and we suppose it will allow

obtaining more generalizable results on the services under

question.

VI. CONCLUSIONS

More and more studies highlight the risks posed by IoT

devices and stress the need for actions to ensure the security

of IoT ecosystem at a wide range of levels [1-2, 18]. In this

paper, we have applied the IoTSE-based tool

ShoBEVODSDT we have presented in our previous study [3]

to inspect the state of play of three countries in the Baltic

region, namely, Latvia, Estonia and Lithuania, with regard to

unprotected open databases accessible outside the

organization and the „value” of the data that can be gathered

from them, in the case of successful connection. We have

inspected eight data sources on their vulnerabilities and their

extent. We conclude that although the total number of open

databases accessible outside the organization is less than 2%

of the data sources scanned, there are data sources that may

pose risks to organizations. Even more, for 12% of open data

sources this has already taken place.

We conclude that the weakest results are demonstrated by

Lithuania with 3.45 of 5 points, followed by Estonia with

3.18 and Latvia with 3.02 points. For the services under

question, the worst results are demonstrated by MongoDB,

followed by PostgreSQL, ElasticSearch and Memcached.

We argue that the ShoBEVODSDT can be useful for (1)

individual organizations to determine whether their data

source data are visible and even accessible outside the

organization, (2) testers to effectively map the potential

attack surface and advance targeted vulnerability

assessments, with their further inspection and development

of preventive activities and security mechanisms, (3)

scientists and developers to carry out a comprehensive

multidimensional and longitudinal analysis of uprotected

data sources, (4) countries and their governments, defining

guidelines and laws according to state of the art on a country

level that would promote technological development and

better protection.

REFERENCES

[1] M. Bada, I. Pete, “An exploration of the cybercrime ecosystem

around Shodan,” In 2020 7th International Conference on Internet of

Things: Systems, Management and Security (IOTSMS) (pp. 1-8).

IEEE, December, 2020.

[2] M. Al-Ruithe, S. Mthunzi, E. Benkhelifa, “Data governance for

security in IoT & cloud converged environments,” In 2016

IEEE/ACS 13th International Conference of Computer Systems and

Applications (AICCSA) (pp. 1-8). IEEE, November, 2016.

[3] A. Daskevics, A. Nikiforova, „ShoBeVODSDT: Shodan and Binary

Edge based vulnerable open data sources detection tool or what

Internet of Things Search Engines know about you,” 2021 (in print)

[4] E.Sahafizadeh, M. A. Nematbakhsh, “A survey on security issues in

Big Data and NoSQL,” Advances in Computer Science: an

International Journal, 4(4), 68-72, 2015.

[5] J. Davis, “Telemedicine vendor breaches the data of 2.4 million

patients in Mexico,” 2018. [Online]. Available:

https://www.healthcareitnews.com/news/telemedicine-vendor-

breaches-data-24-million-patients-mexico

[6] E. Bekker (2020). Identity Force, A sontiq Brand. 2020 data

breaches. The most significant breaches of the year.

[7] B. Burns, D. Killion, N. Beauchesne, E. Moret, J. Sobrier, M. Lynn,...

P. Guersch, “Security power tools,” O'Reilly Media, Inc., 2007.

[8] S. Samtani, M. Kantarcioglu, H. Chen, “Trailblazing the Artificial

Intelligence for Cybersecurity Discipline: A Multi-Disciplinary

Research Roadmap”, CM Trans. Manage. Inf. Syst. 11, 4, Article 17

December 2020, DOI:https://doi.org/10.1145/3430360.

[9] J. R. G. Evangelista, R. J. Sassi, M. Romero, D. Napolitano,

“Systematic literature review to investigate the application of open

source intelligence (osint) with artificial intelligence,” Journal of

Applied Security Research, 1-25, 2020.

[10] H. J. Williams, I. Blum, “Defining second generation open source

intelligence (OSINT) for the defense enterprise,” RAND Corporation

Santa Monica United States, 2018.

[11] A. Oganesyan, DeviceLock Inc., How Researchers Discover

MongoDB and Elasticsearch Open Databases (2019), online:

https://m.devicelock.com/blog/how-researchers-discover-mongodb-

and-elasticsearch-open-databases.html

[12] O. Valin (2020) Most popular databases in 2020 and new trends,

[online] https://www.eversql.com/most-popular-databases-in-2020/,

last accessed 22.06.2021

[13] DB-engines (2021), [online] https://db-engines.com/en/ranking, last

accessed 22.06.2021

[14] P. D. C. de Sousa Rodrigues, “An OSINT Approach to Automated

Asset Discovery and Monitoring,” 2019.

[15] B.Genge, C. Enăchescu, “ShoVAT: Shodan-based vulnerability

assessment tool for Internet-facing services,” Security and

communication networks, 9(15), 2696-2714, 2016.

[16] A. Bizga, “Bad Actors Target MongoDB Databases, Threatening to

Contact GDPR Legislators Unless Ransom is Paid,” Online:

https://securityboulevard.com/2020/07/bad-actors-target-mongodb-

databases-threatening-to-contact-gdpr-legislators-unless-ransom-is-

paid/, last accessed: 11.07.2021

[17] S. Samtani, S. Yu, H. Zhu, M. Patton, H. Chen, “Identifying SCADA

vulnerabilities using passive and active vulnerability assessment

techniques,” In 2016 IEEE Conference on Intelligence and Security

Informatics (ISI) (pp. 25-30). IEEE, September, 2016.

[18] Y. Jararweh, M. Al-Ayyoub, E. Benkhelifa, M. Vouk, A. Rindos,

“SDIoT: a software defined based internet of things framework,”

Journal of Ambient Intelligence and Humanized Computing, 6(4),

453-461, 2015.

Understanding Local Government Cybersecurity Policy: A Concept Map and Framework

Article

Full-text available

Jun 2024

Cybersecurity is a crucial concern for local governments as they serve as the primary interface between public and government services, managing sensitive data and critical infrastructure. While technical safeguards are integral to cybersecurity, the role of a well-structured policy is equally important as it provides structured guidance to translate technical requirements into actionable protocols. This study reviews local governments’ cybersecurity policies to provide a comprehensive assessment of how these policies align with the National Institute of Standards and Technology’s Cybersecurity Framework 2.0, which is a widely adopted and commonly used cybersecurity assessment framework. This review offers local governments a mirror to reflect on their cybersecurity stance, identifying potential vulnerabilities and areas needing urgent attention. This study further extends the development of a cybersecurity policy framework, which local governments can use as a strategic tool. It provides valuable information on crucial cybersecurity elements that local governments must incorporate into their policies to protect confidential data and critical infrastructure.

A Comprehensive Spectrum of Open Ports: A Global Internet Wide Analysis

Conference Paper

Full-text available

Apr 2024

Rashad Aliyev

This paper explores enhancing user accessibility and experience in cybersecurity by introducing a user-friendly website interface for widely used port scanning tools (Nmap, Unicornscan, RustScan). Conducting global scans on the top 10 million hosts, the study provides valuable insights into open ports, vulnerabilities, and offers a foundation for future security strategies. The integration of this interface aims to make port scanning more accessible, bridging the gap between advanced cybersecurity tools and user-friendly interfaces, ultimately fortifying digital asset protection in an evolving threat landscape.

Implementing the Triple-Data Encryption Standard for Secure and Efficient Healthcare Data Storage in Cloud Computing Environments

Article

Apr 2024

An Operative Encryption Method with Optimized Genetical method for Assuring Information Security in Cloud Computing

Article

Full-text available

Jul 2023

An Efficient and Secure Big Data Storage in Cloud Environment by Using Triple Data Encryption Standard

Article

Full-text available

Sep 2022

In recent decades, big data analysis has become the most important research topic. Hence, big data security offers Cloud application security and monitoring to host highly sensitive data to support Cloud platforms. However, the privacy and security of big data has become an emerging issue that restricts the organization to utilize Cloud services. The existing privacy preserving approaches showed several drawbacks such as a lack of data privacy and accurate data analysis, a lack of efficiency of performance, and completely rely on third party. In order to overcome such an issue, the Triple Data Encryption Standard (TDES) methodology is proposed to provide security for big data in the Cloud environment. The proposed TDES methodology provides a relatively simpler technique by increasing the sizes of keys in Data Encryption Standard (DES) to protect against attacks and defend the privacy of data. The experimental results showed that the proposed TDES method is effective in providing security and privacy to big healthcare data in the Cloud environment. The proposed TDES methodology showed less encryption and decryption time compared to the existing Intelligent Framework for Healthcare Data Security (IFHDS) method.

A Comparative Study of MongoDB and Document-Based MySQL for Big Data Application Data Management.

Article

Jan 2024

PPSecS: Privacy-Preserving Secure Big Data Storage in a Cloud Environment

Article

May 2023

The proliferation of social networks, the Internet of Things, and economic mobility has led to an exponential increase in data. New data having high volume, high velocity, high variety, and high value are called big data. Big data present additional requirements in terms of storage and computation resources. Various enterprises aim to outsource their big data services to the cloud because of its cost efficiency, less management, resource pooling, and resilient computing. However, outsourcing the storage of sensitive data can expose them to potential security risks. Encryption presents a straightforward solution for data privacy preserving. In traditional encryption mechanisms, such as advanced encryption standard, the data owner and users must share an exact key for both data encryption and decryption. Currently, these mechanisms do not provide a scalable and secure solution for big data storage and analysis. Furthermore, they need to be more efficient to support big data velocity. Unfortunately, securing outsourced big data storage to a public cloud environment to later maintain efficient and secure processing over encrypted data by cloud servers cannot be ensured using traditional encryption mechanisms. In this paper, we propose a security approach for this issue by which honest but curious users or cloud service providers cannot reach complete information from the stored data. From the analysis, the proposed approach can provide secure cloud-assisted big data. Meanwhile, the performance evaluation shows the efficiency of the proposed approach.

NoSQL Security: Can My Data-driven Decision-making Be Influenced from Outside?

Chapter

Full-text available

Jan 2023

Nowadays, there are billions interconnected devices forming Cyber-Physical Systems (CPS), Internet of Things (IoT) and Industrial Internet of Things (IIoT) ecosystems. With an increasing number of devices and systems in use, amount and the value of data, the risks of security breaches increase. One of these risks is posed by open data sources, which are databases that are not properly protected. These poorly protected databases are accessible to external actors, which poses a serious risk to the data holder and the results of data-related activities such as analysis, forecasting, monitoring, decision-making, policy development, and the whole contemporary society. This chapter aims at examining the state of the security of open data databases representing both relational databases and NoSQL, with a particular focus on a later category.

Eagle-Eye: Open-Source Intelligence Tool for IoT Devices Detection

Conference Paper

Nov 2022

NoSQL Security: Can My Data-driven Decision-making Be Influenced from Outside?

Preprint

Full-text available

Jan 2022

Nowadays, there are billions interconnected devices forming Cyber-Physical Systems, Internet of Things (IoT) and Industrial Internet of Things (IIoT) ecosystems. With an increasing number of devices and systems in use, amount and the value of data, the risks of security breaches increase. One of these risks is posed by open data sources, by which are meant databases, which are not properly protected. These poorly protected databases are accessible to external actors, which poses a serious risk to the data holder and the results of data-related activities such as analysis, forecasting, monitoring, decision-making, policy development, and the whole contemporary society. This chapter aims at examining the state of the security of open data databases representing both relational databases and NoSQL, with a particular focus on a later category.

ShoBeVODSDT: Shodan and Binary Edge based vulnerable open data sources detection tool or what Internet of Things Search Engines know about you

Conference Paper

Full-text available

Nov 2021

The paper proposes a tool for non-intrusive testing of open data sources for detecting their vulnerabilities, called ShoBeVODSDT (Shodan-and Binary Edge-based vulnerable open data sources detection tool). The use of Open Source Intelligence (OSINT) tools, more precisely the Internet of Things Search Engines (IoTSE), allows the tool to inspect a list of predefined data sources, such as MySQL, PostgreSQL, MongoDB, Redis, Elasticsearch, CouchDB, Cassandra and Memcached, on their vulnerabilities and their extent, i.e. is the data source visible outside the organization? what data can be gathered from open data sources (if any) and what is their “value” for attacker and fraudsters? Whether these data can pose the risks to organization using them to deploy an attack? This allows both a comprehensive analysis of unprotected data sources, falling into a list of predefined data sources, or a specific IP or IP range to examine what can be seen from the outside of the organization about the data source in use. While the tool covers 8 data sources representing both rational databases, NoSQL databases and data stores, it is designed to be easily scalable, so, the list of data sources can be extended by third-parties by adapting the code made available. While some data sources can be described as complying with the security-by-design principle, some of them face serious challenges in this respect, including some databases already being compromised.

Trailblazing the Artificial Intelligence for Cybersecurity Discipline: A Multi-Disciplinary Research Roadmap

Article

Full-text available

Oct 2020

Cybersecurity has rapidly emerged has rapidly emerged as a grand societal challenge of the 21 st century. Innovative solutions to proactively tackle emerging cybersecurity challenges are essential to ensuring a safe and secure society. Artificial Intelligence (AI) has rapidly emerged as a viable approach sift through terabytes of heterogeneous cybersecurity data to execute fundamental cybersecurity tasks such as asset prioritization, control allocation, vulnerability management, and threat detection with unprecedented efficiency and effectiveness. Despite its initial promise, AI and cybersecurity have been traditionally siloed disciplines that rely on disparate knowledge and methodologies. Consequently, the AI for Cybersecurity discipline is in its nascency. In this paper, we aim to provide an important step to progress the AI for Cybersecurity discipline. We first provide an overview of prevailing cybersecurity data, summarize extant AI for cybersecurity application areas, and identify key limitations from in the prevailing landscape. Based on these key issues, we offer a multidisciplinary AI for Cybersecurity roadmap that centers on major themes such as cybersecurity applications and data, advanced AI methodologies for cybersecurity, and AI-enabled decision making. To help scholars and practitioners make significant headway in tackling these grand AI for Cybersecurity issues, we summarize promising funding mechanisms from the National Science Foundation (NSF) that can support long-term, systematic research programs. We conclude this paper with an introduction to the papers included in this special issue.

Identifying SCADA vulnerabilities using passive and active vulnerability assessment techniques

Conference Paper

Full-text available

Sep 2016

SDIoT: A Software Defined based Internet of Things framework

Article

Full-text available

Aug 2015

The internet of things (IoT) represent the current and future state of the Internet. The large number of things (objects), which are connected to the Internet, produce a huge amount of data that needs a lot of effort and processing operations to transfer it to useful information. Moreover, the organization and control of this large volume of data requires novel ideas in the design and management of the IoT network to accelerate and enhance its performance. The software defined systems is a new paradigm that appeared recently to hide all complexity in traditional system architecture by abstracting all the controls and management operations from the underling devices (things in the IoT) and setting them inside a middleware layer, a software layer. In this work, a comprehensive software defined based framework model is proposed to simplify the IoT management process and provide a vital solution for the challenges in the traditional IoT architecture to forward, store, and secure the produced data from the IoT objects by integrating the software defined network, software defined storage, and software defined security into one software defined based control model.

ShoVAT: Shodan-based vulnerability assessment tool for Internet-facing services

Article

Full-text available

Apr 2015

Shodan has been acknowledged as one of the most popular search engines available today, designed to crawl the Internet and to index discovered services. This paper expands the features exposed by Shodan with advanced vulnerability assessment capabilities embedded into a novel tool called Shodan-based vulnerability assessment tool (ShoVAT). ShoVAT takes the output of traditional Shodan queries and performs an in-depth analysis of service-specific data, that is, service banners. It embodies specially crafted algorithms which rely on novel in-memory data structures to automatically reconstruct Common Platform Enumeration names and to proficiently extract vulnerabilities from National Vulnerability Database. Compared with the state of the art, ShoVAT brings several novel and significant contributions because it encompasses automated vulnerability identification techniques, it can return highly accurate results with customized and even purposefully modified service banners, and it supports historical service vulnerability analysis without the need to deploy additional monitoring infrastructures. The experiments performed on 1501 services in 12 different institutions across different sectors revealed high accuracy of results and a total of 3922 known vulnerabilities. Copyright © 2015 John Wiley & Sons, Ltd.

An exploration of the cybercrime ecosystem around Shodan

Conference Paper

Dec 2020

Systematic Literature Review to Investigate the Application of Open Source Intelligence (OSINT) with Artificial Intelligence

Article

May 2020

Open Source Intelligence (OSINT) is a concept to describe the search, collection, analysis, and use of information from open sources, as well as the techniques and tools used. OSINT emerges out of a military need to collect relevant and publicly available information. Through the use of OSINT, it is possible to find specific information that has some knowledge or provides an advantage. Since its emergence, some studies have been done proposing and developing new ways of using OSINT in different areas. In addition to OSINT, another field of study that has also been a worldwide trend and is being used together with other areas is Artificial Intelligence (AI). AI is the area of computer science responsible for the development of intelligent systems. However, a systematic literature review that investigates the use of OSINT over the years and your application with AI was not found. So, this work has an objective to develop a systematic literature review on OSINT to investigate the application of OSINT with AI. This work was motivated to fill this research gap, for this, consolidate the publications on OSINT divided into the publication bases. As for its contribution, this work presents a systematic literature review composed of 9-step and also brings consolidated information to support the next OSINT studies. This research searched for publications between January 1990 and October 2019, finding a total of 244 publications. The 9-steps of the systematic literature review are Definition of Keywords, Query string definition, the definition of publication bases, the search on the publications bases, the base search results analysis, download of publications, importing the publications into Mendeley, Importing. Ris file into VOSviewer and Keyword Map Analysis. Analyzing the results, we find some relevant information about the publications that address OSINT and OSINT with AI or other areas. With this information, it was possible to understand where the largest concentration of publications, which countries and continents develop the most research and the characteristics of these publications. What are the trends for the next studies on OSINT with AI. Which AI subareas are used with OSINT. What are the most used keywords, and how do these keywords relate to others over the years. Which publication bases have the highest concentration of publications and what are the types of these publications? Also, a timeline describing the application of OSINT. It also became evident how OSINT has been used with AI to solve problems in different areas with different objectives. Based on these results, it is concluded that the application of a systematic literature review can show the application of OSINT with AI.

Defining Second Generation Open Source Intelligence (OSINT) for the Defense Enterprise

Book

Jan 2018

Data governance for security in IoT & cloud converged environments

Conference Paper

Nov 2016

Security power tools

Article

Aug 2007

IoTSE-based open database vulnerability inspection in three Baltic countries: ShoBEVODSDT sees you

Abstract

Recommended publications

ShoBeVODSDT: Shodan and Binary Edge based vulnerable open data sources detection tool or what Intern...

NoSQL Security: Can My Data-driven Decision-making Be Influenced from Outside?

NoSQL Security: Can My Data-driven Decision-making Be Influenced from Outside?

Data security as a top priority in the digital world: preserve data value by being proactive and thi...