Conference PaperPDF Available

IoTSE-based open database vulnerability inspection in three Baltic countries: ShoBEVODSDT sees you

Authors:

Abstract

This study aims to analyze the state of the security of open data databases, i.e. being accessible from the outside of organization, representing both relational databases and NoSQL of three Baltic countries-Latvia, Lithuania, Estonia. This is done by using previously proposed tool for non-intrusive detection of vulnerable data sources called ShoBEVODSDT (Shodan- and Binary Edge-based vulnerable open data sources detection tool). ShoBEVODSDT is based on the use of Internet of Things Search Engines (IoTSE). It is found to be suitable for this study since it conducts the passive assessment, which means that its use does not harm the databases but rather checks for potentially existing bottlenecks or weaknesses which, if the attack would take place, could be exposed. It allows for both comprehensive analysis for all unprotected data sources falling into the list of predefined data sources-MySQL, PostgreSQL, MongoDB, Redis, Elasticsearch, CouchDB, Cassandra and Memcached, or to define IP range to examine what can be seen from the outside of the organization about the data source. Although some data sources can be described as following the security-by-design principle, some of them face serious challenges in this respect. The study carries out cross-country comparative study on 8 data sources. We inspect both, (1) the most vulnerable data sources and (2) countries characterized by the highest number of open data sources and the highest degree of "value" of data being available to external actors.
This paper has been accepted for publishing in Proceedings of 8th International Conference on Internet of Things: Systems, Management
and Security (IOTSMS2021). The final authenticated version is available online at https://doi.org/10.1109/IOTSMS53705.2021.9704952
Please, cite this paper as:
Daskevics A. and Nikiforova A. "IoTSE-based open database vulnerability inspection in three Baltic countries:
ShoBEVODSDT sees you," 2021 8th International Conference on Internet of Things: Systems, Management and
Security (IOTSMS), 2021, pp. 1-8, doi: 10.1109/IOTSMS53705.2021.9704952.
XXX-X-XXXX-XXXX-X/XX/$XX.00 ©20XX IEEE
IoTSE-based open database vulnerability inspection
in three Baltic countries: ShoBEVODSDT sees you
Artjoms Daskevics
Faculty of Computing
University of Latvia
Riga, Latvia
artjoms.daskevics@gmail.com
Anastasija Nikiforova
Faculty of Computing, Innovation laboratory
University of Latvia
Riga, Latvia
anastasija.nikiforova@lu.lv, ORCID: 0000-0002-0532-3488
AbstractThis study aims to analyze the state of the
security of open data databases, i.e. being accessible from the
outside of organization, representing both relational databases
and NoSQL of three Baltic countries - Latvia, Lithuania,
Estonia. This is done by using previously proposed tool for non-
intrusive detection of vulnerable data sources called
ShoBEVODSDT (Shodan- and Binary Edge- based vulnerable
open data sources detection tool). ShoBEVODSDT is based on
the use of Internet of Things Search Engines (IoTSE). It is
found to be suitable for this study since it conducts the passive
assessment, which means that its use does not harm the
databases but rather checks for potentially existing bottlenecks
or weaknesses which, if the attack would take place, could be
exposed. It allows for both comprehensive analysis for all
unprotected data sources falling into the list of predefined data
sources - MySQL, PostgreSQL, MongoDB, Redis,
Elasticsearch, CouchDB, Cassandra and Memcached, or to
define IP range to examine what can be seen from the outside
of the organization about the data source. Although some data
sources can be described as following the security-by-design
principle, some of them face serious challenges in this respect.
The study carries out cross-country comparative study on 8
data sources. We inspect both, (1) the most vulnerable data
sources and (2) countries characterized by the highest number
of open data sources and the highest degree of “value” of data
being available to external actors.
Keywords Internet of Things Search Engine (IoTSE),
Shodan, BinaryEdge, Internet of Things (IoT), database, NoSQL,
vulnerability
I. INTRODUCTION
Nowadays, there are billions interconnected devices
forming an Internet of Things (IoT) ecosystem. With an
increasing number of devices and systems in use, the risk of
security breaches increases as well [1-2]. One of these risks
is posed by open data sources, i.e. open databases by which
are not meant databases which are deliberately open for
others but databases which are not properly protected,
therefore they are available and accessible to external actors
outside the organization. Although it may sound surprisingly,
but the number of such databases is enormous. In many cases
this is caused by misconfiguration, where the responsibility
falls to database holders, in other cases there are
vulnerabilities in the products and services, where apart of
proper configuration additional security mechanisms are
needed. But how to find out whether the database is visible
and even accessible outside the organization? What
information (if any) may be gathered from it? Whether
stronger security mechanisms are needed? Is the
vulnerability rather related to internal configuration or the
database in use?
Although some questions may be partly answered by
referring to Common Vulnerability and Exposure (CVE)
Details and other data sources summarizing vulnerabilities
and patches on different services, this information may be
too general. Therefore, testing and more precisely penetration
testing could be the answer to allow to get an insight on the
current state for specific artifact, i.e. specific system or set of
systems, or region. In our previous study [3] we have
presented a tool for non-intrusive detection of vulnerable data
sources called ShoBEVODSDT (Shodan- and Binary Edge-
based vulnerable open data sources detection tool). This
time, we have applied ShoBEVODSDT tool to three
countries of Baltic region, namely Latvia, Lithuania and
Estonia, to carry out an extensive investigation on the current
state of data sources and their security in a country context.
The aim of this study is threefold: (1) to validate the tool
in real-life circumstances, thus patrolling the previous study,
(2) to draw conclusions on similarities or differences in three
Baltic country - Latvia, Lithuania and Estonia - patterns, i.e.
whether the technological development of Estonia will be
also seen in this matter, (3) to draw more objective
conclusions on the relationships between more vulnerable
open data sources in respect of specific data source, i.e.
allowing the detection of less protected by designdata
sources.
Thus, the following research questions (RQ) are posed:
(RQ1.1) What data source is the most likely to be open
database among eight analyzed?
(RQ1.2) What data source is the most likely to be
vulnerable?
(RQ2.1) Which country has the most open data sources?
(RQ2.2) What country has the most vulnerable open data
sources?
The paper is structured as follows: a background and
related studies (Section 2), methodology (Section 3), results
of analysis (Section 4), discussion and limitations (Section 5)
and conclusions (Section 6).
II. BACKGROUND
Today, security and database security in particular are
topical for at least a few reasons. First, databases are part of
each system that have only become more popular with the
involvement of the Internet of Things and integration of this
concept in our daily lives. Secondly, the popularity of
NoSQL and their relatively weak security has significantly
increased the popularity of this topic. The main security
concern is that the most NoSQL databases having a list of
benefits and advantages for users are less likely to provide
security measures, including sometimes very primitive and
simple measures such as authentication, authorization [1, 4].
This also applies to data encryption. Perhaps the most
provocative database in this respect is MongoDB, where in
2018 there were 54 000 databases accessible on the Internet,
which resulted in data leakage of 2.4 million patients of
telemedicine vendor [5]. While there have been
improvements in this respect in recent years, it remains a
problem. However, while the vulnerability of NoSQL
databases is widely debated, this does not mean that SQL
databases are secure and their holders do not risk their data
leaking.
According to a list drawn up by Bekker [5] and Identity
Force on major security breaches in 2020, a large number of
data leaks occur due to unsecured databases. As an example:
Estee Lauder 440 million customer records;
Whisper 900 million user records;
Key Ring digital wallet 14 million users records;
Prestige Software hotel reservation platform over
10 million hotel guests, including services such as
Expedia, Hotels.com, Booking.com, Agoda etc.;
Paay card payments database 2.5 million card
transactions;
Slcikwraps 850 000 customers records;
Unnamed U.K-based Security Firm has managed to
gain data belonging to Adobe, Twitter, Tumbler,
LinkedIn etc. and users with a total of over 5 billion
records;
Marijuana Dispensaries 85 000 medical marijuana
patient and recreational user records, etc.
This Section will briefly cover existing studies on this
topic, which are typically divided in (1) registries allowing
identifying the level of security or vulnerability of the
service, more precisely database, in use and (2) approaches to
test the current state of the service used in a particular
system.
For registries to be used to identify the weakest areas of
the service, the Common Vulnerability and Exposure (CVE)
Details (https://www.cvedetails.com/) is probably the most
popular index used for a variety of services. CVE Details
collects and provides to every stakeholder an index of
registered vulnerabilities of various services, including
databases, dividing vulnerabilities in 13 categories: Denial of
Service, Code Execution, Overflow, Memory Corruption,
SQL injection, XSS, Directory Traversal, HTTP response
splitting, Bypass something, Gain information, Gain
Privileges, CSRF and File Inclusion, where “Gain
Information” category is close to the aspect we inspect.
Another registry is VulDB, i.e. the vulnerability database,
documenting and explaining security vulnerabilities, threats,
and exploits for more than 50 years. As CVE Details, it
provides data not only on databases. However, the number of
databases covered by it is limited and databases, such as
Memcahced, ElasticSearch, characterized by a high number
of vulnerabilities and leaks in recent years (also in line with
[5]), and some other databases covered by our study are not
presented. This registry can therefore be used as a
complimentary source, but in many cases it will not be
applicable.
Not least popular is also NVD National Vulnerability
Database - the U.S. government repository that includes
databases of security checklist references, security-related
software flaws, misconfigurations, product names, and
impact metrics.
Although these sources are indisputably valuable, they
are rather static and general, i.e. provide general information
on the vulnerabilities of databases used by specific
organizations / systems. However, it is clear that this alone is
not enough. It does not allow to gain insight into what can be
seen outside the organization. The approaches to test the
current state of the service in use can help in it.
According to Bada et al. [1], testing tools such as security
or vulnerability scanners, presenting the threats and risks
found is an essential part of the vulnerability assessment
process. They typically allow to define, identify, and classify
the security holes. According to CERN [7], vulnerability
scanners are divided by the type of tests executed in intrusive
and non-intrusive tests. An intrusive test tries to expose the
vulnerability, which can crash the remote target. A non-
intrusive test attempts not to cause any harm to the target
system. They usually check the remote service version, or
whether the service is configured insecurely. This concept is
close to the central object of the study identification of
open / unprotected databases. Intrusive tests are indisputably
more accurate, but they cannot be carried out legally in a
production environment. Although a nonintrusive test cannot
determine for sure if a service is vulnerable, it points on the
possibility that it is vulnerable, which is definitely important
and valuable.
Here the concepts of Open Source Intelligence (OSINT)
and Internet of Things Search Engines (IoTSE) come, which
search for and index publicly available and accessible IoT
devices, thereby allowing to understand how publicly
available and accessible are specific devices [8]. OSINT is
defined as a concept describing the search, collection,
analysis and use of information from open sources, and the
methods and tools used [9]. In more general terms, Williams
et al. describe the activities of OSINT in four stages: (1)
collection, (2) processing, (3) exploitation, and (4)
production [10], where processing and exploitation can take
place not entirely sequentially but rather in parallel. They
describe these stages as acquiring or obtaining information,
validating this information, determining its value of and
providing corresponding results to customers.
ShoBEVODSDT presented in [3] and used in this study
follows this paradigm.
The popularity of both concepts is increasing in a variety
of areas, including the detection of open databases and the
assessment of their vulnerabilities and leaks [1, 11], which
can be carried out at different levels, i.e. (1) at system level
for only one organization or (2) more comprehensive, when
overall insight on the state of the art can be gained not
limited to particular organization.
While there are some IoTSE-based tools such as
LeakLooker, LeakLooker X and Lampyre that allow to
automate information gathering, our previous study [3] found
that they have a list of limitations. For instance, the limited
list of databases, inability to perform a country-based
analysis, a format of results that is difficult to process etc. So
we have proposed our own IoTSE-based tool called
ShoBEVODSDT or Shodan- and Binary Edge- based
vulnerable open data sources detection tool. The next section,
the primary aim of which is to present a methodology of the
study, will cover it.
III. METHODOLOGY
The section sets out the main concepts related to the
study methodology, covering the data sources to be analyzed,
the main features of ShoBEVODSDT and introducing a
classification of the results obtained.
A. Data Sources
This study is closely related to our previous study [3],
when we presented ShoBEVODSDT. This close link with
this study means that the number and nature of data sources
we cover is the same. Thus, given that we have designed
ShoBEVODSDT to search for eight predefined open data
sources - MySQL, PostgreSQL, MongoDB, Redis,
Elasticsearch, CouchDB, Cassandra and Memcached, let us
briefly cover them here.
First, these data sources represent both (1) relational SQL
databases and (2) NoSQL databases. To be more precise,
three types of sources (1) relational databases, (2) NoSQL
databases and (3) data stores, are covered to ensure a broader
view on the state of the play (for more detailed classification
of these data sources by their type, see Table I). This list is
based on three factors: (1) the most popular databases, where
the list is created based on the results of a survey of
developers conducted in the mid of 2020 [12], (2) the
different types of data storages, where apart of relational
databases, NoSQL databases were selected to represent both
types, document-oriented, column-oriented and key-value
databases, (3) our own experience when working with these
data storages, which is important because the specificities of
many of them affect the entire testing process, so at least
basic knowledge and skills working with them benefit.
Although the list of the most popular databases [12]
contradict other statistics, such as [13], where Oracle and MS
SQL are dominant, the list we have used overlaps with it
significantly, and even more, it came from developers.
However, given this limitation, i.e. not all the most popular
databases have been covered, we pose it as a future work,
since ShoBEVODSDT is scalable with a source code
available (https://github.com/zhmyh/Open-Databases), i.e.
the list of data sources to be analyzed may be extended. It
would allow everyone, if necessary, to extend the scope of
the developed tool to their needs.
B. ShoBEVODSDT or Shodan- and Binary Edge- based
Vulnerable Open Data Sources Detection Tool
ShoBEVODSDT has already been presented in [3], thus,
we will not cover it in very details. Instead, we will briefly
cover its main actions and output to be further analyzed for
the purpose of this study.
ShoBEVODSDT supports the detection of vulnerabilities
at early non-intrusive security assessment phases, which
makes it possible to apply it to both, own system or the
whole ecosystem of specific country or region. In this paper
we refer to the second case and apply it to the Baltic region
represented by three countries Latvia, Estonia and
Lithuania.
While many studies use only one IoT search engine, such
as Shodan, which is considered a de facto OSINT tool [14-
15], ShoBEVODSDT is based on two of them Binary Edge
and Shodan. It should contribute to the correctness and
completeness of the results and effectively determine their
potential attack surface and contribute to a targeted
assessment of vulnerability.
TABLE I. DATA SOURCES AND THEIR MODELS [BASED ON [13]]
Database
Primary database
model
Secondary database model
MySql
Relational DBMS
document store, spatial DBMS
PostgreSql
Relational DBMS
document store, spatial DBMS
MongoDB
Document store
spatial DBMS, search engine
Redis
Key-value store
document store, graph DBMS,
spatial DBMS, search engine,
time Series DBMS
Elasticsearch
Search engine
document store, Spatial DBMS
CouchDB
Document store
Spatial DBMS
Cassandra
Wide-column store
-
Memcached
Key-value store
-
Compared to individual IoTSE, ShoBEVODSDT extends
the list of features provided by Binary Edge and Shodan, and
allow for more categorized analysis of data obtained. The
later aspect is essential to our study, since we intend to cover
three countries and carry out their comparative analysis.
ShoBEVODSDT operation can be characterized as
follows:
ShoBEVODSDT searches for IP addresses of open
data sources that belong to an appropriate user-
defined country using possible filters from Shodan
and BinaryEdge. These results are combined by
eliminating duplicates (if any) and saving results in
parsed/<service_name_> _<country>. txt;
when an open data source is found, ShoBEVODSDT
gathers available data from relevant IP addresses, and
verifies whether it is possible to retrieve data from a
system that may be possible due to a weak level of
security. ShoBEVODSDT checks found IP addresses
by classifying them by:
(a) service, i.e. MySQL, PostgreSQL,
MongoDB, Redis, ElasticSearch,
CouchDB, Cassandra and Memcached, and
(b) country, i.e. Latvia, Lithuania and
Estonia.
By classification we understand the sorting of
results by matching folders that should be created
when finding an appropriate item for that
classification. This is done by “check” class method
associated with the service. If the connection to the
database has been successful, the IP address is
stored in „good/<service_name>_<country>.txt”,
otherwise, the IP address and error information are
stored in „bad/<service_name> _ <country>.txt”. In
total, up to 48 folders can be created in the case if IP
associated with all eight services belonging to both
three countries have been found;
ShoBEVODSDT retrieves data from data sources to
which it has managed to connect. This is done by
searching for files in a “parsed/good” that
corresponds to the service and country to be checked,
where the process of downloading database content
differs from one another. It is predefined in
ShoBEVODSDT for abovementioned eight services.
However, in the case if another service should be
added, a source code of ShoBEVODSDT should be
modified. All information is stored in the “parsed”
folder and in a file called “<IP_ADRESE>.txt”.
When the data are retrieved and classified, the next stage
of the obtained data assessment take place to evaluate the
“value” of data obtained.
C. Classification of Data Obtained by ShoBEVODSDT
Although the data obtained by ShoBEVODSDT is
automatically categorized by service and country to which
the specific IP address belongs to, data gathered from open
data sources should also be analyzed and classified to ensure
analysis and comparison of the sensitivity of data and the risk
they may pose to the organization. That is why we have
designed a very simple classification, where IP address are
divided in - (1) IP addresses to which ShoBEVODSDT has
managed to connect to, (2) IPs, to which ShoBEVODSDT
has failed to connect to. Then we refer only to the first
category and classify IP addresses according to the “value”
of information that can be obtained from these data sources.
The classification introduced is available in Table II. As in
[3], each category is assigned points from 0 to 5, depending
on the category where the higher risk, the higher the number
of points assigned to it.
TABLE II. IP ADDRESS CLASSIFICATION [3]
Category
Description
0
failed to connect
1
has managed to connect but failed to gather data
2
has managed to connect, but the database is empty
3
has managed to connect by gathering system data or non-
sensitive information
4
has managed to connect and gather sensitive data
5
compromised database
The nature of the categories is explained by the nature of
data that we have gathered through the approbation of our
tool. Therefore, “the database is empty” is derived as a
separate category that is widespread and less valuable
compared to has managed to connect but failed to gather
data or information” but more than “has managed to connect
by gathering system data or non-sensitive information”. The
data obtained, however, may contain sensitive information or
database information. In addition, the database can be
compromised, by which we mean databases where all
records have been deleted and a report has been left on the
fact that all data have a backup copy and that the database
holders have to pay ransom (in Bitcoin) to recover the data,
while if it will not be done, fraudsters will report the breach
of the general data protection regulation (GDPR) and the
database holder will get the penalty because data were not
protected and data leakage took place.
IV. ANALYSIS AND RESULTS: SHOBEVODSDT IN USE
All the data provided in this Section have been collected
by the ShoBEVODSDT. Although the source code of the
ShoBEVODSDT is publicly available, thereby supporting
principles of open science, the data gathered are not
published since they could potentially provide information
that can be used for the attacking phase of penetration
testing. This is particularly risky when very sensitive
information is detected, but we consider our solution as a
“white hacking” tool.
In total, ShoBEVODSDT was able to process a total of
15 180 IP addresses, with the majority of IP addresses
belonging to Lithuania (7 453), followed by Estonia (5 352)
and Latvia (2 375). 98.43% of the addresses have failed to
connect. Therefor, the further actions took place with 1.57%
or 238 IP addresses only.
In terms of data source / database, the most popular
service (at least for Baltic States) is MySQL, followed by
PostgreSQL. However, the third most popular data source
varies from country to country and will be covered in the
following sections. The least popular service is Cassandra.
This trend is valid for both three countries analyzed. This
may be due to the fact that MySQL is intended as a website
database and various website deployment services offer
MySQL database for free when renting server, while
Cassandra is meant to store Big data with multiple servers. In
Figure 1, statistics on services are available, where the
percentage of connection status is displayed by the analyzed
data source. Let us now turn to an overview of the results by
country.
A. Latvia
ShoBEVODSDT has managed to find 2 375 IP addresses,
where 2 325 were protected, thus ShoBEVODSDT failed to
connect to them. However, 2.11% were open, which is
significantly higher than an average.
MySQL
PostgreSQL
MongoDB
Redis
Memcached
ElasticSearch
CouchDB
Cassandra
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
13452
1184
153
110
23
14
6
19
3
24
12
93
86
1
Distribution of IP addresses by successful connection to them
(by service)
connection failed (bad)
successful connection (good)
Fig. 1. Distribution of network hosts with IP addresses attempted to
connect (by service).
For the total number of IP addresses found, this result is
the lowest (50 IPs) but cannot be extrapolated to subsequent
results, i.e. the number of unprotected and vulnerable data
sources. For services in use, MySQL and PostgreSQL are the
most popular, followed by Memcached, MongoDB and
Redis.
Fig. 2 shows a general distribution of successful
connections that ShoBEVODSDT has managed to make,
where Memcached was the most popular database among
those to which a connection has been successful, followed by
ElasticSearch and MySQL. However, in the scope of the total
number of databases found, the total number of Cassandra
and ElasticSearch databases to which ShoBEVODSDT has
managed to connect is the highest all databases found were
open, followed by Memacached 82.5% of Memacached
databases, and Redis, i.e. 6.25%. There are also two services
to which ShoBEVODSDT has not managed to connect,
namely PostgreSQL and CouchDB.
For the “value” of data gathered by ShoBEVODSDT,
Fig. 3 shows that Memchached can be characterized by the
highest number of data collection cases when data gathered
can be classified as system data (3 points) and even sensitive
data (4 points). ShoBEVODSDT has identified 4 databases
that have already been compromised. Although Memcached
is characterized by the highest number of vulnerable
databases, this type of vulnerability has not been identified
for any Memcached database. However, the highest number
of compromised databases was found for ElasticSearch - 2
databases with 1 more MySQL and MongoDB. The most
common case for Latvian databases found is
ShoBEVODSDT ability to connect by gathering system data
or non-sensitive information (3 points).
8%
2%
2%
66%
20%
2%
Latvia: distribution of successful connections by
service
MySQL
PostgreSQL
MongoDB
Redis
Memcached
ElasticSearch
CouchDB
Cassandra
Fig. 2. Distribution of successful connections by service for Latvia.
MySQL
PostgreSQL
MongoDB
Redis
Memcached
ElasticSearch
CouchDB
Cassandra
0
5
10
15
20
25
Latvia: clasification of IP addres ses by service and gathered dat a "value"
(from 1 to 5 points)
1 - has managed to
connect but failed to
gather data or informa-
tion
2 - has managed to
connect, but the DB is
empty
3 - has managed to
connect by gathering
system data or non-sens i-
tive information
4 - has managed to
connect and gather sensi-
tive data
5 - compromised data-
base
data source
number of data sources
Fig. 3. Classification of IPs by service and gathered data “value” (Latvia).
B. Estonia
For Estonia, ShoBEVODSDT has managed to find 5 352
IP addresses, where 5 307 were protected, thus
ShoBEVODSDT failed to connect to them. 45 IP addresses
(0.84%) were open and further action took place with them.
Although the number of IP addresses found by
ShoBEVODSDT is more than twice as high as in the case of
Latvia, the ratio calculated as the number of IP addresses to
which it is possible to connect from outside to the total
number of detected IP addresses, is almost three times
higher.
For services in use, MySQL and PostgreSQL are the most
popular, followed by ElasticSearch, MongoDB and Redis.
Although MongoDB and Redis were in the list of top popular
services for Latvia, ElasticSearch is something new here,
while Memcached, which holds third place in Latvia, is
significantly less popular.
Fig. 4 shows a general distribution of successful
connections that ShoBEVODSDT has managed to make,
where ElasticSearch was the most popular database among
those to which a connection has been successful, followed by
MySQL and Memcached. However, in the scope of the total
number of databases found, the total number of ElasticSearch
databases to which ShoBEVODSDT has managed to connect
is the highest all databases found were open, followed by
Memcached 87.5%, and MongoDB - 15.8%.
As regards the “value of data gathered by
ShoBEVODSDT, Fig. 5 shows that in the case of Estonia
MySQL followed by ElasticSearch can be characterized by
the highest number of both compromised databases and cases
when data gathered can be classified as system data (3
points). As regards sensitive data (4 points), ElasticSearch,
followed by Memcahced and Redis are leaders in this
negative trend. Moreover, 8 databases have been classified as
compromised with 4 ElasticSearch databases, 2 MongoDB
and 1 PostgreSQL and Memcached.
Fig. 4. Distribution of successful connections by service for Estonia.
MySQL
PostgreSQL
MongoDB
Redis
Memcached
ElasticSearch
CouchDB
Cassandra
0
1
2
3
4
5
6
7
8
9
10
Estonia: clasification of IP addresses by service and gathered data "value"
(from 1 to 5 points)
1 - has managed to connect but failed
to gather data or information
2 - has managed to connect, but the
DB is empty
3 - has managed to connect by gathe-
ring system data or non-sensitive in-
formation
4 - has managed to connect and
gather sensitive data
5 - compromised database
Fig. 5. Classification of IPs by service and gathered data “value”
(Estonia).
Here we can see that although the total number of
compromised databases is not very high, ElasticSearch has
demonstrated the highest number of compromised databases
same as it was in the case of Latvia.
For the most common case for Estonian databases found
is the same as it was for Latvia - ability to connect by
gathering system data or non-sensitive information (3
points).
C. Lithuania
In case of Lithuania, ShoBEVODSDT has managed to
find 7 453 IP addresses, where 7 310 were protected. Thus,
further actions took place with 143 IP addresses (1.92%)
which were open. It is even more than a half of the total IP
addresses we process further. Although the number of IP
address to which we have managed to connect is the highest,
the ratio is lower compared to Latvian case but still more
than twice higher than in the case of Estonia.
For services in use, MySQL and PostgreSQL are the most
popular, followed by ElasticSearch, MongoDB and Redis.
An interesting point here is that Lithuania combines the
results of both abovementioned countries and their most
popular services.
Fig. 6 shows a general distribution of successful
connections that ShoBEVODSDT has managed to make,
where ElasticSearch was the most popular database among
those to which a connection has been successful, followed by
Memcached and MongoDB. However, in the scope of the
total number of databases found, the total number of
ElasticSearch to which ShoBEVODSDT has managed to
connect is the highest all databases were open, followed by
Memcached 77.6%, and MongoDB, i.e. 14.4%.
As regards the “value” of data gathered, Fig. 7 shows that
in the case of Lithuania ElasticSearch followed by
Memacached can be characterized by the highest number of
data gatherings.
3%
1%
14%
7%
36%
38%
Lithuania: distribution of successful
connections by service
MySQL
PostgreSQL
MongoDB
Redis
Memcached
ElasticSearch
CouchDB
Cassandra
Fig. 6. Distribution of successful connections by service for Lithuania.
MySQL
PostgreSQL
MongoDB
Redis
Memcached
ElasticSearch
CouchDB
Cassandra
0
5
10
15
20
25
30
35
40
Lithuani a: clasification of IP addr esses by service and gathered data "value"
(from 1 to 5 points)
1 - has managed to connect but failed
to gather data or information
2 - has managed to connect, but the
DB is empty
3 - has managed to connect by gathe-
ring system data or non-sensitive in-
formation
4 - has managed to connect and
gather sensitive data
5 - compromised database
Fig. 7. Classification of IPs by service and gathered data “value”
(Lithuania).
Here we also observe the highest number of
compromised databases (5 points), which mainly belong to
ElasticSearch and MongoDB (17 databases per database
type) with another one Memcached database. This means that
¼ of all open databases have been compromised.
The most common case is the same as for the countries
mentioned above the ability to connect by gathering system
data or non-sensitive information (3 points), but in this case it
is not such an expressive leader, followed by compromised
databases (5 points) and those storing non-sensitive or
system data (2 points) with a little less databases, where
sensitive data have been gathered (3 points).
A. Summary of Results in the Country-by-country Context
For the databases to which ShoBEVODSDT has been
able to connect, it has been found that the highest ratio of the
compromised databases belongs to Lithuania, where a total
of 24.5% of all databases were compromised. It is surprising,
but it is followed by Estonia with 17.8% compromised
databases, while for Latvia only 8% of all databases to which
we have connected to, were compromised.
However, this trend does not applies to the ratio of cases
where sensitive data have been gathered, as the most
negative trend is shown by Latvia (20%), followed by
Lithuania (18.9%), with the best results for Estonia (13.3%).
As regards the gathering system and non-sensitive data,
Estonia demonstrates the most negative trend, where 46.7%
of all databases fall into this category (3 points), followed by
Latvia (44%) and 35% for Lithuania.
Overall, the “value” of the gathered data for the three
countries is 3.22, i.e. closer to the critical level, where the
worst results are demonstrated by Lithuania with 3.45 of 5
points, followed by Estonia with 3.18 and Latvia with 3.02
points. A summary of the analysis is provided in Table III,
where both data on the total IP addresses found, the
successful connections and the distribution of gathered data
“value” are provided by country (the most negative trends are
highlighted in red).
TABLE III. SUMMARY OF RESULTS BY COUNTRY
Latvia
Estonia
Lithuania
Total found
2375
5352
7453
Connection successful
50 (2.1%)
43 (0.8%)
143 (1.9%)
Compromised DB (5
points)
4 (8%)
8 (18.6%)
35
(24.5%)
sensitive data (4 points)
22 (40%)
21 (48.8%)
27 (18.9%)
System or non-sensitive
data (3 points)
22 (44%)
21 (48.8%)
50 (35%)
DB is empty (2 points)
11 (22%)
7 (16.3%)
29 (20.3%)
Failed to gather data (1
point)
3 (6%)
3 (7%)
3 (2.1%)
AVG data “value”
3.02
3.18
3.45
MySQL
PostgreSQL
MongoDB
Redis
Memcached
ElasticSearch
CouchDB
Cassandra
0,00% 20,00% 40,00% 60,00% 80,00% 100,00%
Sensitivity of gathered data by service (1 to 5 points)
1 - has managed to connect but fai-
led to gather data or information
2 - has managed to connect, but
the DB is empty
3 - has managed to connect by
gathering system data or non-
sensitive information
4 - has managed to connect and
gather sensitive data
5 - compromised database
Fig. 8. Sensitivity of gathered data by data source.
B. Results in the Context of Data Source
We have already found that Memcached and
ElasticSearch were the leading data sources to which
ShoBEVODSDT has managed to connect.
Let us turn now to a brief summary of the sensitivity of
the data, because the knowledge on whether it is possible to
connect to the database outside the organization is not
enough to make conclusions on their security.
Fig. 8 provides statistics on the “value” of data we have
gathered without their division by the service classified
according to Table II, where the most popular category is
has managed to connect by gathering system data or non-
sensitive information(45%), followed by has managed to
connect, but the database is empty” (21%). This could be
considered as a positive result, i.e. while these data sources
are visible for external actors, they are not of very high value
to attackers, although they can facilitate attacks. However,
18% of these data sources contain data that could be used by
attackers, and 12% of them have already been compromised
[3].
For compromised databases, it is known that in 2020,
fraudsters attacked more than 22 000 MongoDB databases
[16], however, our experiment shows that MongoDB is not
the only database, which was compromised. The most
compromised databases, where the number of compromised
databases is related to the total number of databases of a
particular type to which ShoBEVODSDT has managed to
connect to, belong to Elasticsearch (27% of all
Elasticsearch), followed by MongoDB (11% of all
MongoDB) and PostgreSQL (0.08% of all PostgreSQL) and
Memcached (2% of all Memcached databases are
compromised).
For databases from which sensitive data have been
gathered, the leader of this negative trend is Redis, where for
83.3% of open databases to which ShoBEVODSDT has
managed to connect, it was possible to gain sensitive data,
which can be used for exploiting attack to it. MySQL and
Memcahced are also in the list of leaders in this respect.
To sum up:
PostgreSQL is mainly characterized by
compromised databases and databases from which
non-sensitive data or system data can be gathered;
MongoDB is characterized by a high number of
cases where databases are compromised (83.3%),
followed by data sources from which sensitive data
can be gathered (4.2%) and some data sources from
which system and non-sensitive data can be
gathered. This finding is also in line with [1, 5];
Cassandra in this case can be characterized as a
data source to which we have managed to connect,
but the database was empty;
Redis can be characterized as a data source from
which sensitive data may be gathered. In some
cases, the relevant databases are empty;
Memcached can be characterized as a data source
where system data and non-sensitive data were
gathered most frequently (61.3%), followed by
sensitive data gatherings (22.6%), empty databases
(12.9%) and 2.1% compromised databases;
MySQL is characterized by prevailing number of
databases from which non-sensitive or system data
can be gathered (52.6%) with 21.1% databases
from which sensitive data can be gathered and
5.3% compromised databases. However, MySQL
has also proved to be a database, where, although
ShoBEVODSDT has managed to connect to
database, data gathering has failed;
ElasticSearch databases represent all categories
where the largest number of databases is empty,
followed by a large number of databases that are
already compromised (26.7%), which is also in line
with [6]. In addition, 24.4% of ElasticSearch
databases contain non-sensitive or system data
(24.4%) with 8.1% databases with sensitive data.
However, it remains one of two databases types
where, although ShoBEVODSDT was able to
connect to it, data gathering was unsuccessful.
A summary by service (excluding CouchDB, where
ShoBEVODSDT has found no vulnerabilities) is provided in
Table IV, which provides both data on the total IP addresses
found, the successful connections and the distribution of
gathered data “value” for categories “5”, “4” and “1” (most
and least vulnerable), highlighting the most negative trends
in red.
Overall, the average “value” of the gathered data for eight
services under question is 2.83, where the worst results are
demonstrated by MongoDB with 4.5 of 5 points, followed by
PostgreSQL with 3.7 and ElasticSearch and Memcached with
3.17 and 3.16 points, respectively.
TABLE IV. SUMMARY OF RESULTS BY SERVICE
MySQL
PostgreSQL
MongoDB
Redis
Memcached
ElasticSearch
Cassandra
Total
found
1347
1
1187
177
122
116
86
7
Connectio
n
successful
0.14
%
0.3%
7.9%
9.8%
80%
100
%
14%
Compro
mised
DB (5
points)
5.3%
33%
71%
0
2.2%
27%
0
sensitive
data (4
points)
0
0
7.1%
83%
24%
8%
0
Failed to
gather
data (1
point)
21%
0
0
17%
0
3.5%
0
AVG
data
“value”
2.7
3.67
4.5
3.5
3.15
3.17
2
These results can be explained not only by the database
holder’s awareness of their data security, but also by the
relevant data sources default security mechanisms. In other
words, data sources with weaker mechanisms are more likely
to be vulnerable. Our examination of data sources under
question lead us to conclusion that Redis, Memcached have
no authentication mechanisms, and MongoDB and
ElasticSearch allow to enable them but do not have them
enabled by default. However MySQL, CouchDB and
Cassandra require authentication data and show better results
when ShoBEVODSDT is used.
This observation makes it possible to state that, in many
cases, even such a primitive and simple approach as proper
authentication mechanisms lead to a significant reduction in
the risk of data leakage and intrusion.
V. DISCUSSION AND LIMITATIONS
First, in this study we use our self-developed tool
ShoBEVODSDT [3], which utilizes a passive assessment
that is characterized by its low level of intrusiveness [17], the
respective data sources are not thoroughly tested to see if the
vulnerabilities identified in the systems actually exist rather
pointing on such possibility.
Secondly, the number of services inspected is limited,
which does not allow us to state with a high degree of
confidence that a particular service is highly vulnerable,
while the other one is totally secure because the number of
databases is not balanced. Thus, although a number indicates
that there is no vulnerabilities among open CouchDB
databases, it cannot be generalized because ShoBEVODSDT
has found only 14 IP addresses, although for other databases
this number exceeds 1 000. Thus, in order to draw more
generalizable conclusions on services, the sample should be
balanced. However, this was not the main aim of this study,
mainly by examining the state of the art in respect of three
countries.
In addition, in the future we also plan to perform a
comparison of the results obtained with CVE Details aimed
at verifying whether there is a relationship between the
registered “Gain Information” vulnerabilities and the data
that we have managed to collect. Similar approach was
applied by Genge et al. [15] and we suppose it will allow
obtaining more generalizable results on the services under
question.
VI. CONCLUSIONS
More and more studies highlight the risks posed by IoT
devices and stress the need for actions to ensure the security
of IoT ecosystem at a wide range of levels [1-2, 18]. In this
paper, we have applied the IoTSE-based tool
ShoBEVODSDT we have presented in our previous study [3]
to inspect the state of play of three countries in the Baltic
region, namely, Latvia, Estonia and Lithuania, with regard to
unprotected open databases accessible outside the
organization and the „value” of the data that can be gathered
from them, in the case of successful connection. We have
inspected eight data sources on their vulnerabilities and their
extent. We conclude that although the total number of open
databases accessible outside the organization is less than 2%
of the data sources scanned, there are data sources that may
pose risks to organizations. Even more, for 12% of open data
sources this has already taken place.
We conclude that the weakest results are demonstrated by
Lithuania with 3.45 of 5 points, followed by Estonia with
3.18 and Latvia with 3.02 points. For the services under
question, the worst results are demonstrated by MongoDB,
followed by PostgreSQL, ElasticSearch and Memcached.
We argue that the ShoBEVODSDT can be useful for (1)
individual organizations to determine whether their data
source data are visible and even accessible outside the
organization, (2) testers to effectively map the potential
attack surface and advance targeted vulnerability
assessments, with their further inspection and development
of preventive activities and security mechanisms, (3)
scientists and developers to carry out a comprehensive
multidimensional and longitudinal analysis of uprotected
data sources, (4) countries and their governments, defining
guidelines and laws according to state of the art on a country
level that would promote technological development and
better protection.
REFERENCES
[1] M. Bada, I. Pete, An exploration of the cybercrime ecosystem
around Shodan,” In 2020 7th International Conference on Internet of
Things: Systems, Management and Security (IOTSMS) (pp. 1-8).
IEEE, December, 2020.
[2] M. Al-Ruithe, S. Mthunzi, E. Benkhelifa, “Data governance for
security in IoT & cloud converged environments,” In 2016
IEEE/ACS 13th International Conference of Computer Systems and
Applications (AICCSA) (pp. 1-8). IEEE, November, 2016.
[3] A. Daskevics, A. Nikiforova, „ShoBeVODSDT: Shodan and Binary
Edge based vulnerable open data sources detection tool or what
Internet of Things Search Engines know about you,” 2021 (in print)
[4] E.Sahafizadeh, M. A. Nematbakhsh, A survey on security issues in
Big Data and NoSQL,” Advances in Computer Science: an
International Journal, 4(4), 68-72, 2015.
[5] J. Davis, “Telemedicine vendor breaches the data of 2.4 million
patients in Mexico,” 2018. [Online]. Available:
https://www.healthcareitnews.com/news/telemedicine-vendor-
breaches-data-24-million-patients-mexico
[6] E. Bekker (2020). Identity Force, A sontiq Brand. 2020 data
breaches. The most significant breaches of the year.
[7] B. Burns, D. Killion, N. Beauchesne, E. Moret, J. Sobrier, M. Lynn,...
P. Guersch, Security power tools,” O'Reilly Media, Inc., 2007.
[8] S. Samtani, M. Kantarcioglu, H. Chen, “Trailblazing the Artificial
Intelligence for Cybersecurity Discipline: A Multi-Disciplinary
Research Roadmap”, CM Trans. Manage. Inf. Syst. 11, 4, Article 17
December 2020, DOI:https://doi.org/10.1145/3430360.
[9] J. R. G. Evangelista, R. J. Sassi, M. Romero, D. Napolitano,
Systematic literature review to investigate the application of open
source intelligence (osint) with artificial intelligence,” Journal of
Applied Security Research, 1-25, 2020.
[10] H. J. Williams, I. Blum, Defining second generation open source
intelligence (OSINT) for the defense enterprise,” RAND Corporation
Santa Monica United States, 2018.
[11] A. Oganesyan, DeviceLock Inc., How Researchers Discover
MongoDB and Elasticsearch Open Databases (2019), online:
https://m.devicelock.com/blog/how-researchers-discover-mongodb-
and-elasticsearch-open-databases.html
[12] O. Valin (2020) Most popular databases in 2020 and new trends,
[online] https://www.eversql.com/most-popular-databases-in-2020/,
last accessed 22.06.2021
[13] DB-engines (2021), [online] https://db-engines.com/en/ranking, last
accessed 22.06.2021
[14] P. D. C. de Sousa Rodrigues, An OSINT Approach to Automated
Asset Discovery and Monitoring,” 2019.
[15] B.Genge, C. Enăchescu, ShoVAT: Shodan-based vulnerability
assessment tool for Internet-facing services,” Security and
communication networks, 9(15), 2696-2714, 2016.
[16] A. Bizga, Bad Actors Target MongoDB Databases, Threatening to
Contact GDPR Legislators Unless Ransom is Paid,” Online:
https://securityboulevard.com/2020/07/bad-actors-target-mongodb-
databases-threatening-to-contact-gdpr-legislators-unless-ransom-is-
paid/, last accessed: 11.07.2021
[17] S. Samtani, S. Yu, H. Zhu, M. Patton, H. Chen, Identifying SCADA
vulnerabilities using passive and active vulnerability assessment
techniques,” In 2016 IEEE Conference on Intelligence and Security
Informatics (ISI) (pp. 25-30). IEEE, September, 2016.
[18] Y. Jararweh, M. Al-Ayyoub, E. Benkhelifa, M. Vouk, A. Rindos,
SDIoT: a software defined based internet of things framework,”
Journal of Ambient Intelligence and Humanized Computing, 6(4),
453-461, 2015.
... With the fast pace of technological progress, growing complexity of cyber threats, and the adoption of smart city initiatives, it is crucial for LGs to regularly update their evaluation and control procedures for essential infrastructure [58,76]. This includes integrating emerging technologies such as Open-Source Intelligence (OSINT) and Internet of Things Security Exploits (IoTSE) into their detection systems [77,78]. These tools will allow local governments to actively detect and fix possible weaknesses in their system. ...
... These tools will allow local governments to actively detect and fix possible weaknesses in their system. Daskevics and Nikiforova [78] highlighted the importance of conducting comprehensive assessments of vulnerabilities in open databases. This is especially important and in line with LGs as they are expanding the use of digital devices and interconnected systems, which involve the gathering and handling of substantial amounts of data, particularly in smart city settings. ...
Article
Full-text available
Cybersecurity is a crucial concern for local governments as they serve as the primary interface between public and government services, managing sensitive data and critical infrastructure. While technical safeguards are integral to cybersecurity, the role of a well-structured policy is equally important as it provides structured guidance to translate technical requirements into actionable protocols. This study reviews local governments’ cybersecurity policies to provide a comprehensive assessment of how these policies align with the National Institute of Standards and Technology’s Cybersecurity Framework 2.0, which is a widely adopted and commonly used cybersecurity assessment framework. This review offers local governments a mirror to reflect on their cybersecurity stance, identifying potential vulnerabilities and areas needing urgent attention. This study further extends the development of a cybersecurity policy framework, which local governments can use as a strategic tool. It provides valuable information on crucial cybersecurity elements that local governments must incorporate into their policies to protect confidential data and critical infrastructure.
... A. Daskevics and A. Nikiforova's research, conducted across Latvia, Lithuania, and Estonia, offers [2] a valuable contribution to this field. Introducing their specialized tool, ShoBEVODSDT, the researchers scanned 15,180 IPs to release a ranking of the top publicly used databases. ...
Conference Paper
Full-text available
This paper explores enhancing user accessibility and experience in cybersecurity by introducing a user-friendly website interface for widely used port scanning tools (Nmap, Unicornscan, RustScan). Conducting global scans on the top 10 million hosts, the study provides valuable insights into open ports, vulnerabilities, and offers a foundation for future security strategies. The integration of this interface aims to make port scanning more accessible, bridging the gap between advanced cybersecurity tools and user-friendly interfaces, ultimately fortifying digital asset protection in an evolving threat landscape.
... Whilst the cloud offers numerous benefits, security measures for data storage are sometimes lacking. Opting to store big data in a singular cloud is less favored due to issues such as unavailable resources and data pilferage by internal attackers [12]. The existing security methods have several issues, including a lack of data security and inaccuracies in data analysis, inefficiencies in performance and complete dependence on a third party [13]. ...
... The analysis findings indicated that this method was effective and highly secure. However, the strategy necessitates dynamic data processing and ranked keyword searches on encrypted big data on the cloud [22][23]. ...
... The analytics of CC and big data evolved together with the development of information management from basic reporting and querying, advanced analytics, business intelligence and machine learning [4]. The application of CC and big data is on the basis of combining the resources, dynamic on-demand service, capabilities and requirements of integrated services [5]. CC quickly searches for the dynamic and real time information in a shorter 1. ...
Article
Full-text available
In recent decades, big data analysis has become the most important research topic. Hence, big data security offers Cloud application security and monitoring to host highly sensitive data to support Cloud platforms. However, the privacy and security of big data has become an emerging issue that restricts the organization to utilize Cloud services. The existing privacy preserving approaches showed several drawbacks such as a lack of data privacy and accurate data analysis, a lack of efficiency of performance, and completely rely on third party. In order to overcome such an issue, the Triple Data Encryption Standard (TDES) methodology is proposed to provide security for big data in the Cloud environment. The proposed TDES methodology provides a relatively simpler technique by increasing the sizes of keys in Data Encryption Standard (DES) to protect against attacks and defend the privacy of data. The experimental results showed that the proposed TDES method is effective in providing security and privacy to big healthcare data in the Cloud environment. The proposed TDES methodology showed less encryption and decryption time compared to the existing Intelligent Framework for Healthcare Data Security (IFHDS) method.
Article
The proliferation of social networks, the Internet of Things, and economic mobility has led to an exponential increase in data. New data having high volume, high velocity, high variety, and high value are called big data. Big data present additional requirements in terms of storage and computation resources. Various enterprises aim to outsource their big data services to the cloud because of its cost efficiency, less management, resource pooling, and resilient computing. However, outsourcing the storage of sensitive data can expose them to potential security risks. Encryption presents a straightforward solution for data privacy preserving. In traditional encryption mechanisms, such as advanced encryption standard, the data owner and users must share an exact key for both data encryption and decryption. Currently, these mechanisms do not provide a scalable and secure solution for big data storage and analysis. Furthermore, they need to be more efficient to support big data velocity. Unfortunately, securing outsourced big data storage to a public cloud environment to later maintain efficient and secure processing over encrypted data by cloud servers cannot be ensured using traditional encryption mechanisms. In this paper, we propose a security approach for this issue by which honest but curious users or cloud service providers cannot reach complete information from the stored data. From the analysis, the proposed approach can provide secure cloud-assisted big data. Meanwhile, the performance evaluation shows the efficiency of the proposed approach.
Chapter
Full-text available
Nowadays, there are billions interconnected devices forming Cyber-Physical Systems (CPS), Internet of Things (IoT) and Industrial Internet of Things (IIoT) ecosystems. With an increasing number of devices and systems in use, amount and the value of data, the risks of security breaches increase. One of these risks is posed by open data sources, which are databases that are not properly protected. These poorly protected databases are accessible to external actors, which poses a serious risk to the data holder and the results of data-related activities such as analysis, forecasting, monitoring, decision-making, policy development, and the whole contemporary society. This chapter aims at examining the state of the security of open data databases representing both relational databases and NoSQL, with a particular focus on a later category.
Preprint
Full-text available
Nowadays, there are billions interconnected devices forming Cyber-Physical Systems, Internet of Things (IoT) and Industrial Internet of Things (IIoT) ecosystems. With an increasing number of devices and systems in use, amount and the value of data, the risks of security breaches increase. One of these risks is posed by open data sources, by which are meant databases, which are not properly protected. These poorly protected databases are accessible to external actors, which poses a serious risk to the data holder and the results of data-related activities such as analysis, forecasting, monitoring, decision-making, policy development, and the whole contemporary society. This chapter aims at examining the state of the security of open data databases representing both relational databases and NoSQL, with a particular focus on a later category.
Conference Paper
Full-text available
The paper proposes a tool for non-intrusive testing of open data sources for detecting their vulnerabilities, called ShoBeVODSDT (Shodan-and Binary Edge-based vulnerable open data sources detection tool). The use of Open Source Intelligence (OSINT) tools, more precisely the Internet of Things Search Engines (IoTSE), allows the tool to inspect a list of predefined data sources, such as MySQL, PostgreSQL, MongoDB, Redis, Elasticsearch, CouchDB, Cassandra and Memcached, on their vulnerabilities and their extent, i.e. is the data source visible outside the organization? what data can be gathered from open data sources (if any) and what is their “value” for attacker and fraudsters? Whether these data can pose the risks to organization using them to deploy an attack? This allows both a comprehensive analysis of unprotected data sources, falling into a list of predefined data sources, or a specific IP or IP range to examine what can be seen from the outside of the organization about the data source in use. While the tool covers 8 data sources representing both rational databases, NoSQL databases and data stores, it is designed to be easily scalable, so, the list of data sources can be extended by third-parties by adapting the code made available. While some data sources can be described as complying with the security-by-design principle, some of them face serious challenges in this respect, including some databases already being compromised.
Article
Full-text available
Cybersecurity has rapidly emerged has rapidly emerged as a grand societal challenge of the 21 st century. Innovative solutions to proactively tackle emerging cybersecurity challenges are essential to ensuring a safe and secure society. Artificial Intelligence (AI) has rapidly emerged as a viable approach sift through terabytes of heterogeneous cybersecurity data to execute fundamental cybersecurity tasks such as asset prioritization, control allocation, vulnerability management, and threat detection with unprecedented efficiency and effectiveness. Despite its initial promise, AI and cybersecurity have been traditionally siloed disciplines that rely on disparate knowledge and methodologies. Consequently, the AI for Cybersecurity discipline is in its nascency. In this paper, we aim to provide an important step to progress the AI for Cybersecurity discipline. We first provide an overview of prevailing cybersecurity data, summarize extant AI for cybersecurity application areas, and identify key limitations from in the prevailing landscape. Based on these key issues, we offer a multidisciplinary AI for Cybersecurity roadmap that centers on major themes such as cybersecurity applications and data, advanced AI methodologies for cybersecurity, and AI-enabled decision making. To help scholars and practitioners make significant headway in tackling these grand AI for Cybersecurity issues, we summarize promising funding mechanisms from the National Science Foundation (NSF) that can support long-term, systematic research programs. We conclude this paper with an introduction to the papers included in this special issue.
Article
Full-text available
The internet of things (IoT) represent the current and future state of the Internet. The large number of things (objects), which are connected to the Internet, produce a huge amount of data that needs a lot of effort and processing operations to transfer it to useful information. Moreover, the organization and control of this large volume of data requires novel ideas in the design and management of the IoT network to accelerate and enhance its performance. The software defined systems is a new paradigm that appeared recently to hide all complexity in traditional system architecture by abstracting all the controls and management operations from the underling devices (things in the IoT) and setting them inside a middleware layer, a software layer. In this work, a comprehensive software defined based framework model is proposed to simplify the IoT management process and provide a vital solution for the challenges in the traditional IoT architecture to forward, store, and secure the produced data from the IoT objects by integrating the software defined network, software defined storage, and software defined security into one software defined based control model.
Article
Full-text available
Shodan has been acknowledged as one of the most popular search engines available today, designed to crawl the Internet and to index discovered services. This paper expands the features exposed by Shodan with advanced vulnerability assessment capabilities embedded into a novel tool called Shodan-based vulnerability assessment tool (ShoVAT). ShoVAT takes the output of traditional Shodan queries and performs an in-depth analysis of service-specific data, that is, service banners. It embodies specially crafted algorithms which rely on novel in-memory data structures to automatically reconstruct Common Platform Enumeration names and to proficiently extract vulnerabilities from National Vulnerability Database. Compared with the state of the art, ShoVAT brings several novel and significant contributions because it encompasses automated vulnerability identification techniques, it can return highly accurate results with customized and even purposefully modified service banners, and it supports historical service vulnerability analysis without the need to deploy additional monitoring infrastructures. The experiments performed on 1501 services in 12 different institutions across different sectors revealed high accuracy of results and a total of 3922 known vulnerabilities. Copyright © 2015 John Wiley & Sons, Ltd.
Article
Open Source Intelligence (OSINT) is a concept to describe the search, collection, analysis, and use of information from open sources, as well as the techniques and tools used. OSINT emerges out of a military need to collect relevant and publicly available information. Through the use of OSINT, it is possible to find specific information that has some knowledge or provides an advantage. Since its emergence, some studies have been done proposing and developing new ways of using OSINT in different areas. In addition to OSINT, another field of study that has also been a worldwide trend and is being used together with other areas is Artificial Intelligence (AI). AI is the area of computer science responsible for the development of intelligent systems. However, a systematic literature review that investigates the use of OSINT over the years and your application with AI was not found. So, this work has an objective to develop a systematic literature review on OSINT to investigate the application of OSINT with AI. This work was motivated to fill this research gap, for this, consolidate the publications on OSINT divided into the publication bases. As for its contribution, this work presents a systematic literature review composed of 9-step and also brings consolidated information to support the next OSINT studies. This research searched for publications between January 1990 and October 2019, finding a total of 244 publications. The 9-steps of the systematic literature review are Definition of Keywords, Query string definition, the definition of publication bases, the search on the publications bases, the base search results analysis, download of publications, importing the publications into Mendeley, Importing. Ris file into VOSviewer and Keyword Map Analysis. Analyzing the results, we find some relevant information about the publications that address OSINT and OSINT with AI or other areas. With this information, it was possible to understand where the largest concentration of publications, which countries and continents develop the most research and the characteristics of these publications. What are the trends for the next studies on OSINT with AI. Which AI subareas are used with OSINT. What are the most used keywords, and how do these keywords relate to others over the years. Which publication bases have the highest concentration of publications and what are the types of these publications? Also, a timeline describing the application of OSINT. It also became evident how OSINT has been used with AI to solve problems in different areas with different objectives. Based on these results, it is concluded that the application of a systematic literature review can show the application of OSINT with AI.