Conference PaperPDF Available

A System Architecture for Monitoring the Reliability of IoT

Authors:

Abstract and Figures

The Internet of Things has gained momentum in recent years, supported by new technologies and computing paradigms such as Cloud Computing and Service Oriented Architecture and an increasing demand from the enterprise. With hundreds of billions of devices to be connected in the near future, IoT will need new methods for addressing key challenges in security and reliability. One particular challenge we will focus on is the ability of the system to prevent itself from failing by continuously introspecting its own state and take decisions without human intervention. We will demonstrate how this can be achieved using new time series databases and monitoring systems such as Prometheus, InfluxDB, OpenTSDB and Graphite. By logging performance and other transaction metrics, the system can use specific algorithms to predict potential issues and react. We will then show how machine-learning algorithms could be used to reveal new insights, patterns and relationships across data.
Content may be subject to copyright.
143
A System Architecture for Monitoring the Reliability of IoT
Radu BONCEA* **, Ioan BACIVAROV**
*Romania Top Level Domain, National Institute for Research and Development
in Informatics-ICI Bucharest
** University Politehnica of Bucharest, Faculty of Electronics,
Telecommunications and Information Technology
* radu@rotld.ro, **bacivaro@euroqual.pub.ro
Abstract
The Internet of Things has gained momentum in recent years, supported by new technologies and
computing paradigms such as Cloud Computing and Service Oriented Architecture and an
increasing demand from the enterprise. With hundreds of billions of devices to be connected in the
near future, IoT will need new methods for addressing key challenges in security and reliability.
One particular challenge we will focus on is the ability of the system to prevent itself from failing
by continuously introspecting its own state and take decisions without human intervention. We will
demonstrate how this can be achieved using new time series databases and monitoring systems
such as Prometheus, InfluxDB, OpenTSDB and Graphite. By logging performance and other
transaction metrics, the system can use specific algorithms to predict potential issues and react.
We will then show how machine-learning algorithms could be used to reveal new insights,
patterns and relationships across data.
Keywords: IoT, monitoring, reliability, self-management, time series, automation, Prometheus,
OpenTSDB, InfluxDB
1. INTRODUCTION
Internet of Things is a vision where every object
in the world has the potential to connect to the
Internet and provide their data so as to derive
actionable insights on its own or through other
connected objects [1].
With support from Do It Yourself communities,
IoT has emerged as a key enabling technology for the
4th Industrial Revolution [2] along with Internet of
Services, Cloud Computing, Machine-to-Machine,
RDIF, Cyber-Physical Systems, Autonomic Systems,
Systems of Systems, Robotics, Software Agents,
Cooperating Objects [3] and Machine Learning.
Recent studies have put the number of IoT devices
connected to Internet to reach 38.5 billion by 2020 [4].
The classic methods of monitoring application
performance rely on tools such as Nagios, Cacti or
Zabbix to log application metrics and a lot of human
engineering to interpret these metrics and make
appropriate decisions. There are multiple layers the
applications run on so you have system administrators,
network administrator and application developers
doing the monitoring at regular intervals of time.
Cloud computing has been the first technology to
challenge this model with its increased number of
applications and services that needed to be monitored,
from infrastructure components (servers, routers
storage) to cloud computing services and the user
experience. Cloud computing vendors like VMware or
Microsoft have integrated the active monitoring into
centralized management and analytics solutions. In
cloud computing, the computational resources are
monitored for both the physical and virtual layers
using agents like VMware vSphere Hypervisor which
reports metrics of the physical machine or host
and VMware View Agent which is addressing
virtual machine metrics as pictured in Fig.1. All
metric values are pulled from agents and pushed to a
time-series database where an analytics platform has
access to.
144
Figure 1 - Monitoring computational resources in Cloud Computing
The IoT ecosystem is composed of 4 layers
(Fig.2): the Edge where IoT devices are located; the
Gateway where sensors data is initially stored, filtered
and curated; the Cloud Platform is the layer where
data is enriched and processed by analytics tools; the
last layer is the Presentation where data-centric
business services are offered to end users [5].
Figure 2 - IoT generic architecture
The monitoring of the applications in the
Presentation layer is based on the classic model with
Nagios, Zabbix and Cacti largely used. The Cloud
Platform monitoring is done using vendor-orien ted
solutions such as VMWare vSphere or cloud operating
systems such as OpenStack.
Because the IoT devices are generally resource-
constrained, the monitoring is done at the Gateway
layer along with the monitoring of other Gateway
applications. There is also a challenge regarding the
number of devices per gateway and the number of
gateways in a typical IoT ecosystem. Like it is the
case with plant and soil monitoring over a large area.
There would be tens of thousands of sensors and
hundreds of gateways deployed in a star-of-stars
topology. Manually monitoring the performance and
reliability of so many devices would be expensive and
inefficient. The system must be able to monitor itself
and react, either by sending alerts or executing series
of operations. In this paper we will discuss the
monitoring of the devices at the Edge and the
applications deployed on gateways using prediction
models and trends based on time-series data.
2. TIME-SERIES DATABASES
There are specific non-functional requirements
for IoT time-series databases that are deployable on
gateways:
labeling and tagging data points is a must due
to the large variety of devices;
labels should be indexable so filtering by a
specific tag or label should be done at
database engine level;
high resolution datapoints;
the engine should be optimized for intensive
writing, almost no updates and the deletes are
done in bulk;
compressed storage;
support service integration through a HTTP
API as the gateways would accommodate a
service oriented architecture with a plethora
of microservices deployed.
One observation worth noting is the fact there is no
requirement for long-term retention of data. This is
because the gateways are pushing data further to the
Cloud Platform where the historical data is analyzed
in a greater context. The data at gateway is used only
near real-time analysis.
We will analyze 4 modern monitoring systems
that are implementing the above described
requirements: Prometheus, InfluxData, OpenTSDB
and Graphite.
2.1. Prometheus
Prometheus is an open-source systems monitoring
and alerting toolkit, using LevelDB as a time-series
database (TSDB) and featuring:
a multi-dimensional data model with support
for labels;
a flexible query language that lets the user to
aggregate time series data in real time;
no reliance on distributed storage; single
server nodes are autonomous;
time series collection happens via a pull
model over HTTP;
pushing time series is supported via an
intermediary gateway;
145
targets are discovered via service discovery
or static configuration;
multiple modes of graphing and
dashboarding support;
HTTP API;
alert manager;
2.2. InfluxData
InfluxData provides a robust, open source and
fully customizable time-series data management
platform. It uses InfluxDB for storing metrics and IoT
sensor data. It features:
support for labels and data annotations, but
unlike Prometheus, InfluxDB is attaching the
metadata to each event/row, thus increasing
the overall overhead and disk space required;
high availability with InfluxDB Relay;
expressive SQL-like query language;
continuous queries automatically compute
aggregate data to make frequent queries more
efficient;
it implements the push model where agents
are sending the metrics to InfluxDB;
downsampling and resolution adjustment
over time;
HTTP API.
2.3. OpenTSDB
OpenTSDB is a time-series database running on
top of Hadoop and HBase, designed specifically for
long retention of raw data and greater scalability. It
features:
millisecond resolution;
HTTP API;
variable length encoding - use less storage
space for smaller integer values;
support for both synchronous and
asynchronous writings;
support for labels, annotations and metadata.
2.4. Graphite
Graphite is an enterprise-scale monitoring system
composed of a daemon listening for time-series data, a
fixed-size database similar to RRD (round-robin-
database) and a dashboard-like web application. It
features:
long-term retention but with the expense of
storage efficiency;
multi-archive storage;
average-like aggregation with functions such
as average, sum, min, max, last;
support for labels, annotations and metadata.
InfluxDB, OpenTSDB and Graphite are passive
databases, in the sense that the agents are pushing
metrics to the database’s interface, while Prometheus
adopts a pull model, “scrapping” metrics from
applications.
Another major difference is that Prometheus has built-
in aggregation functions and alert manager subsystem.
In this regard, Prometheus is a full monitoring and
trending system that includes built-in and active
scraping, storing, querying, graphing, and alerting
based on time series data.
If we take in consideration that the gateways are
relatively light computational devices, at least when
compared with cloud computing performance, we note
that Prometheus has an edge over competition:
InfluxDB and Grahite require more storage
and have limited aggregation functions,
functions which otherwise would have to be
implemented on client side and consequently
requiring more computational resources;
OpenTSDB storage is implemented on top of
Hadoop and HBase, requiring the complex
deployment of a cluster with multiple nodes
from the beginning.
Thus, we will focus on Prometheus as it supports
greater autonomy and does well on resource-
constrained environments.
3. DEPLOYING PROMETHEUS ON GATEWAYS
Prometheus consists of multiple components,
some of them optional:
the Prometheus server scrapes and stores the
time-series data; it supports a query language
which allows for a wide range of operations
including aggregation, slicing and dicing,
prediction and joins;
the push gateway allows ephemeral and batch
jobs to expose their metrics to Prometheus;
since these kinds of jobs may not exist long
enough to be scraped, they can instead push
their metrics to a push gateway;
a browser-based dashboard builder based on
Rails/SQL;
a large variety of special-purpose exporters;
an exporter is basically a http resource
identified by an URL and which contains
metrics (key, tags and values) in a specific
format;
an alert manager which takes care of de-
duplicating, grouping, silencing and routing
them to the correct receiver integration such
as email, PagerDuty or OpsGenie.
146
Figure 3 – Prometheus overall architecure.
Source: www.prometheus.io
Prometheus can be compiled from sources or
precompiled binaries for common operating systems
can be downloaded and installed. There is support for
docker images as well.
There are two ways of telling Prometheus what
targets to use (data scrapping locations): either using
file-based local configuration if a high level of
autonomy is desired or solutions that support service
discovery like Kubernetes and Consul.io for
centralized system architectures.
When deploying Prometheus at gateway level, we
should consider as first target the gateway (in IoT we
associate a gateway with single-board computers like
Raspberry Pi). To achieve this, we can use
Prometheus node exporter to expose thousands of
different types of metrics specific to machines running
on unix-like OS. These metrics cover statistics about
cpu, diskstats, conntrack, available entropy, file
descriptors, network, hardware devices, virtual
devices, vmstat, interrupts, network connections, etc.
The node exporter can be started as a background
process as bellow:
$ nohup node_exporter <flags>
Then we can add the target to Prometheus
configuration in YAML format, specifying the target
URL, the job’s name and the scrape interval. By
default the node exporter will listen on port 9100.
scrape_configs:
- job_name: "node"
scrape_interval: "15s"
target_groups:
- targets: ['localhost:9100']
The next step would be to start Prometheus
server. One way to do it is to start it as a subsystem by
placing a script similar to the one bellow in /etc/init.d.
Figure 4 - Example of script for starting Prometheus
There are several arguments Prometheus accept
that are very important when considering deploying
on gateways:
storage.local.chunk-encoding-version: the
type 1 encoding allows a faster random
access at the expense of storage (3 bytes per
sample), type 2 has better compression (1.3
bytes) but cause more CPU usage and
increased query latency.
storage.local.retention: measured in hours, it
allows you to configure the retention time for
samples. Because the gateway is used to push
curated data upstream, this parameter should
have small values, like 30 days or less.
storage.local.memory-chunks: there should
be 3 memory chunks per series.
storage.local.series-file-shrink-ratio: greater
value minimizes rewrites but at the cost of
more disk space.
4. QUERYING
Prometheus provides a functional expression
language that lets the user select and aggregate time
series data in real time. The result of an expression can
either be shown as a graph, viewed as tabular data in
Prometheus's expression browser, or consumed by
external systems via the HTTP API.
There are four data types in Prometheus
expression language (PromQL):
instant vector - a set of time series containing
a single sample for each time series, all
sharing the same timestamp;
range vector - a set of time series containing
a range of data points over time for each time
series;
scalar - a simple numeric floating point
147
value;
string - a simple string value; currently
unused.
Besides arithmetic, comparison and logical
operators, PormQL supports:
vector matchin g: operations between vectors
attempt to find a matching element in the
right-hand-side vector for each entry in the
left-hand side;
aggregation operators like sum, min, max,
avg, stddev (standard deviation over dimen-
sions), stdvar (standard variance over dimen-
sions), count, bottomk (smallest k elements
by sample value), topk (largest k elements by
sample value), count_values (count number
of elements with the same value).
An instant vector can be obtained by simply
calling the metric name. For instance, the node
exporter has a metric called
process_cpu_seconds_total which is a counter telling
us the total user and system CPU time spent in
seconds. The instant vector is
process_cpu_seconds_total.
A range vector works like an instant vector,
except that it selects a range of samples back from the
current instant. The range duration can be appended in
square brackets to the end of the vector name. For
instance, at a scrape interval of 15 seconds,
process_cpu_seconds_total[1m] will return 4 values
recorded in the last 1 minute.
Prometheus also comes with more than 30 built-in
functions that operates on vectors
Function name and arguments Description
abs(v vector) returns the input vector with all sample values converted to their absolute value
absent(v vector) returns an empty vector if the vector passed to it has any elements and a 1-
element vector with the value 1 if the vector passed to it has no elements
ceil(v instant-vector) rounds the sample values of all elements in v up to the nearest integer
changes(v range-vector) for each input time series, the function returns the number of times its value has
changed within the provided time range as an instant vector
clamp_max(v instant-vector, max scalar) clamps the sample values of all elements in v to have an upper limit of max
clamp_min(v instant-vector, min scalar) clamps the sample values of all elements in v to have a lower limit of min
count_scalar(v instant-vector) returns the number of elements in a time series vector as a scalar
delta(v range-vector) calculates the difference between the first and last value of each time series
element in a range vector v
deriv(v range-vector) calculates the per-second derivative of the time series in a range vector v, using
simple linear regression
drop_common_labels(instant-vector) drops all labels that have the same name and value across all series in the input
vector
exp(v instant-vector) calculates the exponential function for all elements in v
floor(v instant-vector) rounds the sample values of all elements in v down to the nearest integer
histogram_quantile(φ float, b instant-
vector)
calculates the φ-quantile (0 φ ≤ 1) from the buckets b of a histogram
holt_winters(v range-vector, sf scal ar, tf
scalar)
produces a smoothed value for time series based on the range in v
increase(v range-vector) calculates the increase in the time series in the range vector
irate(v range-vector) calculates the per-second instant rate of increase of the time series in the range
vector
ln(v instant-vector) calculates the natural logarithm for all elements in v
log2(v instant-vector) calculates the binary logarithm for all elements in v
log10(v instant-vector) calculates the decimal logarithm for all elements in v
predict_linear(v range-vector, t scalar) predicts the value of time series t seconds from now, based on the range vector v,
using simple linear regression
rate(v range-vector) calculates the per-second average rate of increase of the time series in the range
vector
resets(v range-vector) returns the number of counter resets within the provided time range as an instant
vector
round(v instant-vector, to_nearest=1 scalar) rounds the sample values of all elements in v to the nearest integer
scalar(v instant-vector) returns the sample value of that single element as a scalar
sort(v instant-vector) returns vector elements sorted by their sample values, in ascending order
sort_desc(v instant-vector) returns vector elements sorted by their sample values, in descending order
sqrt(v instant-vector) calculates the square root of all elements in v
148
avg|min|max|sum|count_over_time(v range-
vector)
the average|minimum|maximum|sum|count value of all points in the specified
interval
Table 1 - Prometheus built-in functions
To demonsatrate the usage of these functions, let
us consider this example: we want to predict what
how much disk space we will have 1 day from now on
the root filesystem partition, mounting point “/” and
on machine identify by label instance “serv1”. The
Prometheus function that does that is predict_linear,
which accepts as arguments a range vector (we will
take ranges of 1 minute) and a scalar for the interval in
seconds.
The PromQL query for our use case is:
predict_linear(node_filesystem_avail{instance="
serv1",mountpoint="/"}[1m],86400)
For frequent and computationally expensive
queries, Prometheus comes with precomputed results
saved as new time-series based on explicit rules, like
in the following example:
job:http_inprogress_requests:sum = sum(http_inprogress_requests)
by (job)
Here, the recording rule is evaluated at the
interval specified by the evaluation_interval field in
the Prometheus configuration. During each evaluation
cycle, the right-hand-side expression of the rule
statement is evaluated at the current instant in time
and the resulting sample vector is stored as a new set
of time series with the current timestamp and a new
metric name (job:http_inprogress_requests:sum).
5. ALERTS
Alerting with Prometheus is separated into two
parts. Alerting rules in Prometheus servers send alerts
to an Alertmanager. The Alertmanager then manages
those alerts, including silencing, inhibition,
aggregation and sending out notifications via methods
such as email, PagerDuty and HipChat.
The Alertmanager can be started similar to
Prometheus:
$> nohup alertmanager -config.file=config.yml
The configuration file holds information about the
notification integrations (e.g. email, hipchat,
slack,webhook, pagerduty,pushover), routing rules
and inhibition rules.
A route block defines a node in a routing tree and
its children. Its optional configuration parameters are
inherited from its parent node if not set. That way,
when an alert enters the tree at the configuration top-
level route, it will traverse the child nodes until it
“hits” a matching node and consequently a
notification is fired.
An inhibition rule is a rule that mutes an alert
matching a set of matchers under the condition that an
alert exists that matches another set of matchers. Both
alerts must have a set of equal labels.
The alerting rules are defined similar to recording
rules and reloaded by Prometheus by sending a
SIGHUP signal. A rule has the following syntax:
ALERT <alert name>
IF <expression>
[ FOR <dur atio n>]
[ LABELS <label set>]
[ ANNOTATIONS <label set>]
The optional FOR clause causes Prometheus to
wait for a certain duration between first encountering
a new expression output vector element and counting
an alert as firing for this element.
The LABELS clause allows specifying a set of
additional labels to be attached to the alert.
The ANNOTATIONS clause specifies another set
of labels that are not identifying for an alert instance.
They are used to store longer additional information
such as alert.
In our example with the prediction of the disk
space available tomorrow, we want to create an alarm
that would be fired (sent to Alarmanager for
dispatching) if tomorrow we will run out of free space.
The rule is based on predict_linear as shown bellow:
ALERT WeWillRunOutOfSpace
IF
predict_linear(node_filesystem_avail{instance="srv1",mountp
oint="/"}[1m],86400) < 1
FOR 1m
ANNOTATIONS {
summary = "No more free disk space tomorrow on {{
$labels.instance }}",
description = "{{ $labels.instance }} will run out of space
(current value: {{ $value }}s)",
5. CONCLUSIONS
Time-series databases facilitate predictive
forecasting which has long been the goal for reliability
engineers. With a service oriented architecture,
solution such as Prometheus can be used to automate
the reaction of the system to certain predictions or to
new data insights. We can create alerting rules and
have them routed by an alert manager to message
brokers such as ActiveMQ or Redis. Or we can create
our own reactive manager that has more complex rules
and functions that Prometheus offer. Monitoring
services, part of the reactive applications (see Figure
5), which are subscribers to the notification stream can
then execute explicit instructions for specific events.
149
Prometheus comes with many useful functions
that process the time-series. However, they are limited
to simple arithmetic and logic operations. For more
complex use cases, Machine Learning can be used to
classify time-series events based on historical data.
Such a solution is TensorFlow, an open source
software library numerical computation and machine
learning. Given multiple time-series that have causal
connections, we can use TensorFlow to train logistic
regression models to identify (classify) events that
impact the performance of the applications. For
instance, the system can be trained to know that an
increase in memory usage over time by a certain
application signals a memory leak. Of course, in this
case we could also use rather much simpler arithmetic
operations based on certain thresholds, but machine
learning allows more precision makes it easier to
avoid false alarms. Also once trained, the system will
be able to dynamically classify based on patterns
without human intervention.
Figure 5 - T he architecture for automatic monitoring
REFERENCES
1. Balani, Naveen. Enterprise IoT: A Definitive Handbook. ISBN
1518790860.
2. Acatech. NATIONAL ACADEMY OF SCIENCE AND
ENGINEERING. 2016.
3. Varmesan, Ovidiu and Friess, Peter. Internet of Things:
Converging Technologies for Smart Environments and Integrated
Ecosystems. s.l. : River Publishers. ISBN: 978-87-92982-73-5.
4. Juniper Research. Internet of things’ connected devices to almost
triple to over 38 billion units by 2020. [Online]
http://www.juniperresearch.com/press/press-releases/iot-connected-
device s-to-triple-to-38-bn-by-2020.
5. Boncea, Radu, Bacivarov, Ioan C. S ecurity in I nternet of T hings:
Mitigating the Top Vulnerabilities. Asigurarea Calităţii – Quality
Assurance. January-March 2016, Vol. XXII, 85, pp. Pages 11-17.
6. Prom eth eus - Monitoring system & time series database. [Online]
[Cited: 06 20, 2016.] https://prometheus.io.
7. Gorilla: A Fast, Scalable, In-Memory Time Series Database.
Tuomas Pelkonen, Scott Franklin, Paul Cavallaro, Qi Huang, Justin
Meza, Justin Teller, Kaushik Veeraraghavan. 2014-2015,
Proceedings of the VLDB Endowment, Vol. 8, pp. 1816 - 1827.
8. Gilchrist, Ala sdair. The Technical and Business Innovators of the
Industrial Internet. Industry 4.0. s.l. : Apress, pp. 33 -64.
9. Mauro Andreolini, Marcello Pietri, Stefania Tosi, Riccardo
Lancellotti. A Scalable Monitor for Large Systems. Cloud
Computing and Services Sciences. 201 5 : Springer International
Publishing, pp. 100-116.
... Reliability: In IoT applications, the system should be correctly working, collecting data, and communication protocols must be perfectly reliable. The object should be ready to make a decision in an emergency case [80][81][82][83][84][85]. ...
... Furthermore, during the search process, the providers are reached rapidly and the network lifetime is increased in comparison with baseline works. 83 5 A task execution and offloading mechanism in collaborative edge computing for IoT network ...
Thesis
The Internet of Things (IoT) is a new paradigm of a global network of things. It makes everything and everyone connected and interacted from anywhere, at any time, and using any path and network. IoT nodes can generate enormous amounts of data, perform certain analyses, and make decisions to provide efficient and smart services. Furthermore, a new communication mechanism among nodes is adopted, named Social Internet of Things (SIoT), where the objects can create relationships as human beings do to provide or consume a given service. However, building a searching mechanism for the large number nodes and services makes the space of interest immense and the available solutions diversified, which increases the time required and decreases the accuracy. Furthermore, the IoT networks are characterized by the constrained nodes and the heterogeneity of resources such as energy consumption, storage capacity, processing capacity; therefore, searching the IoT nodes that provide services responding to the request requirements without considering these constraints can lead to decrease the IoT network lifetime. The problem of node search in IoT networks is the discovery and selection of services considering the constraints of IoT nodes and networks. In this thesis, we have proposed three contributions. In the first one, we proposed an approach for service discovery and selection in the IoT. The discovery phase is performed by an edge server using a neural network. The selection phase is performed by nodes to select the adequate node from the set of relevant nodes using Ant Colony Optimization (ACO). The simulation results show high performance in terms of accuracy (96.5%) and a longer network lifetime for the discovery and selection phases, respectively, as well as a short period of time for both phases. In the second one, we proposed an approach that improves the network navigability, speeds up the search process, and increases the network lifetime. This approach aims at creating groups dynamically by nodes where each group has a master node, second, using a consensus algorithm between master nodes to agree with a specific capability, finally adopting a friendship selection method to create a social network. Thus, the friends will be sorted periodically for the objective of creating a balance between the energy consumption and the rapid search process simultaneously. Simulation results on the Brightkite location-based online social network dataset demonstrate that our proposal outperforms baseline methods in terms of some parameters of network navigability, path length to reach the providers, and network lifetime. In the third one, we proposed a new mechanism that allows tasks to be executed by the most capable constrained nodes and offloaded to the efficient edge servers. Each edge server clusters the constrained nodes within its sub-network into capable and incapable nodes. After that, ACO is improved to find the efficient edge servers that will be included in the optimized path for a task offloading in the network. Thus, during the real-time process, when an application task is sent to an IoT sub-network, an edge server can execute this task within its sub-network or offload it to other edge servers via the optimized path. Simulation results on a real network data set demonstrate that our proposal outperforms the state-of-the-art approaches in terms of both the optimal path and latency for task offloading, network lifetime, availability of end nodes, and task success rate.
... In the literature, Refs. [7][8][9][10][11][12][13][14] considered the reliability in IoT and IoT-based SF applications, but they did not exploit the capabilities provided by Fog Computing (FC) for accomplishing data reliability. FC provides computation, storage, and networking resources closer to end-devices, which allows facing the limitations of IoT applications supported in the Cloud Computing [15,16]. ...
... An approach for providing reliability in Fog-based IoT The ability of a system to provide the correct data collection in Smart Farming data over a period of time Refs. [7][8][9][10][11][12][13][14] introduced IoT applications, mostly in SF domains. These works focus on deploying specific applications such as monitoring for greenhouses, a control system for watering crops, and a system to minimize the use of fertilizer and pesticides. ...
Article
Full-text available
The reliability in data collection is essential in Smart Farming supported by the Internet of Things (IoT). Several IoT and Fog-based works consider the reliability concept, but they fall short in providing a network's edge mechanisms for detecting and replacing outliers. Making decisions based on inaccurate data can diminish the quality of crops and, consequently, lose money. This paper proposes an approach for providing reliable data collection, which focuses on outlier detection and treatment in IoT-based Smart Farming. Our proposal includes an architecture based on the continuum IoT-Fog-Cloud, which incorporates a mechanism based on Machine Learning to detect outliers and another based on interpolation for inferring data intended to replace outliers. We located the data cleaning at the Fog to Smart Farming applications functioning in the farm operate with reliable data. We evaluate our approach by carrying out a case study in a network based on the proposed architecture and deployed at a Colombian Coffee Smart Farm. Results show our mechanisms achieve high Accuracy, Precision, and Recall as well as low False Alarm Rate and Root Mean Squared Error when detecting and replacing outliers with inferred data. Considering the obtained results, we conclude that our approach provides reliable data collection in Smart Farming.
... Several works have addressed the reliability issue in diverse application domains like IoT and 5G. Authors like Boncea and Bacivarov (2016) and Kang and Choo (2018) provided reliability in edge gateways using an alert manager and a self-configuring mechanism to automate the reaction to failures in IoT systems and set up IoT devices automatically after failures. Nevertheless, those authors did not evaluate reliability as a performance metric. ...
Article
Reliability is essential in Smart Farming supported by the IoT-Fog-Cloud continuum. Smart Farms’ unprotection may cause significant economic losses and low yields of production. This paper introduces an optimization model for providing reliability and, consequently, service continuity to the IoT-Fog-Cloud continuum-based smart farms. The proposed model allows Smart Farming stakeholders to find the optimal number of Fog Nodes needed to deploy farming services considering the heterogeneity in the fog capabilities, resource demands, redundancy techniques, and reliability requirements. The model was solved using linear programming and evaluated with different demands and protection schemes. Results show that protection schemes guarantee high reliability and reveal that a shared redundancy scheme reduces deployment cost and yet provides reliability. Results also indicate that deployment costs and resources depend on the type of fog-based smart farm services to serve. Moreover, they show that deploying more low-resource hardware can be less expensive for low-reliability demands than deploying with a few high-resource hardware.
... InfluxDB is a NoSQL database optimized for working with time-series data. Several pieces of research have shown that it is suitable for IoT systems that produce high loads of data that need to be stored and analysed [22][23][24]. It supports SQL-like (InfluxQL) queries but offers optimized performance for time series. ...
Article
Full-text available
With the evolution of technology, developed systems have become more complex and faster. Thirty years ago, there were no protocols or databases dedicated to developing and implementing IoT projects. We currently have protocols such as MQTT, AMQP, CoAP, and databases such as InfluxDB. They are built to support a multitude of data from an IoT system and scale very well with the system. This paper presents the design and implementation of an IoT alert system that uses MQTT and InfluxDB to collect and store data. We design a scalable system to display assertive alerts on a Raspberry Pi. Each user can select a subset of alerts in our system using a web interface. We present a bibliographic study of SoTA, the proposed architecture, the challenges posed by such a system, a set of tests for the performance and feasibility of the solution, and a set of conclusions and ideas for further developments.
... IoT will need new methods for addressing key challenges in security and reliability, focusing on the ability of the system to prevent itself from failing by continuously introspecting its own state and take decisions without human intervention [Boncea, 2016]. ...
Article
Full-text available
The IoT networks are growing more complex each year. With this growth comes the cost of managing and monitoring all the devices present inside the network and the rise of concern regarding the security and privacy that can be ensured. This challenge can be overcomed by making use of the latest technologies available, AIOps and Blockchain. We will explore these technologies, how they are being successfully used on Cloud systems and how they can be molded to drive more complex networks such as the IoT.
... The solution is to use a decentralized model such as blockchain and the power of smart contracts. [16] is permanent, chronologically ordered, and readily available to any or all others on the network; ...
Article
Full-text available
Since 1999, when Kevin Ashton coined the term Internet of Things (IoT), till our present days, IoT has evolved from a simple concept to one of the topmost business growth drivers. Along with Machine Learning, Cloud Computing and Big Data, IoT is the foundation stone upon which data- driven digital services are built. In near future IoT will reach a tipping point where most of the data generated in Internet will come from billions of devices that are too resource-constrained to be able to efficiently enforce complex security and data privacy policies. The solution is to use lightweight authentication and agreement protocols for dumb devices and integrate distributed ledger technologies, e.g. blockchain in smarter devices and make use of smart contracts to execute processes on predetermined rules. In this regard, we present a simple smart contract written in Solidity and a young, but very promising, blockchain IoT-centric platform, IoTeX. IoTeX is to become fully operational in 2019, it supports Solidity and Ethereum virtual machine and adopts a blockchains in blockchain architecture, with focus on accommodating a large number of transactions than traditional PoW consensus blockchains. And we finish with presenting some interesting tools incubated by Hyperledger, a Linux Foundation project, that allow us to build bottomtop blockchain building blocks.
... For measuring the load on the slaves, we have chosen the Prometheus monitoring tool [26] with its node exporter component (as a sensor). It is an easy-to-use, lightweight, powerful monitoring tool among many others [27] fully serving our needs. Each slave contains a running node exporter, deployed and configured at startup. ...
Article
Full-text available
Nowadays a significant part of the cloud applications processes a large amount of data to provide the desired analytics, simulation and other results. Cloud computing is becoming a widely used IT model to address the needs of many scientific and commercial Big Data applications. In this paper, we present a Hadoop platform deployment method for various cloud infrastructures with the Occopus cloud orchestrator tool. Our automated solution provides an easy-to-use, portable and scalable way to deploy the popular Hadoop platform with the main goal to avoid vendor locking issues, i.e. there is no dependency on any cloud provider prepared and offered virtual machine image or “black-box” Platform-as-a-Service mechanism. The paper presents promising performance measurements results and cost analysis.
Conference Paper
Current monitoring solutions are not well suited to monitoring large data centers in different ways: lack of scalability, scarce representativity of global state conditions, inability in guaranteeing persistence in service delivery, and the impossibility of monitoring multi-tenant applications. In this paper, we present a novel monitoring architecture that strives to address these problems. It integrates a hierarchical scheme to monitor the resources in a cluster with a distributed hash table (DHT) to broadcast system state information among different monitors. This architecture strives to obtain high scalability, effectiveness and resilience, as well as the possibility of monitoring services spanning across different clusters or even different data centers of the cloud provider. We evaluate the scalability of the proposed architecture through an experimental analysis and we measure the overhead of the DHT-based communication scheme.
Chapter
The advances in sensor technologies in recent times have been driven by the advent of high-speed and low-cost electronic circuits, a change in the way we approach signal processing, and corresponding advances in manufacturing technologies. The coming together of these new developments in these synergetic fields has allowed sensor designers and manufacturers to take a completely novel approach, such as introducing intelligence for self-monitoring and self-calibration, thereby increasing the performance of their technical products. Similarly, the advances in sensor manufacturing technologies facilitate the production of systems and components with a low cost-to-performance ratio. This includes advances in microsystem technologies, where manufacturers are increasingly adopting techniques such as surface and bulk micromachining. Furthermore, initiatives exploring the potential in the field of digital signal processing involve novel approaches for the improvement of sensor properties. These improvements in sensor performance and quality mean that multi-sensor systems, which are the foundation of the Industrial Internet, can significantly contribute to the enhancement of the quality and availability of information. Due to these initiatives and an innovative approach by designers, this has led to new sensor structures, manufacturing technologies, and signal processing methods in individual and multi-sensor systems. However, it is the latest trends in sensor technology that have the most relevance in the Industrial Internet and these are the miniaturization of sensors and components, the widespread use of multi-sensor systems, and the increasing availability of radio wireless and autonomous sensors.
Enterprise IoT: A Definitive Handbook
  • Naveen Balani
Balani, Naveen. Enterprise IoT: A Definitive Handbook. ISBN 1518790860.