Conference PaperPDF Available

A System Architecture for Monitoring the Reliability of IoT

September 2016

September 2016

Conference: PROCEEDINGS of the 15th International Conference on Quality and Dependability

Authors:

Radu Boncea

Institutul Naţional de Cercetare-Dezvoltare în Informatică

Ioan Bacivarov

Polytechnic University of Bucharest

The Internet of Things has gained momentum in recent years, supported by new technologies and computing paradigms such as Cloud Computing and Service Oriented Architecture and an increasing demand from the enterprise. With hundreds of billions of devices to be connected in the near future, IoT will need new methods for addressing key challenges in security and reliability. One particular challenge we will focus on is the ability of the system to prevent itself from failing by continuously introspecting its own state and take decisions without human intervention. We will demonstrate how this can be achieved using new time series databases and monitoring systems such as Prometheus, InfluxDB, OpenTSDB and Graphite. By logging performance and other transaction metrics, the system can use specific algorithms to predict potential issues and react. We will then show how machine-learning algorithms could be used to reveal new insights, patterns and relationships across data.

Monitoring computational resources in Cloud Computing

…

oT generic architecture

…

The architecture for automatic monitoring

…

Figures - uploaded by Radu Boncea

Content may be subject to copyright.

Content uploaded by Radu Boncea

Content may be subject to copyright.

143

A System Architecture for Monitoring the Reliability of IoT

Radu BONCEA* **, Ioan BACIVAROV**

*Romania Top Level Domain, National Institute for Research and Development

in Informatics-ICI Bucharest

** University Politehnica of Bucharest, Faculty of Electronics,

Telecommunications and Information Technology

* radu@rotld.ro, **bacivaro@euroqual.pub.ro

Abstract

The Internet of Things has gained momentum in recent years, supported by new technologies and

computing paradigms such as Cloud Computing and Service Oriented Architecture and an

increasing demand from the enterprise. With hundreds of billions of devices to be connected in the

near future, IoT will need new methods for addressing key challenges in security and reliability.

One particular challenge we will focus on is the ability of the system to prevent itself from failing

by continuously introspecting its own state and take decisions without human intervention. We will

demonstrate how this can be achieved using new time series databases and monitoring systems

such as Prometheus, InfluxDB, OpenTSDB and Graphite. By logging performance and other

transaction metrics, the system can use specific algorithms to predict potential issues and react.

We will then show how machine-learning algorithms could be used to reveal new insights,

patterns and relationships across data.

Keywords: IoT, monitoring, reliability, self-management, time series, automation, Prometheus,

OpenTSDB, InfluxDB

1. INTRODUCTION

Internet of Things is a vision where every object

in the world has the potential to connect to the

Internet and provide their data so as to derive

actionable insights on its own or through other

connected objects [1].

With support from Do It Yourself communities,

IoT has emerged as a key enabling technology for the

4th Industrial Revolution [2] along with Internet of

Services, Cloud Computing, Machine-to-Machine,

RDIF, Cyber-Physical Systems, Autonomic Systems,

Systems of Systems, Robotics, Software Agents,

Cooperating Objects [3] and Machine Learning.

Recent studies have put the number of IoT devices

connected to Internet to reach 38.5 billion by 2020 [4].

The classic methods of monitoring application

performance rely on tools such as Nagios, Cacti or

Zabbix to log application metrics and a lot of human

engineering to interpret these metrics and make

appropriate decisions. There are multiple layers the

applications run on so you have system administrators,

network administrator and application developers

doing the monitoring at regular intervals of time.

Cloud computing has been the first technology to

challenge this model with its increased number of

applications and services that needed to be monitored,

from infrastructure components (servers, routers

storage) to cloud computing services and the user

experience. Cloud computing vendors like VMware or

Microsoft have integrated the active monitoring into

centralized management and analytics solutions. In

cloud computing, the computational resources are

monitored for both the physical and virtual layers

using agents like VMware vSphere Hypervisor which

reports metrics of the physical machine or host

and VMware View Agent which is addressing

virtual machine metrics as pictured in Fig.1. All

metric values are pulled from agents and pushed to a

time-series database where an analytics platform has

access to.

144

Figure 1 - Monitoring computational resources in Cloud Computing

The IoT ecosystem is composed of 4 layers

(Fig.2): the Edge where IoT devices are located; the

Gateway where sensors data is initially stored, filtered

and curated; the Cloud Platform is the layer where

data is enriched and processed by analytics tools; the

last layer is the Presentation where data-centric

business services are offered to end users [5].

Figure 2 - IoT generic architecture

The monitoring of the applications in the

Presentation layer is based on the classic model with

Nagios, Zabbix and Cacti largely used. The Cloud

Platform monitoring is done using vendor-orien ted

solutions such as VMWare vSphere or cloud operating

systems such as OpenStack.

Because the IoT devices are generally resource-

constrained, the monitoring is done at the Gateway

layer along with the monitoring of other Gateway

applications. There is also a challenge regarding the

number of devices per gateway and the number of

gateways in a typical IoT ecosystem. Like it is the

case with plant and soil monitoring over a large area.

There would be tens of thousands of sensors and

hundreds of gateways deployed in a star-of-stars

topology. Manually monitoring the performance and

reliability of so many devices would be expensive and

inefficient. The system must be able to monitor itself

and react, either by sending alerts or executing series

of operations. In this paper we will discuss the

monitoring of the devices at the Edge and the

applications deployed on gateways using prediction

models and trends based on time-series data.

2. TIME-SERIES DATABASES

There are specific non-functional requirements

for IoT time-series databases that are deployable on

gateways:

• labeling and tagging data points is a must due

to the large variety of devices;

• labels should be indexable so filtering by a

specific tag or label should be done at

database engine level;

• high resolution datapoints;

• the engine should be optimized for intensive

writing, almost no updates and the deletes are

done in bulk;

• compressed storage;

• support service integration through a HTTP

API as the gateways would accommodate a

service oriented architecture with a plethora

of microservices deployed.

One observation worth noting is the fact there is no

requirement for long-term retention of data. This is

because the gateways are pushing data further to the

Cloud Platform where the historical data is analyzed

in a greater context. The data at gateway is used only

near real-time analysis.

We will analyze 4 modern monitoring systems

that are implementing the above described

requirements: Prometheus, InfluxData, OpenTSDB

and Graphite.

2.1. Prometheus

Prometheus is an open-source systems monitoring

and alerting toolkit, using LevelDB as a time-series

database (TSDB) and featuring:

• a multi-dimensional data model with support

for labels;

• a flexible query language that lets the user to

aggregate time series data in real time;

• no reliance on distributed storage; single

server nodes are autonomous;

• time series collection happens via a pull

model over HTTP;

• pushing time series is supported via an

intermediary gateway;

145

• targets are discovered via service discovery

or static configuration;

• multiple modes of graphing and

dashboarding support;

• HTTP API;

• alert manager;

2.2. InfluxData

InfluxData provides a robust, open source and

fully customizable time-series data management

platform. It uses InfluxDB for storing metrics and IoT

sensor data. It features:

• support for labels and data annotations, but

unlike Prometheus, InfluxDB is attaching the

metadata to each event/row, thus increasing

the overall overhead and disk space required;

• high availability with InfluxDB Relay;

• expressive SQL-like query language;

• continuous queries automatically compute

aggregate data to make frequent queries more

efficient;

• it implements the push model where agents

are sending the metrics to InfluxDB;

• downsampling and resolution adjustment

over time;

• HTTP API.

2.3. OpenTSDB

OpenTSDB is a time-series database running on

top of Hadoop and HBase, designed specifically for

long retention of raw data and greater scalability. It

features:

• millisecond resolution;

• HTTP API;

• variable length encoding - use less storage

space for smaller integer values;

• support for both synchronous and

asynchronous writings;

• support for labels, annotations and metadata.

2.4. Graphite

Graphite is an enterprise-scale monitoring system

composed of a daemon listening for time-series data, a

fixed-size database similar to RRD (round-robin-

database) and a dashboard-like web application. It

features:

• long-term retention but with the expense of

storage efficiency;

• multi-archive storage;

• average-like aggregation with functions such

as average, sum, min, max, last;

• support for labels, annotations and metadata.

InfluxDB, OpenTSDB and Graphite are passive

databases, in the sense that the agents are pushing

metrics to the database’s interface, while Prometheus

adopts a pull model, “scrapping” metrics from

applications.

Another major difference is that Prometheus has built-

in aggregation functions and alert manager subsystem.

In this regard, Prometheus is a full monitoring and

trending system that includes built-in and active

scraping, storing, querying, graphing, and alerting

based on time series data.

If we take in consideration that the gateways are

relatively light computational devices, at least when

compared with cloud computing performance, we note

that Prometheus has an edge over competition:

• InfluxDB and Grahite require more storage

and have limited aggregation functions,

functions which otherwise would have to be

implemented on client side and consequently

requiring more computational resources;

• OpenTSDB storage is implemented on top of

Hadoop and HBase, requiring the complex

deployment of a cluster with multiple nodes

from the beginning.

Thus, we will focus on Prometheus as it supports

greater autonomy and does well on resource-

constrained environments.

3. DEPLOYING PROMETHEUS ON GATEWAYS

Prometheus consists of multiple components,

some of them optional:

• the Prometheus server scrapes and stores the

time-series data; it supports a query language

which allows for a wide range of operations

including aggregation, slicing and dicing,

prediction and joins;

• the push gateway allows ephemeral and batch

jobs to expose their metrics to Prometheus;

since these kinds of jobs may not exist long

enough to be scraped, they can instead push

their metrics to a push gateway;

• a browser-based dashboard builder based on

Rails/SQL;

• a large variety of special-purpose exporters;

an exporter is basically a http resource

identified by an URL and which contains

metrics (key, tags and values) in a specific

format;

• an alert manager which takes care of de-

duplicating, grouping, silencing and routing

them to the correct receiver integration such

as email, PagerDuty or OpsGenie.

146

Figure 3 – Prometheus overall architecure.

Source: www.prometheus.io

Prometheus can be compiled from sources or

precompiled binaries for common operating systems

can be downloaded and installed. There is support for

docker images as well.

There are two ways of telling Prometheus what

targets to use (data scrapping locations): either using

file-based local configuration if a high level of

autonomy is desired or solutions that support service

discovery like Kubernetes and Consul.io for

centralized system architectures.

When deploying Prometheus at gateway level, we

should consider as first target the gateway (in IoT we

associate a gateway with single-board computers like

Raspberry Pi). To achieve this, we can use

Prometheus node exporter to expose thousands of

different types of metrics specific to machines running

on unix-like OS. These metrics cover statistics about

cpu, diskstats, conntrack, available entropy, file

descriptors, network, hardware devices, virtual

devices, vmstat, interrupts, network connections, etc.

The node exporter can be started as a background

process as bellow:

$ nohup node_exporter <flags>

Then we can add the target to Prometheus

configuration in YAML format, specifying the target

URL, the job’s name and the scrape interval. By

default the node exporter will listen on port 9100.

scrape_configs:

- job_name: "node"

scrape_interval: "15s"

target_groups:

- targets: ['localhost:9100']

The next step would be to start Prometheus

server. One way to do it is to start it as a subsystem by

placing a script similar to the one bellow in /etc/init.d.

Figure 4 - Example of script for starting Prometheus

There are several arguments Prometheus accept

that are very important when considering deploying

on gateways:

• storage.local.chunk-encoding-version: the

type 1 encoding allows a faster random

access at the expense of storage (3 bytes per

sample), type 2 has better compression (1.3

bytes) but cause more CPU usage and

increased query latency.

• storage.local.retention: measured in hours, it

allows you to configure the retention time for

samples. Because the gateway is used to push

curated data upstream, this parameter should

have small values, like 30 days or less.

• storage.local.memory-chunks: there should

be 3 memory chunks per series.

• storage.local.series-file-shrink-ratio: greater

value minimizes rewrites but at the cost of

more disk space.

4. QUERYING

Prometheus provides a functional expression

language that lets the user select and aggregate time

series data in real time. The result of an expression can

either be shown as a graph, viewed as tabular data in

Prometheus's expression browser, or consumed by

external systems via the HTTP API.

There are four data types in Prometheus

expression language (PromQL):

• instant vector - a set of time series containing

a single sample for each time series, all

sharing the same timestamp;

• range vector - a set of time series containing

a range of data points over time for each time

series;

• scalar - a simple numeric floating point

147

value;

• string - a simple string value; currently

unused.

Besides arithmetic, comparison and logical

operators, PormQL supports:

• vector matchin g: operations between vectors

attempt to find a matching element in the

right-hand-side vector for each entry in the

left-hand side;

• aggregation operators like sum, min, max,

avg, stddev (standard deviation over dimen-

sions), stdvar (standard variance over dimen-

sions), count, bottomk (smallest k elements

by sample value), topk (largest k elements by

sample value), count_values (count number

of elements with the same value).

An instant vector can be obtained by simply

calling the metric name. For instance, the node

exporter has a metric called

process_cpu_seconds_total which is a counter telling

us the total user and system CPU time spent in

seconds. The instant vector is

process_cpu_seconds_total.

A range vector works like an instant vector,

except that it selects a range of samples back from the

current instant. The range duration can be appended in

square brackets to the end of the vector name. For

instance, at a scrape interval of 15 seconds,

process_cpu_seconds_total[1m] will return 4 values

recorded in the last 1 minute.

Prometheus also comes with more than 30 built-in

functions that operates on vectors

Function name and arguments Description

abs(v vector) returns the input vector with all sample values converted to their absolute value

absent(v vector) returns an empty vector if the vector passed to it has any elements and a 1-

element vector with the value 1 if the vector passed to it has no elements

ceil(v instant-vector) rounds the sample values of all elements in v up to the nearest integer

changes(v range-vector) for each input time series, the function returns the number of times its value has

changed within the provided time range as an instant vector

clamp_max(v instant-vector, max scalar) clamps the sample values of all elements in v to have an upper limit of max

clamp_min(v instant-vector, min scalar) clamps the sample values of all elements in v to have a lower limit of min

count_scalar(v instant-vector) returns the number of elements in a time series vector as a scalar

delta(v range-vector) calculates the difference between the first and last value of each time series

element in a range vector v

deriv(v range-vector) calculates the per-second derivative of the time series in a range vector v, using

simple linear regression

drop_common_labels(instant-vector) drops all labels that have the same name and value across all series in the input

vector

exp(v instant-vector) calculates the exponential function for all elements in v

floor(v instant-vector) rounds the sample values of all elements in v down to the nearest integer

histogram_quantile(φ float, b instant-

vector)

calculates the φ-quantile (0 ≤ φ ≤ 1) from the buckets b of a histogram

holt_winters(v range-vector, sf scal ar, tf

scalar)

produces a smoothed value for time series based on the range in v

increase(v range-vector) calculates the increase in the time series in the range vector

irate(v range-vector) calculates the per-second instant rate of increase of the time series in the range

vector

ln(v instant-vector) calculates the natural logarithm for all elements in v

log2(v instant-vector) calculates the binary logarithm for all elements in v

log10(v instant-vector) calculates the decimal logarithm for all elements in v

predict_linear(v range-vector, t scalar) predicts the value of time series t seconds from now, based on the range vector v,

using simple linear regression

rate(v range-vector) calculates the per-second average rate of increase of the time series in the range

vector

resets(v range-vector) returns the number of counter resets within the provided time range as an instant

vector

round(v instant-vector, to_nearest=1 scalar) rounds the sample values of all elements in v to the nearest integer

scalar(v instant-vector) returns the sample value of that single element as a scalar

sort(v instant-vector) returns vector elements sorted by their sample values, in ascending order

sort_desc(v instant-vector) returns vector elements sorted by their sample values, in descending order

sqrt(v instant-vector) calculates the square root of all elements in v

148

avg|min|max|sum|count_over_time(v range-

vector)

the average|minimum|maximum|sum|count value of all points in the specified

interval

Table 1 - Prometheus built-in functions

To demonsatrate the usage of these functions, let

us consider this example: we want to predict what

how much disk space we will have 1 day from now on

the root filesystem partition, mounting point “/” and

on machine identify by label instance “serv1”. The

Prometheus function that does that is predict_linear,

which accepts as arguments a range vector (we will

take ranges of 1 minute) and a scalar for the interval in

seconds.

The PromQL query for our use case is:

predict_linear(node_filesystem_avail{instance="

serv1",mountpoint="/"}[1m],86400)

For frequent and computationally expensive

queries, Prometheus comes with precomputed results

saved as new time-series based on explicit rules, like

in the following example:

job:http_inprogress_requests:sum = sum(http_inprogress_requests)

by (job)

Here, the recording rule is evaluated at the

interval specified by the evaluation_interval field in

the Prometheus configuration. During each evaluation

cycle, the right-hand-side expression of the rule

statement is evaluated at the current instant in time

and the resulting sample vector is stored as a new set

of time series with the current timestamp and a new

metric name (job:http_inprogress_requests:sum).

5. ALERTS

Alerting with Prometheus is separated into two

parts. Alerting rules in Prometheus servers send alerts

to an Alertmanager. The Alertmanager then manages

those alerts, including silencing, inhibition,

aggregation and sending out notifications via methods

such as email, PagerDuty and HipChat.

The Alertmanager can be started similar to

Prometheus:

$> nohup alertmanager -config.file=config.yml

The configuration file holds information about the

notification integrations (e.g. email, hipchat,

slack,webhook, pagerduty,pushover), routing rules

and inhibition rules.

A route block defines a node in a routing tree and

its children. Its optional configuration parameters are

inherited from its parent node if not set. That way,

when an alert enters the tree at the configuration top-

level route, it will traverse the child nodes until it

“hits” a matching node and consequently a

notification is fired.

An inhibition rule is a rule that mutes an alert

matching a set of matchers under the condition that an

alert exists that matches another set of matchers. Both

alerts must have a set of equal labels.

The alerting rules are defined similar to recording

rules and reloaded by Prometheus by sending a

SIGHUP signal. A rule has the following syntax:

ALERT <alert name>

IF <expression>

[ FOR <dur atio n>]

[ LABELS <label set>]

[ ANNOTATIONS <label set>]

The optional FOR clause causes Prometheus to

wait for a certain duration between first encountering

a new expression output vector element and counting

an alert as firing for this element.

The LABELS clause allows specifying a set of

additional labels to be attached to the alert.

The ANNOTATIONS clause specifies another set

of labels that are not identifying for an alert instance.

They are used to store longer additional information

such as alert.

In our example with the prediction of the disk

space available tomorrow, we want to create an alarm

that would be fired (sent to Alarmanager for

dispatching) if tomorrow we will run out of free space.

The rule is based on predict_linear as shown bellow:

ALERT WeWillRunOutOfSpace

predict_linear(node_filesystem_avail{instance="srv1",mountp

oint="/"}[1m],86400) < 1

FOR 1m

ANNOTATIONS {

summary = "No more free disk space tomorrow on {{

$labels.instance }}",

description = "{{ $labels.instance }} will run out of space

(current value: {{ $value }}s)",

5. CONCLUSIONS

Time-series databases facilitate predictive

forecasting which has long been the goal for reliability

engineers. With a service oriented architecture,

solution such as Prometheus can be used to automate

the reaction of the system to certain predictions or to

new data insights. We can create alerting rules and

have them routed by an alert manager to message

brokers such as ActiveMQ or Redis. Or we can create

our own reactive manager that has more complex rules

and functions that Prometheus offer. Monitoring

services, part of the reactive applications (see Figure

5), which are subscribers to the notification stream can

then execute explicit instructions for specific events.

149

Prometheus comes with many useful functions

that process the time-series. However, they are limited

to simple arithmetic and logic operations. For more

complex use cases, Machine Learning can be used to

classify time-series events based on historical data.

Such a solution is TensorFlow, an open source

software library numerical computation and machine

learning. Given multiple time-series that have causal

connections, we can use TensorFlow to train logistic

regression models to identify (classify) events that

impact the performance of the applications. For

instance, the system can be trained to know that an

increase in memory usage over time by a certain

application signals a memory leak. Of course, in this

case we could also use rather much simpler arithmetic

operations based on certain thresholds, but machine

learning allows more precision makes it easier to

avoid false alarms. Also once trained, the system will

be able to dynamically classify based on patterns

without human intervention.

Figure 5 - T he architecture for automatic monitoring

REFERENCES

1. Balani, Naveen. Enterprise IoT: A Definitive Handbook. ISBN

1518790860.

2. Acatech. NATIONAL ACADEMY OF SCIENCE AND

ENGINEERING. 2016.

3. Varmesan, Ovidiu and Friess, Peter. Internet of Things:

Converging Technologies for Smart Environments and Integrated

Ecosystems. s.l. : River Publishers. ISBN: 978-87-92982-73-5.

4. Juniper Research. Internet of things’ connected devices to almost

triple to over 38 billion units by 2020. [Online]

http://www.juniperresearch.com/press/press-releases/iot-connected-

device s-to-triple-to-38-bn-by-2020.

5. Boncea, Radu, Bacivarov, Ioan C. S ecurity in I nternet of T hings:

Mitigating the Top Vulnerabilities. Asigurarea Calităţii – Quality

Assurance. January-March 2016, Vol. XXII, 85, pp. Pages 11-17.

6. Prom eth eus - Monitoring system & time series database. [Online]

[Cited: 06 20, 2016.] https://prometheus.io.

7. Gorilla: A Fast, Scalable, In-Memory Time Series Database.

Tuomas Pelkonen, Scott Franklin, Paul Cavallaro, Qi Huang, Justin

Meza, Justin Teller, Kaushik Veeraraghavan. 2014-2015,

Proceedings of the VLDB Endowment, Vol. 8, pp. 1816 - 1827.

8. Gilchrist, Ala sdair. The Technical and Business Innovators of the

Industrial Internet. Industry 4.0. s.l. : Apress, pp. 33 -64.

9. Mauro Andreolini, Marcello Pietri, Stefania Tosi, Riccardo

Lancellotti. A Scalable Monitor for Large Systems. Cloud

Computing and Services Sciences. 201 5 : Springer International

Publishing, pp. 100-116.

Intelligent Mechanisms for Node Search and Task Offloading in the Internet of Things Networks

Thesis

Jul 2021

The Internet of Things (IoT) is a new paradigm of a global network of things. It makes everything and everyone connected and interacted from anywhere, at any time, and using any path and network. IoT nodes can generate enormous amounts of data, perform certain analyses, and make decisions to provide efficient and smart services. Furthermore, a new communication mechanism among nodes is adopted, named Social Internet of Things (SIoT), where the objects can create relationships as human beings do to provide or consume a given service. However, building a searching mechanism for the large number nodes and services makes the space of interest immense and the available solutions diversified, which increases the time required and decreases the accuracy. Furthermore, the IoT networks are characterized by the constrained nodes and the heterogeneity of resources such as energy consumption, storage capacity, processing capacity; therefore, searching the IoT nodes that provide services responding to the request requirements without considering these constraints can lead to decrease the IoT network lifetime. The problem of node search in IoT networks is the discovery and selection of services considering the constraints of IoT nodes and networks. In this thesis, we have proposed three contributions. In the first one, we proposed an approach for service discovery and selection in the IoT. The discovery phase is performed by an edge server using a neural network. The selection phase is performed by nodes to select the adequate node from the set of relevant nodes using Ant Colony Optimization (ACO). The simulation results show high performance in terms of accuracy (96.5%) and a longer network lifetime for the discovery and selection phases, respectively, as well as a short period of time for both phases. In the second one, we proposed an approach that improves the network navigability, speeds up the search process, and increases the network lifetime. This approach aims at creating groups dynamically by nodes where each group has a master node, second, using a consensus algorithm between master nodes to agree with a specific capability, finally adopting a friendship selection method to create a social network. Thus, the friends will be sorted periodically for the objective of creating a balance between the energy consumption and the rapid search process simultaneously. Simulation results on the Brightkite location-based online social network dataset demonstrate that our proposal outperforms baseline methods in terms of some parameters of network navigability, path length to reach the providers, and network lifetime. In the third one, we proposed a new mechanism that allows tasks to be executed by the most capable constrained nodes and offloaded to the efficient edge servers. Each edge server clusters the constrained nodes within its sub-network into capable and incapable nodes. After that, ACO is improved to find the efficient edge servers that will be included in the optimized path for a task offloading in the network. Thus, during the real-time process, when an application task is sent to an IoT sub-network, an edge server can execute this task within its sub-network or offload it to other edge servers via the optimized path. Simulation results on a real network data set demonstrate that our proposal outperforms the state-of-the-art approaches in terms of both the optimal path and latency for task offloading, network lifetime, availability of end nodes, and task success rate.

An Approach Based on Fog Computing for Providing Reliability in IoT Data Collection: A Case Study in a Colombian Coffee Smart Farm

Article

Full-text available

Dec 2020

The reliability in data collection is essential in Smart Farming supported by the Internet of Things (IoT). Several IoT and Fog-based works consider the reliability concept, but they fall short in providing a network's edge mechanisms for detecting and replacing outliers. Making decisions based on inaccurate data can diminish the quality of crops and, consequently, lose money. This paper proposes an approach for providing reliable data collection, which focuses on outlier detection and treatment in IoT-based Smart Farming. Our proposal includes an architecture based on the continuum IoT-Fog-Cloud, which incorporates a mechanism based on Machine Learning to detect outliers and another based on interpolation for inferring data intended to replace outliers. We located the data cleaning at the Fog to Smart Farming applications functioning in the farm operate with reliable data. We evaluate our approach by carrying out a case study in a network based on the proposed architecture and deployed at a Colombian Coffee Smart Farm. Results show our mechanisms achieve high Accuracy, Precision, and Recall as well as low False Alarm Rate and Root Mean Squared Error when detecting and replacing outliers with inferred data. Considering the obtained results, we conclude that our approach provides reliable data collection in Smart Farming.

Reliability provisioning for Fog Nodes in Smart Farming IoT-Fog-Cloud continuum

Article

Sep 2022
COMPUT ELECTRON AGR

Reliability is essential in Smart Farming supported by the IoT-Fog-Cloud continuum. Smart Farms’ unprotection may cause significant economic losses and low yields of production. This paper introduces an optimization model for providing reliability and, consequently, service continuity to the IoT-Fog-Cloud continuum-based smart farms. The proposed model allows Smart Farming stakeholders to find the optimal number of Fog Nodes needed to deploy farming services considering the heterogeneity in the fog capabilities, resource demands, redundancy techniques, and reliability requirements. The model was solved using linear programming and evaluated with different demands and protection schemes. Results show that protection schemes guarantee high reliability and reveal that a shared redundancy scheme reduces deployment cost and yet provides reliability. Results also indicate that deployment costs and resources depend on the type of fog-based smart farm services to serve. Moreover, they show that deploying more low-resource hardware can be less expensive for low-reliability demands than deploying with a few high-resource hardware.

Assertive, Selective, Scalable IoT-Based Warning System

Article

Full-text available

Jan 2022
SENSORS-BASEL

With the evolution of technology, developed systems have become more complex and faster. Thirty years ago, there were no protocols or databases dedicated to developing and implementing IoT projects. We currently have protocols such as MQTT, AMQP, CoAP, and databases such as InfluxDB. They are built to support a multitude of data from an IoT system and scale very well with the system. This paper presents the design and implementation of an IoT alert system that uses MQTT and InfluxDB to collect and store data. We design a scalable system to display assertive alerts on a Raspberry Pi. Each user can select a subset of alerts in our system using a web interface. We present a bibliographic study of SoTA, the proposed architecture, the challenges posed by such a system, a set of tests for the performance and feasibility of the solution, and a set of conclusions and ideas for further developments.

Securely Driving IoT by Integrating AIOps and Blockchain

Article

Full-text available

Nov 2020

The IoT networks are growing more complex each year. With this growth comes the cost of managing and monitoring all the devices present inside the network and the rise of concern regarding the security and privacy that can be ensured. This challenge can be overcomed by making use of the latest technologies available, AIOps and Blockchain. We will explore these technologies, how they are being successfully used on Cloud systems and how they can be molded to drive more complex networks such as the IoT.

Building trust among things in omniscient Internet using Blockchain Technology

Article

Full-text available

Apr 2019

Since 1999, when Kevin Ashton coined the term Internet of Things (IoT), till our present days, IoT has evolved from a simple concept to one of the topmost business growth drivers. Along with Machine Learning, Cloud Computing and Big Data, IoT is the foundation stone upon which data- driven digital services are built. In near future IoT will reach a tipping point where most of the data generated in Internet will come from billions of devices that are too resource-constrained to be able to efficiently enforce complex security and data privacy policies. The solution is to use lightweight authentication and agreement protocols for dumb devices and integrate distributed ledger technologies, e.g. blockchain in smarter devices and make use of smart contracts to execute processes on predetermined rules. In this regard, we present a simple smart contract written in Solidity and a young, but very promising, blockchain IoT-centric platform, IoTeX. IoTeX is to become fully operational in 2019, it supports Solidity and Ethereum virtual machine and adopts a blockchains in blockchain architecture, with focus on accommodating a large number of transactions than traditional PoW consensus blockchains. And we finish with presenting some interesting tools incubated by Hyperledger, a Linux Foundation project, that allow us to build bottomtop blockchain building blocks.

Cloud agnostic Big Data platform focusing on scalability and cost-efficiency

Article

Full-text available

Jun 2018

Nowadays a significant part of the cloud applications processes a large amount of data to provide the desired analytics, simulation and other results. Cloud computing is becoming a widely used IT model to address the needs of many scientific and commercial Big Data applications. In this paper, we present a Hadoop platform deployment method for various cloud infrastructures with the Occopus cloud orchestrator tool. Our automated solution provides an easy-to-use, portable and scalable way to deploy the popular Hadoop platform with the main goal to avoid vendor locking issues, i.e. there is no dependency on any cloud provider prepared and offered virtual machine image or “black-box” Platform-as-a-Service mechanism. The paper presents promising performance measurements results and cost analysis.

On Analysis of Security and Elasticity Dependency in IIoT Platform Services

Conference Paper

Sep 2021

Internet of Things: Converging Technologies for Smart Environments and Integrated Ecosystems

Book

Sep 2022

A Scalable Monitor for Large Systems

Conference Paper

Dec 2015

Current monitoring solutions are not well suited to monitoring large data centers in different ways: lack of scalability, scarce representativity of global state conditions, inability in guaranteeing persistence in service delivery, and the impossibility of monitoring multi-tenant applications. In this paper, we present a novel monitoring architecture that strives to address these problems. It integrates a hierarchical scheme to monitor the resources in a cluster with a distributed hash table (DHT) to broadcast system state information among different monitors. This architecture strives to obtain high scalability, effectiveness and resilience, as well as the possibility of monitoring services spanning across different clusters or even different data centers of the cloud provider. We evaluate the scalability of the proposed architecture through an experimental analysis and we measure the overhead of the DHT-based communication scheme.

The Technical and Business Innovators of the Industrial Internet

Chapter

Jun 2016

Alasdair Gilchrist

The advances in sensor technologies in recent times have been driven by the advent of high-speed and low-cost electronic circuits, a change in the way we approach signal processing, and corresponding advances in manufacturing technologies. The coming together of these new developments in these synergetic fields has allowed sensor designers and manufacturers to take a completely novel approach, such as introducing intelligence for self-monitoring and self-calibration, thereby increasing the performance of their technical products. Similarly, the advances in sensor manufacturing technologies facilitate the production of systems and components with a low cost-to-performance ratio. This includes advances in microsystem technologies, where manufacturers are increasingly adopting techniques such as surface and bulk micromachining. Furthermore, initiatives exploring the potential in the field of digital signal processing involve novel approaches for the improvement of sensor properties. These improvements in sensor performance and quality mean that multi-sensor systems, which are the foundation of the Industrial Internet, can significantly contribute to the enhancement of the quality and availability of information. Due to these initiatives and an innovative approach by designers, this has led to new sensor structures, manufacturing technologies, and signal processing methods in individual and multi-sensor systems. However, it is the latest trends in sensor technology that have the most relevance in the Industrial Internet and these are the miniaturization of sensors and components, the widespread use of multi-sensor systems, and the increasing availability of radio wireless and autonomous sensors.

Enterprise IoT: A Definitive Handbook

Naveen Balani

Balani, Naveen. Enterprise IoT: A Definitive Handbook. ISBN 1518790860.

A System Architecture for Monitoring the Reliability of IoT

Abstract and Figures

Recommended publications

A Fog based System Model for Cooperative IoT Node Pairing using Matching Theory

Assessing the reliability of fog computing for smart mobility applications in VANETs

Fog Computing Architecture: Survey and Challenges

Learn by Examples How to Link the Internet of Things and the Cloud Computing Paradigms: A Fully Work...