Content uploaded by Radu Boncea
Author content
All content in this area was uploaded by Radu Boncea on Oct 13, 2016
Content may be subject to copyright.
143
A System Architecture for Monitoring the Reliability of IoT
Radu BONCEA* **, Ioan BACIVAROV**
*Romania Top Level Domain, National Institute for Research and Development
in Informatics-ICI Bucharest
** University Politehnica of Bucharest, Faculty of Electronics,
Telecommunications and Information Technology
* radu@rotld.ro, **bacivaro@euroqual.pub.ro
Abstract
The Internet of Things has gained momentum in recent years, supported by new technologies and
computing paradigms such as Cloud Computing and Service Oriented Architecture and an
increasing demand from the enterprise. With hundreds of billions of devices to be connected in the
near future, IoT will need new methods for addressing key challenges in security and reliability.
One particular challenge we will focus on is the ability of the system to prevent itself from failing
by continuously introspecting its own state and take decisions without human intervention. We will
demonstrate how this can be achieved using new time series databases and monitoring systems
such as Prometheus, InfluxDB, OpenTSDB and Graphite. By logging performance and other
transaction metrics, the system can use specific algorithms to predict potential issues and react.
We will then show how machine-learning algorithms could be used to reveal new insights,
patterns and relationships across data.
Keywords: IoT, monitoring, reliability, self-management, time series, automation, Prometheus,
OpenTSDB, InfluxDB
1. INTRODUCTION
Internet of Things is a vision where every object
in the world has the potential to connect to the
Internet and provide their data so as to derive
actionable insights on its own or through other
connected objects [1].
With support from Do It Yourself communities,
IoT has emerged as a key enabling technology for the
4th Industrial Revolution [2] along with Internet of
Services, Cloud Computing, Machine-to-Machine,
RDIF, Cyber-Physical Systems, Autonomic Systems,
Systems of Systems, Robotics, Software Agents,
Cooperating Objects [3] and Machine Learning.
Recent studies have put the number of IoT devices
connected to Internet to reach 38.5 billion by 2020 [4].
The classic methods of monitoring application
performance rely on tools such as Nagios, Cacti or
Zabbix to log application metrics and a lot of human
engineering to interpret these metrics and make
appropriate decisions. There are multiple layers the
applications run on so you have system administrators,
network administrator and application developers
doing the monitoring at regular intervals of time.
Cloud computing has been the first technology to
challenge this model with its increased number of
applications and services that needed to be monitored,
from infrastructure components (servers, routers
storage) to cloud computing services and the user
experience. Cloud computing vendors like VMware or
Microsoft have integrated the active monitoring into
centralized management and analytics solutions. In
cloud computing, the computational resources are
monitored for both the physical and virtual layers
using agents like VMware vSphere Hypervisor which
reports metrics of the physical machine or host
and VMware View Agent which is addressing
virtual machine metrics as pictured in Fig.1. All
metric values are pulled from agents and pushed to a
time-series database where an analytics platform has
access to.
144
Figure 1 - Monitoring computational resources in Cloud Computing
The IoT ecosystem is composed of 4 layers
(Fig.2): the Edge where IoT devices are located; the
Gateway where sensors data is initially stored, filtered
and curated; the Cloud Platform is the layer where
data is enriched and processed by analytics tools; the
last layer is the Presentation where data-centric
business services are offered to end users [5].
Figure 2 - IoT generic architecture
The monitoring of the applications in the
Presentation layer is based on the classic model with
Nagios, Zabbix and Cacti largely used. The Cloud
Platform monitoring is done using vendor-orien ted
solutions such as VMWare vSphere or cloud operating
systems such as OpenStack.
Because the IoT devices are generally resource-
constrained, the monitoring is done at the Gateway
layer along with the monitoring of other Gateway
applications. There is also a challenge regarding the
number of devices per gateway and the number of
gateways in a typical IoT ecosystem. Like it is the
case with plant and soil monitoring over a large area.
There would be tens of thousands of sensors and
hundreds of gateways deployed in a star-of-stars
topology. Manually monitoring the performance and
reliability of so many devices would be expensive and
inefficient. The system must be able to monitor itself
and react, either by sending alerts or executing series
of operations. In this paper we will discuss the
monitoring of the devices at the Edge and the
applications deployed on gateways using prediction
models and trends based on time-series data.
2. TIME-SERIES DATABASES
There are specific non-functional requirements
for IoT time-series databases that are deployable on
gateways:
• labeling and tagging data points is a must due
to the large variety of devices;
• labels should be indexable so filtering by a
specific tag or label should be done at
database engine level;
• high resolution datapoints;
• the engine should be optimized for intensive
writing, almost no updates and the deletes are
done in bulk;
• compressed storage;
• support service integration through a HTTP
API as the gateways would accommodate a
service oriented architecture with a plethora
of microservices deployed.
One observation worth noting is the fact there is no
requirement for long-term retention of data. This is
because the gateways are pushing data further to the
Cloud Platform where the historical data is analyzed
in a greater context. The data at gateway is used only
near real-time analysis.
We will analyze 4 modern monitoring systems
that are implementing the above described
requirements: Prometheus, InfluxData, OpenTSDB
and Graphite.
2.1. Prometheus
Prometheus is an open-source systems monitoring
and alerting toolkit, using LevelDB as a time-series
database (TSDB) and featuring:
• a multi-dimensional data model with support
for labels;
• a flexible query language that lets the user to
aggregate time series data in real time;
• no reliance on distributed storage; single
server nodes are autonomous;
• time series collection happens via a pull
model over HTTP;
• pushing time series is supported via an
intermediary gateway;
145
• targets are discovered via service discovery
or static configuration;
• multiple modes of graphing and
dashboarding support;
• HTTP API;
• alert manager;
2.2. InfluxData
InfluxData provides a robust, open source and
fully customizable time-series data management
platform. It uses InfluxDB for storing metrics and IoT
sensor data. It features:
• support for labels and data annotations, but
unlike Prometheus, InfluxDB is attaching the
metadata to each event/row, thus increasing
the overall overhead and disk space required;
• high availability with InfluxDB Relay;
• expressive SQL-like query language;
• continuous queries automatically compute
aggregate data to make frequent queries more
efficient;
• it implements the push model where agents
are sending the metrics to InfluxDB;
• downsampling and resolution adjustment
over time;
• HTTP API.
2.3. OpenTSDB
OpenTSDB is a time-series database running on
top of Hadoop and HBase, designed specifically for
long retention of raw data and greater scalability. It
features:
• millisecond resolution;
• HTTP API;
• variable length encoding - use less storage
space for smaller integer values;
• support for both synchronous and
asynchronous writings;
• support for labels, annotations and metadata.
2.4. Graphite
Graphite is an enterprise-scale monitoring system
composed of a daemon listening for time-series data, a
fixed-size database similar to RRD (round-robin-
database) and a dashboard-like web application. It
features:
• long-term retention but with the expense of
storage efficiency;
• multi-archive storage;
• average-like aggregation with functions such
as average, sum, min, max, last;
• support for labels, annotations and metadata.
InfluxDB, OpenTSDB and Graphite are passive
databases, in the sense that the agents are pushing
metrics to the database’s interface, while Prometheus
adopts a pull model, “scrapping” metrics from
applications.
Another major difference is that Prometheus has built-
in aggregation functions and alert manager subsystem.
In this regard, Prometheus is a full monitoring and
trending system that includes built-in and active
scraping, storing, querying, graphing, and alerting
based on time series data.
If we take in consideration that the gateways are
relatively light computational devices, at least when
compared with cloud computing performance, we note
that Prometheus has an edge over competition:
• InfluxDB and Grahite require more storage
and have limited aggregation functions,
functions which otherwise would have to be
implemented on client side and consequently
requiring more computational resources;
• OpenTSDB storage is implemented on top of
Hadoop and HBase, requiring the complex
deployment of a cluster with multiple nodes
from the beginning.
Thus, we will focus on Prometheus as it supports
greater autonomy and does well on resource-
constrained environments.
3. DEPLOYING PROMETHEUS ON GATEWAYS
Prometheus consists of multiple components,
some of them optional:
• the Prometheus server scrapes and stores the
time-series data; it supports a query language
which allows for a wide range of operations
including aggregation, slicing and dicing,
prediction and joins;
• the push gateway allows ephemeral and batch
jobs to expose their metrics to Prometheus;
since these kinds of jobs may not exist long
enough to be scraped, they can instead push
their metrics to a push gateway;
• a browser-based dashboard builder based on
Rails/SQL;
• a large variety of special-purpose exporters;
an exporter is basically a http resource
identified by an URL and which contains
metrics (key, tags and values) in a specific
format;
• an alert manager which takes care of de-
duplicating, grouping, silencing and routing
them to the correct receiver integration such
as email, PagerDuty or OpsGenie.
146
Figure 3 – Prometheus overall architecure.
Source: www.prometheus.io
Prometheus can be compiled from sources or
precompiled binaries for common operating systems
can be downloaded and installed. There is support for
docker images as well.
There are two ways of telling Prometheus what
targets to use (data scrapping locations): either using
file-based local configuration if a high level of
autonomy is desired or solutions that support service
discovery like Kubernetes and Consul.io for
centralized system architectures.
When deploying Prometheus at gateway level, we
should consider as first target the gateway (in IoT we
associate a gateway with single-board computers like
Raspberry Pi). To achieve this, we can use
Prometheus node exporter to expose thousands of
different types of metrics specific to machines running
on unix-like OS. These metrics cover statistics about
cpu, diskstats, conntrack, available entropy, file
descriptors, network, hardware devices, virtual
devices, vmstat, interrupts, network connections, etc.
The node exporter can be started as a background
process as bellow:
$ nohup node_exporter <flags>
Then we can add the target to Prometheus
configuration in YAML format, specifying the target
URL, the job’s name and the scrape interval. By
default the node exporter will listen on port 9100.
scrape_configs:
- job_name: "node"
scrape_interval: "15s"
target_groups:
- targets: ['localhost:9100']
The next step would be to start Prometheus
server. One way to do it is to start it as a subsystem by
placing a script similar to the one bellow in /etc/init.d.
Figure 4 - Example of script for starting Prometheus
There are several arguments Prometheus accept
that are very important when considering deploying
on gateways:
• storage.local.chunk-encoding-version: the
type 1 encoding allows a faster random
access at the expense of storage (3 bytes per
sample), type 2 has better compression (1.3
bytes) but cause more CPU usage and
increased query latency.
• storage.local.retention: measured in hours, it
allows you to configure the retention time for
samples. Because the gateway is used to push
curated data upstream, this parameter should
have small values, like 30 days or less.
• storage.local.memory-chunks: there should
be 3 memory chunks per series.
• storage.local.series-file-shrink-ratio: greater
value minimizes rewrites but at the cost of
more disk space.
4. QUERYING
Prometheus provides a functional expression
language that lets the user select and aggregate time
series data in real time. The result of an expression can
either be shown as a graph, viewed as tabular data in
Prometheus's expression browser, or consumed by
external systems via the HTTP API.
There are four data types in Prometheus
expression language (PromQL):
• instant vector - a set of time series containing
a single sample for each time series, all
sharing the same timestamp;
• range vector - a set of time series containing
a range of data points over time for each time
series;
• scalar - a simple numeric floating point
147
value;
• string - a simple string value; currently
unused.
Besides arithmetic, comparison and logical
operators, PormQL supports:
• vector matchin g: operations between vectors
attempt to find a matching element in the
right-hand-side vector for each entry in the
left-hand side;
• aggregation operators like sum, min, max,
avg, stddev (standard deviation over dimen-
sions), stdvar (standard variance over dimen-
sions), count, bottomk (smallest k elements
by sample value), topk (largest k elements by
sample value), count_values (count number
of elements with the same value).
An instant vector can be obtained by simply
calling the metric name. For instance, the node
exporter has a metric called
process_cpu_seconds_total which is a counter telling
us the total user and system CPU time spent in
seconds. The instant vector is
process_cpu_seconds_total.
A range vector works like an instant vector,
except that it selects a range of samples back from the
current instant. The range duration can be appended in
square brackets to the end of the vector name. For
instance, at a scrape interval of 15 seconds,
process_cpu_seconds_total[1m] will return 4 values
recorded in the last 1 minute.
Prometheus also comes with more than 30 built-in
functions that operates on vectors
Function name and arguments Description
abs(v vector) returns the input vector with all sample values converted to their absolute value
absent(v vector) returns an empty vector if the vector passed to it has any elements and a 1-
element vector with the value 1 if the vector passed to it has no elements
ceil(v instant-vector) rounds the sample values of all elements in v up to the nearest integer
changes(v range-vector) for each input time series, the function returns the number of times its value has
changed within the provided time range as an instant vector
clamp_max(v instant-vector, max scalar) clamps the sample values of all elements in v to have an upper limit of max
clamp_min(v instant-vector, min scalar) clamps the sample values of all elements in v to have a lower limit of min
count_scalar(v instant-vector) returns the number of elements in a time series vector as a scalar
delta(v range-vector) calculates the difference between the first and last value of each time series
element in a range vector v
deriv(v range-vector) calculates the per-second derivative of the time series in a range vector v, using
simple linear regression
drop_common_labels(instant-vector) drops all labels that have the same name and value across all series in the input
vector
exp(v instant-vector) calculates the exponential function for all elements in v
floor(v instant-vector) rounds the sample values of all elements in v down to the nearest integer
histogram_quantile(φ float, b instant-
vector)
calculates the φ-quantile (0 ≤ φ ≤ 1) from the buckets b of a histogram
holt_winters(v range-vector, sf scal ar, tf
scalar)
produces a smoothed value for time series based on the range in v
increase(v range-vector) calculates the increase in the time series in the range vector
irate(v range-vector) calculates the per-second instant rate of increase of the time series in the range
vector
ln(v instant-vector) calculates the natural logarithm for all elements in v
log2(v instant-vector) calculates the binary logarithm for all elements in v
log10(v instant-vector) calculates the decimal logarithm for all elements in v
predict_linear(v range-vector, t scalar) predicts the value of time series t seconds from now, based on the range vector v,
using simple linear regression
rate(v range-vector) calculates the per-second average rate of increase of the time series in the range
vector
resets(v range-vector) returns the number of counter resets within the provided time range as an instant
vector
round(v instant-vector, to_nearest=1 scalar) rounds the sample values of all elements in v to the nearest integer
scalar(v instant-vector) returns the sample value of that single element as a scalar
sort(v instant-vector) returns vector elements sorted by their sample values, in ascending order
sort_desc(v instant-vector) returns vector elements sorted by their sample values, in descending order
sqrt(v instant-vector) calculates the square root of all elements in v
148
avg|min|max|sum|count_over_time(v range-
vector)
the average|minimum|maximum|sum|count value of all points in the specified
interval
Table 1 - Prometheus built-in functions
To demonsatrate the usage of these functions, let
us consider this example: we want to predict what
how much disk space we will have 1 day from now on
the root filesystem partition, mounting point “/” and
on machine identify by label instance “serv1”. The
Prometheus function that does that is predict_linear,
which accepts as arguments a range vector (we will
take ranges of 1 minute) and a scalar for the interval in
seconds.
The PromQL query for our use case is:
predict_linear(node_filesystem_avail{instance="
serv1",mountpoint="/"}[1m],86400)
For frequent and computationally expensive
queries, Prometheus comes with precomputed results
saved as new time-series based on explicit rules, like
in the following example:
job:http_inprogress_requests:sum = sum(http_inprogress_requests)
by (job)
Here, the recording rule is evaluated at the
interval specified by the evaluation_interval field in
the Prometheus configuration. During each evaluation
cycle, the right-hand-side expression of the rule
statement is evaluated at the current instant in time
and the resulting sample vector is stored as a new set
of time series with the current timestamp and a new
metric name (job:http_inprogress_requests:sum).
5. ALERTS
Alerting with Prometheus is separated into two
parts. Alerting rules in Prometheus servers send alerts
to an Alertmanager. The Alertmanager then manages
those alerts, including silencing, inhibition,
aggregation and sending out notifications via methods
such as email, PagerDuty and HipChat.
The Alertmanager can be started similar to
Prometheus:
$> nohup alertmanager -config.file=config.yml
The configuration file holds information about the
notification integrations (e.g. email, hipchat,
slack,webhook, pagerduty,pushover), routing rules
and inhibition rules.
A route block defines a node in a routing tree and
its children. Its optional configuration parameters are
inherited from its parent node if not set. That way,
when an alert enters the tree at the configuration top-
level route, it will traverse the child nodes until it
“hits” a matching node and consequently a
notification is fired.
An inhibition rule is a rule that mutes an alert
matching a set of matchers under the condition that an
alert exists that matches another set of matchers. Both
alerts must have a set of equal labels.
The alerting rules are defined similar to recording
rules and reloaded by Prometheus by sending a
SIGHUP signal. A rule has the following syntax:
ALERT <alert name>
IF <expression>
[ FOR <dur atio n>]
[ LABELS <label set>]
[ ANNOTATIONS <label set>]
The optional FOR clause causes Prometheus to
wait for a certain duration between first encountering
a new expression output vector element and counting
an alert as firing for this element.
The LABELS clause allows specifying a set of
additional labels to be attached to the alert.
The ANNOTATIONS clause specifies another set
of labels that are not identifying for an alert instance.
They are used to store longer additional information
such as alert.
In our example with the prediction of the disk
space available tomorrow, we want to create an alarm
that would be fired (sent to Alarmanager for
dispatching) if tomorrow we will run out of free space.
The rule is based on predict_linear as shown bellow:
ALERT WeWillRunOutOfSpace
IF
predict_linear(node_filesystem_avail{instance="srv1",mountp
oint="/"}[1m],86400) < 1
FOR 1m
ANNOTATIONS {
summary = "No more free disk space tomorrow on {{
$labels.instance }}",
description = "{{ $labels.instance }} will run out of space
(current value: {{ $value }}s)",
5. CONCLUSIONS
Time-series databases facilitate predictive
forecasting which has long been the goal for reliability
engineers. With a service oriented architecture,
solution such as Prometheus can be used to automate
the reaction of the system to certain predictions or to
new data insights. We can create alerting rules and
have them routed by an alert manager to message
brokers such as ActiveMQ or Redis. Or we can create
our own reactive manager that has more complex rules
and functions that Prometheus offer. Monitoring
services, part of the reactive applications (see Figure
5), which are subscribers to the notification stream can
then execute explicit instructions for specific events.
149
Prometheus comes with many useful functions
that process the time-series. However, they are limited
to simple arithmetic and logic operations. For more
complex use cases, Machine Learning can be used to
classify time-series events based on historical data.
Such a solution is TensorFlow, an open source
software library numerical computation and machine
learning. Given multiple time-series that have causal
connections, we can use TensorFlow to train logistic
regression models to identify (classify) events that
impact the performance of the applications. For
instance, the system can be trained to know that an
increase in memory usage over time by a certain
application signals a memory leak. Of course, in this
case we could also use rather much simpler arithmetic
operations based on certain thresholds, but machine
learning allows more precision makes it easier to
avoid false alarms. Also once trained, the system will
be able to dynamically classify based on patterns
without human intervention.
Figure 5 - T he architecture for automatic monitoring
REFERENCES
1. Balani, Naveen. Enterprise IoT: A Definitive Handbook. ISBN
1518790860.
2. Acatech. NATIONAL ACADEMY OF SCIENCE AND
ENGINEERING. 2016.
3. Varmesan, Ovidiu and Friess, Peter. Internet of Things:
Converging Technologies for Smart Environments and Integrated
Ecosystems. s.l. : River Publishers. ISBN: 978-87-92982-73-5.
4. Juniper Research. Internet of things’ connected devices to almost
triple to over 38 billion units by 2020. [Online]
http://www.juniperresearch.com/press/press-releases/iot-connected-
device s-to-triple-to-38-bn-by-2020.
5. Boncea, Radu, Bacivarov, Ioan C. S ecurity in I nternet of T hings:
Mitigating the Top Vulnerabilities. Asigurarea Calităţii – Quality
Assurance. January-March 2016, Vol. XXII, 85, pp. Pages 11-17.
6. Prom eth eus - Monitoring system & time series database. [Online]
[Cited: 06 20, 2016.] https://prometheus.io.
7. Gorilla: A Fast, Scalable, In-Memory Time Series Database.
Tuomas Pelkonen, Scott Franklin, Paul Cavallaro, Qi Huang, Justin
Meza, Justin Teller, Kaushik Veeraraghavan. 2014-2015,
Proceedings of the VLDB Endowment, Vol. 8, pp. 1816 - 1827.
8. Gilchrist, Ala sdair. The Technical and Business Innovators of the
Industrial Internet. Industry 4.0. s.l. : Apress, pp. 33 -64.
9. Mauro Andreolini, Marcello Pietri, Stefania Tosi, Riccardo
Lancellotti. A Scalable Monitor for Large Systems. Cloud
Computing and Services Sciences. 201 5 : Springer International
Publishing, pp. 100-116.