ChapterPDF Available

Fault-Tolerant IoT: A Systematic Mapping Study

Authors:

Abstract and Figures

A failure may occur at all architectural levels of the Internet of Things (IoT) applications: sensor and actuator nodes can be missed, network links can be down, and processing and storage components can fail to perform properly. That is the reason for which fault-tolerance (FT) has become a crucial concern for IoT systems. Our study aims at identifying and classifying the existing FT mecha nisms that can tolerate the IoT systems failure. In line with a systematicmapping study selection procedure, we picked out 60 papers among over 2300 candidate studies. To this end, we applied a rigorous classification and extraction framework to select and analyze the most influentialdomain-related information. Our analysis revealed the following mainfindings:(i)whilst researchers tend to study fault-tolerant IoT (FT-IoT) in cloud level only, several studies extend the application to fog andedge computing; (ii) there is a growing scientific interest on using themicroservices architecture to address FT in IoT systems; (iii) the IoTcomponents distribution, collaboration and intelligent elements locationimpact the system resiliency. This study gives a foundation to classifythe existing and future approaches for fault-tolerant IoT, by classifying aset of methods, techniques and architectures that are potentially capableto reduce IoT systems failure.
Content may be subject to copyright.
Fault-Tolerant IoT
A Systematic Mapping Study
Mahyar Tourchi Moghaddam(B
)and Henry Muccini
University of L’Aquila, Via Vetoio 1, L’Aquila, Italy
{mahtou,henry.muccini}@univaq.it
Abstract. A failure may occur at all architectural levels of the Internet
of Things (IoT) applications: sensor and actuator nodes can be missed,
network links can be down, and processing and storage components can
fail to perform properly. That is the reason for which fault-tolerance
(FT) has become a crucial concern for IoT systems.
Our study aims at identifying and classifying the existing FT mecha-
nisms that can tolerate the IoT systems failure. In line with a systematic
mapping study selection procedure, we picked out 60 papers among over
2300 candidate studies. To this end, we applied a rigorous classifica-
tion and extraction framework to select and analyze the most influential
domain-related information. Our analysis revealed the following main
findings: (i) whilst researchers tend to study fault-tolerant IoT (FT-IoT)
in cloud level only, several studies extend the application to fog and
edge computing; (ii) there is a growing scientific interest on using the
microservices architecture to address FT in IoT systems; (iii) the IoT
components distribution, collaboration and intelligent elements location
impact the system resiliency. This study gives a foundation to classify
the existing and future approaches for fault-tolerant IoT, by classifying a
set of methods, techniques and architectures that are potentially capable
to reduce IoT systems failure.
Keywords: Fault-tolerance ·Internet of Things ·
Software architecture ·Systematic mapping study
1 Introduction
IoT is the internal/external communication of intelligent elements via internet to
provide smart services [1]. A dependable IoT system should provide reliable and
fault-free services. A fault is a defect within the hardware or software systems that
impacts the correct functionality. It is particularly difficult to establish a pattern
for FT in IoT, since the IoT devices are heterogeneous, highly distributed, pow-
ered on battery, relied upon wireless communication and affected by scalability.
The distribution of IoT devices cause the system to suffer from, e.g., server crash,
server omission, incorrect response and arbitrary failure. The wireless and bat-
tery dependency makes the IoT devices barely recoverable. Furthermore, being
exposed to new devices and services impacts the system performance.
c
Springer Nature Switzerland AG 2019
R. Calinescu and F. Di Giandomenico (Eds.): SERENE 2019, LNCS 11732, pp. 67–84, 2019.
https://doi.org/10.1007/978-3-030-30856-8_5
68 M. T. Moghaddam and H. Muccini
Although the IoT has been introduced more than one decade ago, the research
and industry communities are still trying to define its different aspects and Qual-
ity of Services (QoS) such as FT. Hence, the goal of this research is to identify
and classify the domain state of the art and to highlight the methods, techniques
and architectures that are potentially suitable to model a FT-IoT. In order to
achieve this goal, a systematic mapping study has been performed. The primary
studies have been chosen based on an accurate inclusion and exclusion criteria
and a deep analysis. The main contributions of this study are: (i) addressing to
an up to date state of the art class for Fault-tolerant IoT modeling, which can
be used as a future research and implementation reference; (ii) investigating on
an IoT reference architecture and assessing the impact of such a software design
on FT; (iii) identifying current characteristics, challenges and publication trends
with respect to FT-IoT approach.
The audience of this study are both research and industry communities inter-
ested to improve their knowledge and select suitable methods to design their IoT
systems.
The paper is organized as follows. Section 2reveals the design of this systematic
study. Section 3presents a reference IoT architecture and analyzes its associated
FT aspects. Sections 4,5,6,7and 8elaborate on the obtained results while Sect. 9
analyses threats to validity. Section 10 closes the paper and discusses future work.
2 Research Method
The goal of this research is formulated based on the Goal-Question-Metric per-
spectives [2,3] as follow:
Purpose: to provide a deep understanding on Fault-tolerant IoT systems
Issue: by identifying, classifying and analyzing different methods, techniques
and architectures
Object: based on existing IoT systems approaches
Viewpoint: from both research and industry viewpoints.
2.1 Search Strategy
To achieve the aforementioned goal, we arranged for a set of questions:
RQ1: What IoT architectural styles and patterns are able to make the system
prone to fault?
RQ2: What traditional and novel techniques and methods can protect IoT
systems against failure?
RQ3: What are the quality attributes associated with Fault-tolerance in IoT
systems?
RQ4: What are the trends and evolution that can be deduced from the scien-
tific publications on FT-IoT?
Furthermore, a good search strategy should provide effective solutions to the
following questions [4]:
Fault-Tolerant IoT A Systematic Mapping Study 69
Which approaches? The search strategy consists of two phases: (i) an
automatic search on academic database; and (ii) a snowballing. The first step
has been performed using the search string below. A selection criteria has been
subsequently applied on the set of results. Then a snowballing procedure on the
included results of the automatic search has been applied to structure the final
set of primary studies.
(IoT OR “ternet of Things” OR “Internet-of-Things”) AND (“Fault tolerant”
OR “Fault-tolerant” OR “Fault tolerance” OR “Fault-tolerance”)
Where to search? The electronic databases that we used for the automatic
search (ACM, IEEE, Elsevier, Springer, ISI Web of Science, and Wiley Inter Sci-
ence) are known as the main source of literature for potentially relevant studies
on software engineering.
When and what time span to search? We did not consider publication
year as a criterion for the search and selection steps. Thus, all studies com-
ing from the selection steps, until May 2019, were included regardless of their
publication time.
2.2 Selection Strategy
A multi-stage selection process (Fig. 1) has been designed to give a full control
on the number and characteristics of the studies coming from different stages1.
ACM Digital
Library
IEEE Xplore
Springer
ISI Web of
Science
Wiley Inter
Science
Science
Direct
Initial Search Merge & Duplicates
Removal
Selection Criteria
Application Snowballing
236
334
205
228
345
1288
2374 54 Tot a l :
60
Fig. 1. Search and selection process.
1It is worth mentioning that we considered “Software Engineering” as the Search
Topic, since the original search leaded to 193,000 results.
70 M. T. Moghaddam and H. Muccini
Afterwards, we considered all the selected studies, and filtered them according
to a set of well-defined inclusion and exclusion criteria (Table 1). According to the
standards, the definition of inclusion/ exclusion criteria has been guided by two
main drivers: (i) keeping the focus of the selected papers on the scope of the study;
and (ii) avoiding gray or not scientific works. Thus, Inclusion/exclusion criteria
shall be aligned with the research questions. We included studies that satisfied all
inclusion criteria, and discarded studies that met any exclusion criterion.
On the 2,374 potentially relevant papers, we performed a first manual step
applying the selection criteria on title and abstract of the papers. Afterwards,
a second manual step of reading the full text of firstly selected papers has been
performed and followed by snowballing. The reasons for which we obtained only
60 primary studies over 2,374 potentially relevant papers are that: (i) our search
string was quite inclusive (to avoid ignoring any potentially relevant paper); (ii)
however, selection criteria application has been carefully performed in a way to
avoid including the papers that fall out of the scope of the research. In order to
minimize bias, the procedure has been performed by the first researcher and the
results have been double-checked by the other researcher.
Table 1. Inclusion and exclusion criteria.
Inclusion criteria Exclusion criteria
Studies that propose, leverage, or analyze
software and hardware solutions,
methods, techniques and architectures to
design fault-tolerant IoT systems
Studies that, while focusing on IoT, do
not focus on its fault-tolerance aspects
(e.g., studies focusing only on
technological aspects of IoT) or vice
versa
Studies subject to peer review (e.g.,
journal papers, papers published as part
of conference proceedings, workshop
papers, and book chapters)
Secondary or tertiary studies (e.g.,
systematic literature reviews, surveys,
etc.)
Studies written in English language and
available in full-text
Studies in the form of tutorial papers,
editorials, etc. because they do not
provide enough information
After selection of a final set of primary studies, the data has been extracted
to answer the research questions.
Study Replicability. A replication package is provided to tackle the page lim-
its of a workshop paper: https://www.dropbox.com/s/ansb75ncdoqpc9f/DATA-
SERENE-2019.xlsx?dl=0. The package is available as an excel file with differ-
ent sheets that include all necessary information such as search results, primary
studies distribution, data extraction and validity examination.
3 Background on IoT Architectures
In this section, we present a reference software architecture for the internet of
things applications [57]. IoT applications typically consist of a set of software
Fault-Tolerant IoT A Systematic Mapping Study 71
CLOUD
MPU
MPU
MPU MPU
MPU
MPU
FOG FOG
Fig. 2. IoT reference architecture (MPU refers to microprocessor unit).
components including perception, data processing and storage (P&S) and actu-
ation, which are distributed across network(s). For the purposes of this paper
that has its focus on fault-tolerant data transmission and analysis, we define our
architecture based on the following P&S modeling characteristics:
Distribution: this aspect specifies whether data analysis software ought to be
deployed on a single node or on several nodes that are distributed across the
IoT system. In other words, the distribution is referred to the deployment of
the IoT P&S software to hardware. By using a distributed style, the latency
will potentially be reduced due to data traffic and bandwidth consumption
minimization. Such rapid response time facilitates real-time and fault-tolerant
IoT applications. Furthermore, in distributed systems, a faulty P&S will still
hold IoT system available since the faulty component can be replaced by
another one.
Localization: depending on data size and required analysis complexity, P&S
can be executed locally or remotely. Here is the point in which centralized
cloud and distributed edge and fog concepts become relevant. The advantage
of using a central cloud is that, processing on a cloud component facilitates
long-term data analysis for systems that have no constraints on response time.
For applications with massive P&S requirement, executing the task on the
powerful cloud is the only solution.
Fog nodes are the intermediate P&S, which bring a degree of cloud function-
ality to the network edge. Fog is not limited to perform on a particular device,
so that it can freely be located between device edge and cloud. The analysis
capacity of fog is lower than cloud, but it reduces a significant point of failure
by shifting towards more than one computational component. However, fog
only performs locally so that it does not have a global coverage over a major
IoT system. It is worth mentioning that, some IoT devices are able to per-
form simple P&S by themselves. Performing P&S on IoT device edge, refers
72 M. T. Moghaddam and H. Muccini
to computation capabilities embedded on a smart device to be able to gather
and analyze environmental data.
Collaboration: the aforementioned computation components may interact to
form and empower IoT services. This collaboration may appear as a level of
information sharing, coordinated analysis and/or planning or synchronized
actuation. Each IoT sensor network may provide data for many collaborative
P&S components, both locally and remotely. Here the advantage is that if
the local P&S node fails, local service is still in access.
Considering above definitions, we further design our reference IoT architec-
ture (Fig. 2). The architecture is composed of a physical layer and several P&S
layers. The physical layer is made up of two sub-layers, namely perception and
application. The perception sub-layer hosts a large number of heterogeneous
sensors and the application sub-layer consists of various types of actuators. The
P&S layers store and analyze data gathered by the perception components to
provide the required IoT service.
Looking through primary studies, each of them address the FT for specific
layer(s) of the IoT architecture. As shown in Fig. 3, whilst the faults usually
occur in sense (26/60) and actuation (12/60) sub-layers, the primary studies
realized the importance of network (38/60) and P&S (33/60) layers for FT-IoT
systems. The reason is that, handling FT is under the responsibility of P&S
nodes and is based on the transmitted data coming from the physical layer. In
Sect. 5, we discuss various FT strategies and techniques for IoT systems.
P1, P2, P3, P4, P5, P6, P9, P11, P12, P14, P15, P17, P20, P24, P26, P27, P28, P29, P30, P31, P33, P34, P40, P41, P42, P43, P44, P45, P46, P47, P48, P51, P54, P56, P57, P58, P59, P60
P1, P2, P3, P6, P8, P10, P11, P13, P16, P19, P21, P22, P23, P31, P33, P35, P36, P37, P38, P39, P40, P41, P44, P45, P48, P49, P50, P51, P53, P55, P56, P58, P59
P1, P4, P5, P6, P7, P9, P10, P12, P13, P14, P15, P16, P17, P19, P20, P28, P32, P35, P36, P42, P45, P50, P52, P55, P58, P59
P1, P6, P7, P13, P19, P21, P25, P32, P35, P55, P57, P58
0 5 10 15 20 25 30 35 40
Network
Processing and Storage
Sense
Actuate
PRIMARY STUDIES #
THE FOCUSED ARCHITECTURAL LAYER
Fig. 3. The primary studies focus on each architectural layer.
Fault-Tolerant IoT A Systematic Mapping Study 73
4 Fault-Tolerant IoT Architectural Patterns
and Styles (RQ1)
This section discusses the specific characteristics of primary studies related to
FT-IoT architectural design. The primary studies used one or more overlaid
style(s) to design their software system. However, among the various IoT archi-
tectural styles, layered architecture (32/60) was the clear winner as reported in
Fig. 4. In the layered view the system is viewed as a complex heterogeneous entity
that can be decomposed into interacting parts. The primary studies designed
their layered architecture in different ways, ranged from 3 (with a central P&S
component only) to 5 (including edge and fog) layers (see Fig.2).
Cloud-based architecture (28/60) won the second position. Fog that is a
significant extension to cloud environment is addressed in 15 studies as well. Few
studies (4/60) used the device edge concept to design their FT-IoT architecture.
Minimizing the impact of a failed component within an integrated fog-cloud
platform needs a common agreement protocol that is able to uniform the system
with the minimum rounds of message exchange.
P2, P3, P6, P8, P13, P14, P16, P17, P22, P25, P28, P31, P33, P35, P36, P37, P39, P40, P42, P43, P44, P47, P48, P49, P50, P51, P53, P54, P55, P57, P58, P60
P2, P3, P6, P7, P8, P12, P15, P16, P19, P21, P22, P23, P24, P27, P31, P33, P35, P39, P41, P44, P45, P48, P50, P51, P52, P53, P56, P57
P6, P8, P10, P11, P13, P23, P27, P32, P45
P7, P21, P40, P56
P2
P2, P3, P4, P5, P6, P8, P9, P11, P15, P16, P21, P22, P23, P24, P28, P29, P30, P31, P33, P34, P36, P40, P41, P42, P44, P45, P46, P47, P48, P50, P51, P52, P53, P58
P7, P13, P14, P19, P25, P26, P27, P36, P37, P39, P55, P56, P57
P10, P17, P18, P20, P24, P32, P35, P38, P43, P49, P59, P60
0 5 10 15 20 25 30 35
Layered
Cloud-based
Service oreiented (SOA)
Microservices
Publish/Subscribe
Hybrid
Centralized
Distributed CollaboraƟve
Architectural Styles Architectural PaƩerns
Fig. 4. FT-IoT architectural styles and patterns.
Service oriented architectures (SOA) (9/60) put the service at the centre
of their IoT application design. In fact, the core application component makes
the service available for other IoT components over a network. Microservices
(4/60) and SOA have the same goal in IoT sytems, that is building one or
multiple applications from a set of different services. A microservice is a small
application with single responsibility, which can be deployed, scaled and tested
independently.
74 M. T. Moghaddam and H. Muccini
P21 proposes a pluggable framework based on a microservices architecture
that implements FT support as two complementary microservices: one that uses
complex event processing for real-time FT detection, and another that uses
online machine learning to detect fault patterns and preemptively mitigate faults
before they are activated. P7 propose a system based on container virtualisation
that allows IoT clouds to carry out fault-tolerance when a microservice running
on an IoT device fails. A reactive microservices architecture and its application
in a fog computing case study to investigate FT challenges at the edge of the
network is presented in P40. P56 present a microservices-based mobile cloud plat-
form by exploiting containerization which replaces heavyweight virtual machines
to guarantee run-time FT.
On the other hand, as explained in Sect. 3, IoT distribution patterns clas-
sify the architectures according to edge intelligence and elements collaboration.
Figure 4shows the distribution patterns that are used by the primary stud-
ies. Most of studies used a Hybrid pattern (34/60) followed by the Centralized
(13/60) and the Distributed Collaborative (12/60) patterns.
In this section we showed that edge/cloud-based distributed architectures
are extensively used by primary studies. The results confirm that: a distributed
architecture provides a rapid response time and high availability, and makes the
system prone to fault.
5 Fault-Tolerance Techniques for Resilient IoT (RQ2)
As shown in Fig. 5, the primary studies adopt various techniques to make their
IoT system fault-tolerant. These techniques are explained below.
5.1 Replication
Replication is the process of sharing the data between redundant IoT HW/SW
components. Replication guarantees the data consistency, so that failure of a
component will not result in system failure. The main replication schemes are
known as active and passive [8].
In active replication scheme (22/60), processes are replicated in multiple pro-
cessors to provide fault-tolerance. In IoT context, active replication continuously
pushes the group of IoT resources (such as fog or cloud) to execute the same
process concurrently. In case of fault, failover can have in very short period to
other active resources [P33]. In this way, an extra processing is occurred and
redundant and duplicated dataset it sent to endpoint. Despite that active repli-
cation takes a lot of processing resources, it is failure transparent and its failure
discovery time is deterministic.
In passive replication (24/60), the primary processor performs and the extra
IoT components remain idle until a failure occurs. The idle components, however,
contact the primary processor in order to be updated and keep consistency. The
passive replication scheme imposes additional cost of resources and suffers from
slow response to failure.
Fault-Tolerant IoT A Systematic Mapping Study 75
5.2 Network Control
In network control scheme (19/60), the IoT network is generally divided into var-
ious clusters. A chosen cluster head (CH) periodically makes roll call requests
to the other nodes and if it does not receive a reply message, the failure will be
confirmed. However, the CH itself makes a single point of failure. Several cluster-
based routing protocols have been proposed by the primary studies. Some pri-
mary studies took advantage of bio-inspired particle multi-swarm optimization
routing algorithm to construct, recover, and select disjoint paths that tolerate
the failure while satisfying the quality of service parameters. Some other studies
used the virtual CH formation and flow graph modeling to efficiently tolerate the
failures of CHs. Multiple traveling salesman is also among the routing algorithms
that are addressed by the primary studies.
5.3 Distributed Recovery Block
In this method (8/60), a single program is concurrently executed on a node
pair, from which one is active and the other is inactive. In no-fault situation,
the main (active) node performs the task and the other node performs the same
task in shadow. Afterwards, both results will be tested and if the test is properly
passed, the results associated with the main node will be delivered as the output.
If the primary node test fails, the shadow node becomes active and produces the
outputs. This method can protect the system only against a single point of
failure.
5.4 Time Redundancy
Time redundancy (1/60) can be performed at both instruction and task levels.
At instruction level, the program is duplicated and subsequently the results are
compared to discover a potential error. In task level, a software is run twice (or
more) to mitigate dynamic faults. Despite that this method does not impose the
cost of additional hardware, it increases the time needed to assure redundancy.
The method reduces the computing performance and consumes more energy as
well.
It is worth mentioning that, the whole IoT system can follow a Reactive or
Proact ive strategy. Reactive FT starts to recover the system after the detection
of an error (using event processing methods). In proactive FT, the recovery
strategy is started even before the detection of an error (using machine learning
methods).
76 M. T. Moghaddam and H. Muccini
P3, P7, P10, P13, P14, P18, P19, P20, P21, P22, P23, P25, P26, P29, P34, P35, P39, P40, P41, P43, P45, P48, P50, P56
P2, P3, P6, P8, P11, P15, P16, P17, P21, P24, P27, P32, P36, P37, P41, P46, P47, P51, P54, P55, P57, P58
P4, P5, P9, P10, P12, P13, P28, P31, P34, P42, P44, P45, P46, P47, P48, P51, P53, P59, P60
P3, P6, P7, P29, P30, P33, P38, P52
P52
0 5 10 15 20 25
Passive
AcƟve
Network Control
Distributed Recovery Block
Time Redundancy
PRIMARY STUDIES #
FAULT-TOLERANCE TECHNIQUES
Fig. 5. Fault-tolerance techniques.
6 Quality of IoT Service Associated with Fault-Tolerance
(RQ3)
The standard used to categorize quality attributes comes from ISO 25010 and
some specific IoT attributes derived from the primary studies keywording.
An IoT system brings many challenges from QoS perspective when takes
into account FT. As shown in Fig. 6, the most recognized quality challenges
P2, P3, P4, P5, P6, P7, P9, P11, P13, P16, P18, P19, P22, P23, P26, P31, P43, P44, P47, P48, P49, P52, P56, P58, P60
P2, P3, P5, P6, P10, P14, P15, P17, P18, P19, P21, P25, P27, P33, P35, P36, P39, P41, P44, P53
P1, P6, P7, P8, P11, P12, P14, P16, P17, P20, P21, P26, P36, P39, P40, P44, P45, P51, P53, P58
P5, P6, P9, P11, P14, P15, P16, P17, P19, P21, P23, P27, P40, P41, P47, P52
P8, P9, P11, P21, P24, P40, P41, P47
P5, P41
0 5 10 15 20 25
Performance
Availability
Security
Scalability
Interoperability
Energy ConsumpƟon
PRIMARY STUDIES #
QUALITY ATTRIBUTES
Fig. 6. QoS associated with FT-IoT.
Fault-Tolerant IoT A Systematic Mapping Study 77
are related to performance (25/60), availability (20/60), security (20/60) and
scalability (16/60), whilst interoperability (8/60) and energy efficiency (2/60)
are positioned in a lower degree of concern.
The level of performance depends on how much the processing and storage
components are pushed to the edge in a decentralized way. Availability is the
ability of a system to be fully or partly operational as and when required. Clearly,
FT and availability are not identical since a fault-tolerant system is supposed
to maintain the system operational without interruption, but a highly available
system may have service interruption. However, A fault-tolerant system should
maintain a high level of system availability and performance as well.
In IoT systems that different components and entities are connected to each
other through a network, security gains a high concern. Scalability is also an
essential attribute as IoT systems should be capable to perform properly con-
sidering a huge number of heterogeneous devices. Commenting on scalability of
IoT as a whole system is difficult, however, it depends on how new resources can
be added on demand. A fault-tolerant system also requires enormous computa-
tional efforts to be run in distributed P&S components. Device heterogeneity
and P&S elements distribution make the system resistive to scalability.
Interoperability helps IoT heterogeneous components to work together effi-
ciently. It actually depends on how much IoT large-scale heterogeneous devices
can communicate directly among each other to gather the required data with-
out having to go through the central/remote components. Since most of IoT
devices are battery powered, energy efficiency that is tied to many other quality
attributes (such as performance) becomes essential. However, wireless and bat-
tery dependency make the IoT devices barely recoverable, flexible to scalability
and performant.
7 Horizontal Analysis
This section reports the results orthogonal to the vertical analysis presented
in the previous sections. For the purpose of this section, we cross-tabulated
and grouped the data, we made comparisons between pairs of concepts of our
classification framework and identified perspectives of interest.
7.1 FT Techniques vs Architectural Patterns
Here the question is, which architectural pattern is more often used for each FT
technique? As shown in Fig. 7, (11/60) studies used hybrid pattern to facilitate
their passive FT techniques, whilst (15/60) used hybrid for active FT. In con-
trary, centralized and collaborative architectural patterns are more suitable to
address passive FT. Obviously, network control FT technique is better to be
addressed by a hybrid architectural pattern. In general, a hybrid architecture
guarantees FT-IoT, since if one fog node fails, the IoT system can shift the
computation to another fog to avoid the single point of failure.
78 M. T. Moghaddam and H. Muccini
Passive Active Network
Control
Distributed
Recovery
Time
Redundancy
Hybrid
Distributed
Collaborative
Centralized
1
2
5
11 15 6
8 1
1
3
5
14
3
Architectural Patterns
Fault-tolerance Techniques
Fig. 7. FT techniques vs patterns.
7.2 FT Techniques vs Quality Attributes
What quality attributes are satisfied when a specific FT technique is adopted?
As shown in Fig. 8, passive technique mostly takes into account performance
and availability, whilst the active technique gives more weight to security and
Passive Active Network
Control
Distributed
Recovery
Time
Redundancy
Performance
Availability
Security
Quality Attributes
Fault-tolerance Techniques
Scalability
Interoperability
Energy
Consumption
4
11 7 9 1
3
4
8
10
2
5
89
2
691
3
2
36
1
1
1
Fig. 8. Techniques vs quality attributes.
Fault-Tolerant IoT A Systematic Mapping Study 79
scalability. Furthermore, network control enhances the performance beside the
fault-tolerance. Regarding the rapid development and extension of devices in the
edge of the network, performance of IoT should be maintained in an appropriate
level. Performance highly depends on the data storage and application logic
distribution among edge and central servers. As mentioned before, fog computing
can pave the way to improve IoT systems performance level.
8 Challenges and Emerging Trends (RQ4)
In this section the emerging trends in resilience for FT-IoT are presented. To
this end, publication year, type and venue are firstly extracted and an overall
discussion is subsequently provided.
8.1 Publication Year
Figure 9shows the distribution of FT-IoT literature. It noticeably indicates that
the number of papers grows by time and there is just one related paper published
before 2014. This result confirms the scientific interest and research necessity on
FT-IoT issues in the last few years.
2012 2013 2014 2015 2016 2017 2018 2019
Journa l
Conference
Worksho p 1 2
3
4
5
12
7
10
1
1 1
10 3
Fig. 9. Primary studies distribution by publication type.
8.2 Publication Type
The most common publication type is conference paper (40/60), followed by
journal (17/60), and workshop paper (3/60). Such a high number of journal and
conference papers may point out that FT-IoT is maturing as a research topic
despite that it is still relatively young.
80 M. T. Moghaddam and H. Muccini
8.3 Publication Venues
From the extracted data we can notice that research on FT-IoT is spread across
many venues mostly in the span of IoT (e.g. WF-IoT), computing (e.g. ICAC)
and networking (e.g. ICOIN) communities. The complete list of venues can be
found in the data extraction file. However, the focus on the aforementioned
aspects can prove the significance of distributed computing and networking for
FT-IoT systems.
8.4 Emerging Trends in Resilience for FT-IoT
Our study reveals that some of the different Ft-IoT techniques are more rarely
covered with respect to others, specifically, distributed recovery block and time
redundancy. We clarify that this result by no means implies that there is lim-
ited literature or support on such FT techniques, but they appear to have a
more limited application on IoT. In architectural level, we observed a significant
move toward adopting hybrid architectures, which make the IoT system prone to
fault. Furthermore, whilst a growth on using service-oriented and microservices
architectures is perceived, their various aspects need to be better investigated
regarding FT. The study showed that for FT-IoT architectural layers, the atten-
tion especially goes to network and processing and storage components.
What our study reveals is also that performance and availability are tied up
with IoT systems fault-tolerance. However, assessing the trade-off between FT
and other IoT quality attributes such as scalability, interoperability and energy
consumption shall be further investigated. Another result to be further evaluated
through a state of the practice analysis, is that only few studies support the
interplay between FT techniques and collaborative architectures. The mentioned
aspects are to be considered by the domain future work.
9 Threats to Validity
According to Peterson et al. [9], the quality rating for this systematic mapping
study assessed and scored as 73%. This value is the ratio of the number of
actions taken in comparison to the total number of actions reported in the quality
checklist. The quality score of our study is far beyond the scores obtained by
existing systematic mapping studies in the literature, which have a distribution
with a median of 33% and 48% as absolute maximum value. However, the threats
to validity are unavoidable. Below we shortly define the main threats to validity
of our study and the way we mitigated them.
External validity: in our study, the most severe threat related to external
validity may consist of having a set of primary studies that is not representative
of the whole research on FT-IoT. We mitigated this potential threat by (i) fol-
lowing a search strategy including both automatic search and backward-forward
snowballing of selected studies; and (ii) defining a set of inclusion and exclusion
criteria. Along the same lines, gray and non-English literature are not included
Fault-Tolerant IoT A Systematic Mapping Study 81
in our research as we want to focus exclusively on the state of the art presented
in high-quality scientific studies in English.
Internal validity: it refers to the level of influence that extraneous variables
may have on the design of the study. We mitigated this potential threat to
validity by (i) rigorously defining and validating the structure of our study, (ii)
defining our classification framework by carefully following the keywording pro-
cess, and (iii) conducting a well-structured vertical analysis. Construct validity:
It concerns the validity of extracted data with respect to the research questions.
We mitigated this potential source of threats in different ways. (i) performing
automatic search on a couple of databases to avoid potential biases; (ii) having a
strong and tested search string; (iii) complementing the automatic by the snow-
balling activity; and (iv) rigorously screen the studies according to inclusion and
exclusion criteria.
Conclusion validity: it concerns the relationship between the extracted data
and the obtained results. We mitigated potential threats to conclusion validity
by applying well accepted systematic methods and processes throughout our
study and documenting all of them in the excel package.
10 Conclusion
In this paper we present a systematic mapping study with the goal of classifying
and identifying the domain state-of-the-art and extract a set of FT-IoT methods
and techniques. Starting from over 2300 potentially relevant studies, we applied
a rigorous selection procedure resulting in 60 primary studies. The results of
this study are both research and industry oriented and are intended to make a
framework for future research in FT-IoT related fields. As a future work, we will
assess the potential integration of existing research to an industrial level of IoT.
Primary Studies
P1: Toward a New Approach to IoT Fault Tolerance, https://doi.org/10.1109/MC.
2016.238
P2: CEFIoT: A fault-tolerant IoT architecture for edge and cloud, https://doi.org/
10.1109/WF-IoT.2018.8355149
P3: Reliable and Fault-Tolerant IoT-Edge Architecture, https://doi.org/10.1109/
ICSENS.2018.8589624
P4: Efficient Fault-Tolerant Routing in IoT Wireless Sensor Networks Based on
Bipartite-Flow Graph Modeling, https://doi.org/10.1109/ACCESS.2019.2894002
P5: Optimizing Multipath Routing With Guaranteed Fault Tolerance in Internet
of Things, https://doi.org/10.1109/JSEN.2017.2739188
P6: Brume - A Horizontally Scalable and Fault Tolerant Building Operating Sys-
tem, https://doi.org/10.1109/IoTDI.2018.00018
P7: A Watchdog Service Making Container-Based Micro-services Reliable in IoT
Clouds, https://doi.org/10.1109/FiCloud.2017.57
82 M. T. Moghaddam and H. Muccini
P8: Towards Fault Tolerant Fog Computing for IoT-Based Smart City Applications,
https://doi.org/10.1109/CCWC.2019.8666447
P9: Device clustering for fault monitoring in Internet of Things systems, https://
doi.org/10.1109/WF-IoT.2015.7389057
P10: Decentralized fault tolerance mechanism for intelligent IoT/M2M middleware,
https://doi.org/10.1109/WF-IoT.2014.6803115
P11: Application of Blockchain in Collaborative Internet-of-Things Services,
https://doi.org/10.1109/TCSS.2019.2913165
P12: A Review of Aggregation Algorithms for the Internet of Things, https://doi.
org/10.1109/ICSEng.2017.43
P13: Supporting Service Adaptation in Fault Tolerant Internet of Things, https://
doi.org/10.1109/SOCA.2015.38
P14: Fault tolerant and scalable IoT-based architecture for health monitoring,
https://doi.org/10.1109/SAS.2015.7133626
P15: Fault tolerance capability of cloud data center, https://doi.org/10.1109/ICCP.
2017.8117053
P16: Reaching Agreement in an Integrated Fog Cloud IoT, https://doi.org/10.
1109/ACCESS.2018.2877609
P17: Byzantine Resilient Protocol for the IoT, https://doi.org/10.1109/JIOT.2018.
2871157
P18: DRAW: Data Replication for Enhanced Data Availability in IoT-based Sensor
Systems, https://doi.org/10.1109/DASC/PiCom/DataCom
P19: Power efficient, bandwidth optimized and fault tolerant sensor management
for IOT in Smart Home, https://doi.org/10.1109/IADCC.2015.7154732
P20: Energy efficiency and robustness for IoT: Building a smart home security
system, https://doi.org/10.1109/ICCP.2016.7737120
P21: A Microservices Architecture for Reactive and Proactive Fault Tolerance in
IoT Systems, https://doi.org/10.1109/WoWMoM.2018.8449789
P22: Management of solar energy in microgrids using IoT-based dependable control,
https://doi.org/10.1109/ICEMS.2017.8056441
P23: A hierarchical cloud architecture for integrated mobility, service, and trust
management of service-oriented IoT systems, https://doi.org/10.1109/INTECH.
2016.7845021
P24: Fault-Tolerant Real-Time Collaborative Network Edge Analytics for Indus-
trial IoT and Cyber Physical Systems with Communication Network Diversity,
https://doi.org/10.1109/CIC.2018.00052
P25: Fault-Tolerant mHealth Framework in the Context of IoT-Based Real-Time
Wearable Health Data Sensors, https://doi.org/10.1109/ACCESS.2019.2910411
P26: SCONN: Design and Implement Dual-Band Wireless Networking Assisted
Fault Tolerant Data Transmission in Intelligent Buildings, https://doi.org/10.1109/
VTCFall.2018.8690787
P27: Fault-tolerant application placement in heterogeneous cloud environments,
https://doi.org/10.1109/CNSM.2015.7367359
P28: A reliable and energy efficient IoT data transmission scheme for smart cities
based on redundant residue based error correction coding, https://doi.org/10.1109/
SECONW.2015.7328141
P29: Distributed Continuous-Time Fault Estimation Control for Multiple Devices
in IoT Networks, https://doi.org/10.1109/ACCESS.2019.2892905
P30: Trend-adaptive multi-scale PCA for data fault detection in IoT networks,
https://doi.org/10.1109/ICOIN.2018.8343217
Fault-Tolerant IoT A Systematic Mapping Study 83
P31: Adaptive and Fault-tolerant Data Processing in Healthcare IoT Based on Fog
Computing, https://doi.org/10.1109/TNSE.2018.2859307
P32: Fault-Recovery and Coherence in Internet of Things Choreographies, https://
doi.org/10.1109/WF-IoT.2014.6803224
P33: A Novel Data Reduction Technique with Fault-tolerance for Internet-of-things,
https://doi.org/10.1145/3018896.3018971
P34: Performance Comparisons of Fault-Tolerant Rouging Approaches for IoT
Wireless Sensor Networks, https://doi.org/10.1145/3195106.3195168
P35: Rivulet: A Fault-tolerant Platform for Smart-home Applications, https://doi.
org/10.1145/3135974.3135988
P36: Censorship Resistant Decentralized IoT Management Systems, https://doi.
org/10.1145/3286978.3286979
P37: Towards a Foundation for a Collaborative Replicable Smart Cities IoT Archi-
tecture, https://doi.org/10.1145/3063386.3063763
P38: Responsible Objects: Towards Self-Healing Internet of Things Applications,
https://doi.org/10.1109/ICAC.2015.60
P39: A Multi-agent System Architecture for Self-Healing Cloud Infrastructure,
https://doi.org/10.1145/2896387.2896392
P40: Reactive Microservices for the Internet of Things: A Case Study in Fog Com-
puting, https://doi.org/10.1145/3297280.3297402
P41: Fault Tolerance Techniques and Architectures in Cloud Computing - a Com-
parative Analysis, https://doi.org/10.1109/ICGCIoT.2015.7380625
P42: Energy Efficient Fault-tolerant Clustering Algorithm for Wireless Sensor Net-
works, https://doi.org/10.1109/ICGCIoT.2015.7380464
P43: Layered Fault Management Scheme for End-to-end Transmission in Internet
of Things, https://doi.org/10.1007/s11036-012-0355-5
P44: An Architectural Mechanism for Resilient IoT Services, https://doi.org/10.
1145/3137003.3137010
P45: Resilience of Stateful IoT Applications in a Dynamic Fog Environment,
https://doi.org/10.1145/3286978.3287007
P46: The Optimal Generalized Byzantine Agreement in Cluster-based Wireless
Sensor Networks, https://doi.org/10.1016/j.csi.2014.01.005
P47: A Reliable IoT System for Personal Healthcare Devices, https://doi.org/10.
1016/j.future.2017.04.004
P48: Reliable Industrial IoT-based Distributed Automation, https://doi.org/10.
1145/3302505.3310072
P49: Low-Cost Memory Fault Tolerance for IoT Devices, https://doi.org/10.1145/
3126534
P50: Idea: A System for Efficient Failure Management in Smart IoT Environments,
https://doi.org/10.1145/2906388.2906406
P51: Patterns for Things That Fail, https://www.hillside.net/plop/2017/papers/
proceedings/papers/07-ramadas.pdf
P52: Fall-curve: A Novel Primitive for IoT Fault Detection and Isolation, https://
doi.org/10.1145/3274783.3274853
P53: Multilevel IoT Model for Smart Cities Resilience, https://doi.org/10.1145/
3095786.3095793
P54: Energy Efficient Device Discovery for Reliable Communication in 5G-based
IoT and BSNs Using Unmanned Aerial Vehicles, https://doi.org/10.1016/j.jnca.
2017.08.013
P55: A Programming Framework for Implementing Fault-Tolerant Mechanism in
IoT Applications, https://doi.org/10.1007/978-3-319-27137-8 56
84 M. T. Moghaddam and H. Muccini
P56: Transient fault aware application partitioning computational offloading algo-
rithm in microservices based mobile cloudlet networks, https://doi.org/10.1007/
s00607-019-00733-4
P57: Channel Dependability of the ATM Communication Network Based on
the Multilevel Distributed Cloud Technology, https://doi.org/10.1007/978-3-319-
67642-5 49
P58: Design of compressed sensing fault-tolerant encryption scheme for key sharing
in IoT Multi-cloudy environment(s), https://doi.org/10.1016/j.jisa.2019.04.004
P59: Fault-Tolerant Temperature Control Algorithm for IoT Networks in Smart
Buildings, https://doi.org/10.3390/en11123430
P60: Virtualization in Wireless Sensor Networks: Fault Tolerant Embedding for
Internet of Things, https://doi.org/10.1109/JIOT.2017.2717704
References
1. Muccini, H., Moghaddam, M.T.: IoT architectural styles. In: Cuesta, C.E., Garlan,
D., P´erez, J. (eds.) ECSA 2018. LNCS, vol. 11048, pp. 68–85. Springer, Cham (2018).
https://doi.org/10.1007/978-3-030- 00761-4 5
2. Kitchenham, B., Brereton, P.: A systematic review of systematic review process
research in software engineering. Inf. Softw. Technol. 55(12), 2049–2075 (2013)
3. Kitchenham, B.A., Charters, S.: Guidelines for performing systematic literature
reviews in software engineering. Technical report, EBSE-2007-01 (2007)
4. Zhang, H., Babar, M.A., Tell, P.: Identifying relevant studies in software engineering.
Inf. Softw. Technol. 53(6), 625–637 (2011). https://doi.org/10.1016/j.infsof.2010.12.
010
5. Muccini, H., Spalazzese, R., Moghaddam, M.T., Sharaf, M.: Self-adaptive IoT archi-
tectures: an emergency handling case study. In: Proceedings of the 12th European
Conference on Software Architecture: Companion Proceedings, p. 19. ACM (2018)
6. Muccini, H., Arbib, C., Davidsson, P., Tourchi Moghaddam, M.: An IoT software
architecture for an evacuable building architecture. In: Proceedings of the 52nd
Hawaii International Conference on System Sciences (2019)
7. Arbib, C., Arcelli, D., Dugdale, J., Moghaddam, M., Muccini, H.: Real-time emer-
gency response through performant IoT architectures. In: International Conference
on Information Systems for Crisis Response and Management (ISCRAM) (2019)
8. Fayyaz, M., Vladimirova, T.: Survey and future directions of fault-tolerant dis-
tributed computing on board spacecraft. Adv. Space Res. 58(11), 2352–2375 (2016)
9. Petersen, K., Vakkalanka, S., Kuzniarz, L.: Guidelines for conducting systematic
mapping studies in software engineering: an update. Inf. Softw. Technol. 64, 1–18
(2015)
... This is done by consolidating the workloads onto fewer physical servers while maintaining performance and meeting the service requirements. Research shows that VM consolidation consists of 4 components: resource assignment policy, architecture [9] [10] [8], co-location criteria, and migration triggering point [4] ...
Conference Paper
Full-text available
The technology sector contributes a large amount of power consumption, specifically the cloud providers with their large data centers. Major cloud providers work toward sustainable data centers; however, organizations not building correct cloud architectures could contribute to large amounts of wasted power. This paper aims to review the research conducted into increasing the energy efficiency of cloud-based systems and further simulate a real-world architecture solution , analyzing its energy efficiency and providing improvement solutions. The literature review provides insights into cloud-based architectures and resources. The simulations show that the use case could save upwards of 74% of power by using a proper scaling broker and upwards of 85.4% with an improved virtual machine consolidation algorithm.
... Nguyen et al. [16] discussed novel approaches, such as voting and interpolating techniques, to mitigate fault readings from IoT devices. Moghaddam et al. [17] classified existing architectural-level fault-tolerance techniques for IoT applications, including replication, network control, distributed recovery block, and time redundancy. ...
... They claimed that it is pertinent to implement the tool to test more patterns for IoT systems. The researchers also discussed the use of fault-tolerant technique at different layers i.e., device, network, and cloud [50]. ...
Article
Full-text available
As the Internet of Things (IoT) grows, its failures may have dramatic consequences on the lives of people who depend on it. Yet, it is hard to test IoT systems before they are deployed. Several researchers have provided state-of-the-art approaches for testing IoT systems. However, many of those approaches are based on academia rather than industry. Therefore, we conducted a multi-method study of IoT systems testing in the industry with IoT practitioners. We used three methods: 1 an industry survey, 2 practitioners interviews, and 3 analysis of Eclipse IoT surveys. This study focuses on testing IoT systems by industry practitioners. The findings show that 1 testing focuses more on the device, network, and application layers. IoT testing gives more importance to integration testing than acceptance testing. Test coverage is the most important metric, but metrics may vary depending on the project. 2 IoT system testing mainly uses the model-based approach and is often manual or semi-automated, with low adoption of white box testing. Node-RED is commonly used in testing IoT systems, while Amazon AWS IoT is popular for cloud platform testing of IoT devices. 3 Log analysis is the main approach to analyzing the root cause of bugs. 4 The main challenges in IoT testing include the lack of standards, security, connectivity, and reference architecture. Generating test cases and establishing a standard test approach are recommended for further research. This studyfs findings can help IoT practitioners and researchers to identify and tackle challenges in IoT system testing, leading to future research opportunities.
Conference Paper
Full-text available
With the emergence of cloud computing in the IT industry, more companies than ever are moving resources to the cloud. Cloud consultants currently help companies by relying on experience. Still, this approach often leads to a poor migration job, typically resulting in over-provisioning expensive resources for the clients' setups. This paper investigates the integration of cloud simulators into the consulting process to assist consultants in making data-driven recommendations to their clients. The simulations could provide metrics on cost-effectiveness, performance, and energy efficiency. The paper is initiated by a systematic review of state-of-the-art cloud simulators, followed by an industry survey to address the identified knowledge gaps. This formed the basis for a practical use case identified through collaborative efforts with Eficode, a consultancy company. The evaluations reveal that simulating cloud architectures has significant potential to help the consultants by facilitating a data-driven approach, allowing them to help companies achieve cost savings and optimize resource allocation during cloud migration.
Article
Full-text available
High-performance embedded systems with powerful processors, specialized hardware accelerators, and advanced software techniques are all key technologies driving the growth of the IoT. By combining hardware and software techniques, it is possible to increase the overall reliability and safety of these systems by designing embedded architectures that can continue to function correctly in the event of a failure or malfunction. In this work, we fully investigate the integration of a configurable hardware vector acceleration unit in the fault-tolerant RISC-V Klessydra-fT03 soft core, introducing two different redundant vector co-processors coupled with the Interleaved-Multi-Threading paradigm on which the microprocessor is based. We then illustrate the pros and cons of both approaches, comparing their impacts on performance and hardware utilization with their vulnerability, presenting a quantitative large-fault-injection simulation analysis on typical vector computing benchmarks, and comparing and classifying the obtained results. The results demonstrate, under specific conditions, that it is possible to add a hardware co-processor to a fault-tolerant microprocessor, improving performance without degrading safety and reliability.
Chapter
Full-text available
The Internet of Behaviors (IoB) approach supports developing socio-technical systems based on humans’ goals, characteristics, behaviors, and emotions. This paper shows how emotions and behaviors could impact the quality of software systems. We propose interactive control loops that supervise application and architecture adaptations toward enhancing the system quality of service (QoS) and human quality of experience (QoE). Under the IoB conceptual model, we first show how historical and real emotions could be the source of the design and adaptation of socio-technical systems. We further use a Reinforcement Learning (RL)-based approach as a self-adaptation supervisor of user interfaces (UIs) to users’ emotions. The approach aims to maximize applying the essential adaptations and minimize the unnecessary ones towards users’ QoE. If the control system detects a drop in QoS in emotion-based adaptations or other functions, another level of adaptation reconfigures the architecture towards better quality. We used the emotional IoB approach to develop a mobile application as a recommender system in emergency evacuation training. The app takes users’ facial emotions and positions as input and adapts its UI to impact users’ target emotions and task completion. In addition to UI adaptation, the system supports architecture adaptations to decrease response time if required. The evaluation process confirms the efficiency of the RL in iterations, as well as compared to other possible UI adaptation techniques. The results also show that architecture adaptations positively impact the system performance and users’ emotions and performance.KeywordsInternet of BehaviorsEmotionsQuality of ExperienceQuality of ServiceSoftware ArchitectureUser InterfaceAdaptation
Chapter
Full-text available
Disaster risk management requires new approaches and mechanisms to improve citizens’ safety in disasters. The Internet of Things (IoT) is among the technologies that could enhance awareness by providing real-time information. When an emergency happens, building occupants need to be evacuated to safe areas in the shortest possible time. Optimization algorithms could receive humans’ mobility data from IoT resources and calculate the best route to follow. The algorithm we present in this chapter formulates and solves a linearized, time-indexed flow problem on a network that represents feasible movements of people at a suitable frequency. We evaluate the performance of the IoT system, including the algorithm, to confirm compliance with real-time use. While the optimization method gives a best case scenario, it does not reflect actual human behavior in evacuation. Humans may stay calm and follow our IoT system’s instructions, but they may also have different characteristics and contexts or experience panic attacks, or emotional and social attachment. Thus, we recreate our scenarios with agent-based social simulations, which model occupants as computational agents in an artificial society. The simulations give insights towards a more efficient IoT infrastructure design. We apply our approach to a real location with actual data to prove its feasibility.KeywordsEmergency managementInternet of thingsInternet of behaviorsBuilt environmentsPerformanceSoftware architectureAgent-based modelingHuman behavior modelingSocial attachmentSimulationOptimizationNetwork flow
Conference Paper
Full-text available
This paper describes the design of an Internet of Things (IoT) system for building evacuation. There are two main design decisions for such systems: i) specifying the platform on which the IoT intelligent components should be located; and ii) establishing the level of collaboration among the components. For safety-critical systems, such as evacuation, real-time performance and evacuation time are critical. The approach aims to minimize computational and evacuation delays and uses Queuing Network (QN) models. The approach was tested, by computer simulation, on a real exhibition venue in Alan Turing Building, Italy, that has 34 sets of IoT sensors and actuators. Experiments were performed that tested the effect of segmenting the physical space into different sized virtual cubes. Experiments were also conducted concerning the distribution of the software architecture. The results show that using centralized architectural pattern with a segmentation of the space into large cubes is the only practical solution.
Article
Full-text available
The emerging technology breakthrough of the Internet of Things (IoT) is expected to offer promising solutions for indoor/outdoor healthcare, which may contribute signi_cantly to human health and well-being. In this paper, we investigated the technologies of healthcare service applications in telemedicine architecture. We aimed to resolve a series of healthcare problems on the frequent failures in telemedicine architecture through IoT solutions, particularly the failures of wearable body sensors (Tier 1) and a medical center server (Tier 3). For improved generalisability, we demonstrated an effective research approach, the fault-tolerant framework on mHealth or the so-called FTF-mHealth-IoT in the context of IoT, to resolve essential problems in current investigations on healthcare services. First, we propose a risk local triage algorithm known as the risk-level localization triage (RLLT), which can exclude the control process of patient triage from the medical center by using mHealth and can warn about failures related to wearable sensors. RLLT performs this initial step towards detecting a patient's emergency case and then identifying the healthcare service package of the risk-level. Second, according to the risk-level package, our framework can aid decision makers in hospital selection through multi-criteria decision making (MCDM). Accordingly, mHealth can connect directly with the servers of distributed hospitals to ascertain available healthcare services for the risk-level package in those hospitals. The time of arrival of the patient at the hospital (TAH) is considered for each hospital to reach a _nal decision and select the appropriate institution in case of medical center failure. This paper used two datasets. The _rst dataset involved 572 patients with chronic heart disease. Their triage levels were evaluated using our RLLT algorithm. The second dataset included hospital healthcare services with two levels of availability within distributed hospitals to show variety when testing the proposed framework. The former dataset is an actual dataset of services collected from 12 hospitals located in the capital Baghdad, which represents the maximum level of availability. The latter is an assumption dataset of the services within the 12 hospitals located in the capital Kuala Lumpur, which represents the minimum level of availability. Subsequently, the hospitals were prioritized using a unique MCDM method for estimating small power consumption, namely, the analytic hierarchy process (AHP), based on a crossover between the ``healthcare services package/TAH'' of each hospital and the ``hospital list''. The results showed that the AHP is effective for solving hospital selection problems within mHealth. The implications of this study support the patients, organizations, and medical staff in a modern lifestyle.
Conference Paper
Full-text available
This paper presents a computational component designed to improve and evaluate emergency handling plans. In real-time, the component operates as the core of an Internet of Things (IoT) infrastructure aimed at crowd monitoring and optimum evacuation paths planning. In this case, a software architecture facilitates achieving the minimum time necessary to evacuate people from a building. In design-time, the component helps discovering the optimal building dimensions for a safe emergency evacuation, even before (re-) construction of a building. The space and time dimension are discretized according to metrics and models in literature. The component formulates and solves a linearized, time-indexed flow problem on a network that represents feasible movements of people at a suitable frequency. The CPU time to solve the model is compliant with real-time use. The application of the model to a real location with real data testifies the model capability to optimize the safety standards by small changes in the building dimensions, and guarantees an optimal emergency evacuation performance.
Article
Full-text available
This paper investigates distributed continuous-time fault estimation for multiple devices in Internet of Things (IoT) networks by using a hybrid between cooperative control and state prediction techniques. Firstly, a mode-dependent intermediate temperature matrix is designed, which constructs an intermediate estimator to estimate faulty temperature values obtained by the IoT network. Secondly, the continuous-time Markov chains transition matrix and output temperatures are considered and sufficient conditions of stability for auto-correct error of the IoT network temperatures. Moreover, faulty devices are replaced by virtual devices to ensure continuous and robust monitoring of the IoT network, preventing in this way false date collection. Finally, the efficiency of the presented approach is verified with the results obtained in the conducted case study.
Conference Paper
Full-text available
Fog computing provides computing, storage and communication resources at the edge of the network, near the physical world. Subsequently, end devices nearing the physical world can have interesting properties such as short delays, responsiveness, optimized communications and privacy. However, these end devices have low stability and are prone to failures. There is consequently a need for failure management protocols for IoT applications in the Fog. The design of such solutions is complex due to the specificities of the environment, i.e., (i) dynamic infrastructure where entities join and leave without synchronization, (ii) high heterogeneity in terms of functions, communication models, network, processing and storage capabilities, and, (iii) cyber-physical interactions which introduce non-deterministic and physical world’s space and time dependent events. This paper presents a fault tolerance approach taking into account these three characteristics of the Fog-IoT environment. Fault tolerance is achieved by saving the state of the application in an uncoordinated way. When a failure is detected, notifications are propagated to limit the impact of failures and dynamically reconfigure the application. Data stored during the state saving process are used for recovery, taking into account consistency with respect to the physical world. The approach was validated through practical experiments on a smart home platform.
Article
Innovating business processes involves cutting-edge technologies where the Internet of Things (IoT) and Blockchain are technological breakthroughs. IoT is envisioned as a global network infrastructure consisting of numerous connected devices over the Internet. Many attempts have been made to improve and adapt business workflows for best utilizing IoT services. One possible solution is to digitize and automate internal processes using IoT services, in which Blockchain smart contract is a viable solution to establish the trust of process executions without intermediaries. Modern business processes are composed of disparate services; many of them tend to be delivered based on IoT. Interoperating with such services poses major challenges: 1) time for finality settlement of transactions is unpredictable and usually experiencing delay; 2) several implementations of permissioned Blockchain pose a major concern of trust regarding nodes that perform consensus; and 3) trust of process executions and IoT information is the major factor to the success of modern business processes, which require the composition of distributed IoT services. Traditional business processes are mostly managed by a single entity, which induces the problem of trust of process executions. In this paper, a smart contract for establishing the trust of process executions that fits into the IoT environment is presented. A consensus approach with selected validators extended from Practical Byzantine Fault Tolerance (PBFT) is introduced to address time and prejudice challenges.
Article
The Future Internet will be able to connect most of the objects that are not yet connected on the current Internet. The Internet of Things (IoT) is an important part of the Future Internet and involves connectivity between several physical and virtual objects, allowing the emergence of new services and applications. These intelligent objects, along with their tasks, constitute domain-specific applications (vertical markets), while ubiquitous and analytic services form independent domain services (horizontal markets). The development of these applications and services in these markets brings challenges such as deployment, scalability, integration, interoper-ability, mobility and performance. Recent research indicates that Microservices has been successfully applied by companies such as Netflix and SoundCloud to address some of these issues in their cloud computing applications. However, in the field of IoT, the use of Microservices to deal with these challenges still presents unresolved issues. In this paper, we present a reactive Microser-vices architecture and apply it in a Fog Computing case study to investigate these challenges at the edge of the network. Finally, we evaluate our proposal from the perspective of performance of Microservices provided by intelligent objects (IoT gateways) at the edge of the network.
Conference Paper
Blockchain technology has been increasingly used for decentralizing cloud-based Internet of Things (IoT) architectures to address some limitations faced by centralized systems. While many existing efforts are successful in leveraging blockchain for decentralization with multiple servers (full nodes) to handle faulty nodes, an important issue has arisen that external clients (also called lightweight clients) have to rely on a relay node to communicate with the full nodes in the blockchain. Compromization of such relay nodes may result in a security breach and even a blockage of IoT sensors from the network. We propose censorship resistant decentralized IoT management systems, which include a "diffusion" function to deliver all messages from sensors to all full nodes and an augmented consensus protocol to check data loss, replicate processing outcome, and facilitate opportunistic outcome delivery. We also leverage the cryptographic tool of aggregate signature to reduce the complexity of communication and signature verification.