ArticlePDF Available

Routing Protocols Based on Reinforcement Learning for Wireless Sensor Networks: A Comparative Study

Authors:

Abstract

A carefully designed routing protocol can facilitate wireless sensor networks (WSNs) with improved performance in terms of end-to-end delay, energy consumption, and packet delivery ratio significantly. As the application of WSNs lies in critical operations with limited irreplaceable batteries, routing protocols are needed to be flawless, reliable, and energy-efficient. The complex and dynamic environment of WSNs also requires intelligent routing algorithms. Over the past few years, reinforcement learning (RL) algorithms have been used for designing routing protocols in WSNs so that energy consumption can be lessened and network performance can be improved. In this paper, different RL-based routing protocols are surveyed and qualitatively compared with each other. That is, the RL-based routing protocols are addressed with respect to their operating principles and key features, and they are compared with each other regarding significant characteristics, and pros and cons. A discussion on some challenging issues of designing and implementing RL-based routing algorithms in WSNs is also presented.
1
Routing Protocols Based on Reinforcement Learning for Wireless Sensor
Networks: A Comparative Study
Arafat Habib, Muhammad Yeasir Arafat, Sangman Moh*
Dept. of Computer Eng., Chosun Univ., 309 Pilmun-daero, Dong-gu, Gwangju, 61452 South Korea
akhtab007@gmail.com, yeasir08@yahoo.com, smmoh@chosun.ac.kr*
Corresponding author*: Phone: +82-62-230-6032
Abstract
A carefully designed routing protocol can facilitate wireless sensor networks (WSNs) with improved
performance in terms of end-to-end delay, energy consumption, and packet delivery ratio significantly.
As the application of WSNs lies in critical operations with limited irreplaceable batteries, routing
protocols are needed to be flawless, reliable, and energy-efficient. The complex and dynamic
environment of WSNs also requires intelligent routing algorithms. Over the past few years,
reinforcement learning (RL) algorithms have been used for designing routing protocols in WSNs so that
energy consumption can be lessened and network performance can be improved. In this paper,
different RL-based routing protocols are surveyed and qualitatively compared with each other. That is,
the RL-based routing protocols are addressed with respect to their operating principles and key
features, and they are compared with each other regarding significant characteristics, and pros and
cons. A discussion on some challenging issues of designing and implementing RL-based routing
algorithms in WSNs is also presented.
Keywords: Wireless sensor network, routing protocol, reinforcement learning, Q-learning, energy
consumption, end-to-end delay, packet delivery ratio.
1. Introduction
A wireless sensor network (WSN) consists of autonomous devices containing small sensor nodes that are
distributed spatially for specific applications. The sensor nodes are stationary in most cases and are subject to
limited human intervention. WSNs have very crucial and critical applications like wildlife monitoring, battlefield
surveillance, response to the disasters, and radioactive radiation monitoring [1]. In WSNs, the sensed data should
flow towards the base station or sink node. Fast and reliable communication is required in many WSN
applications. Limited resources such as battery, energy and storage capacity should also be considered carefully.
Energy is the most prominent resource because network lifetime depends on it.
In WSNs, reinforcement learning (RL) based routing protocols can improve network lifetime as well as
general performance [2]. Some existing survey papers ([3], [4] and [5]) review the routing protocols based on
artificial intelligence (AI) technique. RL-based routing protocols are reviewed in part in [2], [3] and [6]. In [2] and
[3], however, any comparative study among the discussed protocols is not addressed. In [6], only five protocols are
addressed with limited information regarding to their features and characteristics.
In this paper, the latest RL-based routing protocols in WSNs are extensively surveyed. The effect of RL in
routing for WSNs is investigated, and the major features and operational characteristics of existing RL-based
routing protocols are reviewed protocol by protocol, followed by the comparative analysis of the discussed
protocols in a qualitative manner. Some challenging design issues are also discussed.
Rest of the paper is organized as follows: In the following section, RL for WSNs is introduced. In Section 3,
the latest RL-based routing protocols are presented with respect to functional principle, basic operation, and
distinctive characteristics. In Section 4, the reviewed protocols are compared qualitatively, and the comparison is
summarized and discussed. Section 5 covers the challenging design issues in brief. The conclusion of the paper is
drawn in section 6.
2. Concept of Reinforcement Learning
Reinforcement learning falls into a criterion of the machine learning algorithms where an agent goes
through interactions with the environment learning what to do step by step yielding some kind of rewards. Its
primary goal is to learn the environment to find optimal policy by process of maximizing rewards. An agent in RL
2
learns which action can lead to optimality by itself by this simple efficient concept of maximizing rewards for the
chosen action. This type of learning considers a scenario with a completely goal-directed agent where the
environment is totally uncertain. Some important concepts of an RL system are discussed below:
Policy: A policy in RL refers to the concept of taking certain actions being in some particular states. It
denotes the behavior of an agent in the way it learns the environment. Basically, it is state-action mapping that an
agent learns through time. Policies are the core of RL and can be both deterministic and stochastic.
Reward Function: The reward function considers a state action pair as a single entity, and it can have a
transition from that state considering the action to another state with a certain reward that is considered as intrinsic
desirability here. The only one objective that an RL agent has is to maximize its reward as time passes by in its
learning period.
Value Function: The main difference between reward function and value function is that the reward
function considers only the immediate sense of reward whereas the value function considers long-term reward.
Summation of rewards gathered by an agent being in a particular state refers to the value of that state if the agent
commences learning from that very state. A state may yield very low immediate reward but still be very important
because it may link to the sates yielding higher rewards. The opposite is also possible.
Environment model: It consists of states, and the paradigm represents the behavior of a particular
environment. Through this, an agent can decide its next state and action through a reward function.
Routing approaches based on RL in WSNs are mostly based on Q-learning. It is a kind of reinforcement
learning that starts from any random state and takes any random action in the environment. It chooses the action
according to the policy derived from Q and observes the next state and reward. It loops and continues to learn the
environment until it finds the optimal policy.
3. RL-based Routing Protocols for WSNs
Routing protocols for WSNs should be aware of energy consumption, end-to-end delay, PDR and prolonged
network lifetime. RL-based approaches towards routing in this type of network also consider the routing metrics.
In this section, these RL-based routing protocols will be discussed and reviewed extensively.
3.1. Feedback Routing for Optimizing Multiple Sinks in WSNs with Reinforcement
Learning (FROMS)
In [7], a routing protocol on the basis of Q-learning is presented. It elaborates the process of sharing a
node’s local information to other neighboring nodes as feedback without causing any network overhead. The
network used for deploying this protocol has a multi-sink architecture. The problem is formulated so that it could
be solved with Q-Routing [8].
The best thing about this routing protocol is that it considers both exploration and exploitation strategies to
find an optimal route. Only exploitation may lead to a local optimal solution on the contrary, excessive exploration
overhead may lengthen the time elapsed for route discovery.
Figure 1. Routing scenario in FORMS
Figure 1 shows an experimental network consisting of two sink nodes and one source node. The solid
arrows in the figure represent the best-shared-route and the dashed arrows represent point-to-point routes. In Figure
1, the leftmost source node s has three neighbor nodes (nodes 1 to 3). Data transmission to the sink node from the
source node has to be done via neighbor nodes. If the source node chooses neighbor node 1 to transmit data, the
3
data travels three hops to reach sink node A and five hops to sink node B. If it chooses neighbor node 2 to transmit
data, the data travels four hops to reach sink node A and four hops to sink node B. Lastly, if it chooses neighbor
node 3 to transmit data, the data travels five hops to reach sink node A and three hops to sink node B. Considering
hop count, choosing neighbor node 2 will lead to the optimal route in Figure 1.
The unique contribution of FORMS [7] is that multiple sinks are taken into consideration in the network
design and significantly reduces network overhead by using the multiple sink nodes. Also, FORMS has a recovery
process if any node fails. On the other hand, FORMS is prone to node failure and sink mobility can lead to routing
errors.
3.2. Multi-agent Reinforcement Learning based Routing Protocol with Quality of
Service (QoS) Support for WSN (MRL-QRP)
In this routing protocol [9], QoS routes are cooperatively computed using a distributed value function.
Global optimization can be achieved through local information about the network and the exchange of values
regarding states with the neighboring nodes. In [9], two things are checked before a node sends any data packet.
First, it checks the packet to look up at the QoS requirements. Next, it checks the Q-value table. After that, the
packet is transmitted to a neighboring node having the Q-value that is higher than other forwarding nodes.
Figure 2. Multi-hop wireless sensor network for MRL-QRP
In Figure 2, a data packet is originated by sensor node i0. When the learning process is going on, i0 sends the
packet to node i1 by random selection. After that, i1 forwards the packet to i2. This process goes on until node in
receives the data packet, where node iN is the last node attached to sink in the routing process.
In [10], it is claimed that MRL-QRP has superiority over a well-established conventional protocol for
WSNs, which is ad hoc on demand distance vector (AODV) routing protocol. MRL-QRP [9] performs
exceptionally well when the traffic load is heavy. It considers end-to-end delay, PDR, and energy consumption as
routing metrics.
3.3. Reinforcement Learning as Adaptive Network Routing of Mobile Agents
In this routing protocol, mobile agents that traverse routing nodes target to quest for the optimal path at each
time step for decreasing the service processing time. Movements of the agents are designed in a way so that
congestions are avoided. The Q-learning algorithm implemented in this protocol learns policies online. Mobile
agents are routed, and changes in the traffic patterns, network load levels, and topologies are promptly dealt with
through incorporation into the routing policies. System topology is determined by the agent through network
discovery. After that, gained information is stored within the nodes. Q-routing exploits Q-learning and was first
proposed by Boyan and Littman [11].
Figure 3. Agent system
Figure 3 presents an agent system used in [8]. R denotes the intelligent router nodes in the figure. MA stands
for mobile agents, and SP represents service providers. R forwards MA from the input link towards next R
depending on the concrete configuration, and it determines its output link by observing attained Q values. Having
4
received MA immediately, R sends the reply with predicted route towards previous R. The process goes on until the
MA had found SP and SP then performs the requested task on behalf of MA [8]. The right-ended arrows in Figure 3
represent the movement of MA. Movement of replies starting from SP is denoted using the left-ended arrows. Each
R manages its input and output links. The last node in the figure is SP that performs the final processing of mobile
agents.
The best part of this routing technique is that it can adapt to frequent topology changes and works fine with
varying network traffic. Also, accurate measurement of service processing time along a particular route makes the
protocol more efficient. In [8], no comparison was shown with the standardized routing protocols to establish the
superiority of the proposed routing algorithm. Moreover, average service processing time is high in this routing
approach because it has to learn the whole network in order to perform routing. Routing metrics taken under
consideration in [8] are PDR and latency.
3.4. Energy-aware QoS Routing using RL (EQR-RL)
In [12], decision on routing relies on resource availability in the network and QoS requirements. For
sending a data packet to the sink node, the sending node decides which intermediate node to go for using the
routing table. For the ease of explanation, let us consider the node to be i. Node i takes into account some
neighboring nodes that have been denoted as s. The nodes in this set s should fulfill the minimum QoS
requirements. After this in [12], a node from the set s is chosen with the help of a load balancing algorithm. The
proposed algorithm was implemented in a scenario where there were periodic and long lasting transmissions to the
sink node. The algorithm in this protocol is also scalable. In [12], to retrieve the information of the neighboring
nodes, a node can look up at the header of the data packet obtained by them. The information includes failure of
any node, removal of any node from neighborhood, and link quality. Neighbors can have a look at the data packet
header to update the routing table when new nodes are added. Also, a neighbor node is excluded from the routing
table if there is no response from it.
EQR-RL supports multiple QoS requirements that include latency, geographical distance, and the number of
hops for data delivery. Also, it can handle mobility and failure recovery of nodes. EQR-RL supports mobile sink
nodes as well. To implement reinforcement learning for any network scenario, a balance between exploration and
exploitation strategies are needed. Only exploration will lead to routing overhead, but it is definite that an optimal
route will be found. Considering exploitation strategies only will lead to a faster route finding in the network but
with a good amount of probability that the route will not be optimal. Authors in [12] consider exploration strategies
only. Network scenario in [12] consists of a few mobile nodes. It is claimed that the proposed protocol improves
lifetime of the network, PDR and end-to-end delay in comparison to [13], [14] and [9].
3.5. Distributed Adaptive Cooperative Routing Protocol (DACR)
In [15], authors propose a protocol that can ensure QoS requirements by decreasing delay and increasing
reliability. Using the AODV protocol, a route from source to destination is established in this discussed protocol.
Data transmission is possible either in a direct manner or in a relayed manner. Based on the energy level of the
node, one among these two transmission modes is elected. If relayed transmission is chosen, the protocol then
considers residual energy, reliability and delay as important criteria.
All routing nodes learn these criteria using lightweight RL. After this, it uses a lexicographic optimization
[16] to find the optimal relay. The contribution of this protocol is summarized below:
No central control is needed for DACR and nodes can be deployed in a distributive way. Global information
on channel state condition is also unnecessary for this protocol. That is why DACR has relatively lower
network overhead.
It is also shown that selecting the relay in a proactive way is more efficient in comparison to selecting
relays in a reactive manner.
Finally, the process for finding out routes and relays could save large amount of energy.
In this protocol, the process of discovering routes and relays can save significant amount of energy. One big
flaw for this protocol can be it's being too much complex because it uses an RL algorithm and transmission mode
selection algorithm in the routing process. It considers energy consumption and network lifetime as routing metrics.
3.6. Routing Protocol to Optimize Network Lifetime (RLLO)
The protocol proposed in [17] utilizes the superiority of RL. After receiving a data packet, a node analyzes
the information. After this, there are two possible actions the node can choose. One is to drop the packet if the node
finds out the next forwarding node does not have a route to destination and the sink node is beyond the
transmission range of the node. The node directly sends the data to the sink node in this condition given that it has
enough energy and the sink node is within the range. Another is to send the packet to the next forwarding node if
the forwarding node has a valid route to destination. The node with the highest Q-value is always selected among
the possible next forwarding nodes.
When a node in RLLO executes an action, it receives a reward. The Q-value of the node changes after
5
obtaining the reward. The reward is used to update the Q-value of the node. The reward function is designed with
residual energy, which leads to an optimal balance of energy usage. Protocol proposed in [17] is highly flexible to
topology changes. Also, it achieves global optimization of the network without any additional cost. The biggest
flaw of this routing approach is that it is prone to network isolation because nodes having low energy deliberately
drop packets. It can also lead to additional latency. To prove the superiority of [17], it was with compared with
[18], considering PDR and network lifetime as performance metrics.
3.7. RL-based Routing Protocol for Multi-hop WSNs
Routing approach proposed in [19] extends Q-routing algorithm [11] for its implication in WSNs.
Optimization of network lifetime is achieved by balancing the routing effort within sensor nodes. This routing
approach minimizes the control overhead too, and current residual batteries are also taken into account. Routing
approach in [19] is designed and implemented for plenary exploration scenarios with a goal to bring satellite and
WSN technologies in space. An example scenario for the implementation of this routing technique can be SWIPE
project [20]. The main concept lies in deploying hundreds or thousands of small sensor nodes with some of them
having abilities for satellite communication. Other which do not have abilities for satellite communication will be
responsible for on surface ad hoc network. Retrieved data will be sent to satellite first after processing and earth
after that.
Q-routing updates Q value through the following function:
                (1)
where s is a random state, a is a random action and R is the reward achieved for choosing action s and a. 
are next state and action after s and a. Here, learning rate is denoted as α and it fixes how much the older
information will be replaced by the new information. Value of this parameter is: 0 ≤ α 1. ’ stands for discount
factor. Importance of the future rewards is determined by it. Value of this parameter is 0 ≤
1. When a node
transmits a packet to the next forwarding node, routing table gets updated. The neighbor node should back
acknowledge (ACK) after that to node i. The update process is done through function described in (1). Node i
requires the data from neighbor nodes only. As Q function gets updated, an assumption of the overall network is
gained if the network topology remains unchanged. The proposed protocol was compared with different versions
of Q-routing.
4. Comparative Analysis of the Routing Protocols based on RL
In this section, the protocols discussed in Section 3 are qualitatively compared with each other. In WSNs,
energy is the most important issue [21]. The target is to reduce wastage of the energy that affects the network
lifetime. Increased PDR can provide more reliable communication whereas decreased end-to-end delay will
remove unwanted latency. Routing metrics that are specific to certain protocols have been mentioned in routing
metric column of Table 1. There can be many different cases and network scenarios where researchers may be
interested to apply a particular routing protocol. The column of outstanding features in the table can be helpful for
that. Good exploration and exploitation strategies are necessary to implement RL algorithms for routing perfectly.
As implementing RL is not a straightforward work, it is also necessary to know that if any routing protocol creates
much network overhead and complex routing than usual due to RL implementation. Though mobility is not a must
in WSNs, it is also a matter of observation that how well an RL based routing algorithm performs if mobility is
introduced to a few of the nodes. Some routing protocols can work better than others under some particular routing
and QoS metrics, and some may create network isolation due to the irresponsive nodes. In Table 1, the competitive
advantages and inherent limitations of each routing protocol are summarized.
Protocol
Routing
metrics
Outstanding features
FROMS [7]
PDR, Energy
expenditure
1. Significantly
decreases the network
overhead.
2. Two nodes can
contact each other in a
full duplex manner.
Table 1: Comparative analysis of the routing protocols based on RL in WSNs
6
Routing
protocol for
adaptive
network
environment
using RL
[8]
PDR and
latency
1. Works well in
frequently changing
network topologies.
2. Synchronized
routing in order to
achieve shortest
delivery and service
processing time.
MRL-QRP
[9]
PDR and delay
1. Sensor nodes
cooperatively compute
routes.
2. Exploits each node
for gaining sufficient
network knowledge to
choose the best hop.
3. Sensor nodes adjust
the probability of
environment according
to node mobility and
wireless channel
condition.
EQR-RL
[12]
Remaining
energy, PDR
1. Nodes are made
aware of the data
needed and the sink
node uses a control
flooding algorithm.
2. Scalability,
flexibility and load
balancing enabled
routing protocol.
3. To keep the routing
table up-to-date, nodes
participating in data
communication can
adopt the header of the
packet.
DACR [15]
Energy
consumption,
PDR, network
lifetime
1. Routing is done in a
cooperative manner.
The protocol can also
adapt to frequent
changes as time passes
by.
2. Multi-hop path from
source to destination.
3. Low network
overhead.
7
5. Challenging Open Issues
Routing in WSNs is always challenging. When it comes to implying RL for routing protocol design, there
are many challenges to be dealt with. These challenges are discussed in this section.
Fixing learning rate: Learning rate in any RL technique plays a prime role to find convergence as it decides
how new information replaces the older information. This value is often needed to vary to compute the Q value.
Inappropriate value of this parameter may lead to inconvenience towards convergence.
Balance between exploration and exploitation strategies: This balance is highly important for applying RL
to any scenario. Optimal route computation is something that is expected in any newly designed routing protocol.
Only exploration will lead to optimality, but it will lead to routing overhead. Again, only exploitation may lead to a
route that is not optimal.
Convergence speed: Convergence of any RL algorithm is the vital most criteria of successfully
implementing RL to any system. After adequate amount of learning episodes, the amount of generated optimum
reward is same always if the algorithm has successfully converged. Successful convergence also depends on the
formulation of the reward function too. Convergence speed matters when faster computation is needed in route
computation.
Security: WSNs are prone to security problems like DDoS attacks, eavesdropping, sinkhole attack, etc. Data
confidentiality is one of the security mechanisms for WSNs. Sensor nodes should not allow their neighbors to have
unauthorized access to their readings [22]. Sinkhole attack is another security issue that is quite challenging to
handle through routing protocol design based on RL. In this kind of attack, a corrupted node looks more beneficial
as a next forwarder since the routing information has already been manipulated by the invader.
Data redundancy: As we have already discussed that WSNs are resource constrained networks, data
redundancy can even lead to more resource issues. In order to prevent that, we have to make sure that the acquired
data is fresh enough and any old data is not resent. Existing works that use RL based algorithms to design routing
protocols do not cover the issue of data redundancy and solving this issue can be a challenging research task.
Energy hole problem: This issue is very important to solve through routing techniques to keep any WSN
fully functional. Nodes staying close to the sink node die faster than the other nodes in this case since it has to pass
data to the sink node more frequently [23].
QoS metrics: Some QoS metrics can be combined with the discussed routing metrics. Examples can be
reliability, packet loss, etc. Protocols discussed in this paper hardly cover this issue of combining routing metrics
along with QoS metrics.
Routing
protocol for
balancing
energy in
multi-hop
WSNs [19]
Energy
consumption
and network
lifetime
1. Designed and
implemented for
plenary exploration
scenarios.
2. Merges satellite and
WSN technologies in
space.
3. Embedded with
nodes for satellite
communication.
RLLO [17]
Network
lifetime and
PDR
1. Uses the superiority
of RL to achieve global
optimization
2. Considers hop count
to the sink to search for
the routing path
3. Evaluates the
approximate goodness
of every action.
8
6. Conclusion
In this paper, the latest RL-based routing protocols have been extensively reviewed in terms of major
features and characteristics, and qualitatively compared with each other. Even though RL may lead to computation
overhead, the existing RL-based routing protocols are promising in terms of providing better performance in terms
of major performance metrics such as energy consumption, end-to-end delay, PDR and network lifetime. The
comparative discussion of different RL-based routing protocols in this paper can be effectively used for choosing a
routing protocol or designing RL-based routing protocols for WSNs. Some challenging issues of designing and
implementing RL-based routing algorithms in WSNs have been also discussed. It can be concluded that a good
balance between exploration and exploitation strategies while implementing RL, faster convergence of the
algorithm, considering multiple routing metrics, QoS supports, resolving security issues should be considered
altogether to design a pragmatic RL-based routing protocol.
7. Acknowledgment
A preliminary short version of this work was presented at the 8th International Conference on Convergence
Technology, Jeju, Korea, July 2018 [24]. This work was supported in part by the Basic Science Research Program
through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (NRF-
2016R1D1A1A09918974), and it was also supported in part by the MIST (Ministry of Science and ICT), Korea
under the National Program for Excellence in SW supervised by the IITP (Institute for Information and
Communications Technology Promotion) (2017-0-00137). Correspondence should be addressed to Sangman Moh
(smmoh@chosun.ac.kr).
8. References
1.
Khan MI, Rinner B, Regazzoni CS. Energy-aware task scheduling in wireless sensor networks based on
cooperative reinforcement learning. Proceedings of 2014 IEEE International Conference on Communications
Workshops (ICC). 2012 Sep; 871 77; Liverpool, UK. DOI: 10.1109/ICCW.2014.6881310.
2.
Arya A, Malik, A, Garg, R. Reinforcement Learning based Routing Protocols in WSN: A survey. IJCSET. 2013
Nov; 4(11): 1401 - 4, 2013.
3.
Kulkarni P, Munot H, Malathi P. Survey on Computational Intelligence Based Routing Protocols in Wireless
Sensor Network. IJWMCIS. 2016 Feb; 3(2): 23-32. DOI: 10.21742/ijwmcis.2016.3.2.04.
4.
Guo W, Zhang W, A survey on intelligent routing protocols in wireless sensor networks. IJWMCIS. 2014;
38:185 201.
5.
Alsheikh MA, Lin S, Niyato D, Tan HP. Machine Learning in Wireless Sensor Networks: Algorithms,
Strategies, and Applications. Commun. Surveys Tuts. 2014 Apr; 16(4): 1996-2018. DOI:
10.1109/COMST.2014.2320099.
6.
Habib A, Arafat Y, Moh S. A Survey on Reinforcement-Learning-Based Routing Protocols in Wireless Sensor
Networks. Proceedings of 8th Int. Conf. on Convergence Technology (ICCT 2018). 2018 Jul; 8(1): 359-
360; Jeju, Korea.
7.
Förster A, Murphy AL. FROMS: Feedback routing for optimizing multiple sinks in WSN with reinforcement
learning. Adhoc Netw. 2011 July; 9(5): 940-65. DOI: 10.1016/j.adhoc.2010.11.006.
8.
Ouzecky D, Jevtic, D. Reinforcement as Adaptive Network Routing Mobile Agents. 2010 Proceedings of the
33rd International Convention. 2010; Opatija, Croatia.
9.
Liang X, Balasingham, I, Byun SS. A multi-agent reinforcement learning based routing protocol for wireless
sensor networks. Proceedings of IEEE International Symposium on Wireless Communication Systems. 2008
Oct; 552-557; Reykjavic, Iceland. DOI: 10.1109/ISWCS.2008.4726117.
10.
Available from: https://www.ietf.org/rfc/rfc3561.txt.
11.
Boyan J, Littman M. Packet Routing in dynamically changing networks: A Reinforcement Learning Approach.
Proceedings of Advances in Neural Information Processing System. 1994 Nov; 671-678; San Francisco, USA.
12.
Jaforzadeh SZ, Mughaddam MHY. Design of Energy-aware QoS Routing Algorithm in Wireless Sensor
Networks Using Reinforcement Learning. Proceedings of 2014 4th International Conference on Computer and
Knowledge Engineering. 2014 Dec; 722 727; Mashhad, Iran. DOI: 10.1109/ICCKE.2014.6993408.
13.
Gerasimov I, Simon R. A bandwidth-reservation mechanism for on-demand ad hoc path finding. Proceedings of
35th Annual Simulation Symposium. 2002 Aug; 27-34; San Diego, USA. DOI:
10.1109/SIMSYM.2002.1000079.
14.
Maalej M, Cherif S, Besbes, H. QoS and Energy Aware Cooperative Routing Protocol for Wildfire Monitoring
Wireless Sensor Networks. The Scientific World Journal. 2013.
15.
Razzaque MA, Alam MM, Rashid MM, Hong CS. Multi-constrained QoS geographic routing for heterogeneous
traffic in sensor networks, IEICE Trans. Commun. 2008; E91-B (08): 25892601.
9
16.
Zykina AV, A lexicographic optimization algorithm. 2004; Autom. Remote Control. 65: 36368.
17.
Guo W, Yan C, Gan Y, Lu T. An Intelligent Routing Algorithm in Wireless Sensor Networks based on
Reinforcement Learning. Applied Mech. and Mate. 2014; 678: 487-93.
18.
Shah R, Rabaey J. Energy aware routing for low energy ad hoc sensor networks. Proceedings of the IEEE
Wireless Communications and Networking Conference (WCNC). 2002 March; 350-355; Orlando, FL. DOI:
10.1109/WCNC.2002.993520.
19.
Oddi G, Pietrabissa A, Liberati F. Energy balancing in multi-hop Wireless Sensor Networks: an approach based
on reinforcement learning. Proceedings of 2014 NASA/ESA Conference on Adaptive Hardware and Systems
(AHS). 2014 July; 262-269; Leicester, UK. DOI: 10.1109/AHS.2014.6880186.
20.
European Commission. Space Wireless Sensor Networks for Planetary Exploration. [online] Available at:
https://cordis.europa.eu/project/rcn/108074_en.html [Accessed 24 Aug. 2018].
21.
Shah k, Kumar M. Distributed Independent Reinforcement Learning (DIRL) Approach to Resource Management
in Wireless Sensor Networks. Proceedings of IEEE Conference on mobile ad-hoc and sensor systems. 2007
October; 1-9; Pisa, Italy DOI: 10.1109/MOBHOC.2007.4428658.
22.
Carman DW, Krus PS, Matt BJ. Constraints and approaches for distributed sensor network security. 2000 Sep. 5-
126.
23.
Pathak A, Zaheeruddin, Tiwari, M. Minimizing the Energy Hole Problem in Wireless Sensor Networks by
Normal Distribution of Nodes and Relaying Range Regulation. 2012 Fourth International Conference on
Computational Intelligence and Communication Networks. 2012 Nov; 154-157; Mathura, India. DOI:
10.1109/CICN.2012.148.
24.
Habib A, Arafat Y, Moh S. A Survey on Reinforcement-Learning-Based Routing Protocols in Wireless Sensor
Networks. Proceedings of 8th Int. Conf. on Convergence Technology (ICCT 2018). 2018 Jul; 8(1): 359-
360; Jeju, Korea.
View publication statsView publication stats
... In order to save energy and enhance network performance, communication protocols in IoT devices and electronic remote monitoring have recently been designed using RL algorithms. 43 The Q-learning algorithm is a popular RL method. A Q-table is created first, followed by an action. ...
... 38 Each of the assumptions that the algorithm must make to determine how similar the points are results in a unique but equally valid cluster. 41,43 Yadav et al 67 suggest employing a clustering approach to find adversaries. The suggested gradient processing approach is highly practical for IoT devices since it does not need a lot of client computing power or bandwidth. ...
Article
Full-text available
The Internet of Things (IoT) is a network of interconnected smart objects having capabilities that collectively form an ecosystem and enable the delivery of smart services to users. The IoT is providing several benefits into people's lives through the environment. The various applications that are run in the IoT environment offer facilities and services. The most crucial services provided by IoT applications are quick decision for efficient management. Recently, machine learning (ML) techniques have been successfully used to maximize the potential of IoT systems. This paper presents a systematic review of the literature on the integration of ML methods in the IoT. The challenges of IoT systems are split into two categories: fundamental operation and performance. We also look at how ML is assisting in the resolution of fundamental system operation challenges such as security, big data, clustering, routing, and data aggregation.
... UAVs have gained significant research attention since their extensive use in military and civil areas, owing to their fast deployment and low cost. A large number of studies have been carried out in this field and have focused on the establishment of communication protocols and localization solutions (Arafat et al., 2020(Arafat et al., , 2021Arafat and Moh, 2019a, 2019b, 2019cHabib et al., 2018;Mach and Becvar, 2017). In Ref. (Arafat and Moh, 2019a), the authors studied cluster-based routing protocols and compared them to several qualitative parameters. ...
... In Refs. (Arafat and Moh, 2019b;Habib et al., 2018), the authors addressed the problems of designing an effective routing protocol for communications between UAV networks focusing on reinforcement learning (RL)-based routing protocols and those of UAV-based wireless sensor networks. The main advantage of UAVs is that they can be deployed anywhere, which enables them to support recovery operations in disaster-prone areas as well as augment several dynamic network technologies (Gupta et al., 2016). ...
Preprint
Full-text available
With the increasing growth of internet-of-things (IoT) devices, effective computation performance has become a critical issue. Many services provided by IoT devices (e.g., augmented reality, location-tracking, traffic systems, and autonomous driving) require intensive real-time data processing, which demands powerful computational resources. Mobile edge computing (MEC) has been introduced to effectively handle this problem reliably over the internet. The inclusion of a MEC server allows computationally intensive tasks to be offloaded from IoT devices. However, communication overhead and delays are major drawbacks. With the advantages of high mobility and low cost, unmanned aerial vehicles (UAVs) can mitigate this issue by acting as MEC servers. The offloading decisions for such scenarios involve service latency, energy/power consumption, and execution delays. For this reason, this study reviews UAV-enabled MEC solutions in which offloading was the focus of research. We compare the algorithms qualitatively to assess features and performance. Finally, we discuss open issues and research challenges in terms of design and implementation.
... For instance, in [14], RL algorithm is adopted for building autonomous IoT (AIoT). Further, RL is utilized in IoT systems to design routing protocols, and network performance is improved in WSNs along with reduction in energy consumption, which is witnessed by applying RL technique [64]. ...
Chapter
For decades, humans have been intrigued by the concept of an intelligent and independent self-learning machine. The idea behind Machine Learning (ML) is to simplify the development of analytical models such that, with the help of available data, algorithms can learn continuously. Internet of Things (IoT)-enabled devices are the major sources of data generation for creating all the data in a variety of ways. Making smart IoT applications involves intelligent processing and analysis of this generated data (Big Data). ML may be used in cases where the desired effect is defined (supervised learning) or where data itself is not defined beforehand (unsupervised learning) or where learning is the outcome of the interaction among the learning model and the environment (reinforcing learning). In this chapter, we present and discuss a classification of machine learning algorithms, which can be used in conjunction with IoT. Furthermore, how different machine learning techniques are applied to derive higher-level information from the data is illustrated. Lastly, we investigate the real-world IoT data characteristics that involve an interpretation of the data.
... Each sensor nodes (SNs) make a decision on the basis of observation state and decision making, which may result in intelligent behaviours. Further, learning and decisionmaking iterations are repeated until determining the optimal solution [17]. The proposed routing architecture mainly aims: ...
Article
Full-text available
For the past few years, huge interest and dramatic development have been shown for the Internet of Things (IoT) based constrained Wireless sensor network (WSN) to achieve efficient resource utilization and better service delivery. IoT requires a better communication network for data transmission between heterogeneous devices and an optimally deployed energy-efficient WSN. The clustering technique applied for WSN node deployment needs to be efficient; therefore, the entire architecture can obtain a better network lifetime. The entire network is partitioned into various clusters. Moreover, the cluster head (CH) selection process also needs proper attention to achieve efficient data communication towards the sink node via selected CH and increase the node reachability within the Cluster. An energy-efficient deep belief network (DBN) based routing protocol is developed in this proposed framework, which achieves better data transmission through the selected path. Due to this, the packet delivery ratio (PDR) gets improved. In this framework, the nodes in the whole network are initially grouped as clusters using a reinforcement learning (RL) algorithm, which assigns a reward for the nodes belonging to the particular Cluster. Then, the CH required for efficient data communication is selected using a Mantaray Foraging Optimization (MRFO) algorithm. The data is transmitted to the sink node via the selected CH using an efficient deep learning approach. Finally, the performance of the proposed deep network-based routing protocol is evaluated using different evaluation metrics: network lifetime, energy consumption, number of alive nodes, and packet delivery rate. Finally, the evaluated results are compared with a few existing algorithms. The proposed DBN routing protocol has achieved a better network lifetime among all these algorithms.
... RL demands low computational resources and implementation efforts, thus providing high flexibility to topological changes and near-optimal results, without requiring any a priori network model [18]- [20]. RL algorithms such as Q-learning are being used for centralized and decentralized routing approaches in general-use network technologies and wireless sensor networks (WSN) [20]- [22]. In decentralized approaches, each node is modeled as a learning agent that selects routes to forward its packets. ...
Article
Industrial wireless sensor networks usually have a centralized management approach, where a device known as network manager is responsible for the overall configuration, definition of routes, and allocation of communication resources. Graph routing is used to increase the reliability of communication through path redundancy. Some of the state-of-the-art graph-routing algorithms use weighted cost equations to define preferences on how the routes are constructed. The characteristics and requirements of these networks complicate to find a proper set of weight values to enhance network performance. Reinforcement learning can be useful to adjust these weights according to the current operating conditions of the network. In this article, we present the Q-learning reliable routing with a weighting agent approach, where an agent adjusts the weights of a state-of-the-art graph-routing algorithm. The states of the agent represent sets of weights, and the actions change the weights during network operation. Rewards are given to the agent when the average network latency decreases or the expected network lifetime increases. Simulations were conducted on a WirelessHART simulator considering industrial monitoring applications with random topologies. Results show, in most cases, a reduction of the average network latency while the expected network lifetime and the communication reliability are at least as good as what is obtained by the state-of-the-art graph-routing algorithms.
... It is a very important ML technique whose idea, illustrated in Fig. 6.5, is that an agent will learn from the environment by interacting with it and receiving rewards for performing actions. Over the past few years, RL algorithms have been used for designing routing protocols in IoT systems and WSNs to reduce the energy consumption and improve the network performance (Habib et al. 2018). An extensively used RL algorithm is Q-learning. ...
Chapter
Full-text available
The Internet of Things (IoT) combined with cloud computing is an extensively stimulating technologies existing today. IoT is all about sensors, actuator, networks and widely distributed smart devices with limited storage and processing capability with prevalent security and privacy issues. Due to the communication between interconnected devices, the enormous amount of data generated in IoT often referred as Big Data that brings heaps of strain on the internet infrastructure. This has made organization to look for a solution or alternative to reduce this load and introduce cloud computing to solve this problem by providing on demand and virtual services like unlimited storage and processing power. These both technologies are inseparable and work in integration towards increasing the efficiency of every day task. Continuously cloud systems are evolving to provide great support to the Internet of Things (IoT). IoT produces continuous or streaming data and cloud computing on another hand gives meaning and provides path to this data. In addition to this, by providing remote storage and access to data, cloud allows developers to implement projects with no delays. Also, taking advantages of this, many cloud providers provide pay-as-you use strategy and charge users for the services used. This chapter discusses the cloud based support to IoT, applications offered by the paradigm, platforms available, challenges faced by integration.
Chapter
In the recent times, huge sensor devices are adapted for patients’ health monitoring in healthcare. The Healthcare Internet of Things (HC-IoT) relies on wireless sensors to store patients and hospital management data over Internet servers. The chapter initially addressed various applications of HC-IoT in healthcare field. Next-generation healthcare Industry 4.0 is expected to deploy highly dense IoT network and thus energy efficiency (EE) becomes a critical factor to network design. Since IoT devices having limited memory and power, thus restricting networks computational capabilities. The chapter addressed various design challenges of dense IoT networks. Routing operation is consuming the maximum power since it allows sensors to communicate through the IoT network. With increasing node density, it becomes a challenging task to enhance the EE of the routing protocol on IoT networks. Therefore, the chapter presents the case study of the Reinforced Learning-based EE routing (EER-RL) for highly dense LPWAN IoT network deployments healthcare uses. The performance enhancement based on optimum parameter selection and learning is adopted for highly dense IoT networks. Reinforcement learning (RL) enables sensors to respond to network parameter variations, such as transmission range, EE, density, and hop counts. The protocol is designed based on the efficient cluster formation and cluster head (CH) selection. Learning is based on the mobile information system. The network density is scaled up to four times and the performance is tested under the network scalability. The network is randomly modeled using normal and uniform probability distribution and performance is compared. To make better routing decisions suggested protocol's performance is evaluated by varying these design parameters to enhance the lifetime and in turn EE, and scalability. The performance is also coppered with state of art EE routing protocols.
Chapter
Routing is a predominant challenge in the field of WSNs because of insufficient power supply in each node. And low-transmission bandwidth required less memory space and handling limit. These sensors distributed randomly in nature and the environment, and each sensor nodes gather data from that environment for further analysis and additional processing and transmits the information and data to the base station. We discussed the different machine learning algorithms to develop routing protocols for the WSNs. These technologies have allowed the sensor to learn the experience data to make appropriate routing decisions and respond to changing the environment. We covered a wide range of machine learning (ML)-based routing protocols, such as distributed regression (DR), self-organizing map (SOM), and reinforcement learning (RL). This chapter affords a complete evaluation of the literature on the topic. The review has structured in such a way that suggests how network characteristics and necessities gradually viewed over time.
Chapter
The rapid increase in the number of smart devices hosting sophisticated applications is significantly affecting the landscape of the information com-munication technology industry. The Internet of Things (IoT) is gaining popularity and importance in man’s everyday life. However, the IoT challenges also increase with its evolution. The urge for IoT improvement and continuous enhancement becomes more important. Machine learning techniques are recently being exploit-ed within IoT systems to leverage their potential. This chapter comprehensively surveys of the use of algorithms that exploit machine learning in IoT systems. We classify such machine learning-based IoT algorithms into those which provide ef-ficient solutions to the IoT basic operation challenges, such as localization, clus-tering, routing and data aggregation, and those which target performance-related challenges, such as congestion control, fault detection, resource management and security.
Conference Paper
Full-text available
Recently, reinforcement learning (RL) algorithms have been used for designing routing protocols in wireless sensor networks (WSNs). In this paper, the RL-based routing protocols are surveyed with respect to their key features, and they are compared with each other. Challenging research issues of designing RL-based routing algorithms in WSNs are also discussed.
Conference Paper
Full-text available
Wireless sensor networks (WSN) are an attractive platform for cyber physical systems. A typical WSN application is composed of different tasks which need to be scheduled on each sensor node. However, the severe energy limitations pose a particular challenge for developing WSN applications, and the scheduling of tasks has typically a strong influence on the achievable performance and energy consumption. In this paper we propose a method for scheduling the tasks using cooperative reinforcement learning (RL) where each node determines the next task based on the observed application behavior. In this RL framework we can trade the application performance and the required energy consumption by a weighted reward function and can therefore achieve different energy/performance results of the overall application. By exchanging data among neighboring nodes we can further improve this energy/performance trade-off. We evaluate our approach in an target tracking application. Our simulations show that cooperative approaches are superior to non-cooperative approaches for this kind of applications.
Article
Full-text available
Wireless sensor networks monitor dynamic environments that change rapidly over time. This dynamic behavior is either caused by external factors or initiated by the system designers themselves. To adapt to such conditions, sensor networks often adopt machine learning techniques to eliminate the need for unnecessary redesign. Machine learning also inspires many practical solutions that maximize resource utilization and prolong the lifespan of the network. In this paper, we present an extensive literature review over the period 2002-2013 of machine learning methods that were used to address common issues in wireless sensor networks (WSNs). The advantages and disadvantages of each proposed algorithm are evaluated against the corresponding problem. We also provide a comparative guide to aid WSN designers in developing suitable machine learning solutions for their specific application challenges.
Conference Paper
Full-text available
Energy hole problem is a crucial issue in wireless sensor network (WSN). Nodes near the sink region will die sooner from outer sub-regions because these nodes send their own data as well as forward outer sub-regions data to the sink. So after very short time energy hole comes near the sink region. After that, data cannot be transmitted to sink even though energy is still remained in outer region nodes which affects the lifetime of the network. In this paper we analyze jointly effect of normal distribution of nodes and relaying range regulation for data transmission so that the problem of energy hole will be minimized and lifetime of Wireless sensor network will be prolonged. Simulation experiments show the effectiveness of our approach.
Article
Lifetime enhancement has been a hot issue in Wireless Sensor Networks (WSNs). To prolong the network lifetime of WSNs, this paper proposes an intelligent routing algorithm named RLLO. RLLO makes uses of the superiority of reinforcement learning (RL) and considers residual energy and hop count to define the reward function. It is to uniformly distribute the energy consumption and improve the packet delivery without additional cost. This proposed algorithm has been compared with Energy Aware Routing (EAR) and improved EAR (I-EAR). Simulation results show that RLLO gains a significant improvement in terms of network lifetime and packet delivery over these two algorithms.
Conference Paper
Wireless Sensor Networks (WSNs) are made of spatially distributed autonomous sensors, which cooperate to monitor a certain physical or environmental condition and pass their data through a network to a central data sink. A promising field of application of WSNs is planet exploration, in which a continuous monitoring of the surface is necessary, to have a clear notion of planet conditions and prepare for a future manned mission. The potentially large size of the region to be monitored and the line-of-sight limitations on remote planets (for instance the Moon, as studied in the SWIPE project [1]), impose constraints on the possibility to have 1-hop sensor-sink communication. Therefore, the sensors must be able to create and maintain a multi-hop ad hoc network, to allow sensed data to reach the sink. This paper extends the Q-Routing algorithm, designed for fixed and mobile networks, in order to be applicable in WSNs. The proposed routing algorithm aims at optimizing the network lifetime, by balancing the routing effort among the sensors, taking into account their current residual batteries, while minimizing the control overhead. Simulation results show an increase of performances, in grid-based networks, which are common topologies for WSNs.
Article
This paper surveys intelligent routing protocols which contribute to the optimization of network lifetime in wireless sensor networks (WSNs). Different from other surveys on routing protocols for WSNs, this paper first puts forward new ideas on the definition of network lifetime. Then, with a view to prolonging network lifetime, it discusses the routing protocols based on such intelligent algorithms as reinforcement learning (RL), ant colony optimization (ACO), fuzzy logic (FL), genetic algorithm (GA), and neural networks (NNs). Intelligent algorithms provide adaptive mechanisms that exhibit intelligent behavior in complex and dynamic environments like WSNs. Inspired by such an idea, some intelligent routing protocols have recently been designed for WSNs. Under each category, it discusses the representative routing algorithms and further analyzes the performance of network lifetime defined in three aspects. This paper intends to give assistance in the optimization of network lifetime in WSNs, together with offering a guide for the collaboration between WSNs and computational intelligence (CI).