ArticlePDF Available

Routing Protocols Based on Reinforcement Learning for Wireless Sensor Networks: A Comparative Study

January 2019
Journal of Advanced Research in Dynamical and Control Systems

January 2019

Authors:

Chosun University

A carefully designed routing protocol can facilitate wireless sensor networks (WSNs) with improved performance in terms of end-to-end delay, energy consumption, and packet delivery ratio significantly. As the application of WSNs lies in critical operations with limited irreplaceable batteries, routing protocols are needed to be flawless, reliable, and energy-efficient. The complex and dynamic environment of WSNs also requires intelligent routing algorithms. Over the past few years, reinforcement learning (RL) algorithms have been used for designing routing protocols in WSNs so that energy consumption can be lessened and network performance can be improved. In this paper, different RL-based routing protocols are surveyed and qualitatively compared with each other. That is, the RL-based routing protocols are addressed with respect to their operating principles and key features, and they are compared with each other regarding significant characteristics, and pros and cons. A discussion on some challenging issues of designing and implementing RL-based routing algorithms in WSNs is also presented.

Content uploaded by Muhammad Yeasir Arafat

Content may be subject to copyright.

Routing Protocols Based on Reinforcement Learning for Wireless Sensor

Networks: A Comparative Study

Arafat Habib, Muhammad Yeasir Arafat, Sangman Moh*

Dept. of Computer Eng., Chosun Univ., 309 Pilmun-daero, Dong-gu, Gwangju, 61452 South Korea

akhtab007@gmail.com, yeasir08@yahoo.com, smmoh@chosun.ac.kr*

Corresponding author*: Phone: +82-62-230-6032

Abstract

A carefully designed routing protocol can facilitate wireless sensor networks (WSNs) with improved

performance in terms of end-to-end delay, energy consumption, and packet delivery ratio significantly.

As the application of WSNs lies in critical operations with limited irreplaceable batteries, routing

protocols are needed to be flawless, reliable, and energy-efficient. The complex and dynamic

environment of WSNs also requires intelligent routing algorithms. Over the past few years,

reinforcement learning (RL) algorithms have been used for designing routing protocols in WSNs so that

energy consumption can be lessened and network performance can be improved. In this paper,

different RL-based routing protocols are surveyed and qualitatively compared with each other. That is,

the RL-based routing protocols are addressed with respect to their operating principles and key

features, and they are compared with each other regarding significant characteristics, and pros and

cons. A discussion on some challenging issues of designing and implementing RL-based routing

algorithms in WSNs is also presented.

Keywords: Wireless sensor network, routing protocol, reinforcement learning, Q-learning, energy

consumption, end-to-end delay, packet delivery ratio.

1. Introduction

A wireless sensor network (WSN) consists of autonomous devices containing small sensor nodes that are

distributed spatially for specific applications. The sensor nodes are stationary in most cases and are subject to

limited human intervention. WSNs have very crucial and critical applications like wildlife monitoring, battlefield

surveillance, response to the disasters, and radioactive radiation monitoring [1]. In WSNs, the sensed data should

flow towards the base station or sink node. Fast and reliable communication is required in many WSN

applications. Limited resources such as battery, energy and storage capacity should also be considered carefully.

Energy is the most prominent resource because network lifetime depends on it.

In WSNs, reinforcement learning (RL) based routing protocols can improve network lifetime as well as

general performance [2]. Some existing survey papers ([3], [4] and [5]) review the routing protocols based on

artificial intelligence (AI) technique. RL-based routing protocols are reviewed in part in [2], [3] and [6]. In [2] and

[3], however, any comparative study among the discussed protocols is not addressed. In [6], only five protocols are

addressed with limited information regarding to their features and characteristics.

In this paper, the latest RL-based routing protocols in WSNs are extensively surveyed. The effect of RL in

routing for WSNs is investigated, and the major features and operational characteristics of existing RL-based

routing protocols are reviewed protocol by protocol, followed by the comparative analysis of the discussed

protocols in a qualitative manner. Some challenging design issues are also discussed.

Rest of the paper is organized as follows: In the following section, RL for WSNs is introduced. In Section 3,

the latest RL-based routing protocols are presented with respect to functional principle, basic operation, and

distinctive characteristics. In Section 4, the reviewed protocols are compared qualitatively, and the comparison is

summarized and discussed. Section 5 covers the challenging design issues in brief. The conclusion of the paper is

drawn in section 6.

2. Concept of Reinforcement Learning

Reinforcement learning falls into a criterion of the machine learning algorithms where an agent goes

through interactions with the environment learning what to do step by step yielding some kind of rewards. Its

primary goal is to learn the environment to find optimal policy by process of maximizing rewards. An agent in RL

learns which action can lead to optimality by itself by this simple efficient concept of maximizing rewards for the

chosen action. This type of learning considers a scenario with a completely goal-directed agent where the

environment is totally uncertain. Some important concepts of an RL system are discussed below:

Policy: A policy in RL refers to the concept of taking certain actions being in some particular states. It

denotes the behavior of an agent in the way it learns the environment. Basically, it is state-action mapping that an

agent learns through time. Policies are the core of RL and can be both deterministic and stochastic.

Reward Function: The reward function considers a state action pair as a single entity, and it can have a

transition from that state considering the action to another state with a certain reward that is considered as intrinsic

desirability here. The only one objective that an RL agent has is to maximize its reward as time passes by in its

learning period.

Value Function: The main difference between reward function and value function is that the reward

function considers only the immediate sense of reward whereas the value function considers long-term reward.

Summation of rewards gathered by an agent being in a particular state refers to the value of that state if the agent

commences learning from that very state. A state may yield very low immediate reward but still be very important

because it may link to the sates yielding higher rewards. The opposite is also possible.

Environment model: It consists of states, and the paradigm represents the behavior of a particular

environment. Through this, an agent can decide its next state and action through a reward function.

Routing approaches based on RL in WSNs are mostly based on Q-learning. It is a kind of reinforcement

learning that starts from any random state and takes any random action in the environment. It chooses the action

according to the policy derived from Q and observes the next state and reward. It loops and continues to learn the

environment until it finds the optimal policy.

3. RL-based Routing Protocols for WSNs

Routing protocols for WSNs should be aware of energy consumption, end-to-end delay, PDR and prolonged

network lifetime. RL-based approaches towards routing in this type of network also consider the routing metrics.

In this section, these RL-based routing protocols will be discussed and reviewed extensively.

3.1. Feedback Routing for Optimizing Multiple Sinks in WSNs with Reinforcement

Learning (FROMS)

In [7], a routing protocol on the basis of Q-learning is presented. It elaborates the process of sharing a

node’s local information to other neighboring nodes as feedback without causing any network overhead. The

network used for deploying this protocol has a multi-sink architecture. The problem is formulated so that it could

be solved with Q-Routing [8].

The best thing about this routing protocol is that it considers both exploration and exploitation strategies to

find an optimal route. Only exploitation may lead to a local optimal solution on the contrary, excessive exploration

overhead may lengthen the time elapsed for route discovery.

Figure 1. Routing scenario in FORMS

Figure 1 shows an experimental network consisting of two sink nodes and one source node. The solid

arrows in the figure represent the best-shared-route and the dashed arrows represent point-to-point routes. In Figure

1, the leftmost source node s has three neighbor nodes (nodes 1 to 3). Data transmission to the sink node from the

source node has to be done via neighbor nodes. If the source node chooses neighbor node 1 to transmit data, the

data travels three hops to reach sink node A and five hops to sink node B. If it chooses neighbor node 2 to transmit

data, the data travels four hops to reach sink node A and four hops to sink node B. Lastly, if it chooses neighbor

node 3 to transmit data, the data travels five hops to reach sink node A and three hops to sink node B. Considering

hop count, choosing neighbor node 2 will lead to the optimal route in Figure 1.

The unique contribution of FORMS [7] is that multiple sinks are taken into consideration in the network

design and significantly reduces network overhead by using the multiple sink nodes. Also, FORMS has a recovery

process if any node fails. On the other hand, FORMS is prone to node failure and sink mobility can lead to routing

errors.

3.2. Multi-agent Reinforcement Learning based Routing Protocol with Quality of

Service (QoS) Support for WSN (MRL-QRP)

In this routing protocol [9], QoS routes are cooperatively computed using a distributed value function.

Global optimization can be achieved through local information about the network and the exchange of values

regarding states with the neighboring nodes. In [9], two things are checked before a node sends any data packet.

First, it checks the packet to look up at the QoS requirements. Next, it checks the Q-value table. After that, the

packet is transmitted to a neighboring node having the Q-value that is higher than other forwarding nodes.

Figure 2. Multi-hop wireless sensor network for MRL-QRP

In Figure 2, a data packet is originated by sensor node i0. When the learning process is going on, i0 sends the

packet to node i1 by random selection. After that, i1 forwards the packet to i2. This process goes on until node in

receives the data packet, where node iN is the last node attached to sink in the routing process.

In [10], it is claimed that MRL-QRP has superiority over a well-established conventional protocol for

WSNs, which is ad hoc on demand distance vector (AODV) routing protocol. MRL-QRP [9] performs

exceptionally well when the traffic load is heavy. It considers end-to-end delay, PDR, and energy consumption as

routing metrics.

3.3. Reinforcement Learning as Adaptive Network Routing of Mobile Agents

In this routing protocol, mobile agents that traverse routing nodes target to quest for the optimal path at each

time step for decreasing the service processing time. Movements of the agents are designed in a way so that

congestions are avoided. The Q-learning algorithm implemented in this protocol learns policies online. Mobile

agents are routed, and changes in the traffic patterns, network load levels, and topologies are promptly dealt with

through incorporation into the routing policies. System topology is determined by the agent through network

discovery. After that, gained information is stored within the nodes. Q-routing exploits Q-learning and was first

proposed by Boyan and Littman [11].

Figure 3. Agent system

Figure 3 presents an agent system used in [8]. R denotes the intelligent router nodes in the figure. MA stands

for mobile agents, and SP represents service providers. R forwards MA from the input link towards next R

depending on the concrete configuration, and it determines its output link by observing attained Q values. Having

received MA immediately, R sends the reply with predicted route towards previous R. The process goes on until the

MA had found SP and SP then performs the requested task on behalf of MA [8]. The right-ended arrows in Figure 3

represent the movement of MA. Movement of replies starting from SP is denoted using the left-ended arrows. Each

R manages its input and output links. The last node in the figure is SP that performs the final processing of mobile

agents.

The best part of this routing technique is that it can adapt to frequent topology changes and works fine with

varying network traffic. Also, accurate measurement of service processing time along a particular route makes the

protocol more efficient. In [8], no comparison was shown with the standardized routing protocols to establish the

superiority of the proposed routing algorithm. Moreover, average service processing time is high in this routing

approach because it has to learn the whole network in order to perform routing. Routing metrics taken under

consideration in [8] are PDR and latency.

3.4. Energy-aware QoS Routing using RL (EQR-RL)

In [12], decision on routing relies on resource availability in the network and QoS requirements. For

sending a data packet to the sink node, the sending node decides which intermediate node to go for using the

routing table. For the ease of explanation, let us consider the node to be i. Node i takes into account some

neighboring nodes that have been denoted as s. The nodes in this set s should fulfill the minimum QoS

requirements. After this in [12], a node from the set s is chosen with the help of a load balancing algorithm. The

proposed algorithm was implemented in a scenario where there were periodic and long lasting transmissions to the

sink node. The algorithm in this protocol is also scalable. In [12], to retrieve the information of the neighboring

nodes, a node can look up at the header of the data packet obtained by them. The information includes failure of

any node, removal of any node from neighborhood, and link quality. Neighbors can have a look at the data packet

header to update the routing table when new nodes are added. Also, a neighbor node is excluded from the routing

table if there is no response from it.

EQR-RL supports multiple QoS requirements that include latency, geographical distance, and the number of

hops for data delivery. Also, it can handle mobility and failure recovery of nodes. EQR-RL supports mobile sink

nodes as well. To implement reinforcement learning for any network scenario, a balance between exploration and

exploitation strategies are needed. Only exploration will lead to routing overhead, but it is definite that an optimal

route will be found. Considering exploitation strategies only will lead to a faster route finding in the network but

with a good amount of probability that the route will not be optimal. Authors in [12] consider exploration strategies

only. Network scenario in [12] consists of a few mobile nodes. It is claimed that the proposed protocol improves

lifetime of the network, PDR and end-to-end delay in comparison to [13], [14] and [9].

3.5. Distributed Adaptive Cooperative Routing Protocol (DACR)

In [15], authors propose a protocol that can ensure QoS requirements by decreasing delay and increasing

reliability. Using the AODV protocol, a route from source to destination is established in this discussed protocol.

Data transmission is possible either in a direct manner or in a relayed manner. Based on the energy level of the

node, one among these two transmission modes is elected. If relayed transmission is chosen, the protocol then

considers residual energy, reliability and delay as important criteria.

All routing nodes learn these criteria using lightweight RL. After this, it uses a lexicographic optimization

[16] to find the optimal relay. The contribution of this protocol is summarized below:

• No central control is needed for DACR and nodes can be deployed in a distributive way. Global information

on channel state condition is also unnecessary for this protocol. That is why DACR has relatively lower

network overhead.

• It is also shown that selecting the relay in a proactive way is more efficient in comparison to selecting

relays in a reactive manner.

• Finally, the process for finding out routes and relays could save large amount of energy.

In this protocol, the process of discovering routes and relays can save significant amount of energy. One big

flaw for this protocol can be it's being too much complex because it uses an RL algorithm and transmission mode

selection algorithm in the routing process. It considers energy consumption and network lifetime as routing metrics.

3.6. Routing Protocol to Optimize Network Lifetime (RLLO)

The protocol proposed in [17] utilizes the superiority of RL. After receiving a data packet, a node analyzes

the information. After this, there are two possible actions the node can choose. One is to drop the packet if the node

finds out the next forwarding node does not have a route to destination and the sink node is beyond the

transmission range of the node. The node directly sends the data to the sink node in this condition given that it has

enough energy and the sink node is within the range. Another is to send the packet to the next forwarding node if

the forwarding node has a valid route to destination. The node with the highest Q-value is always selected among

the possible next forwarding nodes.

When a node in RLLO executes an action, it receives a reward. The Q-value of the node changes after

obtaining the reward. The reward is used to update the Q-value of the node. The reward function is designed with

residual energy, which leads to an optimal balance of energy usage. Protocol proposed in [17] is highly flexible to

topology changes. Also, it achieves global optimization of the network without any additional cost. The biggest

flaw of this routing approach is that it is prone to network isolation because nodes having low energy deliberately

drop packets. It can also lead to additional latency. To prove the superiority of [17], it was with compared with

[18], considering PDR and network lifetime as performance metrics.

3.7. RL-based Routing Protocol for Multi-hop WSNs

Routing approach proposed in [19] extends Q-routing algorithm [11] for its implication in WSNs.

Optimization of network lifetime is achieved by balancing the routing effort within sensor nodes. This routing

approach minimizes the control overhead too, and current residual batteries are also taken into account. Routing

approach in [19] is designed and implemented for plenary exploration scenarios with a goal to bring satellite and

WSN technologies in space. An example scenario for the implementation of this routing technique can be SWIPE

project [20]. The main concept lies in deploying hundreds or thousands of small sensor nodes with some of them

having abilities for satellite communication. Other which do not have abilities for satellite communication will be

responsible for on surface ad hoc network. Retrieved data will be sent to satellite first after processing and earth

after that.

Q-routing updates Q value through the following function:

                (1)

where s is a random state, a is a random action and R is the reward achieved for choosing action s and a. 

are next state and action after s and a. Here, learning rate is denoted as α and it fixes how much the older

information will be replaced by the new information. Value of this parameter is: 0 ≤ α ≤ 1. ‘’ stands for discount

factor. Importance of the future rewards is determined by it. Value of this parameter is 0 ≤



≤ 1. When a node

transmits a packet to the next forwarding node, routing table gets updated. The neighbor node should back

acknowledge (ACK) after that to node i. The update process is done through function described in (1). Node i

requires the data from neighbor nodes only. As Q function gets updated, an assumption of the overall network is

gained if the network topology remains unchanged. The proposed protocol was compared with different versions

of Q-routing.

4. Comparative Analysis of the Routing Protocols based on RL

In this section, the protocols discussed in Section 3 are qualitatively compared with each other. In WSNs,

energy is the most important issue [21]. The target is to reduce wastage of the energy that affects the network

lifetime. Increased PDR can provide more reliable communication whereas decreased end-to-end delay will

remove unwanted latency. Routing metrics that are specific to certain protocols have been mentioned in routing

metric column of Table 1. There can be many different cases and network scenarios where researchers may be

interested to apply a particular routing protocol. The column of outstanding features in the table can be helpful for

that. Good exploration and exploitation strategies are necessary to implement RL algorithms for routing perfectly.

As implementing RL is not a straightforward work, it is also necessary to know that if any routing protocol creates

much network overhead and complex routing than usual due to RL implementation. Though mobility is not a must

in WSNs, it is also a matter of observation that how well an RL based routing algorithm performs if mobility is

introduced to a few of the nodes. Some routing protocols can work better than others under some particular routing

and QoS metrics, and some may create network isolation due to the irresponsive nodes. In Table 1, the competitive

advantages and inherent limitations of each routing protocol are summarized.

Protocol

Routing

metrics

Outstanding features

Advantages

Limitations

FROMS [7]

PDR, Energy

expenditure

1. Significantly

decreases the network

overhead.

2. Two nodes can

contact each other in a

full duplex manner.

1. Consideration of multiple

sink nodes.

2. Good exploration strategy.

3. Complete recovery process

if gone to failure.

4. Greedy approach after

convergence leads to faster

route finding in the network.

5. Maintains nearly perfect

data rate after node failure by

route switching.

1. Prone to node failure.

2. Sink mobility may lead to

routing errors.

3. Data redundancy in

multiple sink nodes can

happen that may cause

wastage of valuable energy.

4. Switches route frequently

that may prolong the time

needed for route discovery.

Table 1: Comparative analysis of the routing protocols based on RL in WSNs

Routing

protocol for

adaptive

network

environment

using RL

[8]

PDR and

latency

1. Works well in

frequently changing

network topologies.

2. Synchronized

routing in order to

achieve shortest

delivery and service

processing time.

1. Highly adaptive to

network topology change.

2. Adaptive to varying traffic

conditions.

3. Service processing time,

along with a particular route,

can be accurately estimated.

4. Easy to find out efficient

routes and no central routing

control is needed.

1. No comparison showed

with standardized protocols.

2. Average service

processing time is high

because routing algorithm

must learn whole network.

3. Exhibits a path hysteresis

problem in which it has

problems with falling back to

the optimal once network

routes improve.

MRL-QRP

[9]

PDR and delay

1. Sensor nodes

cooperatively compute

routes.

2. Exploits each node

for gaining sufficient

network knowledge to

choose the best hop.

3. Sensor nodes adjust

the probability of

environment according

to node mobility and

wireless channel

condition.

1. Compares with a well-

established and conventional

protocol (AODV)

2. Performs especially well

when the traffic load is

heavy.

3. Reduces Network

overhead.

4. Local information of the

network involving

information regarding

neighboring nodes is enough

to have profound optimal

performance in a global

manner.

1. Power consumption not

taken into consideration

while computing routes.

2. The routing table formed

in the beginning cannot

ensure QoS.

3. Resource overuse and

network overhead due to the

fluctuating QoS requirement.

EQR-RL

[12]

Remaining

energy, PDR

1. Nodes are made

aware of the data

needed and the sink

node uses a control

flooding algorithm.

2. Scalability,

flexibility and load

balancing enabled

routing protocol.

3. To keep the routing

table up-to-date, nodes

participating in data

communication can

adopt the header of the

packet.

1. Support for variant QoS

requirements.

2. It can handle mobility and

failure recovery.

3. This protocol supports

mobile sink node.

4. Irresponsive nodes are

well avoided in the routing

process.

1. Exploitation not taken

under consideration.

2. Route computation may

take time.

3. Network isolation is likely

to happen as it excludes

nodes that do not respond

from routing table.

4. Only a few nodes have

been considered to be

mobile. This allows

confusion about the

efficiency of the routing

protocol it will work well if

all the nodes are static or if

all the nodes are mobile.

5. Does not describe how

mobile sinks will cop up with

the frequently changing

network.

DACR [15]

Energy

consumption,

PDR, network

lifetime

1. Routing is done in a

cooperative manner.

The protocol can also

adapt to frequent

changes as time passes

by.

2. Multi-hop path from

source to destination.

3. Low network

overhead.

1. Cooperative routing

decisions lead to reduced

network delay and increased

reliability.

2. QoS requirements are

fulfilled along with reduced

overhead in network

operations due to relay

section in a proactive

manner.

3. Significantly increases

network lifetime.

4. Transmitted signal

achieves the best phase

synchronization in the

receiver end.

5. Cooperation among relay

nodes introduces additional

spatial diversity and

increases the transmission

1. Too much complexity in

the process and uses One RL

and a Transmission mode

selection algorithm.

2. Memory not considered as

a resource to be used wisely.

3. Uses complex

lexicographic optimization

for relay selection.

5. Challenging Open Issues

Routing in WSNs is always challenging. When it comes to implying RL for routing protocol design, there

are many challenges to be dealt with. These challenges are discussed in this section.

Fixing learning rate: Learning rate in any RL technique plays a prime role to find convergence as it decides

how new information replaces the older information. This value is often needed to vary to compute the Q value.

Inappropriate value of this parameter may lead to inconvenience towards convergence.

Balance between exploration and exploitation strategies: This balance is highly important for applying RL

to any scenario. Optimal route computation is something that is expected in any newly designed routing protocol.

Only exploration will lead to optimality, but it will lead to routing overhead. Again, only exploitation may lead to a

route that is not optimal.

Convergence speed: Convergence of any RL algorithm is the vital most criteria of successfully

implementing RL to any system. After adequate amount of learning episodes, the amount of generated optimum

reward is same always if the algorithm has successfully converged. Successful convergence also depends on the

formulation of the reward function too. Convergence speed matters when faster computation is needed in route

computation.

Security: WSNs are prone to security problems like DDoS attacks, eavesdropping, sinkhole attack, etc. Data

confidentiality is one of the security mechanisms for WSNs. Sensor nodes should not allow their neighbors to have

unauthorized access to their readings [22]. Sinkhole attack is another security issue that is quite challenging to

handle through routing protocol design based on RL. In this kind of attack, a corrupted node looks more beneficial

as a next forwarder since the routing information has already been manipulated by the invader.

Data redundancy: As we have already discussed that WSNs are resource constrained networks, data

redundancy can even lead to more resource issues. In order to prevent that, we have to make sure that the acquired

data is fresh enough and any old data is not resent. Existing works that use RL based algorithms to design routing

protocols do not cover the issue of data redundancy and solving this issue can be a challenging research task.

Energy hole problem: This issue is very important to solve through routing techniques to keep any WSN

fully functional. Nodes staying close to the sink node die faster than the other nodes in this case since it has to pass

data to the sink node more frequently [23].

QoS metrics: Some QoS metrics can be combined with the discussed routing metrics. Examples can be

reliability, packet loss, etc. Protocols discussed in this paper hardly cover this issue of combining routing metrics

along with QoS metrics.

reliability against channel

fading.

Routing

protocol for

balancing

energy in

multi-hop

WSNs [19]

Energy

consumption

and network

lifetime

1. Designed and

implemented for

plenary exploration

scenarios.

2. Merges satellite and

WSN technologies in

space.

3. Embedded with

nodes for satellite

communication.

1. Minimizes control

overhead.

2. Optimizes the use of

sensor batteries.

3. Maximizes the node

lifetime.

4. Performs efficiently for

grid-based networks.

5. Adapts to any kind of

network topology.

1. Estimation about the entire

network can be achieved

only after Q-function gets

frequently updated.

2. Does not describe the

process and distinctive

properties of the sensor

nodes responsible for

satellite communication and

also how they will interact

with the normal sensor nodes

responsible for sensing and

data processing.

RLLO [17]

Network

lifetime and

PDR

1. Uses the superiority

of RL to achieve global

optimization

2. Considers hop count

to the sink to search for

the routing path

3. Evaluates the

approximate goodness

of every action.

1. Balances the energy

consumption and improves

PDR.

2. Achieves global

optimization without

additional cost.

3. Highly flexible to

topology changes.

1. Network isolation is likely

to happen as nodes having

relatively low energy drop

packets

2. If there is no existing

route to send data to a

forwarding node, the node

will transmit the data to the

sink node given that it has a

certain level of energy. This

may result in extra loss of

energy.

3. The probability of network

isolation is very high.

6. Conclusion

In this paper, the latest RL-based routing protocols have been extensively reviewed in terms of major

features and characteristics, and qualitatively compared with each other. Even though RL may lead to computation

overhead, the existing RL-based routing protocols are promising in terms of providing better performance in terms

of major performance metrics such as energy consumption, end-to-end delay, PDR and network lifetime. The

comparative discussion of different RL-based routing protocols in this paper can be effectively used for choosing a

routing protocol or designing RL-based routing protocols for WSNs. Some challenging issues of designing and

implementing RL-based routing algorithms in WSNs have been also discussed. It can be concluded that a good

balance between exploration and exploitation strategies while implementing RL, faster convergence of the

algorithm, considering multiple routing metrics, QoS supports, resolving security issues should be considered

altogether to design a pragmatic RL-based routing protocol.

7. Acknowledgment

A preliminary short version of this work was presented at the 8th International Conference on Convergence

Technology, Jeju, Korea, July 2018 [24]. This work was supported in part by the Basic Science Research Program

through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (NRF-

2016R1D1A1A09918974), and it was also supported in part by the MIST (Ministry of Science and ICT), Korea

under the National Program for Excellence in SW supervised by the IITP (Institute for Information and

Communications Technology Promotion) (2017-0-00137). Correspondence should be addressed to Sangman Moh

(smmoh@chosun.ac.kr).

8. References

Khan MI, Rinner B, Regazzoni CS. Energy-aware task scheduling in wireless sensor networks based on

cooperative reinforcement learning. Proceedings of 2014 IEEE International Conference on Communications

Workshops (ICC). 2012 Sep; 871 – 77; Liverpool, UK. DOI: 10.1109/ICCW.2014.6881310.

Arya A, Malik, A, Garg, R. Reinforcement Learning based Routing Protocols in WSN: A survey. IJCSET. 2013

Nov; 4(11): 1401 - 4, 2013.

Kulkarni P, Munot H, Malathi P. Survey on Computational Intelligence Based Routing Protocols in Wireless

Sensor Network. IJWMCIS. 2016 Feb; 3(2): 23-32. DOI: 10.21742/ijwmcis.2016.3.2.04.

Guo W, Zhang W, A survey on intelligent routing protocols in wireless sensor networks. IJWMCIS. 2014;

38:185 – 201.

Alsheikh MA, Lin S, Niyato D, Tan HP. Machine Learning in Wireless Sensor Networks: Algorithms,

Strategies, and Applications. Commun. Surveys Tuts. 2014 Apr; 16(4): 1996-2018. DOI:

10.1109/COMST.2014.2320099.

Habib A, Arafat Y, Moh S. A Survey on Reinforcement-Learning-Based Routing Protocols in Wireless Sensor

Networks. Proceedings of 8th Int. Conf. on Convergence Technology (ICCT 2018). 2018 Jul; 8(1): 359-

360; Jeju, Korea.

Förster A, Murphy AL. FROMS: Feedback routing for optimizing multiple sinks in WSN with reinforcement

learning. Adhoc Netw. 2011 July; 9(5): 940-65. DOI: 10.1016/j.adhoc.2010.11.006.

Ouzecky D, Jevtic, D. Reinforcement as Adaptive Network Routing Mobile Agents. 2010 Proceedings of the

33rd International Convention. 2010; Opatija, Croatia.

Liang X, Balasingham, I, Byun SS. A multi-agent reinforcement learning based routing protocol for wireless

sensor networks. Proceedings of IEEE International Symposium on Wireless Communication Systems. 2008

Oct; 552-557; Reykjavic, Iceland. DOI: 10.1109/ISWCS.2008.4726117.

10.

Available from: https://www.ietf.org/rfc/rfc3561.txt.

11.

Boyan J, Littman M. Packet Routing in dynamically changing networks: A Reinforcement Learning Approach.

Proceedings of Advances in Neural Information Processing System. 1994 Nov; 671-678; San Francisco, USA.

12.

Jaforzadeh SZ, Mughaddam MHY. Design of Energy-aware QoS Routing Algorithm in Wireless Sensor

Networks Using Reinforcement Learning. Proceedings of 2014 4th International Conference on Computer and

Knowledge Engineering. 2014 Dec; 722 – 727; Mashhad, Iran. DOI: 10.1109/ICCKE.2014.6993408.

13.

Gerasimov I, Simon R. A bandwidth-reservation mechanism for on-demand ad hoc path finding. Proceedings of

35th Annual Simulation Symposium. 2002 Aug; 27-34; San Diego, USA. DOI:

10.1109/SIMSYM.2002.1000079.

14.

Maalej M, Cherif S, Besbes, H. QoS and Energy Aware Cooperative Routing Protocol for Wildfire Monitoring

Wireless Sensor Networks. The Scientific World Journal. 2013.

15.

Razzaque MA, Alam MM, Rashid MM, Hong CS. Multi-constrained QoS geographic routing for heterogeneous

traffic in sensor networks, IEICE Trans. Commun. 2008; E91-B (08): 2589–2601.

16.

Zykina AV, A lexicographic optimization algorithm. 2004; Autom. Remote Control. 65: 363–68.

17.

Guo W, Yan C, Gan Y, Lu T. An Intelligent Routing Algorithm in Wireless Sensor Networks based on

Reinforcement Learning. Applied Mech. and Mate. 2014; 678: 487-93.

18.

Shah R, Rabaey J. Energy aware routing for low energy ad hoc sensor networks. Proceedings of the IEEE

Wireless Communications and Networking Conference (WCNC). 2002 March; 350-355; Orlando, FL. DOI:

10.1109/WCNC.2002.993520.

19.

Oddi G, Pietrabissa A, Liberati F. Energy balancing in multi-hop Wireless Sensor Networks: an approach based

on reinforcement learning. Proceedings of 2014 NASA/ESA Conference on Adaptive Hardware and Systems

(AHS). 2014 July; 262-269; Leicester, UK. DOI: 10.1109/AHS.2014.6880186.

20.

European Commission. Space Wireless Sensor Networks for Planetary Exploration. [online] Available at:

https://cordis.europa.eu/project/rcn/108074_en.html [Accessed 24 Aug. 2018].

21.

Shah k, Kumar M. Distributed Independent Reinforcement Learning (DIRL) Approach to Resource Management

in Wireless Sensor Networks. Proceedings of IEEE Conference on mobile ad-hoc and sensor systems. 2007

October; 1-9; Pisa, Italy DOI: 10.1109/MOBHOC.2007.4428658.

22.

Carman DW, Krus PS, Matt BJ. Constraints and approaches for distributed sensor network security. 2000 Sep. 5-

126.

23.

Pathak A, Zaheeruddin, Tiwari, M. Minimizing the Energy Hole Problem in Wireless Sensor Networks by

Normal Distribution of Nodes and Relaying Range Regulation. 2012 Fourth International Conference on

Computational Intelligence and Communication Networks. 2012 Nov; 154-157; Mathura, India. DOI:

10.1109/CICN.2012.148.

24.

Habib A, Arafat Y, Moh S. A Survey on Reinforcement-Learning-Based Routing Protocols in Wireless Sensor

Networks. Proceedings of 8th Int. Conf. on Convergence Technology (ICCT 2018). 2018 Jul; 8(1): 359-

360; Jeju, Korea.

View publication statsView publication stats

A systematic literature review of machine learning applications in IoT

Article

Full-text available

Apr 2023
INT J COMMUN SYST

The Internet of Things (IoT) is a network of interconnected smart objects having capabilities that collectively form an ecosystem and enable the delivery of smart services to users. The IoT is providing several benefits into people's lives through the environment. The various applications that are run in the IoT environment offer facilities and services. The most crucial services provided by IoT applications are quick decision for efficient management. Recently, machine learning (ML) techniques have been successfully used to maximize the potential of IoT systems. This paper presents a systematic review of the literature on the integration of ML methods in the IoT. The challenges of IoT systems are split into two categories: fundamental operation and performance. We also look at how ML is assisting in the resolution of fundamental system operation challenges such as security, big data, clustering, routing, and data aggregation.

Survey on computation offloading in UAV-Enabled mobile edge computing

Preprint

Full-text available

Feb 2022

With the increasing growth of internet-of-things (IoT) devices, effective computation performance has become a critical issue. Many services provided by IoT devices (e.g., augmented reality, location-tracking, traffic systems, and autonomous driving) require intensive real-time data processing, which demands powerful computational resources. Mobile edge computing (MEC) has been introduced to effectively handle this problem reliably over the internet. The inclusion of a MEC server allows computationally intensive tasks to be offloaded from IoT devices. However, communication overhead and delays are major drawbacks. With the advantages of high mobility and low cost, unmanned aerial vehicles (UAVs) can mitigate this issue by acting as MEC servers. The offloading decisions for such scenarios involve service latency, energy/power consumption, and execution delays. For this reason, this study reviews UAV-enabled MEC solutions in which offloading was the focus of research. We compare the algorithms qualitatively to assess features and performance. Finally, we discuss open issues and research challenges in terms of design and implementation.

Artificial Intelligence and Machine Learning with IoT

Chapter

Dec 2023

For decades, humans have been intrigued by the concept of an intelligent and independent self-learning machine. The idea behind Machine Learning (ML) is to simplify the development of analytical models such that, with the help of available data, algorithms can learn continuously. Internet of Things (IoT)-enabled devices are the major sources of data generation for creating all the data in a variety of ways. Making smart IoT applications involves intelligent processing and analysis of this generated data (Big Data). ML may be used in cases where the desired effect is defined (supervised learning) or where data itself is not defined beforehand (unsupervised learning) or where learning is the outcome of the interaction among the learning model and the environment (reinforcing learning). In this chapter, we present and discuss a classification of machine learning algorithms, which can be used in conjunction with IoT. Furthermore, how different machine learning techniques are applied to derive higher-level information from the data is illustrated. Lastly, we investigate the real-world IoT data characteristics that involve an interpretation of the data.

Performance Analysis of Deep Learning Based Routing Protocol for an Efficient Data Transmission in 5G WSN Communication

Article

Full-text available

Jan 2022

For the past few years, huge interest and dramatic development have been shown for the Internet of Things (IoT) based constrained Wireless sensor network (WSN) to achieve efficient resource utilization and better service delivery. IoT requires a better communication network for data transmission between heterogeneous devices and an optimally deployed energy-efficient WSN. The clustering technique applied for WSN node deployment needs to be efficient; therefore, the entire architecture can obtain a better network lifetime. The entire network is partitioned into various clusters. Moreover, the cluster head (CH) selection process also needs proper attention to achieve efficient data communication towards the sink node via selected CH and increase the node reachability within the Cluster. An energy-efficient deep belief network (DBN) based routing protocol is developed in this proposed framework, which achieves better data transmission through the selected path. Due to this, the packet delivery ratio (PDR) gets improved. In this framework, the nodes in the whole network are initially grouped as clusters using a reinforcement learning (RL) algorithm, which assigns a reward for the nodes belonging to the particular Cluster. Then, the CH required for efficient data communication is selected using a Mantaray Foraging Optimization (MRFO) algorithm. The data is transmitted to the sink node via the selected CH using an efficient deep learning approach. Finally, the performance of the proposed deep network-based routing protocol is evaluated using different evaluation metrics: network lifetime, energy consumption, number of alive nodes, and packet delivery rate. Finally, the evaluated results are compared with a few existing algorithms. The proposed DBN routing protocol has achieved a better network lifetime among all these algorithms.

Latency and Lifetime Enhancements in Industrial Wireless Sensor Networks: A Q-Learning Approach for Graph Routing

Article

Aug 2020

Industrial wireless sensor networks usually have a centralized management approach, where a device known as network manager is responsible for the overall configuration, definition of routes, and allocation of communication resources. Graph routing is used to increase the reliability of communication through path redundancy. Some of the state-of-the-art graph-routing algorithms use weighted cost equations to define preferences on how the routes are constructed. The characteristics and requirements of these networks complicate to find a proper set of weight values to enhance network performance. Reinforcement learning can be useful to adjust these weights according to the current operating conditions of the network. In this article, we present the Q-learning reliable routing with a weighting agent approach, where an agent adjusts the weights of a state-of-the-art graph-routing algorithm. The states of the agent represent sets of weights, and the actions change the weights during network operation. Rewards are given to the agent when the average network latency decreases or the expected network lifetime increases. Simulations were conducted on a WirelessHART simulator considering industrial monitoring applications with random topologies. Results show, in most cases, a reduction of the average network latency while the expected network lifetime and the communication reliability are at least as good as what is obtained by the state-of-the-art graph-routing algorithms.

Cloud Computing for IoT

Chapter

Full-text available

May 2020

The Internet of Things (IoT) combined with cloud computing is an extensively stimulating technologies existing today. IoT is all about sensors, actuator, networks and widely distributed smart devices with limited storage and processing capability with prevalent security and privacy issues. Due to the communication between interconnected devices, the enormous amount of data generated in IoT often referred as Big Data that brings heaps of strain on the internet infrastructure. This has made organization to look for a solution or alternative to reduce this load and introduce cloud computing to solve this problem by providing on demand and virtual services like unlimited storage and processing power. These both technologies are inseparable and work in integration towards increasing the efficiency of every day task. Continuously cloud systems are evolving to provide great support to the Internet of Things (IoT). IoT produces continuous or streaming data and cloud computing on another hand gives meaning and provides path to this data. In addition to this, by providing remote storage and access to data, cloud allows developers to implement projects with no delays. Also, taking advantages of this, many cloud providers provide pay-as-you use strategy and charge users for the services used. This chapter discusses the cloud based support to IoT, applications offered by the paradigm, platforms available, challenges faced by integration.

Designing Dense-Healthcare IOT Networks for Industry 4.0 Using AI-Based Energy Efficient Reinforcement Learning Protocol

Chapter

Dec 2023

In the recent times, huge sensor devices are adapted for patients’ health monitoring in healthcare. The Healthcare Internet of Things (HC-IoT) relies on wireless sensors to store patients and hospital management data over Internet servers. The chapter initially addressed various applications of HC-IoT in healthcare field. Next-generation healthcare Industry 4.0 is expected to deploy highly dense IoT network and thus energy efficiency (EE) becomes a critical factor to network design. Since IoT devices having limited memory and power, thus restricting networks computational capabilities. The chapter addressed various design challenges of dense IoT networks. Routing operation is consuming the maximum power since it allows sensors to communicate through the IoT network. With increasing node density, it becomes a challenging task to enhance the EE of the routing protocol on IoT networks. Therefore, the chapter presents the case study of the Reinforced Learning-based EE routing (EER-RL) for highly dense LPWAN IoT network deployments healthcare uses. The performance enhancement based on optimum parameter selection and learning is adopted for highly dense IoT networks. Reinforcement learning (RL) enables sensors to respond to network parameter variations, such as transmission range, EE, density, and hop counts. The protocol is designed based on the efficient cluster formation and cluster head (CH) selection. Learning is based on the mobile information system. The network density is scaled up to four times and the performance is tested under the network scalability. The network is randomly modeled using normal and uniform probability distribution and performance is compared. To make better routing decisions suggested protocol's performance is evaluated by varying these design parameters to enhance the lifetime and in turn EE, and scalability. The performance is also coppered with state of art EE routing protocols.

Wireless Sensor Network Routing Protocols Using Machine Learning

Chapter

Apr 2021

Routing is a predominant challenge in the field of WSNs because of insufficient power supply in each node. And low-transmission bandwidth required less memory space and handling limit. These sensors distributed randomly in nature and the environment, and each sensor nodes gather data from that environment for further analysis and additional processing and transmits the information and data to the base station. We discussed the different machine learning algorithms to develop routing protocols for the WSNs. These technologies have allowed the sensor to learn the experience data to make appropriate routing decisions and respond to changing the environment. We covered a wide range of machine learning (ML)-based routing protocols, such as distributed regression (DR), self-organizing map (SOM), and reinforcement learning (RL). This chapter affords a complete evaluation of the literature on the topic. The review has structured in such a way that suggests how network characteristics and necessities gradually viewed over time.

A Reliable and Low-Latency Graph-Routing Approach for IWSN using Q-Routing

Conference Paper

Nov 2020

Machine Learning for IoT Systems

Chapter

May 2020

The rapid increase in the number of smart devices hosting sophisticated applications is significantly affecting the landscape of the information com-munication technology industry. The Internet of Things (IoT) is gaining popularity and importance in man’s everyday life. However, the IoT challenges also increase with its evolution. The urge for IoT improvement and continuous enhancement becomes more important. Machine learning techniques are recently being exploit-ed within IoT systems to leverage their potential. This chapter comprehensively surveys of the use of algorithms that exploit machine learning in IoT systems. We classify such machine learning-based IoT algorithms into those which provide ef-ficient solutions to the IoT basic operation challenges, such as localization, clus-tering, routing and data aggregation, and those which target performance-related challenges, such as congestion control, fault detection, resource management and security.

A Survey on Reinforcement-Learning-Based Routing Protocols in Wireless Sensor Networks

Conference Paper

Full-text available

Jul 2018

Recently, reinforcement learning (RL) algorithms have been used for designing routing protocols in wireless sensor networks (WSNs). In this paper, the RL-based routing protocols are surveyed with respect to their key features, and they are compared with each other. Challenging research issues of designing RL-based routing algorithms in WSNs are also discussed.

Reinforcement Learning based Routing Protocols in WSNs: A Survey

Article

Full-text available

Apr 2018

Anju Arya

Energy-aware Task Scheduling in Wireless Sensor Networks based on Cooperative Reinforcement Learning

Conference Paper

Full-text available

Jun 2014

Wireless sensor networks (WSN) are an attractive platform for cyber physical systems. A typical WSN application is composed of different tasks which need to be scheduled on each sensor node. However, the severe energy limitations pose a particular challenge for developing WSN applications, and the scheduling of tasks has typically a strong influence on the achievable performance and energy consumption. In this paper we propose a method for scheduling the tasks using cooperative reinforcement learning (RL) where each node determines the next task based on the observed application behavior. In this RL framework we can trade the application performance and the required energy consumption by a weighted reward function and can therefore achieve different energy/performance results of the overall application. By exchanging data among neighboring nodes we can further improve this energy/performance trade-off. We evaluate our approach in an target tracking application. Our simulations show that cooperative approaches are superior to non-cooperative approaches for this kind of applications.

Machine Learning in Wireless Sensor Networks: Algorithms, Strategies, and Applications

Article

Full-text available

May 2014

Wireless sensor networks monitor dynamic environments that change rapidly over time. This dynamic behavior is either caused by external factors or initiated by the system designers themselves. To adapt to such conditions, sensor networks often adopt machine learning techniques to eliminate the need for unnecessary redesign. Machine learning also inspires many practical solutions that maximize resource utilization and prolong the lifespan of the network. In this paper, we present an extensive literature review over the period 2002-2013 of machine learning methods that were used to address common issues in wireless sensor networks (WSNs). The advantages and disadvantages of each proposed algorithm are evaluated against the corresponding problem. We also provide a comparative guide to aid WSN designers in developing suitable machine learning solutions for their specific application challenges.

Minimizing the Energy Hole Problem in Wireless Sensor Networks by Normal Distribution of Nodes and Relaying Range Regulation

Conference Paper

Full-text available

Nov 2012

Energy hole problem is a crucial issue in wireless sensor network (WSN). Nodes near the sink region will die sooner from outer sub-regions because these nodes send their own data as well as forward outer sub-regions data to the sink. So after very short time energy hole comes near the sink region. After that, data cannot be transmitted to sink even though energy is still remained in outer region nodes which affects the lifetime of the network. In this paper we analyze jointly effect of normal distribution of nodes and relaying range regulation for data transmission so that the problem of energy hole will be minimized and lifetime of Wireless sensor network will be prolonged. Simulation experiments show the effectiveness of our approach.

Survey on Computational Intelligence Based Routing Protocols in Wireless Sensor Network

Article

Oct 2016

An Intelligent Routing Algorithm in Wireless Sensor Networks Based on Reinforcement Learning

Article

Oct 2014

Lifetime enhancement has been a hot issue in Wireless Sensor Networks (WSNs). To prolong the network lifetime of WSNs, this paper proposes an intelligent routing algorithm named RLLO. RLLO makes uses of the superiority of reinforcement learning (RL) and considers residual energy and hop count to define the reward function. It is to uniformly distribute the energy consumption and improve the packet delivery without additional cost. This proposed algorithm has been compared with Energy Aware Routing (EAR) and improved EAR (I-EAR). Simulation results show that RLLO gains a significant improvement in terms of network lifetime and packet delivery over these two algorithms.

Reinforcement Learning based Routing Protocols in WSNs: A Survey

Article

Nov 2013

Ritu Garg

Energy balancing in multi-hop Wireless Sensor Networks: An approach based on reinforcement learning

Conference Paper

Jul 2014

Wireless Sensor Networks (WSNs) are made of spatially distributed autonomous sensors, which cooperate to monitor a certain physical or environmental condition and pass their data through a network to a central data sink. A promising field of application of WSNs is planet exploration, in which a continuous monitoring of the surface is necessary, to have a clear notion of planet conditions and prepare for a future manned mission. The potentially large size of the region to be monitored and the line-of-sight limitations on remote planets (for instance the Moon, as studied in the SWIPE project [1]), impose constraints on the possibility to have 1-hop sensor-sink communication. Therefore, the sensors must be able to create and maintain a multi-hop ad hoc network, to allow sensed data to reach the sink. This paper extends the Q-Routing algorithm, designed for fixed and mobile networks, in order to be applicable in WSNs. The proposed routing algorithm aims at optimizing the network lifetime, by balancing the routing effort among the sensors, taking into account their current residual batteries, while minimizing the control overhead. Simulation results show an increase of performances, in grid-based networks, which are common topologies for WSNs.

A survey on intelligent routing protocols in wireless sensor networks

Article

Feb 2014

This paper surveys intelligent routing protocols which contribute to the optimization of network lifetime in wireless sensor networks (WSNs). Different from other surveys on routing protocols for WSNs, this paper first puts forward new ideas on the definition of network lifetime. Then, with a view to prolonging network lifetime, it discusses the routing protocols based on such intelligent algorithms as reinforcement learning (RL), ant colony optimization (ACO), fuzzy logic (FL), genetic algorithm (GA), and neural networks (NNs). Intelligent algorithms provide adaptive mechanisms that exhibit intelligent behavior in complex and dynamic environments like WSNs. Inspired by such an idea, some intelligent routing protocols have recently been designed for WSNs. Under each category, it discusses the representative routing algorithms and further analyzes the performance of network lifetime defined in three aspects. This paper intends to give assistance in the optimization of network lifetime in WSNs, together with offering a guide for the collaboration between WSNs and computational intelligence (CI).

Routing Protocols Based on Reinforcement Learning for Wireless Sensor Networks: A Comparative Study

Abstract

Recommended publications

Optimizing the lifetime of wireless sensor networks via reinforcement-learning-based routing

Energy efficient dynamic MANET on-demand with delay QOS (EE-DYMO-D) routing protocol for wireless se...

An Energy Efficient Event Based Hierarchical Clustering Algorithm for Wireless Sensor Networks

Remaining-energy based routing protocol for wireless sensor network