Content uploaded by Muhammad Yeasir Arafat
Author content
All content in this area was uploaded by Muhammad Yeasir Arafat on Aug 02, 2021
Content may be subject to copyright.
1
Routing Protocols Based on Reinforcement Learning for Wireless Sensor
Networks: A Comparative Study
Arafat Habib, Muhammad Yeasir Arafat, Sangman Moh*
Dept. of Computer Eng., Chosun Univ., 309 Pilmun-daero, Dong-gu, Gwangju, 61452 South Korea
akhtab007@gmail.com, yeasir08@yahoo.com, smmoh@chosun.ac.kr*
Corresponding author*: Phone: +82-62-230-6032
Abstract
A carefully designed routing protocol can facilitate wireless sensor networks (WSNs) with improved
performance in terms of end-to-end delay, energy consumption, and packet delivery ratio significantly.
As the application of WSNs lies in critical operations with limited irreplaceable batteries, routing
protocols are needed to be flawless, reliable, and energy-efficient. The complex and dynamic
environment of WSNs also requires intelligent routing algorithms. Over the past few years,
reinforcement learning (RL) algorithms have been used for designing routing protocols in WSNs so that
energy consumption can be lessened and network performance can be improved. In this paper,
different RL-based routing protocols are surveyed and qualitatively compared with each other. That is,
the RL-based routing protocols are addressed with respect to their operating principles and key
features, and they are compared with each other regarding significant characteristics, and pros and
cons. A discussion on some challenging issues of designing and implementing RL-based routing
algorithms in WSNs is also presented.
Keywords: Wireless sensor network, routing protocol, reinforcement learning, Q-learning, energy
consumption, end-to-end delay, packet delivery ratio.
1. Introduction
A wireless sensor network (WSN) consists of autonomous devices containing small sensor nodes that are
distributed spatially for specific applications. The sensor nodes are stationary in most cases and are subject to
limited human intervention. WSNs have very crucial and critical applications like wildlife monitoring, battlefield
surveillance, response to the disasters, and radioactive radiation monitoring [1]. In WSNs, the sensed data should
flow towards the base station or sink node. Fast and reliable communication is required in many WSN
applications. Limited resources such as battery, energy and storage capacity should also be considered carefully.
Energy is the most prominent resource because network lifetime depends on it.
In WSNs, reinforcement learning (RL) based routing protocols can improve network lifetime as well as
general performance [2]. Some existing survey papers ([3], [4] and [5]) review the routing protocols based on
artificial intelligence (AI) technique. RL-based routing protocols are reviewed in part in [2], [3] and [6]. In [2] and
[3], however, any comparative study among the discussed protocols is not addressed. In [6], only five protocols are
addressed with limited information regarding to their features and characteristics.
In this paper, the latest RL-based routing protocols in WSNs are extensively surveyed. The effect of RL in
routing for WSNs is investigated, and the major features and operational characteristics of existing RL-based
routing protocols are reviewed protocol by protocol, followed by the comparative analysis of the discussed
protocols in a qualitative manner. Some challenging design issues are also discussed.
Rest of the paper is organized as follows: In the following section, RL for WSNs is introduced. In Section 3,
the latest RL-based routing protocols are presented with respect to functional principle, basic operation, and
distinctive characteristics. In Section 4, the reviewed protocols are compared qualitatively, and the comparison is
summarized and discussed. Section 5 covers the challenging design issues in brief. The conclusion of the paper is
drawn in section 6.
2. Concept of Reinforcement Learning
Reinforcement learning falls into a criterion of the machine learning algorithms where an agent goes
through interactions with the environment learning what to do step by step yielding some kind of rewards. Its
primary goal is to learn the environment to find optimal policy by process of maximizing rewards. An agent in RL
2
learns which action can lead to optimality by itself by this simple efficient concept of maximizing rewards for the
chosen action. This type of learning considers a scenario with a completely goal-directed agent where the
environment is totally uncertain. Some important concepts of an RL system are discussed below:
Policy: A policy in RL refers to the concept of taking certain actions being in some particular states. It
denotes the behavior of an agent in the way it learns the environment. Basically, it is state-action mapping that an
agent learns through time. Policies are the core of RL and can be both deterministic and stochastic.
Reward Function: The reward function considers a state action pair as a single entity, and it can have a
transition from that state considering the action to another state with a certain reward that is considered as intrinsic
desirability here. The only one objective that an RL agent has is to maximize its reward as time passes by in its
learning period.
Value Function: The main difference between reward function and value function is that the reward
function considers only the immediate sense of reward whereas the value function considers long-term reward.
Summation of rewards gathered by an agent being in a particular state refers to the value of that state if the agent
commences learning from that very state. A state may yield very low immediate reward but still be very important
because it may link to the sates yielding higher rewards. The opposite is also possible.
Environment model: It consists of states, and the paradigm represents the behavior of a particular
environment. Through this, an agent can decide its next state and action through a reward function.
Routing approaches based on RL in WSNs are mostly based on Q-learning. It is a kind of reinforcement
learning that starts from any random state and takes any random action in the environment. It chooses the action
according to the policy derived from Q and observes the next state and reward. It loops and continues to learn the
environment until it finds the optimal policy.
3. RL-based Routing Protocols for WSNs
Routing protocols for WSNs should be aware of energy consumption, end-to-end delay, PDR and prolonged
network lifetime. RL-based approaches towards routing in this type of network also consider the routing metrics.
In this section, these RL-based routing protocols will be discussed and reviewed extensively.
3.1. Feedback Routing for Optimizing Multiple Sinks in WSNs with Reinforcement
Learning (FROMS)
In [7], a routing protocol on the basis of Q-learning is presented. It elaborates the process of sharing a
node’s local information to other neighboring nodes as feedback without causing any network overhead. The
network used for deploying this protocol has a multi-sink architecture. The problem is formulated so that it could
be solved with Q-Routing [8].
The best thing about this routing protocol is that it considers both exploration and exploitation strategies to
find an optimal route. Only exploitation may lead to a local optimal solution on the contrary, excessive exploration
overhead may lengthen the time elapsed for route discovery.
Figure 1. Routing scenario in FORMS
Figure 1 shows an experimental network consisting of two sink nodes and one source node. The solid
arrows in the figure represent the best-shared-route and the dashed arrows represent point-to-point routes. In Figure
1, the leftmost source node s has three neighbor nodes (nodes 1 to 3). Data transmission to the sink node from the
source node has to be done via neighbor nodes. If the source node chooses neighbor node 1 to transmit data, the
3
data travels three hops to reach sink node A and five hops to sink node B. If it chooses neighbor node 2 to transmit
data, the data travels four hops to reach sink node A and four hops to sink node B. Lastly, if it chooses neighbor
node 3 to transmit data, the data travels five hops to reach sink node A and three hops to sink node B. Considering
hop count, choosing neighbor node 2 will lead to the optimal route in Figure 1.
The unique contribution of FORMS [7] is that multiple sinks are taken into consideration in the network
design and significantly reduces network overhead by using the multiple sink nodes. Also, FORMS has a recovery
process if any node fails. On the other hand, FORMS is prone to node failure and sink mobility can lead to routing
errors.
3.2. Multi-agent Reinforcement Learning based Routing Protocol with Quality of
Service (QoS) Support for WSN (MRL-QRP)
In this routing protocol [9], QoS routes are cooperatively computed using a distributed value function.
Global optimization can be achieved through local information about the network and the exchange of values
regarding states with the neighboring nodes. In [9], two things are checked before a node sends any data packet.
First, it checks the packet to look up at the QoS requirements. Next, it checks the Q-value table. After that, the
packet is transmitted to a neighboring node having the Q-value that is higher than other forwarding nodes.
Figure 2. Multi-hop wireless sensor network for MRL-QRP
In Figure 2, a data packet is originated by sensor node i0. When the learning process is going on, i0 sends the
packet to node i1 by random selection. After that, i1 forwards the packet to i2. This process goes on until node in
receives the data packet, where node iN is the last node attached to sink in the routing process.
In [10], it is claimed that MRL-QRP has superiority over a well-established conventional protocol for
WSNs, which is ad hoc on demand distance vector (AODV) routing protocol. MRL-QRP [9] performs
exceptionally well when the traffic load is heavy. It considers end-to-end delay, PDR, and energy consumption as
routing metrics.
3.3. Reinforcement Learning as Adaptive Network Routing of Mobile Agents
In this routing protocol, mobile agents that traverse routing nodes target to quest for the optimal path at each
time step for decreasing the service processing time. Movements of the agents are designed in a way so that
congestions are avoided. The Q-learning algorithm implemented in this protocol learns policies online. Mobile
agents are routed, and changes in the traffic patterns, network load levels, and topologies are promptly dealt with
through incorporation into the routing policies. System topology is determined by the agent through network
discovery. After that, gained information is stored within the nodes. Q-routing exploits Q-learning and was first
proposed by Boyan and Littman [11].
Figure 3. Agent system
Figure 3 presents an agent system used in [8]. R denotes the intelligent router nodes in the figure. MA stands
for mobile agents, and SP represents service providers. R forwards MA from the input link towards next R
depending on the concrete configuration, and it determines its output link by observing attained Q values. Having
4
received MA immediately, R sends the reply with predicted route towards previous R. The process goes on until the
MA had found SP and SP then performs the requested task on behalf of MA [8]. The right-ended arrows in Figure 3
represent the movement of MA. Movement of replies starting from SP is denoted using the left-ended arrows. Each
R manages its input and output links. The last node in the figure is SP that performs the final processing of mobile
agents.
The best part of this routing technique is that it can adapt to frequent topology changes and works fine with
varying network traffic. Also, accurate measurement of service processing time along a particular route makes the
protocol more efficient. In [8], no comparison was shown with the standardized routing protocols to establish the
superiority of the proposed routing algorithm. Moreover, average service processing time is high in this routing
approach because it has to learn the whole network in order to perform routing. Routing metrics taken under
consideration in [8] are PDR and latency.
3.4. Energy-aware QoS Routing using RL (EQR-RL)
In [12], decision on routing relies on resource availability in the network and QoS requirements. For
sending a data packet to the sink node, the sending node decides which intermediate node to go for using the
routing table. For the ease of explanation, let us consider the node to be i. Node i takes into account some
neighboring nodes that have been denoted as s. The nodes in this set s should fulfill the minimum QoS
requirements. After this in [12], a node from the set s is chosen with the help of a load balancing algorithm. The
proposed algorithm was implemented in a scenario where there were periodic and long lasting transmissions to the
sink node. The algorithm in this protocol is also scalable. In [12], to retrieve the information of the neighboring
nodes, a node can look up at the header of the data packet obtained by them. The information includes failure of
any node, removal of any node from neighborhood, and link quality. Neighbors can have a look at the data packet
header to update the routing table when new nodes are added. Also, a neighbor node is excluded from the routing
table if there is no response from it.
EQR-RL supports multiple QoS requirements that include latency, geographical distance, and the number of
hops for data delivery. Also, it can handle mobility and failure recovery of nodes. EQR-RL supports mobile sink
nodes as well. To implement reinforcement learning for any network scenario, a balance between exploration and
exploitation strategies are needed. Only exploration will lead to routing overhead, but it is definite that an optimal
route will be found. Considering exploitation strategies only will lead to a faster route finding in the network but
with a good amount of probability that the route will not be optimal. Authors in [12] consider exploration strategies
only. Network scenario in [12] consists of a few mobile nodes. It is claimed that the proposed protocol improves
lifetime of the network, PDR and end-to-end delay in comparison to [13], [14] and [9].
3.5. Distributed Adaptive Cooperative Routing Protocol (DACR)
In [15], authors propose a protocol that can ensure QoS requirements by decreasing delay and increasing
reliability. Using the AODV protocol, a route from source to destination is established in this discussed protocol.
Data transmission is possible either in a direct manner or in a relayed manner. Based on the energy level of the
node, one among these two transmission modes is elected. If relayed transmission is chosen, the protocol then
considers residual energy, reliability and delay as important criteria.
All routing nodes learn these criteria using lightweight RL. After this, it uses a lexicographic optimization
[16] to find the optimal relay. The contribution of this protocol is summarized below:
• No central control is needed for DACR and nodes can be deployed in a distributive way. Global information
on channel state condition is also unnecessary for this protocol. That is why DACR has relatively lower
network overhead.
• It is also shown that selecting the relay in a proactive way is more efficient in comparison to selecting
relays in a reactive manner.
• Finally, the process for finding out routes and relays could save large amount of energy.
In this protocol, the process of discovering routes and relays can save significant amount of energy. One big
flaw for this protocol can be it's being too much complex because it uses an RL algorithm and transmission mode
selection algorithm in the routing process. It considers energy consumption and network lifetime as routing metrics.
3.6. Routing Protocol to Optimize Network Lifetime (RLLO)
The protocol proposed in [17] utilizes the superiority of RL. After receiving a data packet, a node analyzes
the information. After this, there are two possible actions the node can choose. One is to drop the packet if the node
finds out the next forwarding node does not have a route to destination and the sink node is beyond the
transmission range of the node. The node directly sends the data to the sink node in this condition given that it has
enough energy and the sink node is within the range. Another is to send the packet to the next forwarding node if
the forwarding node has a valid route to destination. The node with the highest Q-value is always selected among
the possible next forwarding nodes.
When a node in RLLO executes an action, it receives a reward. The Q-value of the node changes after
5
obtaining the reward. The reward is used to update the Q-value of the node. The reward function is designed with
residual energy, which leads to an optimal balance of energy usage. Protocol proposed in [17] is highly flexible to
topology changes. Also, it achieves global optimization of the network without any additional cost. The biggest
flaw of this routing approach is that it is prone to network isolation because nodes having low energy deliberately
drop packets. It can also lead to additional latency. To prove the superiority of [17], it was with compared with
[18], considering PDR and network lifetime as performance metrics.
3.7. RL-based Routing Protocol for Multi-hop WSNs
Routing approach proposed in [19] extends Q-routing algorithm [11] for its implication in WSNs.
Optimization of network lifetime is achieved by balancing the routing effort within sensor nodes. This routing
approach minimizes the control overhead too, and current residual batteries are also taken into account. Routing
approach in [19] is designed and implemented for plenary exploration scenarios with a goal to bring satellite and
WSN technologies in space. An example scenario for the implementation of this routing technique can be SWIPE
project [20]. The main concept lies in deploying hundreds or thousands of small sensor nodes with some of them
having abilities for satellite communication. Other which do not have abilities for satellite communication will be
responsible for on surface ad hoc network. Retrieved data will be sent to satellite first after processing and earth
after that.
Q-routing updates Q value through the following function:
(1)
where s is a random state, a is a random action and R is the reward achieved for choosing action s and a.
are next state and action after s and a. Here, learning rate is denoted as α and it fixes how much the older
information will be replaced by the new information. Value of this parameter is: 0 ≤ α ≤ 1. ‘’ stands for discount
factor. Importance of the future rewards is determined by it. Value of this parameter is 0 ≤
≤ 1. When a node
transmits a packet to the next forwarding node, routing table gets updated. The neighbor node should back
acknowledge (ACK) after that to node i. The update process is done through function described in (1). Node i
requires the data from neighbor nodes only. As Q function gets updated, an assumption of the overall network is
gained if the network topology remains unchanged. The proposed protocol was compared with different versions
of Q-routing.
4. Comparative Analysis of the Routing Protocols based on RL
In this section, the protocols discussed in Section 3 are qualitatively compared with each other. In WSNs,
energy is the most important issue [21]. The target is to reduce wastage of the energy that affects the network
lifetime. Increased PDR can provide more reliable communication whereas decreased end-to-end delay will
remove unwanted latency. Routing metrics that are specific to certain protocols have been mentioned in routing
metric column of Table 1. There can be many different cases and network scenarios where researchers may be
interested to apply a particular routing protocol. The column of outstanding features in the table can be helpful for
that. Good exploration and exploitation strategies are necessary to implement RL algorithms for routing perfectly.
As implementing RL is not a straightforward work, it is also necessary to know that if any routing protocol creates
much network overhead and complex routing than usual due to RL implementation. Though mobility is not a must
in WSNs, it is also a matter of observation that how well an RL based routing algorithm performs if mobility is
introduced to a few of the nodes. Some routing protocols can work better than others under some particular routing
and QoS metrics, and some may create network isolation due to the irresponsive nodes. In Table 1, the competitive
advantages and inherent limitations of each routing protocol are summarized.
Protocol
Routing
metrics
Outstanding features
Advantages
Limitations
FROMS [7]
PDR, Energy
expenditure
1. Significantly
decreases the network
overhead.
2. Two nodes can
contact each other in a
full duplex manner.
1. Consideration of multiple
sink nodes.
2. Good exploration strategy.
3. Complete recovery process
if gone to failure.
4. Greedy approach after
convergence leads to faster
route finding in the network.
5. Maintains nearly perfect
data rate after node failure by
route switching.
1. Prone to node failure.
2. Sink mobility may lead to
routing errors.
3. Data redundancy in
multiple sink nodes can
happen that may cause
wastage of valuable energy.
4. Switches route frequently
that may prolong the time
needed for route discovery.
Table 1: Comparative analysis of the routing protocols based on RL in WSNs
6
Routing
protocol for
adaptive
network
environment
using RL
[8]
PDR and
latency
1. Works well in
frequently changing
network topologies.
2. Synchronized
routing in order to
achieve shortest
delivery and service
processing time.
1. Highly adaptive to
network topology change.
2. Adaptive to varying traffic
conditions.
3. Service processing time,
along with a particular route,
can be accurately estimated.
4. Easy to find out efficient
routes and no central routing
control is needed.
1. No comparison showed
with standardized protocols.
2. Average service
processing time is high
because routing algorithm
must learn whole network.
3. Exhibits a path hysteresis
problem in which it has
problems with falling back to
the optimal once network
routes improve.
MRL-QRP
[9]
PDR and delay
1. Sensor nodes
cooperatively compute
routes.
2. Exploits each node
for gaining sufficient
network knowledge to
choose the best hop.
3. Sensor nodes adjust
the probability of
environment according
to node mobility and
wireless channel
condition.
1. Compares with a well-
established and conventional
protocol (AODV)
2. Performs especially well
when the traffic load is
heavy.
3. Reduces Network
overhead.
4. Local information of the
network involving
information regarding
neighboring nodes is enough
to have profound optimal
performance in a global
manner.
1. Power consumption not
taken into consideration
while computing routes.
2. The routing table formed
in the beginning cannot
ensure QoS.
3. Resource overuse and
network overhead due to the
fluctuating QoS requirement.
EQR-RL
[12]
Remaining
energy, PDR
1. Nodes are made
aware of the data
needed and the sink
node uses a control
flooding algorithm.
2. Scalability,
flexibility and load
balancing enabled
routing protocol.
3. To keep the routing
table up-to-date, nodes
participating in data
communication can
adopt the header of the
packet.
1. Support for variant QoS
requirements.
2. It can handle mobility and
failure recovery.
3. This protocol supports
mobile sink node.
4. Irresponsive nodes are
well avoided in the routing
process.
1. Exploitation not taken
under consideration.
2. Route computation may
take time.
3. Network isolation is likely
to happen as it excludes
nodes that do not respond
from routing table.
4. Only a few nodes have
been considered to be
mobile. This allows
confusion about the
efficiency of the routing
protocol it will work well if
all the nodes are static or if
all the nodes are mobile.
5. Does not describe how
mobile sinks will cop up with
the frequently changing
network.
DACR [15]
Energy
consumption,
PDR, network
lifetime
1. Routing is done in a
cooperative manner.
The protocol can also
adapt to frequent
changes as time passes
by.
2. Multi-hop path from
source to destination.
3. Low network
overhead.
1. Cooperative routing
decisions lead to reduced
network delay and increased
reliability.
2. QoS requirements are
fulfilled along with reduced
overhead in network
operations due to relay
section in a proactive
manner.
3. Significantly increases
network lifetime.
4. Transmitted signal
achieves the best phase
synchronization in the
receiver end.
5. Cooperation among relay
nodes introduces additional
spatial diversity and
increases the transmission
1. Too much complexity in
the process and uses One RL
and a Transmission mode
selection algorithm.
2. Memory not considered as
a resource to be used wisely.
3. Uses complex
lexicographic optimization
for relay selection.
7
5. Challenging Open Issues
Routing in WSNs is always challenging. When it comes to implying RL for routing protocol design, there
are many challenges to be dealt with. These challenges are discussed in this section.
Fixing learning rate: Learning rate in any RL technique plays a prime role to find convergence as it decides
how new information replaces the older information. This value is often needed to vary to compute the Q value.
Inappropriate value of this parameter may lead to inconvenience towards convergence.
Balance between exploration and exploitation strategies: This balance is highly important for applying RL
to any scenario. Optimal route computation is something that is expected in any newly designed routing protocol.
Only exploration will lead to optimality, but it will lead to routing overhead. Again, only exploitation may lead to a
route that is not optimal.
Convergence speed: Convergence of any RL algorithm is the vital most criteria of successfully
implementing RL to any system. After adequate amount of learning episodes, the amount of generated optimum
reward is same always if the algorithm has successfully converged. Successful convergence also depends on the
formulation of the reward function too. Convergence speed matters when faster computation is needed in route
computation.
Security: WSNs are prone to security problems like DDoS attacks, eavesdropping, sinkhole attack, etc. Data
confidentiality is one of the security mechanisms for WSNs. Sensor nodes should not allow their neighbors to have
unauthorized access to their readings [22]. Sinkhole attack is another security issue that is quite challenging to
handle through routing protocol design based on RL. In this kind of attack, a corrupted node looks more beneficial
as a next forwarder since the routing information has already been manipulated by the invader.
Data redundancy: As we have already discussed that WSNs are resource constrained networks, data
redundancy can even lead to more resource issues. In order to prevent that, we have to make sure that the acquired
data is fresh enough and any old data is not resent. Existing works that use RL based algorithms to design routing
protocols do not cover the issue of data redundancy and solving this issue can be a challenging research task.
Energy hole problem: This issue is very important to solve through routing techniques to keep any WSN
fully functional. Nodes staying close to the sink node die faster than the other nodes in this case since it has to pass
data to the sink node more frequently [23].
QoS metrics: Some QoS metrics can be combined with the discussed routing metrics. Examples can be
reliability, packet loss, etc. Protocols discussed in this paper hardly cover this issue of combining routing metrics
along with QoS metrics.
reliability against channel
fading.
Routing
protocol for
balancing
energy in
multi-hop
WSNs [19]
Energy
consumption
and network
lifetime
1. Designed and
implemented for
plenary exploration
scenarios.
2. Merges satellite and
WSN technologies in
space.
3. Embedded with
nodes for satellite
communication.
1. Minimizes control
overhead.
2. Optimizes the use of
sensor batteries.
3. Maximizes the node
lifetime.
4. Performs efficiently for
grid-based networks.
5. Adapts to any kind of
network topology.
1. Estimation about the entire
network can be achieved
only after Q-function gets
frequently updated.
2. Does not describe the
process and distinctive
properties of the sensor
nodes responsible for
satellite communication and
also how they will interact
with the normal sensor nodes
responsible for sensing and
data processing.
RLLO [17]
Network
lifetime and
PDR
1. Uses the superiority
of RL to achieve global
optimization
2. Considers hop count
to the sink to search for
the routing path
3. Evaluates the
approximate goodness
of every action.
1. Balances the energy
consumption and improves
PDR.
2. Achieves global
optimization without
additional cost.
3. Highly flexible to
topology changes.
1. Network isolation is likely
to happen as nodes having
relatively low energy drop
packets
2. If there is no existing
route to send data to a
forwarding node, the node
will transmit the data to the
sink node given that it has a
certain level of energy. This
may result in extra loss of
energy.
3. The probability of network
isolation is very high.
8
6. Conclusion
In this paper, the latest RL-based routing protocols have been extensively reviewed in terms of major
features and characteristics, and qualitatively compared with each other. Even though RL may lead to computation
overhead, the existing RL-based routing protocols are promising in terms of providing better performance in terms
of major performance metrics such as energy consumption, end-to-end delay, PDR and network lifetime. The
comparative discussion of different RL-based routing protocols in this paper can be effectively used for choosing a
routing protocol or designing RL-based routing protocols for WSNs. Some challenging issues of designing and
implementing RL-based routing algorithms in WSNs have been also discussed. It can be concluded that a good
balance between exploration and exploitation strategies while implementing RL, faster convergence of the
algorithm, considering multiple routing metrics, QoS supports, resolving security issues should be considered
altogether to design a pragmatic RL-based routing protocol.
7. Acknowledgment
A preliminary short version of this work was presented at the 8th International Conference on Convergence
Technology, Jeju, Korea, July 2018 [24]. This work was supported in part by the Basic Science Research Program
through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (NRF-
2016R1D1A1A09918974), and it was also supported in part by the MIST (Ministry of Science and ICT), Korea
under the National Program for Excellence in SW supervised by the IITP (Institute for Information and
Communications Technology Promotion) (2017-0-00137). Correspondence should be addressed to Sangman Moh
(smmoh@chosun.ac.kr).
8. References
1.
Khan MI, Rinner B, Regazzoni CS. Energy-aware task scheduling in wireless sensor networks based on
cooperative reinforcement learning. Proceedings of 2014 IEEE International Conference on Communications
Workshops (ICC). 2012 Sep; 871 – 77; Liverpool, UK. DOI: 10.1109/ICCW.2014.6881310.
2.
Arya A, Malik, A, Garg, R. Reinforcement Learning based Routing Protocols in WSN: A survey. IJCSET. 2013
Nov; 4(11): 1401 - 4, 2013.
3.
Kulkarni P, Munot H, Malathi P. Survey on Computational Intelligence Based Routing Protocols in Wireless
Sensor Network. IJWMCIS. 2016 Feb; 3(2): 23-32. DOI: 10.21742/ijwmcis.2016.3.2.04.
4.
Guo W, Zhang W, A survey on intelligent routing protocols in wireless sensor networks. IJWMCIS. 2014;
38:185 – 201.
5.
Alsheikh MA, Lin S, Niyato D, Tan HP. Machine Learning in Wireless Sensor Networks: Algorithms,
Strategies, and Applications. Commun. Surveys Tuts. 2014 Apr; 16(4): 1996-2018. DOI:
10.1109/COMST.2014.2320099.
6.
Habib A, Arafat Y, Moh S. A Survey on Reinforcement-Learning-Based Routing Protocols in Wireless Sensor
Networks. Proceedings of 8th Int. Conf. on Convergence Technology (ICCT 2018). 2018 Jul; 8(1): 359-
360; Jeju, Korea.
7.
Förster A, Murphy AL. FROMS: Feedback routing for optimizing multiple sinks in WSN with reinforcement
learning. Adhoc Netw. 2011 July; 9(5): 940-65. DOI: 10.1016/j.adhoc.2010.11.006.
8.
Ouzecky D, Jevtic, D. Reinforcement as Adaptive Network Routing Mobile Agents. 2010 Proceedings of the
33rd International Convention. 2010; Opatija, Croatia.
9.
Liang X, Balasingham, I, Byun SS. A multi-agent reinforcement learning based routing protocol for wireless
sensor networks. Proceedings of IEEE International Symposium on Wireless Communication Systems. 2008
Oct; 552-557; Reykjavic, Iceland. DOI: 10.1109/ISWCS.2008.4726117.
10.
Available from: https://www.ietf.org/rfc/rfc3561.txt.
11.
Boyan J, Littman M. Packet Routing in dynamically changing networks: A Reinforcement Learning Approach.
Proceedings of Advances in Neural Information Processing System. 1994 Nov; 671-678; San Francisco, USA.
12.
Jaforzadeh SZ, Mughaddam MHY. Design of Energy-aware QoS Routing Algorithm in Wireless Sensor
Networks Using Reinforcement Learning. Proceedings of 2014 4th International Conference on Computer and
Knowledge Engineering. 2014 Dec; 722 – 727; Mashhad, Iran. DOI: 10.1109/ICCKE.2014.6993408.
13.
Gerasimov I, Simon R. A bandwidth-reservation mechanism for on-demand ad hoc path finding. Proceedings of
35th Annual Simulation Symposium. 2002 Aug; 27-34; San Diego, USA. DOI:
10.1109/SIMSYM.2002.1000079.
14.
Maalej M, Cherif S, Besbes, H. QoS and Energy Aware Cooperative Routing Protocol for Wildfire Monitoring
Wireless Sensor Networks. The Scientific World Journal. 2013.
15.
Razzaque MA, Alam MM, Rashid MM, Hong CS. Multi-constrained QoS geographic routing for heterogeneous
traffic in sensor networks, IEICE Trans. Commun. 2008; E91-B (08): 2589–2601.
9
16.
Zykina AV, A lexicographic optimization algorithm. 2004; Autom. Remote Control. 65: 363–68.
17.
Guo W, Yan C, Gan Y, Lu T. An Intelligent Routing Algorithm in Wireless Sensor Networks based on
Reinforcement Learning. Applied Mech. and Mate. 2014; 678: 487-93.
18.
Shah R, Rabaey J. Energy aware routing for low energy ad hoc sensor networks. Proceedings of the IEEE
Wireless Communications and Networking Conference (WCNC). 2002 March; 350-355; Orlando, FL. DOI:
10.1109/WCNC.2002.993520.
19.
Oddi G, Pietrabissa A, Liberati F. Energy balancing in multi-hop Wireless Sensor Networks: an approach based
on reinforcement learning. Proceedings of 2014 NASA/ESA Conference on Adaptive Hardware and Systems
(AHS). 2014 July; 262-269; Leicester, UK. DOI: 10.1109/AHS.2014.6880186.
20.
European Commission. Space Wireless Sensor Networks for Planetary Exploration. [online] Available at:
https://cordis.europa.eu/project/rcn/108074_en.html [Accessed 24 Aug. 2018].
21.
Shah k, Kumar M. Distributed Independent Reinforcement Learning (DIRL) Approach to Resource Management
in Wireless Sensor Networks. Proceedings of IEEE Conference on mobile ad-hoc and sensor systems. 2007
October; 1-9; Pisa, Italy DOI: 10.1109/MOBHOC.2007.4428658.
22.
Carman DW, Krus PS, Matt BJ. Constraints and approaches for distributed sensor network security. 2000 Sep. 5-
126.
23.
Pathak A, Zaheeruddin, Tiwari, M. Minimizing the Energy Hole Problem in Wireless Sensor Networks by
Normal Distribution of Nodes and Relaying Range Regulation. 2012 Fourth International Conference on
Computational Intelligence and Communication Networks. 2012 Nov; 154-157; Mathura, India. DOI:
10.1109/CICN.2012.148.
24.
Habib A, Arafat Y, Moh S. A Survey on Reinforcement-Learning-Based Routing Protocols in Wireless Sensor
Networks. Proceedings of 8th Int. Conf. on Convergence Technology (ICCT 2018). 2018 Jul; 8(1): 359-
360; Jeju, Korea.
View publication statsView publication stats