Multi-agent reinforcement learning schema. 

Multi-agent reinforcement learning schema. 

Source publication
Article
Full-text available
Current trends in interconnecting myriad smart objects to monetize on Internet of Things applications have led to high-density communications in wireless sensor networks. This aggravates the already over-congested unlicensed radio bands, calling for new mechanisms to improve spectrum management and energy efficiency, such as transmission power cont...

Citations

... In real-world scenarios the network perturbation could represent any type of network disturbances; for instance, in our previous work [8] we defined the two probabilistic failures above mentioned that might be used to simulate cascading failures (e.g., in power grids) [11], or the spreading confining of both viruses [12][13][14] and fake news [15,16], or streamline the network, through pruning (i.e., by removing those nodes that have minor impact on the overall connectivity) [17]. ...
Article
Full-text available
Despite the huge importance that the centrality metrics have in understanding the topology of a network, too little is known about the effects that small alterations in the topology of the input graph induce in the norm of the vector that stores the node centralities. If so, then it could be possible to avoid re-calculating the vector of centrality metrics if some minimal changes occur in the network topology, which would allow for significant computational savings. Hence, after formalising the notion of centrality, three of the most basic metrics were herein considered (i.e., Degree, Eigenvector, and Katz centrality). To perform the simulations, two probabilistic failure models were used to describe alterations in network topology: Uniform (i.e., all nodes can be independently deleted from the network with a fixed probability) and Best Connected (i.e., the probability a node is removed depends on its degree). Our analysis suggests that, in the case of degree, small variations in the topology of the input graph determine small variations in Degree centrality, independently of the topological features of the input graph; conversely, both Eigenvector and Katz centralities can be extremely sensitive to changes in the topology of the input graph. In other words, if the input graph has some specific features, even small changes in the topology of the input graph can have catastrophic effects on the Eigenvector or Katz centrality.
... QL-TPC [27] models the transmission power control problem as a Markov Decision Problem (MDP). The authors apply Q-learning, which is a Temporal Difference method, to find the optimal transmission power. ...
Article
Full-text available
RPL—Routing Protocol for Low-Power and Lossy Networks (usually pronounced “ripple”)—is the de facto standard for IoT networks. However, it neglects to exploit IoT devices’ full capacity to optimize their transmission power, mainly because it is quite challenging to do so in parallel with the routing strategy, given the dynamic nature of wireless links and the typically constrained resources of IoT devices. Adapting the transmission power requires dynamically assessing many parameters, such as the probability of packet collisions, energy consumption, the number of hops, and interference. This paper introduces Adaptive Control of Transmission Power for RPL (ACTOR) for the dynamic optimization of transmission power. ACTOR aims to improve throughput in dense networks by passively exploring different transmission power levels. The classic solutions of bandit theory, including the Upper Confidence Bound (UCB) and Discounted UCB, accelerate the convergence of the exploration and guarantee its optimality. ACTOR is also enhanced via mechanisms to blacklist undesirable transmission power levels and stabilize the topology of parent–child negotiations. The results of the experiments conducted on our 40-node, 12-node testbed demonstrate that ACTOR achieves a higher packet delivery ratio by almost 20%, reduces the transmission power of nodes by up to 10 dBm, and maintains a stable topology with significantly fewer parent switches compared to the standard RPL and the selected benchmarks. These findings are consistent with simulations conducted across 7 different scenarios, where improvements in end-to-end delay, packet delivery, and energy consumption were observed by up to 50%.
... The goal of the agent is to learn a policy that maximizes the expected reward over time, enabling it to make optimal decisions in complex environments [14] Fig 9 RL consists of three main components: state "S", reward "R", and action "A". Top: Single-agent RL, Bottom: Multi-agent RL [26] RL can be implemented using either a single agent or multiple agents. Single-agent RL involves a single agent interacting with an environment to learn a policy that maximizes its expected reward. ...
... QL-TPC [24] models the TP control problem as a Markov Decision Problem (MDP). Authors apply Q-learning which is a Temporal Difference method to find the optimal transmission power. ...
Preprint
Full-text available
p>Routing Protocol for Low-power and lossy networks (RPL), as the de-facto routing protocol for IoT networks, neglects to exploit IoT devices' full capacity to tune their transmission power. One of the reasons is that optimizing the transmission power in parallel with the routing strategy is challenging, given the dynamic nature of wireless links and the constrained resources in IoT devices. Optimizing the transmission power requires evaluating the probability of packet collisions, energy consumption, the number of hops, and interference. We propose Adaptive Control of Transmission pOwer for RPL (ACTOR) for dynamic optimization of transmission power. ACTOR aims at improving throughput in dense networks by passively exploring different transmission power levels. The extent of resources used for this exploration significantly affects the network throughput. Thus, the exploration needs to adapt to dynamism in the environment. We formulate this exploration strategy using the Multi-Armed Bandit framework. The classic solutions of bandit theory including Upper Confidence Bound and Discounted Upper Confidence Bound accelerate the convergence of the exploration and guarantee its optimality. We also enhance ACTOR by mechanisms from RPL to blacklist undesirable transmission power levels and stabilize the topology. Results of the experiments on our 40-node testbed and simulations show that ACTOR achieves higher throughput (increasing the packet delivery ratio by 20%), energy consumption, end-to-end delay, and the number of retransmissions are significantly improved against the standard RPL and the selected benchmark.</p
... Robust communications [6] and optimised power consumption [7] are critical objectives considered for multi-radio implementations. As such, these themes form the basis of the approach presented in this work. ...
... Chincoli and Liotta [7] also employed Q-learning but for transmission power control of a single radio. The reward function proposed was a combination of discrete power levels and a linearly quantised packet reception rate. ...
... The online learning performance of the WAMO-SARSA agent was investigated with two exploration strategies. The first strategy reduced the exploration parameter over time to a minimum value as used in [7,8] and is referred to as the decayed exploration rate. The second was the multi-objective VDBE proposed in Section 3.3, which is referred to as the adaptive exploration rate. ...
Article
Full-text available
The advent of the Internet of Things (IoT) has triggered an increased demand for sensing devices with multiple integrated wireless transceivers. These platforms often support the advantageous use of multiple radio technologies to exploit their differing characteristics. Intelligent radio selection techniques allow these systems to become highly adaptive, ensuring more robust and reliable communications under dynamic channel conditions. In this paper, we focus on the wireless links between devices equipped by deployed operating personnel and intermediary access-point infrastructure. We use multi-radio platforms and wireless devices with multiple and diverse transceiver technologies to produce robust and reliable links through the adaptive control of available transceivers. In this work, the term ‘robust’ refers to communications that can be maintained despite changes in the environmental and radio conditions, i.e., during periods of interference caused by non-cooperative actors or multi-path or fading conditions in the physical environment. In this paper, a multi-objective reinforcement learning (MORL) framework is applied to address a multi-radio selection and power control problem. We propose independent reward functions to manage the trade-off between the conflicting objectives of minimised power consumption and maximised bit rate. We also adopt an adaptive exploration strategy for learning a robust behaviour policy and compare its online performance to conventional methods. An extension to the multi-objective state–action–reward–state–action (SARSA) algorithm is proposed to implement this adaptive exploration strategy. When applying adaptive exploration to the extended multi-objective SARSA algorithm, we achieve a 20% increase in the F1 score in comparison to one with decayed exploration policies.
... As a result, wireless sensor networks have found extensive applications in various aspects of our lives, such as medical, industrial manufacturing, and the Internet of Things (IoT) [1][2][3][4]. However, there are still issues that need to be addressed in wireless communication systems, such as equipment service life, complex environments, and the impact of channel states on energy harvesting and information transmission [5,6]. ...
Article
Full-text available
This paper investigates the problem of RF energy harvesting in wireless sensor networks, with the aim of finding a suitable communication protocol by comparing the performance of the system under different protocols. The network is made up of two parts: first, at the beginning of each timeslot, the sensor nodes harvest energy from the base station (BS) and then send packets to the BS using the harvested energy. For the energy-harvesting part of the wireless sensor network, we consider two methods: point-to-point and multi-point-to-point energy harvesting. For each method, we use two independent control protocols, namely head harvesting energy of each timeslot (HHT) and head harvesting energy of dedicated timeslot (HDT). Additionally, for complex channel states, we derive the cumulative distribution function (CDF) of packet transmission time using selective combining (SC) and maximum ratio combining (MRC) techniques. Analytical expressions for system reliability and packet timeout probability are obtained. At the same time, we also utilize the Monte Carlo simulation method to simulate our system and have analyzed both the numerical and simulation solutions. Results show that the performance of the HHT protocol is better than that of the HDT protocol, and the MRC technology outperforms the SC technology for the HHT protocol in terms of the energy-harvesting efficiency coefficient, sensor positions, transmit signal-to-noise ratio (SNR), and length of energy harvesting time.
... We could be saying that in data transmission blocking, Wireless Sensor Network nodes need to supply more power to the radio transceiver. Therefore, to overcome the blocking, the network needs to be self-learning power control [18]. ...
Article
Full-text available
The Wireless Sensor Network needs to become a dynamic and adaptive network to conserve energy stored in the wireless sensor network node battery. This dynamic and adaptive network sometimes are called SON (Self Organizing Network). Several SON concepts have been developed such as routing, clustering, intrusion detection, and other. Although several SON concepts already exist, however, there is no concept for SON in dynamic radio configuration. Therefore, the authors' contribution to this field would be proposing a dynamic and adaptive Wireless Sensor Network node radio configuration. The significance of their work lies in the modelling of the SON network that builds based on our measurement in the real-world jungle environment. The authors propose input parameters such as SNR, the distance between the transmitter and receiver, and frequency as the static parameter. For adaptive parameters, we propose bandwidth, spreading factor, and its most important parameter such as power for data transmission. Using the Lev-enberg Marquardt Artificial Neural Network (LM-ANN) self-organise Network model, power reduction and optimisation from 20 dBm to 14.9 dBm for SNR 3, to 11.5 dBm for SNR 6, and to 12.9 dBm for SNR 9 all within a 100-m range can be achieved. With this result, the authors conclude that we can use LM-ANN for the wireless sensor network SON model in the jungle environment.
... The lack of a standardized parking scheme also causes problems. [18,19] The increasing popularity of linking disparate smart objects for the sake of Internet of Things applications has led to an increase in the communication density of wireless sensor networks. More people using the same unlicensed radio channels will create congestion, prompting research into better ways to manage the airwaves and save power, such as transmission power control. ...
... In [4], Dai et al. investigated the joint optimization of base station (BS) clustering and power control for non-orthogonal multiple access (NOMA)-enabled coordinated multipoint (CoMP) transmission in dense cellular networks, maximizing the sum rate of the system. In addition, in terms of wireless sensor networks (WSNs), Ref. [5] investigated how machine learning could be used to reduce the possible transmission power level of wireless nodes and, in turn, satisfy the quality requirements of the overall network. Reducing the transmission power has benefits in terms of both energy consumption and interference. ...
Article
Full-text available
The intensity of radio waves decays rapidly with increasing propagation distance, and an edge server’s antenna needs more power to form a larger signal coverage area. Therefore, the power of the edge server should be controlled to reduce energy consumption. In addition, edge servers with capacitated resources provide services for only a limited number of users to ensure the quality of service (QoS). We set the signal transmission power for the antenna of each edge server and formed a signal disk, ensuring that all users were covered by the edge server signal and minimizing the total power of the system. This scenario is a typical geometric set covering problem, and even simple cases without capacity limits are NP-hard problems. In this paper, we propose a primal–dual-based algorithm and obtain an m-approximation result. We compare our algorithm with two other algorithms through simulation experiments. The results show that our algorithm obtains a result close to the optimal value in polynomial time.
... In [4], Dai et al. investigated the joint optimization of base station (BS) clustering and power control for non-orthogonal multiple access (NOMA)enabled coordinated multipoint (CoMP) transmission in dense cellular networks, maximizing the sum rate of the system. In addition, in terms of wireless sensor networks (WSNs), [5] investigated how machine learning could be used to reduce the possible transmission power level of wireless nodes and, in turn, satisfy the quality requirements of the overall network. Reducing the transmission power has benefits in terms of both energy consumption and interference. ...
Preprint
Full-text available
The intensity of radio waves decays rapidly with increasing propagation distance, and an edge server's antenna needs more power to form a larger signal coverage area. Therefore, the power of the edge server should be controlled to reduce energy consumption. In addition, edge servers with capacitated resources provide services for only a limited number of users to ensure the quality of service (QoS). We set the signal transmission power for the antenna of each edge server and formed a signal disk, ensuring that all users were covered by the edge server signal and minimizing the total power of the system. This scenario is a typical geometric set covering problem, and even simple cases without capacity limits are NP-hard problems. In this paper, we propose a primal-dual-based algorithm and obtain an $m$-approximation result. We compare our algorithm with two other algorithms through simulation experiments. The results show that our algorithm obtains a result close to the optimal value in polynomial time.