The system configuration for the proposed federated reinforcement learning.

The system configuration for the proposed federated reinforcement learning.

Source publication
Article
Full-text available
Reinforcement learning has recently been studied in various fields and also used to optimally control IoT devices supporting the expansion of Internet connection beyond the usual standard devices. In this paper, we try to allow multiple reinforcement learning agents to learn optimal control policy on their own IoT devices of the same type but with...

Context in source publication

Context 1
... the simulation environment, most elements of the device are placed in cyberspace and implemented in software. However, as shown in Figure 1, the manufactured real IoT devices are positioned in the physical space and the worker containing our reinforcement learning agent is usually placed in cyberspace because of the IoT device's constrained resource. Therefore, the elements of cyberspace and physical space interact in real-time is important. ...

Citations

... FRL is expected to cope with the data hungry of centralized Reinforcement Learning (RL) while complying with the requirement of privacy preservation or data confidentiality [8]. It has demonstrated remarkable potential across a wide range of real-world systems, including robot navigation [4], resource management in networking [9], and control of IoT devices [10]. ...
... Liu et al. [4] propose a continual FRL method specifically designed for robot navigation problems, allowing new robots to utilize the shared policy to speed up training. Lim et al. [10] propose an FRL algorithm on top of Proximal Policy Optimization (PPO) [15] and FedAvg. Liang et al. [16] retrofit Deep Deterministic Policy Gradient (DDPG) [17] for FedAvg in the context of autonomous driving. ...
Preprint
Federated Reinforcement Learning (FRL) has been deemed as a promising solution for intelligent decision-making in the era of Artificial Internet of Things. However, existing FRL approaches often entail repeated interactions with the environment during local updating, which can be prohibitively expensive or even infeasible in many real-world domains. To overcome this challenge, this paper proposes a novel offline federated policy optimization algorithm, named $\texttt{DRPO}$, which enables distributed agents to collaboratively learn a decision policy only from private and static data without further environmental interactions. $\texttt{DRPO}$ leverages dual regularization, incorporating both the local behavioral policy and the global aggregated policy, to judiciously cope with the intrinsic two-tier distributional shifts in offline FRL. Theoretical analysis characterizes the impact of the dual regularization on performance, demonstrating that by achieving the right balance thereof, $\texttt{DRPO}$ can effectively counteract distributional shifts and ensure strict policy improvement in each federative learning round. Extensive experiments validate the significant performance gains of $\texttt{DRPO}$ over baseline methods.
... Recently, to reconcile FL with ever-growing intelligent decision-making applications, there has been a surge of interest towards Federated Reinforcement Learning (FRL), whereby distributed agents collaborate to build a decision policy with no need to share their raw trajectories [4]- [8]. FRL has been deemed as a practically appealing approach to address the data hungry of Reinforcement Learning (RL) [8], and demonstrated remarkable potential in a wide range of real-world systems, including robotics [4], autonomous driving [9], resource management in networking [10], and control of IoT devices [11]. ...
... However, the majority of current studies in FRL heuristically repurpose well-established supervised FL methods for the RL setting, e.g., directly combining FedAvg with classical PG or Q-learning [5], [8], [11], neglecting a unique challenge embedded therein: the spatio-temporal non-stationarity of data distributions. That is, in contrast to supervised FL operating on fixed datasets, FRL's intrinsic trial-and-error learning process typically necessitates each agent to explore the environment and sample new data using the current policy in each local update, causing continually varying data distributions across participating agents and training rounds. ...
... Nadiger et al. [5] propose an FRL approach that combines DQN [20] and FedAvg [1] to obtain personalized policies for individual players in the Pong game by employing the smoothing average technique. Lim et al. [11] propose an FRL algorithm that combines Proximal Policy Optimization (PPO) [21] with FedAvg. Utilizing transfer learning, Liang et al. [9] adapt Deep Deterministic Policy Gradient (DDPG) [22] for FedAvg to operate in autonomous driving scenarios. ...
Preprint
Federated Reinforcement Learning (FRL) has garnered increasing attention recently. However, due to the intrinsic spatio-temporal non-stationarity of data distributions, the current approaches typically suffer from high interaction and communication costs. In this paper, we introduce a new FRL algorithm, named \alg{}, that utilizes momentum, importance sampling, and additional server-side adjustment to control the shift of stochastic policy gradients and enhance the efficiency of data utilization. We prove that by proper selection of momentum parameters and interaction frequency, \alg{} can achieve $\tilde{\mathcal{O}}(H N^{-1}\epsilon^{-3/2})$ and $\tilde{\mathcal{O}}(\epsilon^{-1})$ interaction and communication complexities ($N$ represents the number of agents), where the interaction complexity achieves linear speedup with the number of agents, and the communication complexity aligns the best achievable of existing first-order FL algorithms. Extensive experiments corroborate the substantial performance gains of \alg{} over existing methods on a suite of complex and high-dimensional benchmarks.
... Some methods break down a complex problem into smaller sub-problems so that it can be more manageable. There are some methods implemented, such as Federated Reinforcement Learning (FRL) for optimizing learning to deal with multiple agents [11] or Hierarchical Reinforcement Learning (HRL), which is presented as a solution to overcome the limitations of RL. Micro-Macro States Combination (MMSC) successfully divides the job into two layers to overcome the curse dimensionality of grid environments [12]. ...
Article
Full-text available
Recently, various Deep Actor-Critic Reinforcement Learning (DAC-RL) algorithms have been widely utilized for training mobile robots in acquiring navigational policies. However, they usually need a preventively long learning time to achieve good policies. This research proposes a two-stage training mechanism infused with human common-sensical prior knowledge, named Two Stages DAC-RL with incentive reward, to alleviate this problem. The actor-critic networks were pre-trained in a simple environment to acquire a basic policy. Afterward, the basic policy was transferred to initialize the training process of a new navigational policy in more complex environments. This study also infused humans' common-sensical prior knowledge to further mitigate the RL learning burden by giving incentive rewards in beneficial situations for the navigation task. The experiments tested this research's algorithms against navigation tasks in which the robot should efficiently reach designated goals. The tasks were made more challenging by requiring the robot to cross some corridors to reach the goal while avoiding obstacles. The results showed that the proposed algorithm worked efficiently regarding various start-goal positions across the corridors.
... Predicting IoT DataStream accurately is crucial for informed decision-making across various fields like clinical medicine [14], finance [15], traffic flow [16], and human action analysis [17], among others. Unlike other prediction tasks, DataStream introduces complexity due to the sequence dependence among input variables over sequences. ...
Preprint
Full-text available
Real-time data stream processing presents a significant challenge in the rapidly changing Internet of Things (IoT) environment. Traditional centralized approaches face hurdles in handling the high velocity and volume of IoT data, especially in real-time scenarios. In order to improve IoT DataStream prediction performance, this paper introduces a novel framework that combines federated learning (FL) with a competitive random search optimizer (CRSO) of Long Short-Term Memory (LSTM) models based on attention. The proposed integration leverages distributed intelligence while employing competitive optimization for fine-tuning. The proposed framework not only addresses privacy and scalability concerns but also optimizes the model for precise IoT DataStream predictions. This federated approach empowers the system to derive insights from a spectrum of IoT data sources while adhering to stringent privacy standards. Experimental validation on a range of authentic IoT datasets underscores the framework's exceptional performance, further emphasizing its potential as a transformational asset in the realm of IoT DataStream prediction. Beyond predictive accuracy, the framework serves as a robust solution for privacy-conscious IoT applications, where data security remains paramount. Furthermore, its scalability and adaptability solidify its role as a crucial tool in dynamic IoT environments.
... Predicting IoT DataStream accurately is crucial for informed decision-making across various fields like clinical medicine [14], finance [15], traffic flow [16], and human action analysis [17], among others. Unlike other prediction tasks, DataStream introduces complexity due to the sequence dependence among input variables over sequences. ...
Preprint
Full-text available
Real-time data stream processing presents a significant challenge in the rapidly changing Internet of Things (IoT) environment. Traditional centralized approaches face hurdles in handling the high velocity and volume of IoT data, especially in real-time scenarios. In order to improve IoT DataStream prediction performance, this paper introduces a novel framework that combines federated learning (FL) with a competitive random search optimizer (CRSO) of Long Short-Term Memory (LSTM) models based on attention. The proposed integration leverages distributed intelligence while employing competitive optimization for fine-tuning. The proposed framework not only addresses privacy and scalability concerns but also optimizes the model for precise IoT DataStream predictions. This federated approach empowers the system to derive insights from a spectrum of IoT data sources while adhering to stringent privacy standards. Experimental validation on a range of authentic IoT datasets underscores the framework's exceptional performance, further emphasizing its potential as a transformational asset in the realm of IoT DataStream prediction. Beyond predictive accuracy, the framework serves as a robust solution for privacy-conscious IoT applications, where data security remains paramount. Furthermore, its scalability and adaptability solidify its role as a crucial tool in dynamic IoT environments.
... In the RL setting, Federated Reinforcement Learning (FRL) aims to learn a common value function (Wang et al., 2023) or produce a better policy from multiple RL agents interacting with similar environments. In the recent survey paper (Qi et al., 2021), FRL has empirically shown great success in reducing the sample complexity in applications such as autonomous driving (Liang et al., 2022), IoT devices (Lim et al., 2020), and resource management in networking (Yu et al., 2020). ...
... FRL focuses on learning a common value function (Wang et al., 2023;Fabbro et al., 2023) or improving the policy by leveraging multiple RL agents interacting with similar environments. The empirical evidence presented in the survey paper (Qi et al., 2021) demonstrates the significant success of FRL in reducing sample complexity across various applications such as autonomous driving (Liang et al., 2022), IoT devices (Lim et al., 2020), resource management in networking (Yu et al., 2020), and communication efficiency (Gatsis, 2022). However, it is important to note that existing recent works in this field do not specifically tackle the challenge of finding a common and stabilizing optimal policy that is suitable for all RL agents in a heterogeneous setting. ...
Preprint
Full-text available
We study a model-free federated linear quadratic regulator (LQR) problem where M agents with unknown, distinct yet similar dynamics collaboratively learn an optimal policy to minimize an average quadratic cost while keeping their data private. To exploit the similarity of the agents' dynamics, we propose to use federated learning (FL) to allow the agents to periodically communicate with a central server to train policies by leveraging a larger dataset from all the agents. With this setup, we seek to understand the following questions: (i) Is the learned common policy stabilizing for all agents? (ii) How close is the learned common policy to each agent's own optimal policy? (iii) Can each agent learn its own optimal policy faster by leveraging data from all agents? To answer these questions, we propose a federated and model-free algorithm named FedLQR. Our analysis overcomes numerous technical challenges, such as heterogeneity in the agents' dynamics, multiple local updates, and stability concerns. We show that FedLQR produces a common policy that, at each iteration, is stabilizing for all agents. We provide bounds on the distance between the common policy and each agent's local optimal policy. Furthermore, we prove that when learning each agent's optimal policy, FedLQR achieves a sample complexity reduction proportional to the number of agents M in a low-heterogeneity regime, compared to the single-agent setting.
... In the IoT domain, [9] used FL for anomaly detection and provided an experimental validation showing that the approach outperforms the classic centralized ML by adding data privacy. Another approach that leverages FL is [10], which advises how to combine FL with RL to collaboratively learn an optimal control policy. The collaboration amongst devices proved to accelerate the learning process, mitigate training instability and increase generalization. ...
... The collaboration amongst devices proved to accelerate the learning process, mitigate training instability and increase generalization. Similar to the work proposed in [10], [11] provided evidence of optimizing defense with RL, however, the applicability in the MTD paradigm is not supported. Looking at the combination of RL and MTD, plenty of approaches can be found in the literature that were implemented and validated in real or simulated environments. ...
Preprint
Full-text available
The expansion of the Internet-of-Things (IoT) paradigm is inevitable, but vulnerabilities of IoT devices to malware incidents have become an increasing concern. Recent research has shown that the integration of Reinforcement Learning with Moving Target Defense (MTD) mechanisms can enhance cybersecurity in IoT devices. Nevertheless, the numerous new malware attacks and the time that agents take to learn and select effective MTD techniques make this approach impractical for real-world IoT scenarios. To tackle this issue, this work presents CyberForce, a framework that employs Federated Reinforcement Learning (FRL) to collectively and privately determine suitable MTD techniques for mitigating diverse zero-day attacks. CyberForce integrates device fingerprinting and anomaly detection to reward or penalize MTD mechanisms chosen by an FRL-based agent. The framework has been evaluated in a federation consisting of ten devices of a real IoT platform. A pool of experiments with six malware samples affecting the devices has demonstrated that CyberForce can precisely learn optimum MTD mitigation strategies. When all clients are affected by all attacks, the FRL agent exhibits high accuracy and reduced training time when compared to a centralized RL agent. In cases where different clients experience distinct attacks, the CyberForce clients gain benefits through the transfer of knowledge from other clients and similar attack behavior. Additionally, CyberForce showcases notable robustness against data poisoning attacks.
... In reference [86], the authors suggest a method called FRL for automatically controlling software-defined networking (SDN)-based IoT systems while prioritizing data security. This approach allows IoT devices to independently learn and adjust to the network's changing conditions without needing a central controller. ...
... In reference [86], the authors present an FRL architecture in which each agent operates independently on their respective IoT device and shares their learning experience with other agents in a decentralized way. This scheme addresses the security concerns related to training control policies for IoT devices that arise due to scalability. ...
... Federated reinforcement learning (FRL) overview[86,87]. ...
Article
Full-text available
The internet of things (IoT) represents a disruptive concept that has been changing society in several ways. There have been several successful applications of IoT in the industry. For example, in transportation systems, the novel internet of vehicles (IoV) concept has enabled new research directions and automation solutions. Moreover, reinforcement learning (RL), federated learning (FL), and federated reinforcement learning (FRL) have demonstrated remarkable success in solving complex problems in different applications. In recent years, new solutions have been developed based on this combined framework (i.e., federated reinforcement learning). Conversely, there is a lack of analysis concerning IoT applications and a standard view of challenges and future directions of the current FRL landscape. Thereupon, the main goal of this research is to present a literature review of federated reinforcement learning (FRL) applications in IoT from multiple perspectives. We focus on analyzing applications in multiple areas (e.g., security, sustainability and efficiency, vehicular solutions, and industrial services) to highlight existing solutions, their characteristics, and research gaps. Additionally, we identify key short- and long-term challenges leading to new opportunities in the field. This research intends to picture the current FRL ecosystem in IoT to foster the development of new solutions based on existing challenges.
... For example, privacy is a major concern in autonomous driving [9,10], and sharing data among vehicles is not allowed. To this end, the recent success of FL [11,12,13], which enables multiple clients to jointly train a global model without violating user privacy, makes it an appealing solution for addressing the sample inefficiency and privacy issue of RL in innovative applications such as autonomous driving, IoT network, and healthcare [14]. As a result, Federated Reinforcement Learning (FRL) has attracted much research attention [15]. ...
... Among all policies, the optimal policy π * I of the imaginary MDP is of particular interest since its value function V π * I I = V * I is the largest lower bound of the averaged value functionV π . Since there are approximation errors (13) in each client, we can only obtain an Synchronize the global policy π t to every client. 4: for n = 0,1,...,N do 5: ...
Preprint
The development of Policy Iteration (PI) has inspired many recent algorithms for Reinforcement Learning (RL), including several policy gradient methods, that gained both theoretical soundness and empirical success on a variety of tasks. The theory of PI is rich in the context of centralized learning, but its study is still in the infant stage under the federated setting. This paper explores the federated version of Approximate PI (API) and derives its error bound, taking into account the approximation error introduced by environment heterogeneity. We theoretically prove that a proper client selection scheme can reduce this error bound. Based on the theoretical result, we propose a client selection algorithm to alleviate the additional approximation error caused by environment heterogeneity. Experiment results show that the proposed algorithm outperforms other biased and unbiased client selection methods on the federated mountain car problem by effectively selecting clients with a lower level of heterogeneity from the population distribution.
... Several professionals in the field of mental health, such as psychologists and psychiatrists, employ play therapy. Professionals in fields such as physical therapy, social work, occupational therapy, and behavioural therapy are all engaged in this field [4]. ...
Article
Full-text available
Some of the most significant computational ideas in neuroscience for learning behavior in response to reward and penalty are reinforcement learning algorithms. This technique can be used to train an artificial intelligent (AI) agent to serve as a virtual assistant and a helper. The goal of this study is to determine whether combining a reinforcement learning-based Virtual AI assistant with play therapy. It can benefit wheelchair-bound youngsters with Down syndrome. This study aims to employ play therapy methods and Reinforcement Learning (RL) agents to aid children with Down syndrome and help them enhance their abilities like physical and mental skills by playing games with them. This Agent is designed to be smart enough to analyze each patient's lack of ability and provide a specific set of challenges in the game to improve that ability. Increasing the game's difficulty can help players develop these skills. The agent should be able to assess each player's skill gap and tailor the game to them accordingly. The agent's job is not to make the patient victorious but to boost their morale and skill sets in areas like physical activities, intelligence, and social interaction. The primary objective is to improve the player's physical activities such as muscle reflexes, motor controls and hand-eye coordination. Here, the study concentrates on the employment of several distinct techniques for training various models. This research focuses on comparing the reinforcement learning algorithms like the Deep Q-Learning Network, QR-DQN, A3C and PPO-Actor Critic. This study demonstrates that when compared to other reinforcement algorithms, the performance of the AI helper agent is at its highest when it is trained with PPO-Actor Critic and A3C. The goal is to see if children with Down syndrome who are wheelchair-bound can benefit by combining reinforcement learning with play therapy to increase their mobility.