The system configuration for the proposed federated reinforcement learning.

Federated Offline Policy Optimization with Dual Regularization

Preprint

May 2024

Federated Reinforcement Learning (FRL) has been deemed as a promising solution for intelligent decision-making in the era of Artificial Internet of Things. However, existing FRL approaches often entail repeated interactions with the environment during local updating, which can be prohibitively expensive or even infeasible in many real-world domains. To overcome this challenge, this paper proposes a novel offline federated policy optimization algorithm, named $\texttt{DRPO}$, which enables distributed agents to collaboratively learn a decision policy only from private and static data without further environmental interactions. $\texttt{DRPO}$ leverages dual regularization, incorporating both the local behavioral policy and the global aggregated policy, to judiciously cope with the intrinsic two-tier distributional shifts in offline FRL. Theoretical analysis characterizes the impact of the dual regularization on performance, demonstrating that by achieving the right balance thereof, $\texttt{DRPO}$ can effectively counteract distributional shifts and ensure strict policy improvement in each federative learning round. Extensive experiments validate the significant performance gains of $\texttt{DRPO}$ over baseline methods.

Momentum-Based Federated Reinforcement Learning with Interaction and Communication Efficiency

Preprint

May 2024

Federated Reinforcement Learning (FRL) has garnered increasing attention recently. However, due to the intrinsic spatio-temporal non-stationarity of data distributions, the current approaches typically suffer from high interaction and communication costs. In this paper, we introduce a new FRL algorithm, named \alg{}, that utilizes momentum, importance sampling, and additional server-side adjustment to control the shift of stochastic policy gradients and enhance the efficiency of data utilization. We prove that by proper selection of momentum parameters and interaction frequency, \alg{} can achieve $\tilde{\mathcal{O}}(H N^{-1}\epsilon^{-3/2})$ and $\tilde{\mathcal{O}}(\epsilon^{-1})$ interaction and communication complexities ($N$ represents the number of agents), where the interaction complexity achieves linear speedup with the number of agents, and the communication complexity aligns the best achievable of existing first-order FL algorithms. Extensive experiments corroborate the substantial performance gains of \alg{} over existing methods on a suite of complex and high-dimensional benchmarks.

Common-Sensical Incentive Reward in Deep Actor-Critic Reinforcement Learning for Mobile Robot Navigation

Article

Full-text available

Apr 2024
INT J INNOV COMPUT I

Recently, various Deep Actor-Critic Reinforcement Learning (DAC-RL) algorithms have been widely utilized for training mobile robots in acquiring navigational policies. However, they usually need a preventively long learning time to achieve good policies. This research proposes a two-stage training mechanism infused with human common-sensical prior knowledge, named Two Stages DAC-RL with incentive reward, to alleviate this problem. The actor-critic networks were pre-trained in a simple environment to acquire a basic policy. Afterward, the basic policy was transferred to initialize the training process of a new navigational policy in more complex environments. This study also infused humans' common-sensical prior knowledge to further mitigate the RL learning burden by giving incentive rewards in beneficial situations for the navigation task. The experiments tested this research's algorithms against navigation tasks in which the robot should efficiently reach designated goals. The tasks were made more challenging by requiring the robot to cross some corridors to reach the goal while avoiding obstacles. The results showed that the proposed algorithm worked efficiently regarding various start-goal positions across the corridors.

An Integrated Federated Learning with CRSO of Attention-based LSTM Framework for Efficient IoT DataStream Prediction

Preprint

Full-text available

Nov 2023

Asma M. El-Saied

Real-time data stream processing presents a significant challenge in the rapidly changing Internet of Things (IoT) environment. Traditional centralized approaches face hurdles in handling the high velocity and volume of IoT data, especially in real-time scenarios. In order to improve IoT DataStream prediction performance, this paper introduces a novel framework that combines federated learning (FL) with a competitive random search optimizer (CRSO) of Long Short-Term Memory (LSTM) models based on attention. The proposed integration leverages distributed intelligence while employing competitive optimization for fine-tuning. The proposed framework not only addresses privacy and scalability concerns but also optimizes the model for precise IoT DataStream predictions. This federated approach empowers the system to derive insights from a spectrum of IoT data sources while adhering to stringent privacy standards. Experimental validation on a range of authentic IoT datasets underscores the framework's exceptional performance, further emphasizing its potential as a transformational asset in the realm of IoT DataStream prediction. Beyond predictive accuracy, the framework serves as a robust solution for privacy-conscious IoT applications, where data security remains paramount. Furthermore, its scalability and adaptability solidify its role as a crucial tool in dynamic IoT environments.

An Efficient IoT DataStream Prediction using Integrated Federated Learning with CRSO of Attention-based LSTM Framework

Preprint

Full-text available

Nov 2023

Asma M. El-Saied

Real-time data stream processing presents a significant challenge in the rapidly changing Internet of Things (IoT) environment. Traditional centralized approaches face hurdles in handling the high velocity and volume of IoT data, especially in real-time scenarios. In order to improve IoT DataStream prediction performance, this paper introduces a novel framework that combines federated learning (FL) with a competitive random search optimizer (CRSO) of Long Short-Term Memory (LSTM) models based on attention. The proposed integration leverages distributed intelligence while employing competitive optimization for fine-tuning. The proposed framework not only addresses privacy and scalability concerns but also optimizes the model for precise IoT DataStream predictions. This federated approach empowers the system to derive insights from a spectrum of IoT data sources while adhering to stringent privacy standards. Experimental validation on a range of authentic IoT datasets underscores the framework's exceptional performance, further emphasizing its potential as a transformational asset in the realm of IoT DataStream prediction. Beyond predictive accuracy, the framework serves as a robust solution for privacy-conscious IoT applications, where data security remains paramount. Furthermore, its scalability and adaptability solidify its role as a crucial tool in dynamic IoT environments.

Model-free Learning with Heterogeneous Dynamical Systems: A Federated LQR Approach

Preprint

Full-text available

Aug 2023

We study a model-free federated linear quadratic regulator (LQR) problem where M agents with unknown, distinct yet similar dynamics collaboratively learn an optimal policy to minimize an average quadratic cost while keeping their data private. To exploit the similarity of the agents' dynamics, we propose to use federated learning (FL) to allow the agents to periodically communicate with a central server to train policies by leveraging a larger dataset from all the agents. With this setup, we seek to understand the following questions: (i) Is the learned common policy stabilizing for all agents? (ii) How close is the learned common policy to each agent's own optimal policy? (iii) Can each agent learn its own optimal policy faster by leveraging data from all agents? To answer these questions, we propose a federated and model-free algorithm named FedLQR. Our analysis overcomes numerous technical challenges, such as heterogeneity in the agents' dynamics, multiple local updates, and stability concerns. We show that FedLQR produces a common policy that, at each iteration, is stabilizing for all agents. We provide bounds on the distance between the common policy and each agent's local optimal policy. Furthermore, we prove that when learning each agent's optimal policy, FedLQR achieves a sample complexity reduction proportional to the number of agents M in a low-heterogeneity regime, compared to the single-agent setting.

CyberForce: A Federated Reinforcement Learning Framework for Malware Mitigation

Preprint

Full-text available

Aug 2023

The expansion of the Internet-of-Things (IoT) paradigm is inevitable, but vulnerabilities of IoT devices to malware incidents have become an increasing concern. Recent research has shown that the integration of Reinforcement Learning with Moving Target Defense (MTD) mechanisms can enhance cybersecurity in IoT devices. Nevertheless, the numerous new malware attacks and the time that agents take to learn and select effective MTD techniques make this approach impractical for real-world IoT scenarios. To tackle this issue, this work presents CyberForce, a framework that employs Federated Reinforcement Learning (FRL) to collectively and privately determine suitable MTD techniques for mitigating diverse zero-day attacks. CyberForce integrates device fingerprinting and anomaly detection to reward or penalize MTD mechanisms chosen by an FRL-based agent. The framework has been evaluated in a federation consisting of ten devices of a real IoT platform. A pool of experiments with six malware samples affecting the devices has demonstrated that CyberForce can precisely learn optimum MTD mitigation strategies. When all clients are affected by all attacks, the FRL agent exhibits high accuracy and reduced training time when compared to a centralized RL agent. In cases where different clients experience distinct attacks, the CyberForce clients gain benefits through the transfer of knowledge from other clients and similar attack behavior. Additionally, CyberForce showcases notable robustness against data poisoning attacks.

Federated Reinforcement Learning in IoT: Applications, Opportunities and Open Challenges

Article

Full-text available

May 2023

The internet of things (IoT) represents a disruptive concept that has been changing society in several ways. There have been several successful applications of IoT in the industry. For example, in transportation systems, the novel internet of vehicles (IoV) concept has enabled new research directions and automation solutions. Moreover, reinforcement learning (RL), federated learning (FL), and federated reinforcement learning (FRL) have demonstrated remarkable success in solving complex problems in different applications. In recent years, new solutions have been developed based on this combined framework (i.e., federated reinforcement learning). Conversely, there is a lack of analysis concerning IoT applications and a standard view of challenges and future directions of the current FRL landscape. Thereupon, the main goal of this research is to present a literature review of federated reinforcement learning (FRL) applications in IoT from multiple perspectives. We focus on analyzing applications in multiple areas (e.g., security, sustainability and efficiency, vehicular solutions, and industrial services) to highlight existing solutions, their characteristics, and research gaps. Additionally, we identify key short- and long-term challenges leading to new opportunities in the field. This research intends to picture the current FRL ecosystem in IoT to foster the development of new solutions based on existing challenges.

Client Selection for Federated Policy Optimization with Environment Heterogeneity

Preprint

May 2023

The development of Policy Iteration (PI) has inspired many recent algorithms for Reinforcement Learning (RL), including several policy gradient methods, that gained both theoretical soundness and empirical success on a variety of tasks. The theory of PI is rich in the context of centralized learning, but its study is still in the infant stage under the federated setting. This paper explores the federated version of Approximate PI (API) and derives its error bound, taking into account the approximation error introduced by environment heterogeneity. We theoretically prove that a proper client selection scheme can reduce this error bound. Based on the theoretical result, we propose a client selection algorithm to alleviate the additional approximation error caused by environment heterogeneity. Experiment results show that the proposed algorithm outperforms other biased and unbiased client selection methods on the federated mountain car problem by effectively selecting clients with a lower level of heterogeneity from the population distribution.

Reinforcement learning-based AI assistant and VR play therapy game for children with Down syndrome bound to wheelchairs

Article

Full-text available

May 2023

Some of the most significant computational ideas in neuroscience for learning behavior in response to reward and penalty are reinforcement learning algorithms. This technique can be used to train an artificial intelligent (AI) agent to serve as a virtual assistant and a helper. The goal of this study is to determine whether combining a reinforcement learning-based Virtual AI assistant with play therapy. It can benefit wheelchair-bound youngsters with Down syndrome. This study aims to employ play therapy methods and Reinforcement Learning (RL) agents to aid children with Down syndrome and help them enhance their abilities like physical and mental skills by playing games with them. This Agent is designed to be smart enough to analyze each patient's lack of ability and provide a specific set of challenges in the game to improve that ability. Increasing the game's difficulty can help players develop these skills. The agent should be able to assess each player's skill gap and tailor the game to them accordingly. The agent's job is not to make the patient victorious but to boost their morale and skill sets in areas like physical activities, intelligence, and social interaction. The primary objective is to improve the player's physical activities such as muscle reflexes, motor controls and hand-eye coordination. Here, the study concentrates on the employment of several distinct techniques for training various models. This research focuses on comparing the reinforcement learning algorithms like the Deep Q-Learning Network, QR-DQN, A3C and PPO-Actor Critic. This study demonstrates that when compared to other reinforcement algorithms, the performance of the AI helper agent is at its highest when it is trained with PPO-Actor Critic and A3C. The goal is to see if children with Down syndrome who are wheelchair-bound can benefit by combining reinforcement learning with play therapy to increase their mobility.

The system configuration for the proposed federated reinforcement learning.

Context in source publication

Citations