BlueSky user interface.

BlueSky user interface.

Source publication
Article
Full-text available
Deep reinforcement learning (DRL) has been widely adopted recently for its ability to solve decision-making problems that were previously out of reach due to a combination of nonlinear and high dimensionality. In the last few years, it has spread in the field of air traffic control (ATC), particularly in conflict resolution. In this work, we conduc...

Context in source publication

Context 1
... is now a full-blown, user-friendly air traffic simulator that can run at higher update rates than expected for a high number of aircraft. The user interface is shown in Figure 5. To sum up, the authors recommend that the independent development environment be used to carry out the research in the free flight mode, and the environment based on the BlueSky platform be used to carry out the research in the en-route mode. ...

Citations

... Reinforcement learning has already been suggested, along with other approaches, for automating the conflict detection and resolution function in ATM [149]. In summary, the proposals currently have limitations in handling complex traffic conditions. ...
Article
Full-text available
In the contemporary landscape, the escalating deployment of drones across diverse industries has ushered in a consequential concern, including ensuring the security of drone operations. This concern extends to a spectrum of challenges, encompassing collisions with stationary and mobile obstacles and encounters with other drones. Moreover, the inherent limitations of drones, namely constraints on energy consumption, data storage capacity, and processing power, present formidable obstacles in developing collision avoidance algorithms. This review paper explores the challenges of ensuring safe drone operations, focusing on collision avoidance. We explore collision avoidance methods for UAVs from various perspectives, categorizing them into four main groups: obstacle detection and avoidance, collision avoidance algorithms, drone swarm, and path optimization. Additionally, our analysis delves into machine learning techniques, discusses metrics and simulation tools to validate collision avoidance systems, and delineates local and global algorithmic perspectives. Our evaluation reveals significant challenges in current drone collision prevention algorithms. Despite advancements, critical UAV network and communication challenges are often overlooked, prompting a reliance on simulation-based research due to cost and safety concerns. Challenges encompass precise detection of small and moving obstacles, minimizing path deviations at minimal cost, high machine learning and automation expenses, prohibitive costs of real testbeds, limited environmental comprehension, and security apprehensions. By addressing these key areas, future research can advance the field of drone collision avoidance and pave the way for safer and more efficient UAV operations.
... By learning from historical experiences in an interactive manner, RL can extract decision-making knowledge from historical experiences and generalize it to unseen scenarios, effectively tackling new similar problems at hand [26]- [30]. The learning-based methods have gained prominence recently in addressing aircraft conflict resolution challenges [31], [32]. The learning-based methods follow the framework of the Markov Decision Process (MDP), wherein the aircraft are modeled as interactive agents [33]. ...
Article
Full-text available
The escalating density of airspace has led to sharply increased conflicts between aircraft. Efficient and scalable conflict resolution methods are crucial to mitigate collision risks. Existing learning-based methods become less effective as the scale of aircraft increases due to their redundant information representations. In this paper, to accommodate the increased airspace density, a novel graph reinforcement learning (GRL) method is presented to efficiently learn deconfliction strategies. A time-evolving conflict graph is exploited to represent the local state of individual aircraft and the global spatiotemporal relationships between them. Equipped with the conflict graph, GRL can efficiently learn deconfliction strategies by selectively aggregating aircraft state information through a multi-head attention-boosted graph neural network. Furthermore, a temporal regularization mechanism is proposed to enhance learning stability in highly dynamic environments. Comprehensive experimental studies have been conducted on an OpenAI Gym-based flight simulator. Compared with the existing state-of-the-art learning-based methods, the results demonstrate that GRL can save much training time while achieving significantly better deconfliction strategies in terms of safety and efficiency metrics. In addition, GRL has a strong power of scalability and robustness with increasing aircraft scale.
... Conflict-avoidance strategies formulated as a reinforcement learning problem have been studied by numerous research groups, many of which represent the strategies as neural networks. A recent literature review is provided here [17]. In the present work, the maneuver strategy is computed offline and stored as a look-up table for the "value" function. ...
Technical Report
Full-text available
This report describes a novel approach to the development of a horizontal maneuver guidance strategy for Detect-and-Avoid systems. The maneuver guidance strategy provides a directive turn action that can be automatically executed by the vehicle's auto-pilot system, taking into account the cost of recapturing the flight plan path. Pairwise conflict scenarios with non-accelerating intruders are simulated to validate the effectiveness of the maneuver guidance strategy. Initial results suggest the strategy is more effective for faster ownship than for slower ownship, which is unable to avoid conflict in certain scenarios against fast intruders. These findings indicate this novel approach shows great potential, but improvement to its performance is necessary and will be future work.
... The reinforcement learning (RL) model employs the training information for the purpose of assessments of the actions instead of applying accurate actions [23]. This reason is making the RL technique dissimilar from the other learning strategies. ...
Preprint
Full-text available
The improvement of energy and spectral efficiency in networks can be realized by seamlessly integrating energy harvesting, cognitive radio technologies, and NOMA techniques. These complementary strategies work together to optimize resource usage and address challenges related to energy consumption. Additionally, the adaptability and versatility of UAVs offer an innovative solution for enhancing coverage performance, not only improving connectivity but also overall efficiency and reliability. This study introduces a novel approach named a Deep Reinforcement Learning-Random Walrus (DRL-RW) algorithm, to enhance energy efficiency. The developed method combines Deep Reinforcement Learning and the Random Walrus optimization technique to efficiently allocate spectrum resources and manage energy harvesting in a dynamic environment. The DRL-RW algorithm empowers UAVs to learn optimal spectrum sharing strategies and energy harvesting policies. The random walrus optimization enhances the algorithm's adaptability and speed in exploring diverse solutions. Simulation results demonstrate the effectiveness of the DRL-RW algorithm, indicating improvements in various performance metrics, including reduced energy consumption, enhanced computation time, improved convergence, signal-to-noise ratio, increased throughput, network lifetime, harvested energy, and overall superior network performance compared to baseline techniques. These findings highlight the efficacy of the DRL-RW approach in effectively addressing challenges associated with energy management in cognitive radio networks. The integration of UAVs, NOMA networks, and the novel algorithm represents a promising direction for advancing energy-efficient communication systems.
... A detailed analysis of these methods can be found in our previous paper (Chen et al., 2023), where we also discussed the contributions of Guo et al. (2021), Brittain and Wei (2022), Pham et al. (2019), Yilmaz et al. (2021) and Ribeiro et al. (2020a) in the field of RL-based CR methods. Additionally, Wang et al. have comprehensively reviewed the research on RL-based CR methods up to 2021 (Wang et al., 2022). This paper will discuss the specific challenges that may arise while extending the 2D RL-based methods listed in Table 2 to the 3D space. ...
... In terms of success rate, it is challenging for RL-based CR methods to reach 100% (Wang et al., 2022). This is most likely because the agent does not know what the conflict detection model (or minimum safety separation) is, and it is simply trained to act as appropriately as possible based on feedback from the RL environment. ...
... 1. Our developed 2D model has achieved competitive results in terms of success rates and extra flight distance than state-ofthe-art RL-based CR models in previous studies (Chen et al., 2023); 2. Neural network parameters for other state-of-the-art RL-based CR models are not publicly available; 3. The current research on RL methods in CR lacks a baseline (Wang et al., 2022); 4. 3D RL-based CR methods (especially those that consider flight uncertainty and wind) are rarely reported. ...
Article
Full-text available
Reinforcement learning (RL) techniques have been studied for solving the conflict resolution (CR) problem in air traffic management, leveraging their potential for computation and ability to handle uncertainty. However, challenges remain that impede the application of RL methods to CR in practice, including three-dimensional manoeuvres, generalisation, trajectory recovery, and success rate. This paper proposes a general multi-agent reinforcement learning approach for real-time three-dimensional multi-aircraft conflict resolution, in which agents share a neural network and are deployed on each aircraft to form a distributed decision-making system. To address the challenges, several technologies are introduced, including a partial observation model based on imminent threats for generalisation, a safety separation relaxation model for multiple flight levels for three-dimensional manoeuvres, an adaptive manoeuvre strategy for trajectory recovery, and a conflict buffer model for success rate. The Rainbow Deep Q-learning Network (DQN) is used to enhance the efficiency of the RL process. A simulation environment that considers flight uncertainty (resulting from mechanical and navigation errors and wind) is constructed to train and evaluate the proposed approach. The experimental results demonstrate that the proposed method can resolve conflicts in scenarios with much higher traffic density than in today’s real-world situations.
... The last couple of years has shown increasing interest in research on utilizing Deep Reinforcement Learning (DRL) methods for conflict resolution and safe multi-agent navigation for Air Traffic Control (ATC) operations [1]. One issue with conventional neural networks underlying the DRL methods, however, is the requirement of a fixed-length input vector. ...
... One issue with conventional neural networks underlying the DRL methods, however, is the requirement of a fixed-length input vector. Most of the current research therefore does a pre-selection on the number of aircraft that will be considered for the model's input or keeps the number of agents/aircraft in the environment constant in an artificial way, which limits the application of trained models in environments with variable number of aircraft [1]. ...
Conference Paper
Full-text available
Deep Reinforcement Learning has seen more usage in the field of Air Traffic Control over the last couple of years. As the number of aircraft in a given sector of airspace is not constant, there is a need for methods to be invariant to the number of agents in the system. Often this is done by making a selection of the aircraft that will be included in the state, which introduces human biases. Another option that has been used is Recurrent Neural Networks to process the entire sequence of aircraft present. These methods however are sequence-dependent and can give different results depending on the order that the aircraft are given, which is undesirable. Methods that solely rely on attention mechanisms, such as transformers, allow sequential data to be processed in a sequence-invariant manner by using multi-head attention mechanisms. However, because traditional Transformers operate on individual tokens, this does not allow for relative state information to be encoded into the hidden state. This paper shows that by performing a transformation operation on the key and value tokens, it is possible to use Transformers on relative states, at the cost of a factor (N-1) additional attention computations, where N is the number of agents in the system. This adaptation allows relative state Transformers to obtain significantly higher performance than standard Transformers. The results also showed that using attention mechanisms to construct the initial observation vector out of a total of 20 agents results in similar, but slightly lower, performance to handcrafted observation vectors, without requiring manual selection of the important agents. Future research should investigate whether additional changes to the attention mechanisms and their training can result in higher performance.
... Alternatively, a learning method that is intrinsically motivated to minimize the number of secondary conflicts resulting from successive resolution manoeuvres could be used. Deep Reinforcement Learning (DRL) is one such method that has been researched in various studies for conflict resolution [5]. However, one main drawback of DRL is the 'black-box problem', which makes it challenging to certify and predict behaviour in all stages of flight. ...
Conference Paper
Full-text available
The number of unmanned aircraft operating in the airspace is expected to grow exponentially during the next decades. This will likely lead to traffic densities that are higher than those currently observed in civil and general aviation, and might require both a different airspace structure compared to conventional aviation, as well as different conflict resolution methods. One of the main disadvantages of analytical conflict resolution methods, in high-traffic density scenarios, is that they can cause instabilities of the airspace due to a domino effect of secondary conflicts. Therefore, many studies have also investigated other methods of conflict resolution, such as Deep Reinforcement Learning, which have shown positive results, but tend to be hard to explain due to their black-box nature. This paper investigates if it is possible to explain the behaviour of a Soft Actor-Critic model, trained for resolving vertical conflicts in a layered urban airspace, by interpreting the policy through a heat map of the selected actions. It was found that the model actively changes its policy depending on the degrees of freedom and has a tendency to adopt preventive behaviour on top of conflict resolution. This behaviour can be directly linked to a decrease in secondary conflicts when compared to analytical methods and can potentially be incorporated into these methods to improve them while maintaining explainability.
... Recent advances in machine learning have led to the emergence of deep reinforcement learning (DRL) [27], a powerful technique that combines reinforcement learning and deep learning to solve a wide range of optimization problems [28][29][30][31][32]. Deep Q-network (DQN) is the first DRL algorithm [27]. ...
Article
Full-text available
The need for reliable wireless communication in remote areas has led to the adoption of unmanned aerial vehicles (UAVs) as flying base stations (FlyBSs). FlyBSs hover over a designated area to ensure continuous communication coverage for mobile users on the ground. Moreover, rate-splitting multiple access (RSMA) has emerged as a promising interference management scheme in multi-user communication systems. In this paper, we investigate an RSMA-enhanced FlyBS downlink communication system and formulate an optimization problem to maximize the sum-rate of users, taking into account the three-dimensional FlyBS trajectory and RSMA parameters. To address this continuous non-convex optimization problem, we propose a TD3-RFBS optimization framework based on the twin-delayed deep deterministic policy gradient (TD3). This framework overcomes the limitations associated with the overestimation issue encountered in the deep deterministic policy gradient (DDPG), a well-known deep reinforcement learning method. Our simulation results demonstrate that TD3-RFBS outperforms existing solutions for FlyBS downlink communication systems, indicating its potential as a solution for future wireless networks.
... The autonomous AAM necessitates real-time decisionmaking to resolve conflicts while upholding safety and mission requirements. Deep reinforcement learning (DRL) algorithms offer a promising alternative, as they can handle real-time uncertainty and dynamic interactions among autonomous aircraft during conflict resolution [7], while leveraging real-time navigation information [8]. However, DRL is vulnerable to real-time data integrity issues arising from *This work was not supported by any external funding 1 The authors are with School of Aerospace, Transport, and Manufacturing, Cranfield University, MK43 0AL Cranfield, U.K Deepak.Panda@cranfield.ac.uk, weisi.guo@cranfield.ac.uk 2 Weisi Guo is with The Alan Turing Institute, NW1 2DB London, U.K : weisi.guo@cranfield.ac.uk adversarial disturbances and communication constraints. ...
Conference Paper
The increasing utilization of unmanned aerial vehicles (UAVs) in advanced air mobility (AAM) necessitates highly automated conflict resolution and collision avoidance strategies. Consequently, reinforcement learning (RL) algorithms have gained popularity in addressing conflict resolution strategies among UAVs. However, increasing digitization introduces challenges related to packet drop constraints and various adversarial cyber threats, rendering AAM fragile. Adversaries can introduce perturbations into the system states, reducing the efficacy of learning algorithms. Therefore, it is crucial to systematically investigate the impact of increased digitization, including adversarial cyber-threats and packet drop constraints to study the fragile characteristics of AAM infrastructure. This study examines the performance of artificial intelligence(AI) based path planning and conflict resolution strategies under different adversarial and stochastic packet drop constraints in UAV systems. The fragility analysis focuses on the number of conflicts, collisions and fuel consumption of the UAVs with respect to its mission, considering various adversarial attacks and packet drop constraint scenarios. The safe deep q-networks (DQN) architecture is utilized to navigate the UAVs, mitigating the adversarial threats and is benchmarked with vanilla DQN using the necessary metrics. The findings are a foundation for investigating the necessary modification of learning paradigms to develop antifragile strategies against emerging adversarial threats.
... Air traffic management (ATM) research traditionally focuses on the macroscopic aspect of air transportation such as airspace design, traffic flow management, airport planning and scheduling, and more (Wu & Caves, 2002). Recently, with the new development of aerial vehicle concepts, including urban air mobility (UAM) and unmanned aircraft system (UAS), there has been a growing interest in performing ATM research, for example, conflict resolution using reinforcement learning (Wang et al., 2022), 4D-trajectory optimization (Tian et al., 2020), and even unmanned traffic management (UTM). Eurocontrol U-space (Barrado et al., 2020) and FAA/NASA UTM project (Kopardekar et al., 2016) are some examples of existing efforts in the industry to perform such research. ...