shows the movement of a ballistic missile M and an interceptor I in Earth-centered Earth-fixed (ECEF) coordinates. The coordinates and velocity of the spacecraft are known, as well as the position of the expected re-entry point T . M and I rely on pulse motors to perform orbital maneuvers. The reaction forces with amplitude fixed as F perpendicular to the lower and upper sides of the body are defined as d F and u F . The angle between the velocity direction of M and the sight line of M and T is defined as S

shows the movement of a ballistic missile M and an interceptor I in Earth-centered Earth-fixed (ECEF) coordinates. The coordinates and velocity of the spacecraft are known, as well as the position of the expected re-entry point T . M and I rely on pulse motors to perform orbital maneuvers. The reaction forces with amplitude fixed as F perpendicular to the lower and upper sides of the body are defined as d F and u F . The angle between the velocity direction of M and the sight line of M and T is defined as S

Source publication
Article
Full-text available
To maintain the survivability of ballistic missiles, this paper proposes using deep reinforcement learning to obtain a midcourse maneuver controller. First, the midcourse is abstracted as a Markov decision process (MDP) with an unknown system state equation. Then, a controller formed by the Dueling Double Deep Q (D3Q) neural network is used to appr...

Citations

... Subsequently, reinforcement learning has been further applied to missile active defense [23,24], spacecraft pursuitevasion games [25][26][27][28], as well as exoatmospheric missile guidance [29][30][31]. Shalumov et al. [24] tried to find an optimal launch time of the defender and an optimal target guidance law before and after launch using DRL. A policy suggesting at each decision time the bang-bang target maneuver and whether or not to launch the defender was obtained and analyzed via simulations. ...
... For the games that start outside the barrier, a learning algorithm for the capture zone embedding strategy is presented based on deep reinforcement learning to help the game state cross the barrier surfaces. In [30], reinforcement learning algorithm was applied to the midcourse penetration of extra-atmospheric ballistic missiles. In [31], reinforcement learning combined with meta-learning is applied to the guidance law of extra-atmospheric interceptor. ...
Article
Full-text available
Due to the lack of aerodynamic forces, the available propulsion for exoatmospheric pursuit-evasion problem is strictly limited, which has not been thoroughly investigated. This paper focuses on the evasion guidance in an exoatmospheric environment with total energy limit. A Constrained Reinforcement Learning (CRL) method is proposed to solve the problem. Firstly, the acceleration commands of the evader are defined as cost and an Actor-Critic-Cost (AC2) network structure is established to predict the accumulated cost of a trajectory. The learning objective of the agent becomes to maximize cumulative rewards while satisfying the cost constraint. Secondly, a Maximum-Minimum Entropy Learning (M2EL) method is proposed to minimize the randomness of acceleration commands while preserving the agent’s exploration capability. Our approaches address two challenges in the application of reinforcement learning: constraint specification and precise control. The well-trained agent is capable of generating accurate commands while satisfying the specified constraints. The simulation results indicate that the CRL and M2EL methods can effectively control the agent’s energy consumption within the specified constraints. The robustness of the agent under information error is also validated.
... Recently, with the improvement of the computing ability of computer, the artificial intelligence have been developed rapidly [6], investigations on theory and applications of reinforcement learning (RL) [7][8][9][10][11][12][13] is very important. In order to obtain the best operating action of the whole system, the RL can make the intelligent agent to select the behaviors which can gain the maximum reward of the environment state by learning the mappings from the environmental states to the behaviors. ...
... A planar evasive maneuvering strategy of aircrafts was derived in [24], too name but a few. For more detail, one can refer to [11,[25][26][27][28]. ...
Article
Full-text available
This paper considers the maneuvering penetration methods of missile which do not know the intercepting strategies of the interceptor beforehand. Based on reinforcement learning, the online intelligent maneuvering penetration methods of missile are derived. When the missile is locked by the interceptor, in terms of the tracking characteristics of the interceptor, the missile carries out tentative maneuvers which lead to the interceptor makes the responses respectively, in the light of the information on interceptor responses which can be gathered by the missile-borne detectors, online game confrontation learning is employed to increase the miss distance of the interceptor in guidance blind area by reinforcement learning algorithm, the results of which are used to generate maneuvering strategies that make the missile to achieve the successful penetration. The simulation results show that, compared with no maneuvering methods or random maneuvering methods, the methods proposed not only present higher probability of successful penetration, but also need less overload and lower command switching frequency. Moreover, the effectiveness of this maneuvering penetration methods can be realized under the condition of limited number of training.
... An intensive instant reward function similar to that designed by Jiang et al. [39] is not used, since there is no experience to draw on for the HGV anti-interception problem. Furthermore, rather than a Bolza problem, this instant reward function is fully equivalent to the optimization goal, that is, the mayer type problem defined in Equation (12). ...
Article
Full-text available
Anti-interception guidance can enhance a hypersonic glide vehicle (HGV) compard to multiple interceptors. In general, anti-interception guidance for aircraft can be divided into procedural guidance, fly-around guidance and active evading guidance. However, these guidance methods cannot be applied to an HGV’s unknown real-time process due to limited intelligence information or on-board computing abilities. In this paper, an anti-interception guidance approach based on deep reinforcement learning (DRL) is proposed. First, the penetration process is conceptualized as a generalized three-body adversarial optimal (GTAO) problem. The problem is then modelled as a Markov decision process (MDP), and a DRL scheme consisting of an actor-critic architecture is designed to solve this. Reusing the same sample batch during training results in fewer serious estimation errors in the critic network (CN), which provides better gradients to the immature actor network (AN). We propose a new mechanismcalled repetitive batch training (RBT). In addition, the training data and test results confirm that the RBT can improve the traditional DDPG-based-methodes.
Article
Aiming at the problem of single-missile maneuver penetration in the process of multi-missile cooperative precision strike on the ground, this paper proposes a cooperative maneuver penetration guidance law based on line deviation control, which not only realizes the consistent attack time but also improves the penetration capability of single-missile through spiral maneuver while considering the overload constraints. Firstly, the concept of line deviation control is established, based on which the missile’s maneuvering penetration trajectory is decomposed into a virtual guided trajectory and a relative maneuvering trajectory; secondly, a time-coordinated guidance law is designed for the virtual guided trajectory; and thirdly, based on the establishment of the line deviation command, the relative spiral maneuvering trajectory and the guidance law under the overload constraint are proposed. In this paper, the state constraints and interference estimation are realized by using command filter and expanded state observer(ESO), and the stability of the system is proved based on Lyapunov stability theorem, and finally, the numerical simulation results prove the validity of the proposed cooperative maneuvering surprise guidance scheme.
Article
In order to analytically evaluate the effects of different maneuvering strategies, we derive a series of closed-form solutions of the miss distance that are applicable to higher-order guidance system models. Instead of adopting the simplified zero-lag missile model, our solutions are closer to reality. First, based on the linearization of the guidance loop, the closed-form solutions of various guidance models are derived. For the first-order system, the confluent hypergeometric function is introduced to derive the solutions of the miss distance due to heading error and target maneuver; for the higher-order system in binomial form, the frequency and time domain expressions for the miss distance are obtained through frequency domain analysis and inverse Laplace transform; for the higher-order guidance systems with arbitrary forms, the power series solutions of the miss distance are given based on the adjoint system. Secondly, the effectiveness of step and weave maneuvers is analyzed based on the closed-form solutions, and the accuracy of these solutions is validated. Finally, an optimal maneuvering strategy based on the closed-form solutions is proposed. This strategy offers significantly better evasion performance than the conventional maneuvering strategy, as evidenced by both linear and nonlinear simulation results.
Chapter
This paper proposes a ballistic missile maneuver strategy based on proximal policy optimization (PPO) reinforcement learning algorithm, which enables the ballistic missile to evade the interceptor in the midcourse. Firstly, the extra-atmospheric engagement process was modeled into a Markov Decision Process. The thrust of the ballistic missile is regarded as the action of the agent, and the energy consumption is regarded as the reward of the agent. Importantly, the observations only consist of the seeker angle and their rate of change with no range, velocity, and acceleration estimation, making it possible to applying to the passive seekers outside the atmosphere. Simulation shows that the ballistic missile agent based on PPO algorithm could evade the interceptor with a 100% success rate. Compared with traditional differential game methods, reinforcement learning-based methods could map the observations to the divert thruster command, which greatly reduces the time of real-time computation.
Article
Full-text available
This paper proposes an algorithm for missile manoeuvring based on a hierarchical proximal policy optimization (PPO) reinforcement learning algorithm, which enables a missile to guide to a target and evade an interceptor at the same time. Based on the idea of task hierarchy, the agent has a two-layer structure, in which low-level agents control basic actions and are controlled by a high-level agent. The low level has two agents called a guidance agent and an evasion agent, which are trained in simple scenarios and embedded in the high-level agent. The high level has a policy selector agent, which chooses one of the low-level agents to activate at each decision moment. The reward functions for each agent are different, considering the guidance accuracy, flight time, and energy consumption metrics, as well as a field-of-view constraint. Simulation shows that the PPO algorithm without a hierarchical structure cannot complete the task, while the hierarchical PPO algorithm has a 100% success rate on a test dataset. The agent shows good adaptability and strong robustness to the second-order lag of autopilot and measurement noises. Compared with a traditional guidance law, the reinforcement learning guidance law has satisfactory guidance accuracy and significant advantages in average time and average energy consumption.