ArticlePDF Available

Neural network-based flight control systems: Present and future

May 2022
Annual Reviews in Control

May 2022

DOI:10.1016/j.arcontrol.2022.04.006

Authors:

Seyyed Ali Emami

Sharif University of Technology

Paolo Castaldi

University of Bologna

Afshin Banazadeh

Sharif University of Technology

As the first review in this field, this paper presents an in-depth mathematical view of Intelligent Flight Control Systems (IFCSs), particularly those based on artificial neural networks. The rapid evolution of IFCSs in the last two decades in both the methodological and technical aspects necessitates a comprehensive view of them to better demonstrate the current stage and the crucial remaining steps towards developing a truly intelligent flight management unit. To this end, in this paper, we will provide a detailed mathematical view of Neural Network (NN)-based flight control systems and the challenging problems that still remain. The paper will cover both the model-based and model-free IFCSs. The model-based methods consist of the basic feedback error learning scheme, the pseudocontrol strategy, and the neural backstepping method. Besides, different approaches to analyze the closed-loop stability in IFCSs, their requirements, and their limitations will be discussed in detail. Various supplementary features, which can be integrated with a basic IFCS such as the fault-tolerance capability, the consideration of system constraints, and the combination of NNs with other robust and adaptive elements like disturbance observers, would be covered, as well. On the other hand, concerning model-free flight controllers, both the indirect and direct adaptive control systems including indirect adaptive control using NN-based system identification, the approximate dynamic programming using NN, and the reinforcement learning-based adaptive optimal control will be carefully addressed. Finally, by demonstrating a well-organized view of the current stage in the development of IFCSs, the challenging issues, which are critical to be addressed in the future, are thoroughly identified. As a result, this paper can be considered as a comprehensive road map for all researchers interested in the design and development of intelligent control systems, particularly in the field of aerospace applications.

Principal characteristics of some of the introduced intelligent control systems for VTOL aerial vehicles

…

Figures - uploaded by Seyyed Ali Emami

Content may be subject to copyright.

Content uploaded by Seyyed Ali Emami

Content may be subject to copyright.

arXiv:2206.05596v1 [eess.SY] 11 Jun 2022

Neural Network-based Flight Control Systems:

Present and Future

Seyyed Ali Emamia, Paolo Castaldib, Afshin Banazadeha,∗

aDepartment of Aerospace Engineering, Sharif University of Technology, Tehran, Iran

bDepartment of Electrical, Electronic and Information Engineering ”Guglielmo Marconi”,

University of Bologna, Via Dell’Universit‘a 50, Cesena, Italy

Abstract

As the ﬁrst review in this ﬁeld, this paper presents an in-depth mathematical

view of Intelligent Flight Control Systems (IFCSs), particularly those based on

artiﬁcial neural networks. The rapid evolution of IFCSs in the last two decades

in both the methodological and technical aspects necessitates a comprehensive

view of them to better demonstrate the current stage and the crucial remaining

steps towards developing a truly intelligent ﬂight management unit. To this

end, in this paper, we will provide a detailed mathematical view of Neural Net-

work (NN)-based ﬂight control systems and the challenging problems that still

remain. The paper will cover both the model-based and model-free IFCSs. The

model-based methods consist of the basic feedback error learning scheme, the

pseudocontrol strategy, and the neural backstepping method. Besides, diﬀer-

ent approaches to analyze the closed-loop stability in IFCSs, their requirements,

and their limitations will be discussed in detail. Various supplementary features,

which can be integrated with a basic IFCS such as the fault-tolerance capability,

the consideration of system constraints, and the combination of NNs with other

robust and adaptive elements like disturbance observers, would be covered, as

well. On the other hand, concerning model-free ﬂight controllers, both the in-

direct and direct adaptive control systems including indirect adaptive control

using NN-based system identiﬁcation, the approximate dynamic programming

∗Corresponding author

Email address: banazadeh@sharif.edu (Afshin Banazadeh )

Preprint submitted to Annual Reviews in Control June 14, 2022

using NN, and the reinforcement learning-based adaptive optimal control will

be carefully addressed. Finally, by demonstrating a well-organized view of the

current stage in the development of IFCSs, the challenging issues, which are

critical to be addressed in the future, are thoroughly identiﬁed. As a result, this

paper can be considered as a comprehensive road map for all researchers inter-

ested in the design and development of intelligent control systems, particularly

in the ﬁeld of aerospace applications.

Keywords: Flight control, Intelligent control, Neural networks, Reinforcement

learning

Contents

1 Introduction 4

1.1 Intelligent control systems . . . . . . . . . . . . . . . . . . . . . . 4

1.2 Direct versus indirect adaptive control . . . . . . . . . . . . . . . 8

1.3 Model-based versus Model-free control . . . . . . . . . . . . . . . 9

1.3.1 Model-based approach . . . . . . . . . . . . . . . . . . . . 9

1.3.2 Model-free approach . . . . . . . . . . . . . . . . . . . . . 10

2 Foundations of model-based intelligent control 12

2.1 Feedback Error Learning . . . . . . . . . . . . . . . . . . . . . . . 13

2.2 Pseudocontrol strategy . . . . . . . . . . . . . . . . . . . . . . . . 21

2.3 Neural backstepping control . . . . . . . . . . . . . . . . . . . . . 26

2.3.1 Dynamic surface control . . . . . . . . . . . . . . . . . . . 28

2.3.2 Command ﬁltered backstepping . . . . . . . . . . . . . . . 29

2.3.3 Backstepping augmented by the First-Order Sliding Mode

Diﬀerentiators (FOSMD) . . . . . . . . . . . . . . . . . . 31

2.3.4 Direct neural-backstepping control . . . . . . . . . . . . . 32

2.4 How to analyze the closed-loop stability? . . . . . . . . . . . . . 34

2.4.1 Asymptotic stability . . . . . . . . . . . . . . . . . . . . . 35

2.4.2 Exponential stability . . . . . . . . . . . . . . . . . . . . . 37

2.4.3 Finite-time stability . . . . . . . . . . . . . . . . . . . . . 38

3 Supplementary features in model-based IFCSs 40

3.1 Output Feedback (OFB) control . . . . . . . . . . . . . . . . . . 40

3.2 Minimal-learning parameter . . . . . . . . . . . . . . . . . . . . . 42

3.3 Systems with unknown control direction . . . . . . . . . . . . . . 46

3.4 Neural networks and Disturbance Observers (DOs) . . . . . . . . 48

3.4.1 Neural disturbance observer . . . . . . . . . . . . . . . . . 48

3.4.2 Combination of NN function approximation and DOs . . 50

3.5 Fault-tolerant control . . . . . . . . . . . . . . . . . . . . . . . . . 54

3.5.1 FEL-based fault identiﬁcation . . . . . . . . . . . . . . . . 55

3.5.2 Using a separate Neural fault detection and identiﬁcation

(FDI) block . . . . . . . . . . . . . . . . . . . . . . . . . . 57

3.5.3 Multimodel approaches . . . . . . . . . . . . . . . . . . . 59

3.6 Consideration of input constraints . . . . . . . . . . . . . . . . . 60

3.6.1 Pseudocontrol Hedging (PCH) . . . . . . . . . . . . . . . 60

3.6.2 Employment of a modiﬁed tracking error . . . . . . . . . 61

3.6.3 Neuro-predictive control . . . . . . . . . . . . . . . . . . . 63

3.6.4 Using Nussbaum function . . . . . . . . . . . . . . . . . . 65

3.7 Consideration of state/output constraints . . . . . . . . . . . . . 66

3.8 Self-organizing neural networks . . . . . . . . . . . . . . . . . . . 68

3.9 Concerns with air vehicle’s characteristics . . . . . . . . . . . . . 72

4 Towards truly model-free control systems 75

4.1 Neural network-based system identiﬁcation . . . . . . . . . . . . 75

4.1.1 Single-hidden-layer neural networks . . . . . . . . . . . . 77

4.1.2 Deep neural networks . . . . . . . . . . . . . . . . . . . . 82

4.2 Neuroadaptive optimal control . . . . . . . . . . . . . . . . . . . 83

4.2.1 Optimal control formulation (HJB vs. HJI equations) . . 83

4.2.2 Approximate dynamic programming (continuous-time sys-

tems) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

4.3 Direct adaptive control using Reinforcement learning (RL) . . . . 92

4.3.1 Approximate dynamic programming (discrete-time systems) 94

4.3.2 Direct policy updating . . . . . . . . . . . . . . . . . . . . 99

5 Concluding remarks and future directions 115

1. Introduction

1.1. Intelligent control systems

Although the words Intelligence and Autonomy have been widely employed

interchangeably, there is an essential conceptual diﬀerence between them [1].

Diﬀerent deﬁnitions have been given for both concepts in the literature [2, 3].

However, in a general view, the intelligence may be deﬁned as a very general

mental capability that involves the ability to reason, plan, solve problems, think

abstractly, comprehend complex ideas, learn quickly and learn from experience

[3]. On the other hand, the ability to generate one’s own purposes without any

instruction from outside can be interpreted as the autonomy of a system [1].

Nevertheless, concerning these two general deﬁnitions, in some cases, distin-

guishing the high level of autonomy from the low level of intelligence is not

trivial at all. Within the framework of the control theory, typically, the ﬁnal

purpose is to develop an autonomous system (rather than an intelligent system),

which can fulﬁll a set of predeﬁned missions in a satisfactory manner. However,

like the common literature, one may interpret the high level of autonomy of

some Unmanned Aerial Vehicles (UAVs) as intelligence. Accordingly, despite

the conceptual diﬀerence of these two words, in the current study, (like existing

literature,) we will use the words Intelligence and Autonomy, interchangeably.

Diﬀerent metrics have been provided in literature to specify the Level of

Autonomy (LoA) of a system [4], particularly a UAV. Despite the lack of a

unique deﬁnition categorization [5], a beneﬁcial division has been given in [1, 6]

in the case of UAVs. Considering it, the highest LoA for a single UAV (level 4)

indicates the self-accomplishment of an assigned tactical plan, where it is capable

of on-board tra jectory replanning, event-driven self resource management, and

compensating for most faults and disturbances in diﬀerent ﬂight conditions.

This, in turn, requires diﬀerent self-adaptive mechanisms in the entire control

system including the entire range from low-level control such as attitude control

to high-level control such as path planning. Such a scheme will be known as an

intelligent control system in the rest of the paper.

Nowadays, there are diﬀerent types of Intelligent Flight Control Systems

(IFCSs) in the literature that have been designed by employing neural net-

works, fuzzy systems [7], behavior tree [8, 9], reinforcement learning [10], dif-

ferent data-driven approaches [11, 12], evolutionary algorithms [13, 14], etc.

Such a widespread and scattered use of IFCSs in the literature necessitates a

comprehensive survey, which can clearly demonstrate the evolution of IFCSs in

both theoretical and practical aspects in recent years. As the ﬁrst review in

this ﬁeld, authors in [15], have addressed various technical and practical aspects

of IFCSs, where diﬀerent approaches including fuzzy inference systems, Neu-

ral networks, genetic algorithms, swarm intelligence, and hybrid evolutionary

systems have been discussed in the paper. However, due to the breadth of the

subject, it could not provide an in-depth theoretical view of IFCSs. To deal

with such an issue, in the current survey, we will mainly focus on a speciﬁc

type of IFCSs, namely the Neural Network (NN)-based ﬂight control systems

as the most commonly used approach in the literature within recent years. NNs

have been satisfactorily employed in both the dynamic model identiﬁcation and

controller design process. Due to their universal approximation property, they

can be employed to estimate diﬀerent nonlinearities in dynamic systems. In

addition, unlike the basic fuzzy control schemes, which highly depend on expert

knowledge and pure experiments to construct the fuzzy rule base [16], NNs can

eﬀectively learn the system or controller dynamics with no prior information

about the system. They can also be integrated with diﬀerent learning-based

methodologies and have been successfully utilized in both direct and indirect

control structures, which are discussed in the following. Further, due to their in-

herent property of parallel processing, neural networks can be suitably employed

in real-time implementations [17, 18].

Historically, the concept of IFCS was introduced in the 1990s by adopting

NNs in the structure of ﬂight control systems as a learning element to adapt to

unexpected fault and ﬂight conditions [19]. However, although the beginning

of using NNs in ﬂight control systems dates back to early 1990s, due to both

technological and methodological limitations, dynamic NNs were not employed

in practical ﬂight control problems until 2001 [20]. As the largest project in this

ﬁeld, the IFCS program has been conducted in a collaboration between NASA

and Boeing between 1999 to 2009 [21, 18, 22]. This program consists of two

main phases. The ﬁrst phase of the program focused on the development of an

indirect adaptive ﬂight control system. The ﬁrst set of ﬂights using a highly

modiﬁed F-15B prototype occurred in 1999, where the stability and control

derivatives of the air vehicle have been estimated using pre-trained NNs. Sub-

sequently, dynamic cell structure NNs have been adopted in the control scheme

for online modiﬁcation of the estimated derivatives. The second set of ﬂights

using such an online identiﬁcation scheme have been performed in 2003 [23].

Although the obtained results in the closed-loop simulation were reported as

apromising achievement, due to the fact that the online identiﬁed model was

not utilized in the control structure in real ﬂights, the control scheme was not

yet really adaptive [24]. On the other hand, the second phase of the program

dealt with developing a direct adaptive control architecture in which a dynamic

inversion block was augmented by an online NN [25]. Flight tests of the sec-

ond phase began in 2006 and continued into 2008. Flight tests consisted of

performance evaluation, with and without dynamic NN augmentation, in the

presence of structural damages and control surface faults. The evaluation was

based on performance measurements and pilot ratings. As reported in [26, 27],

for structural damages, NN augmentation was generally found to provide sig-

niﬁcant improvements in overall pitch performance. However, control surface

faults led to mixed results from slight improvements in pitch rate response to a

propensity for roll pilot-induced oscillation. A modiﬁcation was also introduced

in the designed control scheme employed in the second phase of the program.

The utilized modiﬁcations included the use of alternate NN inputs in the de-

signed framework which can satisfactorily tackle high-correlation and high-gain

problems in the basic design, the adoption of a weight decay term (in updating

the NN weights) to avoid the overﬁtting problem, and using scalar dead-zones

in adaptation laws for simplicity. The results obtained by the modiﬁed control

scheme in 2009 indicated a signiﬁcant improvement over the basic design [27].

With the retirement of the F-15B air vehicle in January 2009, the IFCS program

was ﬁnally ﬁnished [22].

The lesson learned from the IFCS program demonstrated that the high com-

plexity of the control design, as well as the unpredictable behavior of the control

scheme in the presence of unexpected ﬂight and fault conditions, could be serious

concerns in utilizing adaptive control approaches in real applications, particu-

larly manned aircraft [28, 29]. Another program has been accordingly launched

by NASA in 2009, namely the Integrated Resilient Aircraft Control (IRAC)

project, where one of its main ob jectives was to investigate simple, yet eﬀective,

adaptive control methods to address the issue of veriﬁcation and validation of

adaptive ﬂight controls to a safety-critical level. Addressing this project in more

detail is beyond the scope of this paper. Motivated by the above discussion, in

this survey, we will address both indirect and direct NN-based adaptive control

schemes and their evolution towards more reliable approaches with less compu-

tational complexity, in detail. However, as will be discussed in the following,

although the indirect and direct adaptive control approaches arise from two

diﬀerent points of view, they can be formulated within a similar mathemati-

cal framework with the same updating rules for dynamic NNs in the case of

model-based approaches as in the IFCS project of NASA. Accordingly, we will

deal with IFCSs in a diﬀerent primary categorization, i.e. the model-based and

model-free NN-based control methods. In addition, in the current research, we

will address a variety of model-free ﬂight control systems that have been built

upon some other machine learning approaches such as Reinforcement Learning

(RL).

1.2. Direct versus indirect adaptive control

There are a variety of ﬂight control systems in the literature in which neural

networks have been utilized to solve an online optimization problem within the

control block [30, 31] or to mimic the behavior of a classical controller [32]. In

this paper, we will focus on control methods where neural networks have been

directly adopted in the control design procedure as intelligent elements to bring

a degree of intelligence into the closed-loop behavior. This can be performed in

diﬀerent manners: Various studies have attempted to employ NNs to estimate

model uncertainties, which are subsequently utilized in designing the control

command. Such a framework is known as indirect neural control. The training

process of NNs can be performed online using well-known learning algorithms

such as feedback error learning or oﬄine to provide a pre-trained dynamic model

of the system. On the other hand, in the direct neural controller, NNs have been

utilized to directly construct the control command.

To be more speciﬁc, in the case of model-based control approaches, if the

system dynamics can be formulated as ˙x=f(x)+g(x)u, where f(x) and g(x) de-

note unknown nonlinear functions of system states, we have two general choices

to design the control command u. In the ﬁrst approach, we can estimate both

the unknown functions fand gusing NNs and then employ them in the designed

control command. In the second approach, however, we attempt to directly de-

sign the control command by estimating g−1( ˙xd−f), which is required in the

control command, using a NN (xdrepresents the reference trajectory). In the

literature, the ﬁrst method is known as an indirect adaptive control, while the

second approach corresponds to a direct adaptive control scheme [33, 34]. How-

ever, as will be discussed in Section 2, concerning the mathematical formulation

and the structure of updating rules of NNs, there is no fundamental diﬀerence

between these two model-based control approaches.

On the other hand, in the case of model-free adaptive control methods, it is

not easy to provide a general view of diﬀerent types of indirect and direct adap-

tive control schemes employed in the literature. As one of the most commonly

used model-free indirect IFCSs, (diﬀerent types of) NNs are used to identify

the entire system dynamics, and subsequently an adaptive control scheme is

designed based on the online identiﬁed model. In addition, the direct model-

free control could be originated from various design methodologies, while in the

current survey, two more common schemes, i.e. the adaptive optimal and the

RL-based control methods will be discussed in detail.

Although due to the simpler structure and less computational complexity

[35], direct adaptive control has been widely employed in diﬀerent applications,

there are various concerns regarding its applicability in serious missions. Brieﬂy,

increasing the learning rate in direct adaptive control, known as aggressive learn-

ing, is a typical approach to rapid reduction of the dynamic inversion error [36].

In this regard, high-gain control due to aggressive learning in direct adaptive

control is a problematic issue which can lead to actuator saturation, the exci-

tation of unmodeled dynamics, and other well-known problems of high learning

rates [37]. Besides, in the case of a damaged aircraft, the system dynamics

can signiﬁcantly change, and the lack of reliable knowledge about the current

system dynamics may result in ineﬃcient control commands, particularly, when

the control system consists of a nominal controller augmented by an adaptive

NN-based control command [38].

1.3. Model-based versus Model-free control

In this section, we present a more detailed classiﬁcation of NN-based ﬂight

control systems to be used in the remainder of the paper. Neural networks have

been extensively employed in the structure of ﬂight control systems for the past

three decades [39]. They can be generally studied in two fundamentally diﬀerent

categories, i.e. the model-base and the model-free control approaches.

1.3.1. Model-based approach

The model-based neural control, which utilizes a nominal model of the sys-

tem in the control design process, has signiﬁcantly evolved during the last two

decades. Feedback Error Learning (FEL) is the most popular learning scheme,

which has been widely incorporated in intelligent control systems. By employing

the tracking error of the system, the prediction error of the model, the output

of a baseline controller, or a combination of them, an unsupervised learning

approach is developed in such a way that both the tracking error of the system

and the estimation error of the neural networks remain bounded (unsupervised

learning occurs when the NN is trained to respond to a certain pattern in the

absence of output examples [40]). Several variations of FEL-based IFCSs have

been proposed by researchers in recent years. In this method, the neural net-

work attempts either to estimate the model uncertainties (or/and external dis-

turbances) or to determine the control command. The ﬁrst approach leads to

an indirect control structure, while the second one results in a direct adaptive

control scheme. In addition, diﬀerent types of feedforward or recurrent neural

networks including Radial Basis Function (RBF) neural networks, multilayer

perceptron, High-Order NNs (HONNs) [41], Extreme Learning Machine (ELM)

[42], Elman NN, etc have been employed in FEL-based control methods. The

FEL scheme, its characteristics, and diﬀerent variants within the ﬂight control

framework will be intently studied in Section 2.

One of the main drawbacks of FEL-based control systems is that all the

uncertain terms in the model are typically estimated as a single term using

NNs. This may result in poor training performance, particularly in the lack of

Persistent Exciting (PE) signals. In addition, in most of the FEL-based neural

controllers, a baseline control, which is designed based on a nominal model of

the system, is employed where the NN acts as an aid to the controller. This

may cause large control actions under severe structural damages or dynamic

changes. Such concerns and other design considerations result in incorporating

several features in the basic FEL-based control methods, which will be addressed

in Section 3.

1.3.2. Model-free approach

On the other hand, the model-free scheme does not require any prior infor-

mation about the system dynamics to be used in the control design procedure.

As a traditional model-free approach, the entire controller was modeled by a

single NN [43, 44], where the error back-propagation technique was typically

utilized to train the NN. Although such a control scheme, in some cases, could

provide an acceptable response even under severe external disturbances [45], the

stability of the closed-loop system could not be mathematically analyzed [46].

In addition, such a training method may occasionally converge to local minima.

Thus, this control approach could not be safely utilized in important missions.

In recent years, a class of model-free intelligent control systems has been

proposed in the literature using the concept of Approximate Dynamic Program-

ming (ADP) and Reinforcement Learning (RL). Indeed, although the introduced

schemes have originated from diﬀerent scientiﬁc points of view (from the control

theory to machine learning and information theory), the principal methodology

employed by them are fundamentally similar. More speciﬁcally, in many of

such control structures, an actor-critic framework is deﬁned (where NNs may

be used to estimate both the actor and critic functions). In this regard, the

critic corresponds to the cost-to-go function (or the value function), while the

actor determines the control input applied to the system. With a focus on the

RL framework, the entire control design process is typically transformed to a

Markov Decision Process (MDP). Accordingly, the value function demonstrates

an accumulative discounted reward function obtained by the system (from the

present time) using the current policy. The control objective is then to endeavor

to maximize the value function by changing the policy function. This can be

performed using the conventional policy-gradient method. Owing to the ad-

vancement of the computational power of processors, diﬀerent RL-based IFCSs

have been proposed in few recent years, where the control design process has

been completely performed in the simulation environment and subsequently, the

designed controller is satisfactorily applied to a real application. Such a control

methodology will be discussed in detail in Sections 4.2 and 4.3.

There is also a variety of model-free indirect IFCSs in literature in which a

separate identiﬁcation process has been deﬁned in the control design procedure.

Diﬀerent types of neural networks including nonlinear autoregressive with ex-

ogenous inputs (NARX) network [43], Elman networks (as recurrent NN), Con-

volutional NN, Wavelet NN, ELMs [47], and Fuzzy NNs [48] have been utilized

in the identiﬁcation step to identify diﬀerent unmodeled dynamics. The identi-

ﬁed models can also be updated online in order to adapt to dynamic changes in

the system. Subsequently, the control system is designed based on the identiﬁed

NN. Although the analysis of the closed-loop stability in such a multi-step con-

trol design process may be more challenging, this can result in a more eﬃcient

control system in comparison with FEL-based control methods, especially in

the presence of severe dynamic changes. We will address this type of indirect

IFCS in Section 4.1.

Finally, a set of concluding remarks and possible future directions for NN-

based IFCSs will be provided in Section 5, which indeed attempts to illustrate

the main existing gaps in the framework of IFCSs to be employed in serious mis-

sions as a reliable, eﬀective, and really intelligent control scheme with acceptable

computational cost.

2. Foundations of model-based intelligent control

In this section, we deal with model-based NN-based ﬂight control systems.

Regarding the dynamic model of air vehicles, they can be categorized in diﬀerent

manners. As a general classiﬁcation, an air vehicle can be modeled as a nonlin-

ear aﬃne or nonaﬃne dynamic model. Concerning the aﬃne model, we will pay

more attention to two types of more popular dynamic models: dynamic mod-

els with dim(x) = dim(u) (Section 2.1) and the dynamic models in the strict

feedback form (Section 2.3). Most of the current approaches in the literature

to control the aerial vehicles attempt to transform the system dynamics into

one of the above-mentioned variants by deﬁning intermediate control variables,

designing multi-loop control systems, etc, where such techniques will also be

brieﬂy discussed. It is notable that, both the continuous-time and discrete-time

models will be covered, as well. In addition, the control of nonaﬃne systems,

mainly using the pseudocontrol strategy or similar methods, will be discussed

in Section 2.2.

Besides, regarding the consideration of model uncertainties, internal faults,

and external disturbances, it should be noted that they can be modeled using

either additive or multiplicative uncertain terms. Although both schemes have

been utilized in the literature, the employment of additive uncertain terms is

more general than the multiplicative case [49, 50]. Indeed, multiplicative uncer-

tainties can also be modeled using additive terms, though the unknown terms

may become a function of both the system inputs and states [51]. Accord-

ingly, in the following, we mainly focus on the control of dynamic systems with

additive uncertain terms and will transform the possible multiplicative uncer-

tainties into additive terms. In addition, here, we will consider uncertain terms

in the dynamic model as a lumped disturbance, which will be estimated by a

NN. However, dealing with diﬀerent types of uncertain dynamics such as model

uncertainties, atmospheric disturbances, and operational faults may necessitate

diﬀerent learning strategies with their own requirements, which are addressed

in the following section. More speciﬁcally, combined approaches that utilize a

combination of NNs, disturbance observers, and/or state estimators to tackle

diﬀerent uncertain terms in the system dynamics will be discussed in Sections

3.4 and 3.5. Further, as the consideration of the multiplicative representation of

uncertain terms can be more beneﬁcial in case of identiﬁcation of unknown gains

corresponding to actuator faults, we will also address this type of model uncer-

tainties in the framework of NN-based Fault-Tolerant Control (FTC) systems

in Section 3.5.

2.1. Feedback Error Learning

Here, we will introduce the fundamental theoretical basis for the most com-

monly used approach to incorporate NNs within the adaptive control design

process, i.e. the FEL method, while the application of such an algorithm in

ﬂight control systems would be addressed in the following subsections. FEL can

eﬀectively integrate the control design procedure and the online updating law

for the parameters of the NN, which is utilized to compensate for model un-

certainties and disturbances. Accordingly, in a general view, the control block

can consist of a conventional controller in the inner loop to stabilize the system

dynamics, and the neural controller acts as an aid to the controller to compen-

sate for model uncertainties. Thus, employing a composite Lyapunov function

including both the tracking error and the estimation error of NN parameters,

the closed-loop system can satisfy the Bounded-Input–Bounded-Output (BIBO)

stability requirement in the presence of model uncertainties and external dis-

turbances. To be more precise, consider the dynamic model of an aircraft (in

the aﬃne form) as follows:

˙x=F(x) + B(x)u+ ∆,(1)

where x, u ∈Rn, and ∆ stands for model uncertainties and external distur-

bances. Deﬁning the desired tra jectory as xd, the tracking error is obtained as

e=x−xd. Now, the control command can be deﬁned as

u=B−1(−F(x)−∆ + ˙xd−k1e),(2)

where k1is a positive-deﬁnite matrix. However, the vector ∆ is unknown.

Thus, it is approximated by a NN (such as RBFNN or multilayer perceptron)

as ˆ

∆ = ˆ

WTµ(x), where µrepresents the vector of basis functions (corresponding

to hidden layers of the NN) and Windicates the matrix of unknown weights

which should be identiﬁed. Such a formulation can be used to represent diﬀerent

feedforward and recurrent NNs. Here, we will use such a general formulation,

and for brevity, do not address diﬀerent possible network structures and their

advantages and disadvantages in ﬂight control systems (for more details, see

[52]). Accordingly, due to the universal approximations property of NNs, we

have:

∆ = W∗Tµ(x) + ε, (3)

where W∗denotes the (unknown) optimal weight and εindicates the bounded

approximation error (kεk ≤ εM). The control command can now be constructed

as follows:

u=B−1−F(x)−ˆ

∆(x) + ˙xd−k1e.(4)

Now, consider a Lyapunov function as

V=1

2eTe+1

2tr ˜

WTΓ−1˜

W,(5)

where ˜

W=ˆ

W−W∗and Γ is a positive deﬁnite matrix. Next, we have

V=eT˙e+tr ˜

WTΓ−1˙

W

=−eT˜

∆ + k1e+tr ˜

WTΓ−1˙

W,

(6)

where, ˜

∆ = ˆ

∆−∆. Accordingly, deﬁning

W= ΓµeT,(7)

and considering ˙

W≈˙

W(as a consequence of assuming a constant optimal

weight W∗, while such an assumption is reasonable even in the case of a time-

dependent uncertain term ∆ = W∗T(t)µ(x) + εwith ˙

W∗≪˙

W), we have

V=−eT˜

∆ + k1e+tr ˜

WTΓ−1˙

W

=−eT˜

∆ + k1e+tr ˜

WTµeT

=−eT˜

∆ + k1e+eT˜

WTµ

=−k1eTe+eTε=−eT(k1e−ε),

(8)

which leads to ˙

V < 0 for kk1ek>kεkthereby guaranteeing the bounded tracking

error. The determination of optimal design parameters such as k1and Γ is not

an easy task, where it is typically done by trial and error. It is also possible

to deﬁne an optimization problem in terms of these parameters and solve it

using well-known optimization methods (such as evolutionary algorithms) to

determine their optimal values according to predeﬁned criteria [53]. In cases

where the matrix Bis also uncertain, we have:

˙x=F(x) + (B+ ∆B)u+ ∆,(9)

where Brepresents the nominal part. Thus, it is possible to deﬁne

∆ := ∆Bu + ∆ (10)

and estimate it using an NN as ˆ

∆(x, u) = ˆ

WTµ(x, u). Consequently, the control

command can be calculated as follows:

u=B−1−F(x)−ˆ

∆(x, u) + ˙xd−k1e.(11)

As seen, the control command results in an equation as u=h(., u). The exis-

tence and uniqueness of a solution for urequire a contraction assumption [54].

Suﬃcient conditions for satisfying this assumption are given in [55]. Notably,

this assumption implicitly requires the sign of the control gain function to be

known [56]. Note that, it is also possible to update the weights of the hidden

layer to provide more eﬀective learning. This can be performed using a similar

Lyapunov stability analysis by taking advantage of the Taylor expansion of the

hidden layer output (µ(x)) [57]. However, due to the more complicated formu-

lation and excessive computational burden, in this paper, we will only update

the output weights, ˆ

W, and the other parts of the NN remain unchanged.

Besides, one can replace the tracking error ein (5) and (7) by a ﬁltered

tracking error as s=e+λRe dt (with λdenotes a positive constant or a

positive deﬁnite matrix) to compensate for the steady-state tracking error [58].

Further, the introduced FEL neural control scheme can be applied to a second-

order system, i.e. ¨x=F(x)+ B(x)u+ ∆ by substituting eby a ﬁltered tracking

error as s= ˙e+λe [59, 60].

On the other hand, as the designed controller should be programmed on a

digital processor in real applications, the development of a control system in

the discrete-time domain makes more sense. Using a discrete-time controller,

the dependence of the closed-loop performance on the sampling rate can also

be eliminated. This would be more beneﬁcial in the case of NN-based control

systems in which the diﬀerential equations for updating the NN weights change

to diﬀerence equations. Furthermore, in the case of discrete-time controllers,

the NN weights’ updating rate that guarantees the convergence of the training

rule can be computed analytically [61]. To illustrate the fundamental structure

of a discrete-time FEL scheme, consider the equivalent discrete-time model of

(1) as follows:

x(k+ 1) = Fd(x(k)) + Bd(x(k))u(k) + ∆d(k).(12)

Deﬁning

u=B(k)−1−Fd(k)−ˆ

WTµ(k) + ce(k) + xd(k+ 1),(13)

∆d(k) = W∗Tµ(x(k)) + ε, (14)

e(k) = x(k)−xd(k),(15)

where 0 < c < 1 leads to the following equation.

e(k+ 1) = ce(k)−˜

WTµ(k) + ε. (16)

By multiplying both side of (16) by eT(k+ 1), we have

eT(k+ 1) ˜

WTµ(k) = ceT(k+ 1)e(k)−eT(k+ 1)e(k+ 1) + eT(k+ 1)ε. (17)

Using Cauchy–Schwarz and Young’s inequalities, it is obtained that

eT(k+ 1) ˜

WTµ(k)≤ ke(k+ 1)k2−1 + ρ1+ρ2+c2

4ρ1ke(k)k2+1

4ρ2

ε2

(18)

with ρ1, ρ2>0. Thus, if a Lyapunov function is deﬁned as (5) (without the

coeﬃcient 1/2), using the following updating rule,

W(k+ 1) = ˆ

W(k) + Γµ(k)eT(k+ 1),(19)

the ﬁrst diﬀerence of V(k) is obtained as follows:

∆V(k) = V(k+ 1) −V(k) = ke(k+ 1)k2− ke(k)k2

+tr ˜

WT(k+ 1)Γ−1˜

WT(k+ 1) −˜

WT(k)Γ−1˜

WT(k)

≤ ke(k+ 1)k21 + 2(−1 + ρ1+ρ2)+ke(k)k2c2

4ρ1−1

4ρ2

ε2

M+kµk2

Γke(k+ 1)k2.

(20)

Accordingly, we have

∆V(k)≤ −k1ke(k+ 1)k2−k2ke(k)k2+c1,(21)

where,

k1= 1 −2ρ1−2ρ2− kµk2

Γ,(22)

k2= 1 −c2

4ρ1

,(23)

c1=1

4ρ2

ε2

M.(24)

Thus, assuming the boundedness of µ(x), it is possible to determine ρ1,ρ2,c,

and Γ such that k1, k2>0 thereby guaranteeing ∆V(k)<0 for ke(k+ 1)k2>

c1/k1. As seen, although the updating rule (19) is similar to that of continuous-

time systems, i.e. (7), the stability analysis of the discrete-time FEL neural

control is quite diﬀerent and more complex compared to that of continuous-time

systems. Consequently, in the following, we mainly focus on the continuous-

time formulation of control systems, while their discrete-time equivalent would

be obtained in a similar manner as discussed above.

It should be noted that, in the introduced (continuous/discrete) adaptive

control scheme, the convergence of the NN weights to their ideal values is not

trivial, and it requires a Persistent Excitation (PE) [18]. More precisely, in

the absence of persistent exciting input signals, the NN weight estimates might

drift to very large values, which will result in a variation of high-gain control

[62, 63]. Diﬀerent approaches have been proposed in the literature to prevent

parameter drift in such conditions. Some of the more common methods are

brieﬂy introduced in the following.

1. Dead-zone: In this straightforward method, the previously mentioned up-

dating rule is only used when the tracking error exceeds a predeﬁned

threshold [64]. Otherwise, the NN weights remain constant. Although

such a method can successfully prevent parameter drift, as discussed in

[65, 66], the determination of an appropriate threshold requires the bounds

of the control gain function and the NN estimation error (εM), which may

not be generally known.

2. Projection: The second simple method is to limit the NN weights to a

predeﬁned interval. It means that the time derivative of the parameters

is set to zero when they reach the given bounds [67]. The main drawback

of this method is the requirement of the lower and upper bounds of the

NN parameters.

3. Sigma-modiﬁcation: The third method, which has been introduced by

Ioannou and Kokotovic [68, 69] is a more useful approach. In this method,

a modiﬁcation term is incorporated in the updating rule of the NN pa-

rameters as ˙

W= Γ µeT−σ W , where σis a positive constant [70]. Such

an approach has been employed in many NN-based ﬂight control systems

such as [71, 63, 72, 73].

4. e-modiﬁcation: Another popular approach has been introduced in [74],

where the constant parameter σin the previous technique is replaced by

a term proportional to |e|[62, 75]. The boundedness of the NN parame-

ters using e-modiﬁcation has been presented in [76]. Further, as a major

advantage of the e-modiﬁcation technique over the σ-modiﬁcation, such

a modiﬁcation term is eﬀectively attenuated by approaching the tracking

error to zero, and (in the lack of the estimation error ε,) this method does

not aﬀect the convergence of the NN weights to their ideal values in the

presence of persistent exciting training signals.

5. Alternate weights: This approach has been ﬁrst proposed in [77]. The e-

modiﬁcation method may not achieve acceptable performance in the pres-

ence of large oscillatory disturbances [78]. The basic idea of this method is

that diﬀerent sets of NN weights are capable of uniformly approximating

the same nonlinear function. An alternate set of weights with a smaller

magnitude than ˆ

Wcan be used to improve the training. By keeping the

NN weights close to the smaller alternate weights, it is possible to provide

a more eﬃcient compromise between the approximation performance and

keeping the NN weights bounded, while there is a need for two distinct

sets of NN weights and their corresponding updating rules. This method

has been employed in [79] to design a ﬂight control system for a quadrotor

air vehicle under the wind buﬀeting.

Although the aforementioned approaches result in bounded NN parameters,

satisfactorily, they do not ensure the convergence of the NN weights to their

ideal values. Recently, a variety of modiﬁed learning approaches have been

proposed in the literature, which causes the improvement in training the NN

parameters. Some of the more attractive methods are as follows:

1. Composite learning: Diﬀerent composite learning approaches have been

introduced in the literature where their fundamental idea is to include the

estimation performance into the updating law [80, 81, 82]. This can lead to

faster learning speed as well as higher precision [83]. More speciﬁcally, the

state estimation can be constructed as ˙

ˆx=F(x) + B(x)u+ˆ

WTµ(x)−β˜x,

where ˜x= ˆx−x, and βis a positive constant. Thus, the updating rule

(7) can be modiﬁed as ˙

W= Γµ(eT−Γ1˜xT) where Γ1is a positive deﬁnite

matrix [83]. An improved learning method has been presented in [34],

where the basic updating rule (7) is augmented by a novel prediction

error signal constructed using online recorded data within a time interval

[t−τ, t], which is equal to ˜

WTRt

t−τµ(x) + Rt

t−τε. As shown in [34], the

proposed approach, which has been applied to the longitudinal model of

a hypersonic aircraft, can lead to better tracking with less chattering.

2. Concurrent learning: A beneﬁcial approach, which has been introduced

in [84, 56], utilizes a set of recorded data points concurrently with instan-

taneous data to improve the convergence of both parameter and tracking

errors. The main beneﬁt of the concurrent learning method is that PE

or high adaptation gains are not required. More precisely, in the case of

nonlinear systems with parametric uncertainty (which can be modeled as

∆(x) = W∗Tµ(x)), it has been proved that, if the training input signal

is exciting in a suﬃciently large ﬁnite time interval, both tracking error

(e) and NN weight’s estimation error ( ˜

W) converge exponentially to zero.

However, this approach requires precise estimation of the time derivative

of the system states, which may be impractical in some cases.

3. Reinforced learning: Another approach to improve the training perfor-

mance is to reinforce the learning signal [85, 86]. This can be done by

modifying the training rule (7) using the output of another NN (com-

monly known as the critic network) as ˙

W= Γµe+kekˆ

cµcT

where

Wcrepresents the output weights of the critic network, which is tuned in

such a way that guarantees the closed-loop stability [87]. The provided

learning signal is more informative than the basic training rule (7) thereby

strengthening the control performance [88].

The above-mentioned formulation corresponds to the indirect FEL-based

control where a NN attempts to identify model uncertainties, and then the

control command is constructed using the estimated uncertainty [89]. FEL

can also be satisfactorily employed in the framework of direct adaptive control

systems. In this regard, it is possible to directly estimate the entire control

command uor the uncertain term B−1∆ in (2) by a NN. As a result, the

updating rule of ˆ

Wwill include the control gain matrix B. However, there is

no fundamental diﬀerence between the direct and indirect approaches regarding

the formulation of the updating rules and the stability analysis of the closed-

loop system. Direct methods may also be preferred in cases where the control

gain function is entirely unknown (see Section 2.3.4).

2.2. Pseudocontrol strategy

In a somewhat similar manner to the introduced direct FEL-based neural

control, in the case of nonaﬃne models, traditionally, the output of a baseline

controller (such as a PID controller) may be used to train a NN, which augments

the output of the baseline controller to learn the inverse dynamics of the system,

while there is considerable complexity in the closed-loop stability analysis [90].

Accordingly, an auto-landing scheme has been proposed in [91] for an aircraft

under external disturbances using a FEL-based neural aided H∞control. Sim-

ilarly, a combination of a classical tra jectory tracking control (using the loop

shaping technique) with a FEL-based neural controller has been employed in

[92] as a fault-tolerant auto-landing control method. Such a method has also

been adopted in [93] to control the attitude of a simpliﬁed model of a ﬁghter

aircraft using fully-tuned growing RBFNNs.

By incorporating a similar framework, type-2 Fuzzy Neural Networks (T2-

FNN) have been employed in [94, 95] to augment a classical PD controller in

the case of a set of SISO systems. FEL algorithm has been adopted, where the

updating rule corresponding to the consequent part of the T2-FNN has been

derived by minimizing R( ˙e+λe)2dt. Although in [94], it has been assumed that

the intended system has a second-order stabilizable dynamic model, the stability

of the closed-loop system has been analyzed without making any assumption on

the system characteristics (even the system’s stabilizability!). Apparently, this

is a consequence of estimating an explicit function of time by a neural network

as W∗Tµ(x), which is not generally feasible. This is a common issue in NN-

based identiﬁcation schemes (see Section 3.4). The proposed control system

has been applied to the tra jectory tracking control of a quadrotor UAV. A self-

organizing neuro-fuzzy-based control has been introduced in [96] in which the

consequent part of the fuzzy rules has been trained using a similar FEL scheme,

where the designed controller has been applied to a hexacopter and a ﬂapping-

wing Micro Aerial Vehicle (MAV) to control the altitude and the attitude of

the system. It has been claimed that the controller’s performance does not

depend on any features for the system. This is clearly an exaggerated statement

since the most obvious feature required in a controlled system is the system’s

controllability. Again, it seems that the generality of the stability analysis is due

to the aforementioned concern regarding the NN-based identiﬁcation schemes.

Diﬀerent from [96], in [97], the stability analysis has been provided for a nth

order, aﬃne, SISO model. A FEL scheme has been used to train the consequent

parameters of a neuro-fuzzy control system, which augments a PID controller,

while the updating rule is subject to the parameter drift issue.

On the other hand, a simpler and popular approach to FEL-based direct

adaptive control of nonaﬃne systems, known as the pseudocontrol strategy, has

been widely employed in IFCSs. Generally speaking, in this approach, the

control command is determined using a model inversion block where a neural

network is utilized to cancel out the inversion error [98]. To be more precise,

consider a generic nonaﬃne nonlinear model of the system as follows:

˙x=F(x, u).(25)

As seen, unlike the previous subsection, here, there is no need for an aﬃne

model of the controlled system. Despite the possible complexities in the control

of nonaﬃne systems, the following design would be more eﬀective in the case

of nonconventional air vehicles with highly nonlinear dynamics, which could

not be modeled as an aﬃne model, satisfactorily. In particular, such a method

could be an optimal choice in the case of an HFV, which possesses a completely

nonaﬃne model [99]. Indeed, although, using some simpliﬁcations, HFVs are

typically modeled as an approximate aﬃne model (and the remaining nonlinear

terms are treated as model uncertainty) to facilitate the control design, such

an approach results in a conservative control system. It should be noted that,

in the case of a ﬂight control problem, the dynamic model of the system is

typically formulated as ¨x=F( ˙x, x, u). However, the following design can be

applied to such a control problem, as well, using a simple change of variables and

employing a composite error function consisting of both eand ˙e. Now, assuming

the availability of an approximate inversion model, the control command can

be computed as follows:

u=ˆ

F−1(x, ν),(26)

where νdenotes the pseudocontrol input, which should be designed. Notice

that, although there is no need for an accurate inversion model, the chosen

inversion model should capture the control assignment structures. It means

that, for example, the inversion model should include the fact that the elevator

deﬂection aﬀects the pitch rate. In addition, it is assumed that ˆ

F−1(x, ν) is

a one-to-one function. This assumption can be realized if dim(u) = dim(x)

[100], which is reasonable in a typical ﬂight control problem. Accordingly, the

pseudocontrol input νcan be designed as follows [101]:

ν= ˙xd−ke −νad,(27)

where kis a positive constant and νad denotes an additional command to alle-

viate the inversion error. More precisely, deﬁning ∆(x, u) = F(x, u)−ˆ

F(x, u),

we have:

˙e=−ke −νad + ∆(x, u).(28)

Thus, if it is possible to have νad = ∆(x, u), the tracking error converges asymp-

totically to zero. However, ∆(x, u) is unknown. So, we estimate it using the

feedback error learning scheme. In this regard, using a NN to identify ∆(x, u),

we have ∆(x, u) = W∗Tµ(x, u) + ε. Subsequently, νad can be determined as

νad =ˆ

WTµ(x, u). Introducing a Lyapunov function Vas

V=1

2eTe+1

2tr ˜

WTΓ−1˜

W,(29)

and using the updating rule ˙

W≈˙

W= Γµ(x, u)eT, the time derivative of Vis

obtained as the following equation.

V=−eTke +˜

WTµ(x, u)−ε+tr ˜

WTΓ−1˙

W

=−eT(ke −ε).

(30)

Accordingly, the introduced control strategy can satisfactorily ensure the bounded

tracking error. Again, one of the above-mentioned modiﬁcation techniques

can be adopted in the proposed updating rule to prevent parameter drift.

Notice that the proposed control framework results in a control law as u=

F−1x, ˙xd−ke −ˆ

WTµ(x, u), thereby requiring the contraction assumption.

A modiﬁcation to the introduced strategy has been given in [99] by taking ad-

vantage of the Mean Value theorem to relax this assumption, while the sign of

∂F /∂ u should be known, and there are some concerns with the provided stabil-

ity analysis. Besides, authors in [99] have employed the pseudocontrol approach

in the case of a SISO system in the normal feedback form with dim(x)> dim(u),

while there is a requirement for the availability of all the system states in the

proposed control scheme. To this end, if we have z1=e=x1−x1d,z2= ˙z1,

and ¨x1=f(x, u), one can deﬁne a ﬁltered tracking error s= ˙e+λe, which

results in ˙s=f(x, u)−¨x1d+λ˙e. Thus, by replacing the real tracking error

ewith the ﬁltered tracking error sin the introduced method, it is possible to

design a similar pseudocontrol framework.

The pseudocontrol strategy has been employed in diﬀerent ﬂight control

systems [102] such as the attitude control of a tailless ﬁghter aircraft [103, 104,

105, 106], the trajectory tracking control of a helicopter [101], the attitude

control of a tilt-rotor aircraft [75], etc. A similar direct adaptive control has been

utilized in [107, 38] to control the trajectory of a conventional ﬁxed-wing aircraft

under structural damages. A hybrid direct-indirect adaptive control has also

been developed in them in which parallel FEL algorithms attempted to provide

both the control augmentation signal and estimated uncertain dynamics. In

[100], an inner loop attitude control block based on the pseudocontrol strategy

has been employed within a fault-tolerant guidance and control system for a

conventional ﬁxed-wing air vehicle. An acceleration (outer loop) guidance loop

has been designed, which attempts to provide feasible acceleration command in

the presence of structural damages and actuator faults. The performance of the

proposed approach was veriﬁed in the presence of severe structural and actuator

damages.

As an alternative to the above-mentioned approach to control nonaﬃne sys-

tems, authors in [108] attempted to directly estimate the desired control com-

mand rather than the inversion model error using a NN (in the case of a SISO

system with stable zero dynamics). Under conservative assumptions on the

value of ∂F/∂u (and its time derivative) and employing the implicit function

theorem, one can assume that there is an ideal control command u∗which en-

sures the closed-loop system stability, i.e. F(x, u∗) = ˙xd−ke. Subsequently,

in lieu of utilizing an approximate inversion model, the Mean Value theorem

has been adopted to provide an expression for F(x, u) in terms of F(x, u∗).

Using such a formulation, a NN with a typical FEL scheme can be employed

to estimate u∗. Although in this method, there is no need for an approximate

inversion model of the system, diﬀerent restrictive assumptions are required in

the control design, which may not be satisﬁed in a practical ﬂight control prob-

lem. A somewhat similar approach has been utilized in [109] in the framework

of indirect adaptive control, where the singular perturbation theory has been

adopted to move utowards u∗as ǫ˙u=F(x, u∗)−F(x, u) with ǫdenotes a small

positive constant. Such a method has been employed to control the longitudi-

nal model of an HFV, where a set of NNs has been incorporated to estimate

unknown dynamics. In this regard, there is no need for a strict feedback model

and a backstepping design, while, again, restrictive assumptions should be made

to ensure closed-loop stability.

2.3. Neural backstepping control

In the basic FEL scheme, it was assumed that dim(x) = dim(u). Also, in

both the above-mentioned control structures, the entire dynamic model of the

system is assumed invertible. However, in many cases, the dimension of the

system inputs is less than that of the system states. The backstepping control

method can be eﬀectively employed in such circumstances when the dynamic

model can be formulated in a strict feedback form. For simplicity, consider an

uncertain nonlinear SISO system as follows:

˙xi=fi(¯xi) + gi(¯xi)xi+1 +¯

∆i+di,1≤i≤n−1,(31)

˙xn=fn(¯xn) + gn(¯xn)u+¯

∆n+dn,(32)

y=x1,(33)

where, ¯

∆iand distand, respectively, for model uncertainties and external dis-

turbances and ¯xi= [x1,...,xi]T. Without loss of generality, in the following,

we assume that n= 2. The introduced control method can be simply applied

to higher-order systems. Deﬁning ∆i=¯

∆i+diand the desired output as yd,

we have:

˙e1= ˙y−˙yd=f1(¯x1) + g1(¯x1)x2+ ∆1−˙yd.(34)

Thus, a virtual control can be deﬁned for x2as

x2d=g−1

1(¯x1)˙yd−k1e1−f1(¯x1)−ˆ

∆1,(35)

where ˆ

∆1represent the estimation of ∆1and k1is a positive constant. There-

fore, deﬁning e2=x2−x2d, we have:

˙e2= ˙x2−˙x2d=f2(¯x2) + g2(¯x2)u+ ∆2−˙x2d.(36)

Finally, the control command can be deﬁned as

u=g−1

2(¯x2)˙x2d−g1(¯x1)e1−k2e2−f2(¯x2)−ˆ

∆2,(37)

where ˆ

∆2denotes the estimation of ∆2and k2is a positive constant. Using

feedforward NNs to estimate ∆is, it is obtained that:

∆i=W∗T

iµi(¯xi) + εi,(38)

such that kεik ≤ εMi. To derive the updating rules of the NNs’ parameters, one

can deﬁne a Lyapunov function as follows:

V=1

2e2

1+e2

2+˜

1Γ−1

1˜

W1+˜

2Γ−1

2˜

W2.(39)

The time derivative of Vis obtained as

V=e1˙e1+e2˙e2+˜

1Γ−1

1˙

W1+˜

2Γ−1

2˙

=e1g1(¯x1)e2−˜

∆1−k1e1+e2(−g1(¯x1)e1

−˜

∆2−k2e2+˜

1Γ−1

1˙

W1+˜

2Γ−1

2˙

=−k1e2

1−k2e2

2+˜

1Γ−1

1˙

W1−µ1(x1)e1

+˜

2Γ−1

2˙

W2−µ2(¯x2)e2+e1ε1+e2ε2.

(40)

Thus, assuming ˙

Wi=˙

Wi, the updating rules of Wis can be deﬁned as follows:

Wi= Γiµi(¯xi)ei−σiˆ

Wi,(41)

where the second term on the right-hand side of the equation corresponds to

the σ-modiﬁcation. Using the updating rules (41), it is easy to show that ˙

V≤

−kV +Cwith kand Cdenote positive constant. As will be discussed in Section

2.4, this can ensure that all signals in the closed-loop system are uniformly

ultimately bounded.

Such a control method can similarly be employed in cases where xis are

some vectors rather than scalar functions. A neural backstepping controller for

an uncertain MIMO dynamic model of a helicopter has been introduced in [110]

to control the attitude of the vehicle considering actuator dynamics, while each

step of the design process deals with the control of a four-dimensional state

vector. A neural backstepping controller has been adopted in [111] to control

a planar VTOL air vehicle, where a gradient descent training algorithm has

been replaced the updating rule (41). Although it has been claimed that such

a training method results in better control performance, there is a need for

the exact value of the uncertain term that is estimated by the NN, while in

this paper, it has been computed by approximating the time derivatives of the

system states and using the dynamic equations of the air vehicle.

It should be noted that although the proposed adaptive backstepping control

leads to bounded tracking error in the presence of model uncertainties and

external disturbances, it suﬀers from the explosion of terms. More precisely, the

control command (37) includes the time derivative of ˙x2d, which requires the

time derivative of g1(x1), f1(x1), and ˆ

∆1. This issue becomes more problematic

by increasing the relative degree of the system.

2.3.1. Dynamic surface control

To solve the above-mentioned issue, Dynamic Surface Control (DSC) has

been introduced in [112] in which the virtual control is passed through a ﬁrst-

order ﬁlter. More precisely, if x2cis deﬁned by (35), then the desired value of

x2is obtained as

τ˙x2d+x2d=x2c, x2d(0) = x2c(0),(42)

where τrepresents the ﬁlter time constant. Subsequently, the ﬁltering error is

also incorporated in the Lyapunov function of the system to be compensated

by the designed control commands. Using such a technique, the problem of

the explosion of terms in the traditional backstepping control can be eﬀectively

avoided, though at the cost of reducing the global stability of the system obtained

using the backstepping control to the semi-global stability in the case of DSC

[112].

Several NN-based DSC methods have been introduced in the literature for

diﬀerent aerial vehicles [113, 114, 115, 116]. Such an approach has been proposed

in [117] to control the ﬂight path angle and velocity of a ﬂexible HFV, where the

employment of the integral of the tracking error in the control law improves the

tracking performance. DSC has been employed in [118] to control the attitude of

a Near-Space Vehicle (NSV) in which recurrent wavelet NNs have been utilized

at each step and trained using a composite learning method to compensate for

external disturbances and model uncertainties. Also, such a scheme has been

adopted in [119] to control the longitudinal dynamics of an air-breathing HFV

considering model uncertainties and external disturbances compensated by fully

tuned RBFNNs. In addition, DSC has been applied to the longitudinal mode

of an HFV in [33]. In comparison with conventional DSC, which results in a

semi-globally uniformly ultimately bounded stability, global tracking has been

achieved through aggregating the neural function approximation and a robust

term (using a switching function), which brings the system states into the neural

approximation domain from outside. The robust term has been designed to

estimate the upper bound of uncertain terms in a similar way as discussed in

Section 3.4.2. However, the determination of the active region of NNs (which is

required in designing the switching functions [82, 120]) is not trivial.

2.3.2. Command ﬁltered backstepping

To simplify the stability analysis of DSC, a command ﬁltered backstep-

ping has been proposed in [121] for a nonlinear system without uncertainty.

The introduced method attempts to eliminate the ﬁlter eﬀects using a set of

compensating signals. This idea has been extended to nonlinear systems with

parametric uncertainties in [122]. To clarify the main idea, consider again the

aforementioned control problem. Assuming that the virtual control signals x2c

and x2dare deﬁned, respectively, by (35) and (42), and by deﬁning the auxiliary

variable ξ1as

ξ1=−k1ξ1+g1(¯x1) (x2d−x2c), ξ1(0) = 0,(43)

a compensated tracking error can be deﬁned as ǫ1=y−yd−ξ1. Thus, we have:

˙ǫ1= ˙y−˙yd−˙

ξ1

=f1(¯x1) + g1(¯x1)x2+ ∆1−˙yd+k1ξ1−g1(¯x1) (x2d−x2c)

=−k1ǫ1+g1(¯x1)e2−˜

∆1.

(44)

Accordingly, the control command can be deﬁned as

u=g−1

2(¯x2)˙x2d−g1(¯x1)ǫ1−k2e2−f2(¯x2)−ˆ

∆2,(45)

which leads to

˙e2= ˙x2−˙x2d=−g1(¯x1)ǫ1−k2e2−˜

∆2.(46)

Thus, using the following updating rules

W1= Γ1µ1(¯x1)ǫ1−σ1ˆ

W1,(47)

W2= Γ2µ2(¯x2)e2−σ2ˆ

W2,(48)

and deﬁning a Lyapunov function as

V=1

2ǫ2

1+e2

2+˜

1Γ−1

1˜

W1+˜

2Γ−1

2˜

W2,(49)

it can be shown that ˙

V≤ −kV +C, where kand Care positive constants. This

results in bounded ǫ1and e2. As discussed in [122, 123], assuming that g1(x1)

is bounded, it can be simply proved that ξ1is also bounded, thereby resulting

in a bounded tracking error. A command ﬁltered backstepping control has been

designed in [123] for the longitudinal dynamics of an HFV considering input

constraints and additive actuator faults. The control gain functions (gi) have

also been considered unknown, where it has been assumed that model uncer-

tainties, as well as the control gain functions, can be written into a parametric

form with partially unknown parameters. Considering the neural network-based

representation, the above assumption means that the residual terms εiin (38)

are equal to zero, while in the case of complex air vehicles with nonparametric

uncertainties, such an assumption becomes infeasible. Similarly, a command

ﬁltered backstepping control has been adopted in [124, 125] to control the tra-

jectory of an F-16 ﬁghter aircraft model with parametric uncertainties, where

second-order ﬁlters have been used to impose both the magnitude and rate lim-

its on the system states (see Section 3.6.2). Another analogous formulation has

been presented in [80] in which the time derivative of ξiconsists of ξi+1 , where

irepresents the step of the backstepping control design process. Using this

formulation, the control command ucan be written in terms of e1rather than

ǫ1.

2.3.3. Backstepping augmented by the First-Order Sliding Mode Diﬀerentiators

(FOSMD)

Another improved approach to approximate the time derivative of the virtual

control signal x2dis to employ a ﬁrst-order sliding mode diﬀerentiator rather

than employing a ﬁrst-order ﬁlter. Using the FOSMD, the diﬀerentiation error

tends to zero or a compact neighborhood of zero (depending on the signal’s

characteristics) after a ﬁnite-time transient process [126]. Considering a known

function l(t), the FOSMD formulation is obtained as follows:

˙ς0=−0|ς0−l(t)|0.5sign (ς0−l(t)) + ς1,(50)

˙ς1=−1sign (ς1−˙ς0),(51)

where ς0and ς1represent the states of the diﬀerentiator, and 0and 1denote

design parameters. Therefore, ˙ς0−˙

l(t) remains bounded if ˙ς0(0) −˙

l(0) and

ς0(0) −l(0) are bounded.

This approach has been adopted in the backstepping control design in [34, 83]

to control the longitudinal mode of an HFV. As shown in [34], using the FOSMD,

the stability analysis is more concise compared to the traditional backstepping,

DSC, and command ﬁltered design. Besides, a neural backstepping control ap-

proach using FOSMD has been proposed in [127] for the longitudinal dynamic

model of a sweep-back wings morphing aircraft subject to input–output con-

straints. It is notable that higher-order sliding mode diﬀerentiators (HOSMD),

which result in superior performance compared to FOSMDs [128], can also be

employed in the structure of the neural backstepping scheme [129].

2.3.4. Direct neural-backstepping control

In addition to the above techniques, there are a variety of direct adaptive

backstepping ﬂight control systems in the literature, which can satisfactorily

prevent the problem of the explosion of terms. To be more precise, consider

again the nonlinear model (31)-(33) with n= 2. Deﬁning

x∗

2d=g−1

1(¯x1) ( ˙yd−k1e1−f1(¯x1)−∆1),(52)

u∗=g−1

2(¯x2) ( ˙x2d−g1(¯x1)e1−k2e2−f2(¯x2)−∆2),(53)

and using two distinct neural networks to identify them as x∗

2d=W∗T

1µ1(¯x1)+ε1

and u∗=W∗T

2µ2(¯x2) + ε2, we have:

x2d=ˆ

1µ1(¯x1) = x∗

2d−ε1+˜

1µ1(¯x1),(54)

u=ˆ

2µ2(¯x2) = u∗−ε2+˜

2µ2(¯x2).(55)

Thus, considering a Lyapunov function candidate as (39), we have:

V=e1˙e1+e2˙e2+˜

1Γ−1

1˙

W1+˜

2Γ−1

2˙

=e1g1(¯x1)e2−ε1+˜

1µ1(¯x1)−k1e1+

e2−g1(¯x1)e1−k2e2+g2(¯x2)˜

2µ2(¯x2)−ε2

+˜

1Γ−1

1˙

W1+˜

2Γ−1

2˙

=−k1e1e1+g1(¯x1)

ε1−k2e2e2+g2(¯x2)

ε2

+˜

1Γ−1

1˙

W1+µ1(x1)g1(¯x1)e1

+˜

2Γ−1

2˙

W2+µ2(¯x2)g2(¯x2)e2.

(56)

Accordingly, by introducing the following updating rules,

Wi=−Γiµi(¯xi)gi(¯xi)ei+σiˆ

Wi, i = 1,2,(57)

and assuming that gis are nonzero and bounded, again, it can be concluded that

V < −kV +C, which ensures that all signals in the closed-loop system remain

bounded (see Section 2.4). As seen, despite the simpler formulation of the

direct method compared to previously proposed indirect backstepping schemes,

the boundedness of control gain functions (gis) is necessary to guarantee the

closed-loop stability.

It is notable that, using the aforementioned direct neural backstepping scheme,

the control singularity problem in the control of dynamic systems with unknown

gis (induced by approaching ˆgis to zero) is also avoided [130]. A direct neu-

ral backstepping control has been designed in [130] to control the longitudinal

mode of an air-breathing HFV with unknown gis, where it is necessary to have

gi≥gi>0 (gidenotes a positive constant). To this end, concerning the ap-

propriate Lyapunov function candidate, 1/giis multiplied by 1/2e2

ito eliminate

the requirement for giin updating rules, and extra terms in ˙

Vraised by this

reformulation have been compensated by deﬁning an appropriate ideal control

command (u∗). Besides, a ﬁltered tracking error including the integral of the

error has been considered as the error function to remove the steady-state error,

while the tracking error corresponding to the second step of the backstepping

design was not considered in the ﬁrst step.

On the other hand, by introducing an output feedback form and utilizing

High Gain Observers (HGOs) to estimate the time derivatives of the system

output (Section 3.1), it is possible to derive the control command with no re-

quirement for a backstepping scheme. In this regard, a ﬁltered tracking error is

deﬁned (as discussed in Section 2.2) to provide a uniﬁed error dynamic model.

Such a method has been adopted in [131, 132] to control an HFV, where only

one neural network is required in the altitude control block to determine the

actual control command. In a somewhat similar manner, a direct neuroadap-

tive control scheme has been integrated with the funnel control method in [133]

to control an air-breathing HFV considering non-aﬃne dynamics. The alti-

tude subsystem has been transformed into a simpliﬁed normal output feedback

model, where only one NN is required to determine the control command. Fur-

ther, the non-aﬃne dynamics of the vehicle have been handled by incorporating

a low-pass ﬁlter in the last step of the design to deﬁne a new virtual control

input in aﬃne form.

In addition to the above-mentioned continuous-time backstepping control

methods, several studies in the literature have addressed the design of a discrete-

time neural backstepping controller. As discussed in Section 2.1, the general for-

mulation of the NN updating rules in the discrete-time domain is similar to that

of the continuous-time controller, while the stability analysis of the closed-loop

system is quite diﬀerent [134]. A discrete-time direct neural backstepping con-

trol has been proposed in [135] by incorporating HONNs to estimate uncertain

terms in control commands. A similar control formulation has been given in [42]

using Extreme Learning Machines (ELMs). To simplify the control structure,

in [136, 137], dynamic equations corresponding to the altitude dynamics of an

HFV have been aggregated into a prediction model as x1(k+n) = ¯

f(x) + ¯g(x)u,

where ¯

fand ¯grepresent uncertain nonlinear functions and ndenotes the system

order. Subsequently, a single NN has been employed to tackle uncertain terms

in the control command. Such a method can be considered as the discrete-time

equivalent of the above-mentioned approach to control output feedback models

in the continuous-time domain. Alternatively, an equivalent prediction model

of an HFV has been deﬁned in [41] in which xi(k+n−i+ 1) is obtained as

a function of xi+1(k+n−i). Using such a change in the formulation of sys-

tem dynamics, all the information of the desired tra jectory in future nsteps

is involved in designing the controller, thereby improving the closed-loop per-

formance. The designed controller in all these papers has been applied to the

longitudinal mode of an HFV model.

2.4. How to analyze the closed-loop stability?

As mentioned before, most of the current feedback error learning-neural

control schemes in the literature can only guarantee the Uniformly Ultimately

Bounded (UUB) stability of the closed-loop system. To be more precise, it is

not typically possible to prove the negative deﬁniteness of the time derivative

of the Lyapunov function, but it can be proved that

V≤ −kV +C, (58)

where kand Cdenote positive constants. As a result, it is obtained that [138,

110]

V(t)≤V(0) −C

ke−kt +C

k.(59)

In this regard, many studies in the literature attempted to propose a ﬂight

control system, which satisﬁes the local [57, 70], semi-global [131, 139, 140, 110,

141], or global [33] UUB tracking. There are fewer works that have addressed

more stringent stability criteria including asymptotic, exponential, or ﬁnite-time

stability.

2.4.1. Asymptotic stability

A variety of ﬂight control systems are given in the literature, which can

prove the convergence of the tracking error to zero as time tends to inﬁnity. As

a straightforward approach, if we can assume that the estimation error εin (8)

is zero (or negligible), it is obtained that ˙

V=−k1eTe, which guarantees the

asymptotic convergence of the tracking error to zero. Such an assumption is

reasonable in the case of dynamic systems with parametric uncertainties. More

precisely, in such a circumstance, it is possible to estimate uncertain terms as

∆(x) = W∗Tµ(x), where µ(x) and W∗represent the vector of appropriate basis

functions and the matrix of unknown weights, respectively. According to this,

a constrained adaptive backstepping control scheme has been proposed in [124]

for a strict feedback system with parametric uncertainties. A set of modiﬁed

tracking errors (¯zi) have been deﬁned (Section 3.6.2), and it has been proved

that the time derivative of the Lyapunov function is obtained as ˙

V=−Pci¯z2

with cidenotes positive constants. This leads to convergence of the modiﬁed

tracking errors to zero as time tends to inﬁnity [142], while the actual tracking

error may increase if the control inputs are saturated. Finally, the proposed

control scheme was employed in [124] to control the attitude of a simpliﬁed

model of an F-16 aircraft with multi-axis thrust vectoring, considering actuator

faults and symmetric structural damages (only in the simulation phase). A

similar approach has been presented in [125] to provide a trajectory tracking

control for an F-16 ﬁghter aircraft under parametric uncertainties and system

constraints. An adaptive backstepping controller has been introduced in [143]

for the attitude control of an NSV in the presence of model uncertainties and

multiplicative actuator faults. For this purpose, ﬁrst, an adaptive neural state

observer has been proposed (Section 3.4), and subsequently, the estimated states

have been utilized in a backstepping control scheme. The asymptotic stability of

the system has been proved assuming that the NN estimation error is negligible.

Obviously, such control approaches can not ensure the asymptotic stability of the

system in the presence of nonparametric uncertainties and external disturbances,

which are usually present in practical ﬂight control problems.

Several studies have been reported in the literature in which NNs are com-

bined with discontinuous feedback control methods such as variable structure

or Sliding Mode Controllers (SMC) to guarantee the asymptotic stability of the

closed-loop system. The fundamental idea of the combination of NN function

approximation and robust terms (such as in SMC), which can result in the

asymptotic stability of the closed-loop system, is given in Section 3.4.2. A ro-

bust output feedback control with neural network function approximation has

been designed in [139] for the attitude and altitude control of a quadrotor UAV

in the presence of model uncertainties and external disturbances, where the at-

titude dynamics are constructed in terms of the unit quaternion. Although the

asymptotic stability of the closed-loop system has been proved, the proposed

control command leads to high-amplitude and oscillatory thrust forces, and the

chattering phenomenon due to the employment of the signum function in the

control command. Indeed, such discontinuous controllers suﬀer from well-known

limitations including a requirement for an inﬁnite control bandwidth and chat-

tering. Unfortunately, ad hoc ﬁxes for these eﬀects result in a loss of asymptotic

stability [144]. An adaptive SMC has been proposed in [145], where the param-

eters of the sliding surface were trained by NNs through error back-propagation

learning. The hyperbolic tangent function has replaced the signum function to

eliminate the chattering phenomenon, while the asymptotic stability of the sys-

tem has been achieved by neglecting model uncertainties in the control design

process.

To overcome the above-mentioned issues, a trajectory tracking control sys-

tem has been introduced in [146] for a rotorcraft UAV using Robust Integral of

the Signum of the Error (RISE) feedback [144], where a NN has been adopted to

compensate for uncertain dynamics. The RISE control scheme is a diﬀerentiable

control method that can compensate for additive disturbances and parametric

uncertainties. By combining it with an NN-based FEL method, there is no

need for linearity in the parameters to ensure the asymptotic stability of the

system. The proposed approach was employed in [146] in a multi-loop control

structure, where the desired attitude is determined in the outer loop and the

attitude tracking control has been addressed in the inner loop. The semi-global

asymptotic stability of the inner loop in tracking the desired attitude has been

proved assuming that the ﬁrst four time derivatives of the reference trajectory

are bounded. Another alternative has been given in [138, 147], where the signum

function of ehas been substituted by e/ √e2+ω2with ω(t) denotes a van-

ishing positive function satisfying R∞

0ω2(t)dt < ∞. Accordingly, asymptotic

tracking can be achieved, while the updating rules are sub ject to possible pa-

rameter drift. A class of NN-based optimal control methods with guaranteed

asymptotic stability has also been introduced in the literature, which will be

addressed in Section 4.2.

2.4.2. Exponential stability

As stated in [148], in the case of a ﬂight control problem, few studies have

claimed to achieve exponential convergence [149]. This is even less so regarding

uncertain dynamic systems. As mentioned earlier, an improved learning method

has been introduced in [56], which can ensure the exponential parameter and

tracking error convergence for a speciﬁc class of single-input nonlinear systems

with parametric uncertainties assuming that a precise estimation of the time

derivative of system states is available.

2.4.3. Finite-time stability

Moreover, a variety of indirect NN-based ﬂight control systems has been

developed in the literature employing SMC-based methods, which can guarantee

the ﬁnite-time or practical ﬁnite-time stability of the closed-loop system. The

practical ﬁnite-time stability means that the tracking error converges to a small

neighborhood of the origin in ﬁnite time [150]. To clarify the principal idea of

the mentioned control methods, consider again the ﬁrst control problem given

in Section 2.1. Let’s deﬁne a sliding manifold as

s=e+ηZt

sigr(e)dt, (60)

where sigr(e) = [sign(e1)|e1|r,···, sign(en)|en|r]T,η > 0, and 0 < r < 1. Now,

if the control command is computed as

u=B−1˙xd−F(x)−ηsigr(e)−ˆ

∆(x)−k1s−k2sigr(s),(61)

where the same NN function approximation as (3) with an updating rule as

W= Γ µsT−σˆ

Wis incorporated, using a Lyapunov function as

V=1

2sTs+1

2tr ˜

WTΓ−1˜

W,(62)

it is easy to prove the satisfaction of (58), thereby guaranteeing the boundedness

of all signals in the closed-loop system. As a consequence, we have

˙s≤ −k1s−k2sigr(s) + ρMI(63)

where Iand ρMdenote the identity matrix and a positive constant satisfying



ε−˜

WTµ(x)



∞≤ρM.(64)

Thus, by appropriate choice of k1and k2, one can simply show the conver-

gence of sto a compact neighborhood of the origin in ﬁnite time [151]. Such

a method is extensively discussed within the framework of adaptive Terminal

SMC (TSMC). Further, by incorporating a robust term into the control com-

mand (in a similar manner as discussed in Section 3.4.2) to compensate for ρM,

and replacing the ﬁrst term in the Lyapunov function (62) by ksk, it is possible

to guarantee the ﬁnite-time stability of the closed-loop system [152]. An FTC

has been introduced in [152] to control the longitudinal mode of a conventional

aircraft using a similar SMC, which ensures the ﬁnite-time convergence of sto

zero under appropriate control gains. Self-constructing fuzzy neural networks

have been utilized to estimate the bound of uncertain dynamics caused by actua-

tor faults and model uncertainties, while the minimum estimation error εunder

optimal network weights (W∗) has been neglected. A TSMC augmented by

neural approximation and disturbance observers (as given in Section 3.4.2) has

been presented in [151] for the trajectory tracking control of a quadrotor aerial

robot under model uncertainties, input dead-zone, and external disturbances,

where the practical ﬁnite-time stability of the system has been ensured. A slid-

ing surface, which is equal to the time derivative of (60) due to the second-order

dynamics of the controlled system, has been utilized in the proposed design.

A formation ﬂight control problem for a group of helicopter UAVs has been

dealt with in [153] using an analogous control scheme. TSMC combined with

neural approximation and the above-mentioned robust term has been adopted

in both the position and attitude control loops, which can ensure, respectively,

the ﬁnite-time and practical ﬁnite-time stability of the position and attitude

tracking error, while the control loops have been decoupled based on the multi-

ple time-scale assumption. The same problem has been addressed in [154] using

TSMC in the inner control loop and an adaptive NN-based control scheme in

the outer loop, though, again, the control loops have been analyzed separately.

The inter-vehicle collision avoidance has also been solved by incorporating an

exponential potential function into the design process. The proposed control

structure can guarantee the practical ﬁnite-time stability of the closed-loop sys-

tem, while it requires only the relative position of UAVs to their adjacent.

In addition to the aforementioned indirect adaptive control schemes, Direct

T2-FNNs have been employed in [155, 48, 156] as an augmentation for a PD

controller in the case of SISO nonlinear systems, where the network parameters

are updated by an SMC-based algorithm (with a sliding surface as s= ˙e+ηe)

using the output of the PD controller as the learning signal. The practical

ﬁnite-time stability of the closed-loop system has been proved using a simple

Lyapunov function as V= 1/2s2assuming that a PD controller can stabilize

the system [48]. Six T2-FNNs have been used in [156] to control the trajectory

of a 6-DOF quadrotor air vehicle, where each T2-FNN corresponds to a distinct

system state. Due to the complexity of computing the gradient of the cost

function with respect to the antecedent parameters of the FNN, particle swarm

optimization has been adopted as a gradient-free approach to train them, while

the consequent parameters have been updated using the mentioned SMC-based

algorithm. Notice that, despite the superior convergence properties of the above-

mentioned approaches, as discussed earlier, they are subject to considerable

limitations of discontinuous control systems.

3. Supplementary features in model-based IFCSs

Several additional features may be required in diﬀerent ﬂight control systems

due to diﬀerent additional requirements. In this section, we will address various

supplementary features, which have been widely incorporated in IFCSs. Again,

the focus of this section is on model-based IFCSs, while some of the introduced

elements such as self-organizing NNs can be eﬀectively employed in model-free

approaches, as well.

3.1. Output Feedback (OFB) control

The basic feedback error learning method has been developed assuming all

the system states are measurable. This assumption is not feasible in many

applications. Accordingly, diﬀerent modiﬁcations have been introduced in the

literature to eﬀectively control an uncertain nonlinear system using only the

system inputs-outputs. Such an intention is typically fulﬁlled by employing

state observers [157], where a composite Lyapunov function is subsequently

incorporated to compensate for both the tracking error and the state estimation

error.

A common assumption in the design of OFB control methods is that the

system is input-output linearizable with a speciﬁed relative degree [108]. More

precisely, consider a SISO dynamic system as ˙x=f(x)+ g(x)u,y=h(x). Thus,

we have:

˙y=∂h

∂x (f(x) + g(x)u) = Lfh(x) + Lgh(x)u, (65)

where Lfh(x) = ∂h

∂x f(x) is the Lie derivative of halong f[158]. Assuming the

system has a relative degree ρ, we have LgLρ−1

fh(x)6= 0. Therefore, deﬁning

u=1

LgLρ−1

fh(x)(−Lρ

fh(x) + ν),(66)

the dynamic model reduces to y(ρ)=ν[159] (a similar deﬁnition can also be

provided for a nonaﬃne system). Using such an assumption and assuming that

the system is globally exponentially minimum phase, an adaptive OFB control

has been introduced in [72] for a nonaﬃne SISO system with an unknown (but

bounded) dimension. A linear observer of dimension 2ρ−1 has been developed

to estimate the time derivatives of the tracking error signal. The estimated

vector is then used as the training signal for a Single-Hidden-Layer (SHL) NN

that attempts to compensate for the model inversion error in the framework

of the pseudocontrol strategy. Under the same assumption, a backstepping

control scheme has been designed in [108], where the time derivatives of yhave

been estimated using High Gain Observers (HGOs). Also, an adaptive neural

network has been developed to construct the control command in the presence of

model uncertainties. Finally, the proposed control scheme has been applied to a

helicopter model to control the altitude of the vehicle in vertical ﬂight. Similarly,

HGOs have been used in [160] to provide the estimation of the time derivatives of

Euler angles, which are required in the attitude control of a ﬂapping-wing micro

aerial vehicle. Analogously, a beneﬁcial approach to control the longitudinal

model of HFVs has been introduced in [131, 132]. As discussed earlier, typically,

a backstepping control scheme is designed for an HFV in which the longitudinal

dynamics are transformed into the strict feedback form. Alternatively, a new

formulation has been given in [131, 132] to transform the altitude subsystem into

a normal output feedback form. More precisely, considering the longitudinal

dynamics of an HFV and deﬁning z1=y=γ,z2= ˙z1, and z3= ˙z2, we have

˙z1=z2,˙z2=z3,˙z3=a(X) + b(X)δe, y =z1,(67)

where X= [γ, θ , q]Tand aand bare unknown. Here, γ,θ, and qrepresent the

ﬂight path angle, the pitch angle, and the pitch rate, respectively. Accordingly,

z2and z3are the time derivatives of the system output and are unknown.

Utilizing an HGO, the system states Z= [z1, z2, z3]Tcan be estimated by

Z= [z1,ξ2

ε,ξ3

ε2]T, where

ξ1=ξ2

ε,(68)

ξ2=ξ3

ε,(69)

ξ3=−d1ξ3−d2ξ2−ξ1+y(t)

ε,(70)

and εis a small design constant and d1and d2are chosen such that s3+

d1s2+d2s+ 1 is Hurwitz. Consequently, there exist positive constants hsand

tssuch that ∀t > ts, we have |ˆ

Z−Z| ≤ εhs[161]. Afterward, an NN-based

control command has been developed in [131, 132] to ensure the convergence of

a ﬁltered tracking error to a small neighborhood of zero.

An NN-based observer (see Section 3.4) has been designed in [162] to esti-

mate the angular and translational velocities of a quadrotor air vehicle, which are

subsequently utilized in the control loop. On the other hand, non-model-based

ﬁlters have been employed in [139] to provide an estimation of the unknown

angular velocity, which was required in the proposed OFB control.

3.2. Minimal-learning parameter

One of the ma jor drawbacks of neural networks in the structure of the feed-

back error learning scheme is the excessive computational burden of the training

process due to the high number of parameters that should be identiﬁed. An

eﬃcient identiﬁcation technique with signiﬁcantly fewer training parameters,

called the Minimal-Learning Parameter (MLP), has been widely employed by

researchers in recent years. MLP has been ﬁrst introduced by Yang and his

colleagues and employed in the traditional backstepping control combined with

T-S fuzzy systems [163, 164] or RBFNNs [165]. Subsequently, it was eﬀectively

integrated with DSC [166] (to solve the problem of the explosion of complexity

in classical backstepping control) and with direct adaptive fuzzy control [167]

to directly approximate the desired control input signals rather than unknown

system’s nonlinearities.

Generally speaking, this technique attempts to estimate the norm of the

unknown weight vector (or matrix) rather than estimating its elements [168].

To be more precise, consider again the control problem given in Section 2.1

with the dynamic model as (1). Suppose that kW∗k2≤, where denotes

an unknown constant. Deﬁning ˆas the estimation of , a Lyapunov function

can be deﬁned as:

V=1

2eTe+1

2λ˜2,(71)

where ˜= ˆ−and λis a positive constant. Thus, we have:

V=eTF(x) + B(x)u−˙xd+W∗Tµ+ε+1

λ˜˙

ˆ. (72)

Using Cauchy–Schwarz and Young’s inequalities, it is obtained that

eTW∗Tµ≤a2eTeµTµ

2+1

2a2,(73)

eTε≤a2eTe

2+εM

2a2,(74)

where arepresents a positive design constant. Therefore,

V≤eT(F(x) + B(x)u−˙xd) + εM

2a2+1

2a2+a2eTeµTµ

2+a2eTe

2+1

λ˜˙

ˆ

=eT(F(x) + B(x)u−˙xd) + εM

2a2+1

2a2+a2ˆeTeµTµ

2+a2eTe

+ ˜1

λ˙

˜−a2eTeµTµ

2.

(75)

Thus, it is possible to deﬁne the control command and the updating rule of ˆ

as follows:

u=B−1−F(x) + ˙xd−a2

2+k1e−a2eˆµTµ

2,(76)

ˆ=a2λeTeµTµ

2−σλ ˆ, (77)

where k1and σdenote positive design parameters, and the second term on the

right-hand side of (77) represents the σ-modiﬁcation term. Substituting (76)

and (77) in (75) yields

V≤ −k1eTe−σ˜ˆ+εM

2a2+1

2a2.(78)

So, setting σ=2k1

λand knowing that

˜ˆ≥˜2

2−2

2,(79)

ﬁnally, (78) can be written as follows:

V≤ −k1eTe+˜2

λ+k12

λ+εM

2a2+1

2a2=−kV +C, (80)

where k= 2k1and C=k12

λ+εM

2a2+1

2a2. As discussed in Section 2.4, (80) leads

to the convergence of both the tracking error and the norm estimation error to a

small neighborhood of zero, where the appropriate value of ais determined con-

sidering the tradeoﬀ between larger steady-state error and more control eﬀort.

Accordingly, we should train only a scalar parameter ( ˆ) rather than a matrix

(ˆ

W), thereby considerably reducing the computational burden corresponding to

the online tuning of the NN parameters. However, by comparing (77) with (7),

it can be understood that such an achievement is obtained at the cost of less

eﬃcient use of the error vector ein the MLP technique. More precisely, here,

we use only the norm of the tracking error (in a scalar updating rule) instead

of using all its elements, separately, which results in a conservative design. A

similar formulation can also be given for an MLP technique in the case of direct

adaptive control designs.

This approach can be employed, in a similar way, in the structure of the

backstepping control method. The design has been enhanced in [114] by up-

dating only one parameter in the attitude control block that corresponds to

the maximum of the norm of all the three RBFNNs employed in the control

system. However, this results in a more conservative design, thereby requiring

more control eﬀort. On the other hand, as mentioned earlier, authors in [132]

have transformed the longitudinal dynamics of an HFV into the output feedback

form in which the new system states were approximated using HGOs. Thus,

there is a need for only one neural network in the proposed scheme, where the

MLP technique has been employed in the training phase. A similar formulation

has been utilized in [99] for a nonaﬃne model of an HFV. Both the velocity

and attitude control blocks have been designed using the pseudocontrol strat-

egy. Also, fuzzy wavelet neural networks have been employed to compensate for

model uncertainties, where, owing to the employment of the MLP technique,

only the norm of the weight matrix was tuned. Besides, in [115], DSC has been

integrated with the MLP technique in the case of an HFV, which is subject to

actuator bias fault. Considering unknown control gain functions (gis), the MLP

technique has been utilized to estimate λi=gimin

−1kW∗k2to avoid the control

singularity problem, while due to the unusual formulation of the employed NNs,

the small-gain theorem [169, 166] has been involved to ensure the UUB stability

of the system.

Also, regarding the application of the MLP technique in the backstepping

control of VTOL UAVs, such an approach has been utilized in a backstepping

trajectory tracking control scheme for a quadrotor air vehicle in [138] in which

the MLP technique has been applied to each of the six RBFNNs employed to

estimate model uncertainties. Further, authors in [116] have used the DSC along

with the MLP technique in the trajectory tracking control of a multi-rotor UAV

considering output constraints. A robust term has also been incorporated to

estimate the neural approximation error (Section 3.4). Besides, a neuroadaptive

control approach has been proposed in [60] for a quadrotor UAV under model

uncertainties and actuator faults. Compared to previously-mentioned studies,

a NN has been employed in this paper to estimate an upper bound for the norm

of model uncertainties (instead of estimating the uncertainty itself), where the

MLP technique has been adopted to estimate an upper bound for the norm of

the weight vector of that NN.

The MLP technique has also been employed in discrete-time neural back-

stepping controllers [170], where the updating rule of the NN weights’ norm

and the stability analysis can be obtained similarly to the basic discrete-time

FEL in Section 2.1. Such an MLP scheme has been used in [135] in a discrete-

time neural backstepping controller applied to the longitudinal mode of an HFV.

However, the introduced updating rule for in [170, 135] may result in negative

values in some time intervals. Thus, the design has been improved in [137, 42]

by assuming that kW∗k ≤ sign(), which allows to be either positive or

negative.

3.3. Systems with unknown control direction

The design of a controller for a dynamic system with unknown control direc-

tion is a challenging problem. This is due to the fact that a control command

with incorrect direction can simply make the system unstable. In such cases,

an interesting idea would be to alternately change the control direction. Ac-

cordingly, if the control command is applied in the wrong direction, the systems

states get away from the desired trajectory until the control direction changes.

Subsequently, the amplitude of the control command should increase by increas-

ing the tracking error to get the system back to the desired trajectory. Such an

idea has been ﬁrst introduced in [171], and a function with the above-mentioned

characteristics is known as a Nussbaum function. Nussbaum function has been

employed in diﬀerent studies to provide acceptable closed-loop performance in

the case of complex systems with unknown control direction [172]. To clarify

the control design procedure using the Nussbaum function, consider a SISO dy-

namic model as ˙x=f(x) + g(t)u, where g(t) is a time-varying control gain with

unknown direction. To ensure the stabilizability of the system, assume that

g(t)∈I= [g, g], where gand gdenote unknown constants and 0 /∈I. Deﬁning

e=x−xd, we have

˙e= ˙x−˙xd=f(x) + g(t)u−˙xd.(81)

If g(t) was available, the control command could be computed as u=g−1(−f(x)+

˙xd−ke), where kis a positive constant. However, due to the unknown control

direction, such a command is not feasible. Thus, we deﬁne a control command

as follows:

u=N(ζ)η, (82)

ζ=eη, (83)

η=f(x)−˙xd+ke, (84)

where N(ζ) represents a Nussbaum function like N(ζ) = exp(ζ2) cos(πζ/2).

Deﬁning a Lyapunov function as V= 1/2e2, we have

V=e˙e=e(f(x) + gN (ζ)η−˙xd).(85)

By adding and subtracting ˙

ζto the right side of the equation, ˙

Vis obtained as

V=ef(x) + gN(ζ)eη +˙

ζ−˙

ζ−e˙xd

=gN (ζ)˙

ζ+˙

ζ−ke2.

(86)

Now, by multiplying both sides of the equation by exp(ct), where c= 2k, the

following equation is obtained.

dt V ect=gN (ζ)˙

ζ+˙

ζect.(87)

Thus, we have

V=e−ct Zt

(gN (ζ) + 1) ˙

ζec τ dτ. (88)

Consequently, according to Lemma 2 in [173], it is proved that V(t) and ζ(t)

are bounded, thereby guaranteeing the bounded tracking error. In cases where

f(x) is also an unknown function, as discussed earlier, we can simply substitute

f(x) in (84) by its estimation as ˆ

f(x) = ˆ

WTµ(x) and include an additional term

1/2˜

WTΓ−1

1˜

Win the Lyapunov function to ensure the closed-loop stability. A

similar approach has been employed in [174] in the framework of DSC to con-

trol the longitudinal mode of an HFV considering dead-zone input nonlinearity,

where a set of NNs have been used to estimate uncertain terms in the control

command.

3.4. Neural networks and Disturbance Observers (DOs)

3.4.1. Neural disturbance observer

As discussed earlier, NNs can be eﬀectively employed in the closed-loop

control to estimate and compensate for model uncertainties, external distur-

bances, and also complex parts of the control command. In addition to the

above-mentioned control systems, NNs can also be utilized as a powerful DO in

an open-loop identiﬁcation problem. To this end, consider again the nonlinear

model (1) where ∆(x, u) corresponds to the eﬀect of model uncertainties, actua-

tor faults, and external disturbances on the system dynamics [118]. Notice that

an external disturbance is generally an explicit function of time (not the system

states and inputs). Thus, the identiﬁcation of ∆ as a function of system states

(and inputs) requires an implicit assumption that external disturbances can be

formulated as a function of the system states (and inputs). Although such an

assumption makes sense in the case of some types of external disturbances, in

a general case, it is not reasonable. In such circumstances, it may be possible

to estimate ∆ by a NN with time-dependent weights (or even time-dependent

structure). This brings new challenges to the convergence analysis of the NN,

which would be an interesting research direction. Another idea would be assum-

ing that external disturbances are smaller than an unknown bounded function

of system states, i.e. |d(t)| ≤ W∗Tµ(x) [165], while it may lead to a conservative

design, thereby signiﬁcantly increasing the control eﬀort in the case of control

problems.

Here, assuming that the uncertain terms in the dynamic model can be for-

mulated as a function systems states (and inputs), we can introduce a new

state-space model as

ˆx=F(x) + B(x)u+ˆ

∆ + κ(x−ˆx),(89)

where κrepresents a positive constant (which is tuned according to the compro-

mise between the convergence rate of the introduced observer and its sensitivity

to measurement noises [175]), and ˆ

∆ = ˆ

WTµ(x, u) denotes the estimation of ∆.

Notice that diﬀerent types of feedforward and recurrent NNs can be formulated

in such a compact form [118]. Now, by deﬁning eD= ˆx−xand ˜

W=ˆ

W−W∗,

a Lyapunov function can be proposed as

V=1

2eT

DeD+1

2tr ˜

WTΓ−1˜

W.(90)

Thus, we have

V=eT

D−κeD+˜

WTµ(x, u)−ε+tr ˜

WTΓ−1˙

W,(91)

where εdenotes the bounded estimation error of the NN. As a consequence,

if the NN’s parameters are updated as ˙

W= ΓµeT

D, it is obtained that ˙

−eT

D(κeD+ε), which results in ˙

V < 0 for kκeDk>kεk, thereby guaranteeing

the bounded estimation error. The employment of one of the modiﬁcation tech-

niques introduced in Section 2.1 in the updating rule is also recommended to

avoid parameter drift. It is notable that, in the proposed DO, there is no need

for an aﬃne model, and we can simply substitute F(x) + B(x)uin (89) with

F(x, u). A similar method has been used in [176] to estimate uncertain terms

in the dynamics model of a UAV including external disturbances induced by

diﬀerent types of atmospheric disturbances, i.e. the wind shear, wind gust, and

atmospheric turbulence. Owing to the presence of both vand ˙vin the distur-

bance term (where vrepresents the wind velocity vector), a dynamic equation

is then derived (using the estimated uncertainty) as ˙v=χ(x, u, v) to estimate

the total wind velocity. Subsequently, an auto-landing control system has been

proposed in [176] for the Sekwa UAV in the presence of external disturbances

using a combination of the backstepping control and the dynamic inversion,

while the designed scheme attempted to control six independent outputs by

only four system inputs, which is not generally feasible. More precisely, the

pseudo-inverse operator employed to compute the control command may result

in inappropriate commands in the case of inconsistent control objectives.

Moreover, it should be noted that an analogous formulation can be developed

to provide a neural state observer. A neural observer has been proposed in [162]

by incorporating both the kinematic and dynamic equations of the system to

estimate the translational and angular velocities of a quadrotor knowing the

position and attitude of the vehicle. Such a DO can also be utilized in the

closed-loop control by substituting eDin the updating rule of the NN by eD+e

with erepresents the tracking error. Indeed, this would be a variant of the

composite learning method introduced in Section 2.1.

3.4.2. Combination of NN function approximation and DOs

There are a variety of robust control approaches in the literature in which

a combination of DOs and NNs has been adopted to simultaneously compen-

sate for external disturbances and model uncertainties, respectively. Using such

an identiﬁcation scheme, it is possible to distinguish between external distur-

bances, which are explicit functions of time, and internal disturbances, which

can be modeled as a function of system states (and inputs). In addition, using a

combination of DOs and NN-based estimators, DOs will be capable of compen-

sating for the estimation error of the NN. More speciﬁcally, consider a nonlinear

dynamic model as

˙x=F(x) + B(x)u+ ∆(x, u) + d(t),(92)

where ∆(x, u) and d(t) represent model uncertainties and external disturbances,

respectively. Now, considering the following deﬁnitions:

ˆx=F(x) + B(x)u+ˆ

∆(x, u) + ˆ

D+κ(x−ˆx),(93)

∆ = W∗Tµ(x, u) + ε, ˆ

∆ = ˆ

WTµ(x, u),(94)

D(t) = d(t) + ε, (95)

eD= ˆx−x, e =x−xd,(96)

and assuming dim(x) = dim(u) = n, an appropriate control command can be

formulated as follows (the introduced approach can be used in the case of other

types of dynamic systems and control methods in a similar manner):

u=B(x)−1(x)˙x−F(x)−ˆ

∆−ˆ

D−k1e.(97)

Subsequently, a Lyapunov function can be formulated as

V=1

2eTk2e+eT

Dk3eD+˜

DT˜

D+tr(˜

WTΓ−1˜

W),(98)

where kirepresents positive constants, and ˜

D=ˆ

D−D. Accordingly, we have

V=k2eT−k1e−˜

WTµ−˜

D+tr(˜

WTΓ−1˙

W)+

DT˙

D−˙

D+k3eT

D−κeD+˜

WTµ+˜

D.

(99)

Thus, the following updating rules can be deﬁned:

W=˙

W= Γ µk2eT−k3eT

D−σWˆ

W,(100)

D= (k2e−k3eD)−k4h˙

ˆx−˙x+κeDi,(101)

where k4denotes a positive constant. Concerning the second term on the right-

hand side of (101), using (93), we have

ˆx−˙x+κeD=˜

WTµ+˜

D. (102)

Consequently, assuming that µ(x, u) and ˙

Dare bounded, one can simply prove

the satisfaction of (58), thereby ensuring the boundedness of all signals in the

closed-loop system. Notice that, although the updating rule (101) consists of

˙x, there is no need for it to compute ˆ

Dbecause the estimated disturbance

(ˆ

D) is obtained as the integral of (101) (the integral of other terms in the

updating rule can be calculated using an auxiliary state variable [177]). Such

a combination has been employed in [177] in a backstepping design. The same

approach has also been utilized in [129] to provide a decentralized attitude

synchronization tracking of multi-UAVs in the presence of actuator faults and

wind eﬀects. Similarly, the trajectory tracking control of multiple trailing UAVs

has been addressed in [178] using DSC, where an NN+DO has been adopted

to compensate for unknown aerodynamic parameters, actuator faults, and wake

vortices. A partially analogous scheme has been utilized in [160] to provide a

trajectory tracking control for a ﬂapping-wing micro aerial vehicle considering

model uncertainties and external disturbances. In a similar manner, a combined

NN and DO has been incorporated in [140] in the framework of an FTC to

control the attitude of a 3-DOF helicopter. Further, an analogous scheme has

been employed in [141] in a backstepping controller designed to control the

attitude of an NSV. The same identiﬁcation approach has also been adopted

in [151], where a DO is utilized to compensate for external disturbances, the

estimation error of NNs, and the eﬀect of unknown input dead-zone.

Another eﬀective combination of NNs and DOs with less complexity and no

requirement to use the boundedness of µ(x, u) and ˙

Din the stability analysis,

relies on the estimation of the upper bound of Drather than that of the exact

value of it. Such a method, which results in a conservative design, can be

classiﬁed as a robust adaptive control. For this purpose, consider again the

above-mentioned dynamic model and the following deﬁnitions (for simplicity,

suppose that x, u ∈R, while the introduced approach can be extended to MIMO

systems with a similar formulation):

ˆx=F(x) + B(x)u+ˆ

∆(x, u) + υ+κ(x−ˆx),(103)

∆ = W∗Tµ(x, u) + ε, ˆ

∆ = ˆ

WTµ(x, u),(104)

D(t) = d(t) + ε, kDk ≤ DM,(105)

eD= ˆx−x, e =x−xd,(106)

u=B(x)−1(x)˙xd−F(x)−ˆ

∆−υ−k1e,(107)

where υshould be designed. Now, redeﬁning the Lyapunov function as

V=1

2k2e2+k3e2

D+k4˜

M+˜

WTΓ−1˜

W,(108)

we have

V=k2e−k1e−˜

WTµ+D−υ+˜

WTΓ−1˙

W+

k4˜

DM˙

DM+k3eD−κeD+˜

WTµ+υ−D.

(109)

Thus, using the updating rule (100) and

ea=k2e−k3eD,(110)

DM= 1/k4eatanh(ea/ǫ)−σMˆ

DM,(111)

υ=ˆ

DMtanh(ea/ǫ),(112)

where ǫdenotes a positive constant, it is obtained that

V=−k2k1e2−k3κe2

D+˜

DMeatanh(ea/ǫ)−σMˆ

DM

−σW˜

WTˆ

W+eaD−eaˆ

DMtanh(ea/ǫ).

(113)

Having the following inequality [179] for any ǫ > 0 and z∈R,

0≤ |z| − ztanh(z/ǫ)≤0.2785ǫ, (114)

it is easy to show that (58) is satisﬁed. The utilization of the hyperbolic tangent

function instead of the signum function in the presented formulation is an ef-

fective way to avoid the chattering phenomenon, while the possible asymptotic

stability of the closed-loop system reduces to UUB stability. To be more speciﬁc,

if we simply employ the signum function in the introduced control scheme and

eliminate the σ-modiﬁcation terms from (100) and (111), one can simply prove

the asymptotic convergence of the tracking error to zero, though at the cost of

possible parameter drift and previously discussed limitations of discontinuous

control systems (Section 2.4.1). Such an approach has been utilized in [168]

to control the position and attitude of a helicopter with unknown inertia ma-

trix considering aerodynamic frictions. Accordingly, the unknown aerodynamic

forces and moments have been estimated using RBFNNs, where the upper bound

of the estimation error corresponding to NNs, as well as external disturbances,

has been compensated by the introduced DO.

The introduced identiﬁcation scheme can be similarly employed in the back-

stepping control design. It has been employed in [180] in a backstepping tra-

jectory tracking control applied to a model-scaled helicopter in order to deal

with the NN’s estimation error, where a switching function has been adopted

to integrate the NN and the introduced DO. Further, a similar approach has

been used in the framework of DSC to control the longitudinal mode of an HFV

in the presence of model uncertainties, dead-zone input nonlinearity [174], and

actuator faults [115]. Analogously, in [118], recurrent wavelet neural networks

have been integrated with such a DO in a DSC to compensate for external dis-

turbances, model uncertainties, and the eﬀect of input constraints in the case

of the attitude control of an NSV. An adaptive neural backstepping control has

been proposed in [83] for an HFV, where a similar approach has been utilized

in each step of the backstepping control to deal with model uncertainties and

estimation error of NNs. In [116], DSC has been applied to a multi-rotor UAV

to provide an attitude control system. A similar identiﬁcation method has been

employed to compensate for model uncertainties and external disturbances.

On the other hand, a reverse combination of NN+DO has been introduced

in [181], where ﬁrst, a DO attempts to estimate the entire model uncertainties

and external disturbances as a lumped disturbance, and subsequently, a NN

has been employed to compensate for estimation error of the DO. The proposed

scheme has been utilized to control the roll angle of an air vehicle considering

the wing rock phenomenon. However, using such an approach, the estimation

error of the NN is not identiﬁed and thus remains uncompensated.

3.5. Fault-tolerant control

As mentioned earlier, the introduced (direct and indirect) NN-based adaptive

controllers have been applied to faulty systems, as well. They include but not

limited to the basic FEL-based control [91, 182, 92], the pseudocontrol strategy

[105, 106, 183, 100], the neural backstepping design [70, 184, 124, 125, 185, 140,

147], and hybrid direct-indirect adaptive controllers [107, 38]. In this regard, a

typical approach to deal with operational faults is to incorporate the nonlinear

terms induced by the actuator faults (or structural damages) into the model

uncertainty and estimate (and compensate for) them as a lumped uncertainty

by FEL-based NNs [115, 152, 60, 186, 59]. Although the actuator faults (or

structural damages) suﬀer from the same issue as external disturbances (i.e. the

explicit dependence on time), by considering a sequence of abrupt faults, the

coeﬃcients corresponding to system faults can be deemed as time-independent

functions between two sequential faults. Thus, the stability analysis can be

performed for a speciﬁc time interval between two sequential faults (see the

following subsection). Accordingly, the aforementioned combination of NNs

and DOs can also be utilized in fault-tolerant ﬂight control systems [140, 129].

To provide more eﬃcient FTC systems with less conservativeness, in addition

to the aforementioned generic adaptive neural control methods, there are various

NN-based controllers in the literature that have been customized to speciﬁcally

deal with actuator/sensor faults and structural damages. Some of the more

commonly used schemes in this ﬁeld are given in the following.

3.5.1. FEL-based fault identiﬁcation

It is possible to employ the FEL method to directly identify the coeﬃcients

corresponding to actuator faults, while simultaneously estimating model uncer-

tainties in the closed-loop control. To clarify the basic idea, consider again the

dynamic model (1), and suppose that the actual plant input is determined as

u(t) = ξ(t)uc(t) + δ(t),(115)

where uc∈Rnrepresents the computed control command, and ξ(t) and δ(t)

denote an unknown diagonal matrix and an unknown vector corresponding to

multiplicative and additive actuator faults, respectively. Such a formulation can

represent diﬀerent types of actuator faults, such as the stuck type fault and the

loss of eﬀectiveness [187]. Considering a sequence of sudden actuator faults, ξ

and δcan be considered as piecewise constant functions. Accordingly, deﬁning

tias the time of the occurrence of the ith actuator fault, one can assume that

ξand δremain constant for t∈(ti, ti+1). In the following, we will focus on this

time interval, while, due to the ﬁnite number of such time intervals, the design

can be extended to the entire ﬂight time assuming that the occurrence of the

fault at tidoes not violate the controllability of the system. Now, if the ideal

system input (u∗) is deﬁned as (4) with ˆ

∆ = ˆ

WTµ, then a control command

can be determined as follows:

uc=k2u∗+k3,(116)

where k2and k3represent unknown constants satisfying

ξk2=I, ξk3+δ= 0,(117)

which require ξto be invertible. Knowing that ξis a diagonal matrix, the

invertibility implies that no actuator should be completely stuck. Owing to

the unknown value of k2and k3, their estimations are employed in the control

command. Thus, we have:

u=ξ(ˆ

k2u∗+ˆ

k3) + δ=u∗+ξ(˜

k2u∗+˜

k3).(118)

Thus, using the following updating rules

k2= Γ2u∗eTB(x)T,(119)

k3= Γ3eTB(x)T,(120)

and employing (7), one can deﬁne the following Lyapunov function

V=1

2eTe+tr ˜

WTΓ˜

W+˜

2ξΓ−1

2˜

k2+˜

3ξΓ−1

3˜

k3 (121)

Here, we have assumed that ξis a positive deﬁnite matrix, which is a reason-

able assumption due to the fact that ξcorresponds to the eﬀectiveness ratio

of actuators (0 < ξii ≤1). Utilizing the above-mentioned updating rules, the

time derivative of Vis obtained as (8), thereby ensuring the bounded tracking

error. As discussed previously, it is recommended to incorporate a modiﬁcation

term in the updating rules (119) and (120) to avoid the parameter drift in the

absence of the PE condition.

A similar approach has been used in [188] for a SISO system in the frame-

work of traditional backstepping control. Similarly, in [123], such an approach

has been utilized in a command ﬁltered backstepping considering parametric un-

certainty in both internal dynamics and the control gain function, while in [83],

the prediction error has also be involved in the NN updating rules. All these

controllers have been applied to the longitudinal mode of an HFV. An anal-

ogous method has been employed in [189], where the designed controller has

been applied to an HFV considering ﬂexible dynamics and state constraints.

Alternatively, by considering a similar plant in the framework of DSC, authors

in [190] attempt to estimate 1

inf |ξii|, which allows to deal with time-varying ac-

tuator faults at the expense of employing a conservative design. DSC has been

utilized in [191] to control the skid-to-turn missile in the presence of partial state

constraints and actuator faults. Compared to the above-mentioned studies, the

additive fault B(x)δhas been aggregated with the model uncertainty ∆ into a

single term, where the upper bound of it has been estimated by a NN. Further,

instead of estimating k2, the matrix ξhas been estimated directly, while there

is no basic diﬀerence between these two design methods.

3.5.2. Using a separate Neural fault detection and identiﬁcation (FDI) block

Traditionally, NNs were utilized as a separate Fault Detection and Isolation

(FDI) scheme in the framework of FTCs. The main idea in it comes from

the comparison of the output of the system and pre-trained NNs, where the

residuals are interpreted as a fault if they exceed predeﬁned thresholds [192].

However, such an approach cannot ensure closed-loop stability, and it may also

lead to false alarms in the presence of severe external disturbances or unexpected

damages.

There are other types of indirect fault identiﬁcation approaches in the liter-

ature (which have been designed separately from the control system), as well.

The main concern about such a decentralized design is the challenges in an-

alyzing the closed-loop stability considering the estimation error of the fault

identiﬁcation block (which is commonly neglected in the stability analysis). An

NN-based fault identiﬁcation block has been proposed in [143] to estimate mul-

tiplicative actuator faults and model uncertainties, distinctly. The introduced

method is similar to a FEL-based fault identiﬁcation scheme, while the tracking

error eis substituted by the estimation error of a neural observer (see Section

3.4.1). The estimated model uncertainties and actuator faults have been subse-

quently employed in the structure of a backstepping attitude controller applied

to an NSV. Further, authors in [193], have attempted to identify the combina-

tion of fault dynamics and model uncertainties as a lumped uncertainty using

a neural state observer. Such an approach has been employed in the paper to

tackle sensor and actuator faults in the case of a satellite. The updating rules

are obtained using a FEL method considering the estimation error of the neural

observer as the learning signal.

In a somewhat similar fashion, a neural observer has been employed in [49,

50] within the framework of nonlinear geometric approach for fault detection and

identiﬁcation [194]. The fundamental assumption (which could be a restrictive

assumption in ﬂight control problems) in such an approach is the existence of

acoordinate change in the state space and the output space that provides an

observable subsystem, which is aﬀected by a speciﬁc fault but not aﬀected by

external disturbances and other faults. By exploiting such subsystems and using

the same neural observer as introduced in Section 3.4.1, authors in [49, 50] have

attempted to detect and identify diﬀerent (but not simultaneous) sensor and

actuator faults in a satellite, while considering external disturbances. Finally,

the proposed scheme in [49] has been employed in an attitude control system

based on a typical LQG controller designed for a linear model of the satellite.

Alternatively, an RLS optimization has been adopted in several studies to

identify diﬀerent operational faults of an air vehicle. In [195], multiplicative

actuator faults have been identiﬁed using a generalized Online Sequential Ex-

treme Learning Machine (OS-ELM) algorithm (for MIMO systems), which is

based on the RLS optimization (see Section 4.1). The model uncertainties and

external disturbances have been neglected at this stage, while they have been

compensated by a robust model predictive control, which is applied to a quadro-

tor UAV. A neural state observer has been introduced in [196, 197] in which

the NN weights have been updated using an Extended Kalman Filter (EKF),

which is formulated by a similar formulation to the RLS optimization. The

proposed approach has been evaluated in the presence of diﬀerent faults such

as abrupt and intermittent faults. Besides, such a method has been adopted

in a dynamic inversion control in [197] to control the attitude of a ﬁxed-wing

aircraft considering actuator faults.

According to the obtained results in [196, 195, 38], the use of an RLS

optimization-based updating rule results in faster convergence of the NN weights

and higher accuracy in comparison with FEL-based approaches (which are devel-

oped based on Lyapunov’s direct method), particularly in the case of an abrupt

actuator fault. In addition, unlike the RLS optimization-based approaches, a

fault identiﬁcation block, which is developed using Lyapunov’s direct method

(such as in [193]), may result in severe changes in its estimation at the moment

of an abrupt actuator fault [196]. This phenomenon, which has not been ad-

dressed in typical stability analyses, can be a challenging issue in FEL-based

FTCs.

3.5.3. Multimodel approaches

A number of studies attempted to identify the dynamic model of the sys-

tem using an online identiﬁcation problem employing recurrent NNs (such as

NARX NNs), and then design a controller for the identiﬁed model [198]. The

challenging issue with such a control system is analogous to that of previously

mentioned indirect FDI schemes. More precisely, the identiﬁcation error is typ-

ically neglected in the stability analysis of the closed-loop system.

As a more reliable solution, a multimodel approach has been developed in

[199] to identify a 6-DOF model of a ﬁxed-wing aircraft in the presence of

diﬀerent actuator faults. To be more precise, a set of local NARX NNs have been

ﬁrst trained considering diﬀerent fault conditions, i.e the elevator, aileron, and

rudder faults, where each local model corresponds to a speciﬁc fault condition.

Subsequently, the output of the entire model is computed by a weighted average

of the outputs of local NNs, where the relative weight of each local model is

determined using an OS-ELM-based optimization. It means that each local

model can be considered as a hidden node of an extended ELM, where the

output layer of this extended ELM is trained using the OS-ELM algorithm.

Accordingly, the entire model can be considered as a deep neural network with

two hidden layers, where the ﬁrst layer (corresponding to local models) is trained

oﬄine, and the second layer is trained by the OS-ELM approach. Such an

identiﬁcation algorithm has been adopted in [200] to provide a reliable prediction

model for the system. The obtained model has been used in a Model Predictive

Control (MPC) to provide a tra jectory tracking control for a ﬁxed-wing aircraft.

As illustrated in [200], the proposed approach not only can deal with actuator

faults that have been considered in the oﬄine training of local NNs, but it

can compensate for all the actuator faults and structural damages that can be

modeled as a combination of the local models. In this regard, the local NNs

can be considered as the basis vectors of a multidimensional space, which are

capable of representing all vectors in that space. Also, the prediction error of the

model has been tackled by a DO in the proposed model predictive controller.

The stability of the closed-loop system has been analyzed using a terminal

constraint in the MPC framework, while the feasibility of such a constrained

optimization problem is not trivial [201].

3.6. Consideration of input constraints

Similar to FTC systems, a typical approach to overcome the input con-

straints is to consider nonlinear terms induced by input constraints (such as

dead-zone or saturation function) as an uncertain term, which is estimated and

compensated by NNs [136, 202]. Nevertheless, the same issue with neural DOs,

i.e. the estimation of an explicit function of time using a NN that is a function

of system states, exists here as well. In addition to such a control approach,

other types of NN-based control designs have been proposed, which can deal

with input constraints. The most commonly used approaches to this goal are

given in the following.

3.6.1. Pseudocontrol Hedging (PCH)

A traditional approach to deal with input constraints in the framework of the

pseudocontrol strategy called the Pseudo-Control Hedging (PCH), is to prevent

the adaptive elements in the control system from seeing the eﬀects of input

constraints by manipulating the reference tra jectory [183, 55]. For this purpose,

consider again the dynamic model (25). Considering the desired trajectory xd,

a reference trajectory is deﬁned for the system as ˙xr= ˙xd+νh, where νh, which

represents a residual term induced by input constraints, should be designed.

Deﬁning ¯e=x−xr, we have:

¯e=F(x, u)−˙xr=ˆ

F(x, u)−˙xr+ ∆(x, u)

=ˆ

F(x, uc)−˙xr+ ∆(x, u) + ˆ

F(x, u)−ˆ

F(x, uc),

(122)

where ucdenotes the desired control command and ∆(x, u) = F(x, u)−ˆ

F(x, u).

Thus, if we deﬁne

νh=ˆ

F(x, u)−ˆ

F(x, uc),(123)

using the control command ucdeﬁned by (26)-(27), and employing the same

procedure as given in Section 2.2 (by substituting eby ¯e), it can be concluded

that both the tracking error ¯eand the estimation error of the weight matrix W

are bounded. In this regard, due to the substitution of eby ¯ein the updating

rule of the NN weights, they can be satisfactorily updated even at the time of

input saturation owing to the elimination of the eﬀect of input constraints from

¯eusing the introduced term νh. However, concerning the boundedness of the

real tracking error e=x−xd, there is a need for a restrictive assumption, i.e.



Zt

νh(τ)dτ



≤νM,(124)

with νMis a positive constant.

This approach has been utilized in [203, 204] to control the attitude of a

Reusable Launch Vehicle (RLV) considering actuator faults. As discussed in

[204], even in the lack of system controllability, the adaptation mechanism is

satisfactory, which results in a rapid recovery once the system controllability is

retrieved. Similarly, PCH has been adopted in [71, 205, 206, 207] in a trajectory

tracking control problem applied to a helicopter, where the PCH technique has

been employed in both the inner and outer control loops. As a result, the

interaction between adaptive elements in the outer loop and the characteristics

of the inner loop can also be avoided. The same approach has been utilized

in [208] to control the trajectory of a ducted-fan VTOL UAV. Further, PCH

has been employed in [209] to overcome actuators’ nonlinearities in the landing

control of a ﬁxed-wing aircraft.

3.6.2. Employment of a modiﬁed tracking error

Another eﬀective approach to handle diﬀerent types of input constraints

with less restrictive assumptions is to introduce an auxiliary state variable cor-

responding to a ﬁltered version of the eﬀect of input constraints. More pre-

cisely, consider the dynamic model (1). Suppose that the real system input (u)

is obtained as h(uc), where ucand hrepresent the computed (desired) control

command and a known nonlinear function, respectively. Notice that h(.) can

represent diﬀerent types of input nonlinearities such as the saturation function,

dead-zone nonlinearity, etc, or user-deﬁned ﬁlters [210] to generate feasible con-

trol commands according to the physical constraints of the system. Now, we

deﬁne an auxiliary variable γas follows:

˙γ=−kγ +B(x)δu, (125)

where δu =u−uc. Accordingly, a modiﬁed tracking error can be deﬁned as

z=x−xd−γ=e−γ. (126)

Notice that, in the absence of input constraints, γtends to zero, and so the

introduced modiﬁed tracking error reduces to the real tracking error. Besides,

the introduced modiﬁed tracking error has a similar formulation to the compen-

sated tracking error used in the command ﬁltered backstepping control (Section

2.3.2). Considering the following deﬁnitions,

uc=B(x)−1−F(x)−ˆ

∆ + ˙xd−ke,(127)

∆ = W∗Tµ(x) + ε, ˆ

∆ = ˆ

WTµ(x),(128)

W= ΓµzT,(129)

and by deﬁning a Lyapunov function as

V=1

2zTz+1

2tr ˜

WTΓ−1˜

W,(130)

one can prove the boundedness of z. Thus, assuming that B(x) and δu are also

bounded (the boundedness of δu is a consequence of the system controllability),

it is easy to see that γis also bounded, thereby resulting in a bounded real

tracking error (e). Besides, even if the system controllability is lost at some time

intervals, the updating rule of the NN is still stable thanks to the utilization of

the bounded term zrather than the real tracking error in (129).

By comparing the above-mentioned approach with the PCH technique, it

can be found that both methods attempt to eliminate the eﬀects of input con-

straints from the tracking error that is involved in the updating rule of the NN

parameters, while the employment of the low-pass ﬁlter (125) in the current

scheme relaxes the necessary assumption on the residual term induced by input

constraints.

Such an approach has been employed in [124, 125] in the framework of the

command ﬁltered backstepping control. The same technique has been adopted

in [117] to control the longitudinal mode of an HFV using a DSC design. Second-

order ﬁlters have been utilized in these papers (h(.) is deﬁned as a linear second-

order transfer function) to deal with the magnitude, rate, and bandwidth limits

of the control commands. Notably, as discussed in [124], the constraints on the

system states can also be similarly taken into account by ﬁltering the virtual

control commands in the backstepping control. Modiﬁed tracking errors have

also been utilized in [141] and [186], respectively, in a backstepping control and

an SMC to deal with input saturation, where the designed controller in [186]

has been applied to the longitudinal model of an air-breathing ﬂexible HFV.

Similarly, a modiﬁed tracking error has been adopted in [211] to tackle input

saturation in a backstepping control scheme applied to the longitudinal dynamic

model of an HFV considering additive faults, which has been estimated and com-

pensated as a disturbance term using a NN. In addition, an analogous approach

has been employed in [127] in a DSC design to control the longitudinal dynamic

model of a morphing aircraft in the presence of input saturation. Besides, a

somewhat similar scheme, borrowed from [212], has been employed in [191] to

deal with partial state constraints in an integrated guidance and control design

for skid-to-turn missile using DSC. More precisely, an auxiliary state variable γ

has been deﬁned in [191] based on δu, where γhas been involved in the desired

virtual control command instead of the tracking error, while the given stability

analysis in the paper requires some revision.

3.6.3. Neuro-predictive control

Model predictive control (MPC) is an advanced control method that can

satisfactorily deal with input, state, and output constraints. More precisely,

an optimization problem is constructed to minimize the tracking error within a

prediction horizon, as well as the control eﬀort within a control horizon, while

considering the system constraints. The optimization problem is solved at each

time step. The ﬁrst element in the computed control sequence is applied to

the system, and the entire process is repeated in future steps. Despite the

numerous advantages of MPC in dealing with nonlinear, MIMO, and constrained

system dynamics, there are signiﬁcant concerns regarding the stability analysis

of the system and the high computational burden of MPC. The stability of the

closed-loop systems can be ensured using terminal costs and terminal constraints

[201]. However, such stabilizing terminal conditions can make the optimization

problem infeasible. In this regard, the recursive feasibility problem has been

extensively addressed by researchers to provide a feasible control design with

guaranteed stability [213, 214]. On the other hand, diﬀerent practical MPC

schemes have been introduced in the literature to provide a computationally

eﬃcient control system for real applications [215].

Concerning the application of MPC in IFCSs, it should be noted that NNs

can be adopted in the framework of MPC in diﬀerent ways. A straightforward

way is the employment of a (typically recurrent) NN to learn the system dy-

namics as a prediction model and utilizing it in the structure of MPC. A NARX

NN, with an RLS optimization-based online adaptation, has been used in [216]

as the prediction model of a 6-DOF F-16 ﬁghter aircraft, and afterward, it has

been incorporated in an MPC to control the vehicle’s attitude considering in-

put constraints. In [217], an adaptive feedforward NN has been employed to

estimate the translational acceleration of a ﬁxed-wing aircraft in a moving time

window, where the identiﬁed model has been adopted in an MPC-based trajec-

tory tracking scheme in the presence of input constraints and model uncertain-

ties. However, the closed-loop stability has not been analyzed in these studies

owing to the complicated structure of the proposed nonlinear MPC. Multiple

model-based MPC using a set of local NARX NNs as the prediction model of

the system has also been introduced in [200], where both the system constraints

and actuator faults have been considered in the control design process.

On the other hand, regarding model-based approaches, a linear MPC has

been proposed in [218], where the linearization error and unmolded dynamics

have been estimated by a feedforward NN in an oﬄine identiﬁcation problem.

The deigned control system has been employed in the altitude control of a

helicopter, while the estimation error of the NN has not been considered in

the design process, and the closed-loop stability has not been analyzed. In

addition, a fault-tolerant MPC has been introduced in [195], where an OSELM-

based algorithm has been adopted to identify actuator faults. Also, the input

constraints have been considered in the given design, and the estimation error of

the identiﬁcation block has been compensated by a DO. Further, the closed-loop

stability has been proved using terminal constraints, while there are concerns

regarding the feasibility problem.

3.6.4. Using Nussbaum function

In [219], Nussbaum functions have been employed in a backstepping control

scheme to overcome the input saturation. To this end, the saturation function

is approximated by a smooth function g(v), and the approximation error is in-

cluded in an unknown disturbance term. Subsequently, a Nussbaum function

has been utilized to handle ∂g/∂v, which is created in the last step of the back-

stepping control design. However, there are concerns about the stability analysis

of the proposed approach given in the paper. To be more precise, although the

Input-to-State Stability (ISS) assumption has been employed in the paper, the

boundedness of the introduced Lyapunov function has been proved considering

the input saturation without using the ISS condition, while this is an irrational

result. Based on such a method, A DSC has been proposed in [118] to control

the attitude of an NSV considering the input saturation and external distur-

bances. Surprisingly, there is no assumption on the stabilizability of the system

to ensure closed-loop stability in the presence of input saturation. Apparently,

this is due to the employment of the aforementioned theorem in [219]. Sim-

ilarly, considering a more stringent problem, a backstepping control has been

developed in [177] for a SISO model of a helicopter to control the pitch angle

of the vehicle in the presence of input and output constraints. Neural networks

have been employed to identify model uncertainties, while disturbance observers

attempt to compensate for unknown external disturbances. Again, Nussbaum

functions have been used to deal with the input saturation, where there is no

assumption on the stabilizability of the air vehicle. In this regard, further in-

vestigations should be conducted by researchers to evaluate the application of

the Nussbaum function in the control of constrained systems. But, similar to

the discussion presented in Section 3.3, Nussbaum functions have been success-

fully adopted in [174] to deal with dead-zone input nonlinearity as an unknown

control gain function.

3.7. Consideration of state/output constraints

As discussed previously, the use of modiﬁed tracking errors and the MPC

framework can be beneﬁcial in dealing with state constraints, as well. Mean-

while, there are other approaches in the literature to cope with state/output

constraints in the structure of IFCSs. The most commonly used method for

this purpose is the employment of Barrier Lyapunov Functions (BLFs). A bar-

rier function is deﬁned as a function, f(z), which tends to inﬁnity as its variable,

z, tends to a predeﬁned bound [220]. Accordingly, considering a desired bound

kbfor the tracking error e=x−xd, a BLF can be deﬁned as follows [221]:

V0=1

2ln k2

b−eTe,(131)

which is a positive deﬁnite function for kek< kb(it is assumed that ke(0)k<

kb). Thus, considering the dynamic model (1) and using the same NN function

approximation as given in Section 2.1, a Lyapunov function can be deﬁned for

the system as

V=V0+1

2tr ˜

WTΓ−1˜

W.(132)

This results in the following equation.

V=eT

b−eTe(F(x) + B(x)u+ ∆ −˙xd) + tr ˜

WTΓ−1˙

W.(133)

Consequently, using the control command (4) and by deﬁning the following

updating rule,

W= ΓµeT

b−eTe,(134)

we have:

V=eT

b−eTe(−k1e+ε),(135)

which ensures the negative deﬁniteness of ˙

Vfor kk1ek>kεk, thereby guaran-

teeing the satisfaction of kek< kb(assuming that kb>kε/k1k).

Such a control formulation can also be employed in the backstepping con-

trol design to impose both the state and output constraints on the controlled

system. In [138], a BLF has been adopted in a backstepping control scheme in

the position control loop to keep the tra jectory tracking error of a quadrotor

UAV in a desired bound. Similarly, a BLF has been utilized in [177] within a

backstepping controller to control the pitch angle of a 3-DOF helicopter con-

sidering output constraints. The constraint on the angle of attack (AOA) has

also been dealt with by a BLF in [83] in a backstepping controller applied to

the longitudinal mode of an HFV. In addition, in [127], both the velocity and

altitude constraints have been considered in a backstepping design using BLFs,

where the designed control system is applied to the longitudinal dynamic model

of a morphing aircraft.

As discussed in [116], the satisfaction of output constraint using the BLF is

achieved at the expense of excessive control eﬀort in the case of approaching the

tracking error to the boundaries of the permissible region. Accordingly, there

is a trade-oﬀ between choosing a narrow range for the outputs’ tracking error

and reducing the control eﬀort. More speciﬁcally, as a typical BLF imposes

a constant upper bound on the system output, it may lead to large control

inputs at initial times. An asymmetric BLF with time-varying bounds has

been employed in [116] to deal with time-varying output constraints in which

the constant parameter kbin (131) is substituted by a function of time. The

introduced scheme has been utilized in a DSC design in case of the attitude

control of a multi-rotor UAV.

Another eﬀective approach to tackle output constraints using a time-varying

funnel-like bound is known as the funnel control. The key point of the funnel

control is to construct a time-varying gain to control a dynamic system in such

a way that the (norm of the) tracking error falls within a funnel boundary

(t), where (t) is a continuous bounded function [222]. To be more speciﬁc,

the funnel control is somewhat similar to the BLF-based approach, where the

Lyapunov function (131), in the case of a single-output system, is changed to

V0=1

2e

Φ(t)− |e|2

,(136)

where Φ(t) = 1

(t). Such an approach has been employed in [133] in a back-

stepping design to deal with both the velocity and altitude constraints in the

case of the longitudinal mode of an air-breathing HFV with a nonaﬃne model.

Similarly, a Lyapunov function has been introduced in [147] as

V0=1

2tan2πe

2ξ(t),(137)

where ξ(t) represents a funnel-like function. This method has been utilized in

[147] to control an HFV using a typical backstepping control in the presence

of external disturbances and actuator faults. Although such approaches suﬀer

from the same issue as the BLF scheme, i.e. the excessive control eﬀort in the

vicinity of the permissible output boundaries, the initial large control inputs

can be avoided due to the employment of a funnel boundary.

3.8. Self-organizing neural networks

Although due to the universal approximation property, NNs (or FNNs) can

approximate almost all nonlinear functions with an acceptable estimation ac-

curacy, the determination of the appropriate number of hidden nodes (or fuzzy

rules) in the network is not an easy task. In addition, the development of a

variable structure NN rather than a ﬁxed-structure NN (with only variable pa-

rameters) provides greater power to deal with time-varying characteristics of

dynamics systems. Several self-organizing NNs have been introduced in the lit-

erature to deal with such issues. Further, a self-organizing FNN can eliminate

the requirement for prior knowledge about the system [223]. Typically, a set of

growing and pruning algorithms are deﬁned in a self-organizing network to add

or remove hidden nodes to (/from) the NN when necessary. As a result, a set of

modiﬁcations may be required in the network’s parameters (at the time of the

change in the network’s architecture) to ensure the continuity of the network

output.

As a traditional approach in this ﬁeld, Minimal Resource Allocation Network

(MRAN) was introduced in [224], which has been developed based on RBFNNs.

The growing phase in MRAN is activated if 1) the incoming data is far away

from the center of existing hidden nodes, 2) the estimation error in the current

step is larger than a predeﬁned threshold, and 3) the root mean estimation error

over a moving window is larger than a predeﬁned threshold. Such an approach

can be considered as a clustering problem. In this regard, the center of the

newly added node is set to the last incoming data, while the output weight of

that neuron is equal to the current estimation error of the network. On the

other hand, a hidden node is pruned from the network architecture if the nor-

malized output of that neuron becomes less than a predeﬁned threshold in a

speciﬁc number of consecutive steps. In addition, the network parameters are

trained using either a Least Mean Squares (LMS) optimization or an EKF al-

gorithm. An extension to MRAN, called Extended MRAN (EMRAN), has also

been introduced in [225] in which only the parameters of the nearest neuron

to the current input data are updated at each step. This leads to a signiﬁcant

reduction in the online computational burden of the algorithm. Such a learn-

ing strategy has been adopted in several studies. In [91], an MRAN-aided H∞

control is incorporated in an auto-landing control problem of a conventional air-

craft considering external disturbances and actuator faults. The NN, which was

trained using the FEL method, augments the control command of the baseline

controller. Similarly, an EMRAN-aided controller has been proposed in [92] to

control a ﬁghter aircraft in the landing phase considering actuator faults and

severe winds, where the NN attempted to learn the inverse dynamics model

of the system using a FEL scheme. However, the closed-loop stability has not

been analyzed in these two papers. In a similar manner, EMRAN has been

adopted in [226] to augment a baseline controller, combined with an SMC to

ensure the closed-loop stability, where the designed controller has been applied

to an auto-landing problem considering actuator faults and severe winds.

The ambiguity in how to determine the parameters employed in an MRAN

is a challenging issue, while there is no explicit relationship between the design

parameters and the estimation error of the NN. As an alternative, the concept

of the signiﬁcance of a neuron has been employed in [227] to provide more ef-

ﬁcient growing and pruning rules for RBFNNs. The signiﬁcance of a neuron

is deﬁned as the average of its output over all the input samples it has seen.

Accordingly, a new neuron is added to the network only if its signiﬁcance is

greater than a chosen learning accuracy, while a neuron is pruned if its signiﬁ-

cance becomes less than the learning accuracy. The main concern about such an

approach is the complexity of computing the signiﬁcance of a neuron, which has

been determined in [227] assuming a uniform distribution for the input data.

Such a concept has been extended in [96] to develop the growing and pruning

rules within the framework of a Generic Evolving Neuro-Fuzzy Inference System

(GENEFIS), which was ﬁrst introduced in [228]. An ǫ-completeness criterion

has also been utilized in the paper to add a new rule when a new incoming

sample cannot be covered by any existing rules. According to this criterion,

which has been introduced in [229], the ﬁring strength of at least one fuzzy rule

corresponding to each data in the operating range, should not be less than ǫ.

Subsequently, to update the antecedent parameters of the fuzzy rules, the Gen-

eralized Adaptive Resonance Theory+ (GART+), which uses the Bayes decision

theory, has been ﬁrst employed to determine the winning rule corresponding to

each newly added data, and then, a vigilance test has been conducted to in-

vestigate the capability of the winning rule to accommodate the newest data.

Further, an SMC-based approach has been adopted to train the consequent pa-

rameters of the fuzzy rules. Alternatively, hyperplane-based clusters have been

employed in [97], which removes the antecedent parameters in the proposed

neuro-fuzzy system. More precisely, the membership function of each fuzzy rule

has been deﬁned according to the distance between the current data point and

the corresponding hyperplane, where the hyperplane parameters are considered

as the consequent parameters of the network. The main idea in the introduced

self-organizing network has been borrowed from [223] in which a Parsimonious

Learning Machine (PALM) has been developed for data regression. However,

diﬀerent from the basic PALM, which requires various predeﬁned thresholds, the

growing and pruning mechanisms in [97] have been developed using the concept

of bias-variance. Accordingly, by deﬁning the expected squared tracking error of

the system as the Network Signiﬁcance (NS), the NS has been derived as a sum

of the bias and variance of the plant’s expected output. Then, a high variance of

the system outputs has been interpreted as the high complexity of the network,

which in turn activates the rule pruning mechanism. On the other hand, the

rule growing algorithm is activated in the presence of high output bias, which

is induced by an oversimpliﬁed network. Finally, similarly to [96], an SMC-

based training method has been adopted to update the consequent parameters

of the network. The proposed schemes in both the above-mentioned research

[96, 97] have been utilized in the altitude and attitude control blocks of a hexa-

copter and a ﬂapping-wing micro aerial vehicle as an aid to a baseline controller.

Similarly, a self-constructing FNN has been introduced in [152], where the dis-

tance between the incoming data and existing clusters has been considered as

a measure to add a new rule, while the distance between the existing clusters

has been analyzed to prune insigniﬁcant rules. The obtained self-constructing

FNN has been utilized to approximate the upper bounds of model uncertain-

ties, while it has been employed in an FTC applied to a longitudinal model of

a ﬁxed-wing aircraft. Notice that, despite the development of various eﬀective

self-organizing NNs in the literature, signiﬁcant concerns still remain about the

optimality of the network’s architecture. As a consequence, the development of

a truly generic approach to construct an optimal network structure depending

on diﬀerent characteristics of the obtained data from a plant is an important

research direction, which must be addressed in future studies as a critical step

in developing a fully autonomous control scheme.

Finally, concerning multiple-model-based structures, a self-organizing multi-

model ensemble has been given in [230], in which a new local NN is added to

the proposed multi-model structure if the estimation error of the entire model

exceeds a predeﬁned threshold. In addition, a local NN is considered as an

insigniﬁcant model and pruned from the entire model if the normalized weight

of the model in the entire scheme becomes less than a predeﬁned threshold. The

proposed approach has been employed in the paper to identify the time-varying

dynamic model of a ﬁxed-wing aircraft at diﬀerent ﬂight conditions.

3.9. Concerns with air vehicle’s characteristics

The position and attitude of an air vehicle can be determined using the kine-

matic equations based on the translational and angular velocities, respectively.

As a result, the position and attitude can be controlled indirectly in a back-

stepping scheme, where in the ﬁrst step, the position (or attitude) controller

is designed, and the second step deals with the control of the translational (or

angular) velocity. Besides, the measurement noises or the simpliﬁcations in the

kinematic equations, which appear in the ﬁrst step of the backstepping con-

troller, can be estimated and compensated (as an uncertain term) by NNs [116].

On the other hand, the dynamic equations of an aerial vehicle can be gen-

erally derived using either the Newton-Euler or Euler-Lagrange methods. Re-

garding conventional multi-rotor VTOL UAVs (with no tilt-rotor), the system

dynamics are divided into the rotational and translational equations, where, due

to the under-actuated dynamics of the vehicle, the translational motion (typ-

ically in the horizontal plane) should be indirectly controlled by the vehicle’s

attitude. Thus, a straightforward control method to deal with such a dynamic

model would be a multi-loop control design wherein the desired attitude, which

is controlled by the inner loop, is determined using the translational dynamics

in the outer control loop [32, 58, 60]. Such a framework may also be expressed

within a backstepping control scheme. More precisely, the ﬁrst step of the

backstepping controller would deal with the translational dynamics, while the

attitude dynamics have been addressed in the second step [138, 162]. In this

regard, the desired attitude is indirectly determined (usually by employing the

inverse kinematics method) to provide the desired forces required in the outer

loop (or in the ﬁrst step of the backstepping controller) [231, 95, 151]. Accord-

ingly, owing to the complicated relationship between the vehicles’ attitude and

the translational dynamics, it may be a requirement for a control law (in the

inner loop) that can guarantee the asymptotic stability of the inner loop (rather

than a bounded tracking error). Otherwise, the possible tracking error caused

by the inner control loop should be considered in the outer loop, while it can

complicate the stability analysis. Further, as discussed earlier, uncertain forces

and moments induced by uncertain dynamics or external disturbances in the

translational and rotational dynamic models can be estimated by distinct NNs.

Besides, a similar framework can also be employed in the case of a helicopter.

In addition to the inverse kinematics method to determine the desired attitude

(or the desired angular velocity) [206, 232], it is also possible to deﬁne a vir-

tual control input for the attitude dynamics in a backstepping control scheme

[180]. Such a virtual control would be computed according to the translational

dynamics of the vehicle, which have been addressed in the previous steps of

the backstepping design, by taking into account the kinematic equations (cor-

responding to the rotation matrix or the quaternion) in such a way that the

closed-loop stability can be analyzed based on the Lyapunov stability theorem

[233]. Again, NNs can be adopted to compensate for model uncertainties, exter-

nal disturbances, or the inversion model error (in the pseudo-control strategy)

[101] in each loop. By employing a backstepping scheme in [110] for the attitude

control of a helicopter, the uncertain control gain matrix (gi) in the dynamic

model has been estimated by a distinct NN, while the extra terms due to the

minimum estimation error of that NN (ε) has been considered in the last step

of the backstepping design corresponding to the actuator dynamics. However,

the proposed design suﬀers from the chattering phenomenon. Further, the issue

of unknown inertia matrix has been dealt with in [168] wherein an additional

adaptive rule has been deﬁned to estimate it (while taking advantage of the

Cholesky decomposition).

A somewhat similar control framework can also be designed in the case of

conventional ﬁxed-wing aircraft. A backstepping design has been employed in

[125], where the desired trajectory is ﬁrst transformed to the desired velocity,

ﬂight path, and heading angles using the kinematic equations and subsequently,

they are converted to the desired thrust force and angular velocity using dynamic

equations. Finally, in the last step, the control-surface deﬂections have been

determined according to the desired angular velocity, where unknown forces and

moments have been estimated using the FEL method. In such a framework, the

outer loop is typically considered as the guidance loop, while the inner loops are

known as the main control system. A similar approach has been utilized in [200],

where the desired trajectory is ﬁrst transformed to the desired Euler angles (and

subsequently to the desired angular velocity) in the guidance loop, while in the

inner loop, the actuator deﬂections have been determined based on the desired

angular velocity. Another approach is to decouple the control problem of the

longitudinal and lateral-directional modes of a ﬁxed-wing air vehicle (using some

simpliﬁcations) and address them separately [98]. In this regard, several studies

have addressed only one of these two subtasks and skipped the other part [152].

In the case of HFVs, almost all of the above-mentioned papers have investi-

gated only the longitudinal model of the vehicle, where the velocity and altitude

subsystems are typically decoupled, as well. To this end, the eﬀect of thrust

force on the altitude subsystem should be neglected, and the change rate of the

velocity is considered much smaller than that of the altitude (known as time-

scale decomposition) [136]. As a result, the velocity subsystem is obtained as

a simple SISO model with a single state, while the altitude subsystem includes

four state variables including the altitude (h), the Flight Path Angle (FPA, γ),

the pitch angle (θ), and the pitch rate (q). Further, the consideration of ﬂexible

modes results in introducing additional states, which are not directly involved

in the control design process [117, 114, 174], and they may be considered as

a disturbance compensated by a DO. In this regard, the wind eﬀect, which

results in an excessive angle of attack, can also be considered as an unknown

disturbance [81]. In such a framework, the main system inputs are the throttle

setting and the elevator deﬂection, which directly inﬂuence the velocity and al-

titude subsystems, respectively. Other systems inputs such as the diﬀuser area

ratio and the canard deﬂection can also be considered in the design, while there

are typically assumed to be constant or a linear function of other system inputs

[114]. Besides, a coordinate change has been employed in [82] to deal with the

non-minimum phase behaviour of the attitude subsystem (due to the coupling

between the lift force and the elevator deﬂection) that is typically eliminated by

the canard deﬂection as an additional control input in most studies. Knowing

that h=Vsin γ, a typical method to design a controller for the altitude subsys-

tem is incorporating an intermediate PID controller between hand γto derive

the desired FPA [115], and subsequently transforming the remaining system

(including x= [γ, θ, q]T) into a strict feedback form, which can be controlled

using a backstepping design [135]. In this regard, both the direct and indirect

adaptive backstepping designs can be used to control such a strict feedback

system [33] On the other hand, the backstepping design can be avoided by de-

riving a normal output-feedback model (in the case of continuous-time systems)

[131, 132] or a prediction model (in the discrete-time domain) [136]. In such

a circumstance, it may be possible to use only one NN to compensate for un-

certain terms in the control command. Such a normal feedback form has been

employed in a pseudocontrol strategy in [99] to deal with nonaﬃne dynamics

of the vehicle. There also few works in the literature, which have addressed

the coupled dynamics of the velocity and attitude subsystems. A combination

of singular perturbation theory and implicit function theorem has been incor-

porated in [109] to deal with the longitudinal model of an HFV in a uniﬁed

manner, while conservative assumptions on the dynamic model are required in

the proposed control scheme. In a more eﬀective way, a neural backstepping

controller has been proposed in [189] for a MIMO model of an HFV considering

the coupling between the velocity and attitude subsystems, where a combined

adaptive design and a DO (as discussed in Section 3.4.2) has been adopted.

4. Towards truly model-free control systems

4.1. Neural network-based system identiﬁcation

As discussed earlier, in the basic indirect FEL-based control, a nominal

model of the system is derived and subsequently, a set of NNs are employed to

identify model uncertainties and external disturbances as a single term [209].

This type of dynamic modeling leads to a conservative control design with re-

duced eﬃciency. To be more precise, most diﬃculties in modeling a complex

dynamic system may originate from the existence of hidden states in the sys-

tem, not from the model uncertainties caused by a lumped disturbance [234].

In addition, in many of the above-mentioned papers, an initial controller was

designed based on a nominal model of the system, and then a control augmen-

tation was proposed considering instantaneous model uncertainties. However,

in the case of severe structural damages or signiﬁcant dynamic changes, such

an approach may lead to excessive control eﬀort or even closed-loop instability

[182].

In this regard, the development of a valid dynamic model of the system is a

crucial task in the control theory. Typically, this is performed by two diﬀerent

approaches: the ﬁrst-principles modeling and the system identiﬁcation. Obtain-

ing an acceptable dynamic model of an air vehicle using the ﬁrst-principles mod-

eling requires detailed information about the aerodynamic and propulsive forces

and moments acting on the vehicle, which may not be practical for complicated

systems. Alternatively, system identiﬁcation attempts to ﬁt a mathematical

model on the obtained system inputs-outputs. It can be eﬀectively employed

to identify the system dynamics, particularly in the case of complex systems.

However, the employment of such a black-box model identiﬁcation suﬀers from

the lack of interpretability of the obtained model [36]. There are various studies

in the literature that have demonstrated the superior performance of the inte-

gration of two methods compared to that of only one method [235, 236, 237],

while such an approach remains an open research topic in the framework of

IFCSs [52].

In contrast to the FEL method, there are a variety of intelligent controllers in

the literature that include separate identiﬁcation and control design processes.

In this regard, the previously mentioned unique capabilities of NNs make them

an ideal candidate to be used in the identiﬁcation process of such control sys-

tems. Diﬀerent feedforward and Recurrent Neural Networks (RNNs) have been

employed for this purpose, where the training process of the network would

be performed in the framework of the supervised learning using either oﬄine

or online approaches (or a combination of them). More precisely, the system

identiﬁcation and the control design processes can be fulﬁlled sequentially or

simultaneously in an iterative manner, which is also known as iterative learning

control. As a result, the iterative learning control can eﬀectively deal with time-

varying dynamic systems, and consequently, it can be classiﬁed as an intelligent

control system, while the employment of a pre-trained NN in the ﬂight control

system may not be considered as an IFCS.

Further, the identiﬁcation problem using a NN can be considered as an

optimization problem in which the NN’s parameters are determined by mini-

mizing the prediction error of the NN with respect to adjustable parameters

of the NN. Therefore, diﬀerent optimization algorithms (from traditional ap-

proaches such as the gradient-descent method, the Gauss-Newton method, the

Levenberg-Marquardt (LM) method, etc., to heuristic methods such as the ge-

netic algorithm [238]) can be eﬀectively employed to train the NN parameters.

Such optimization algorithms have been thoroughly discussed in the literature

[40], and thus, in the following, only the Online Sequential Extreme Learning

Machine (OS-ELM) method will be brieﬂy introduced as a conventional online

training algorithm in the structure of indirect adaptive ﬂight control systems.

Besides, concerning the network structure, some of the most commonly used

NNs for identifying the system dynamics in the case of an IFCS will be dis-

cussed in the following.

4.1.1. Single-hidden-layer neural networks

As a special case of state-space modeling of a dynamic system, input-output

representation is a simpler popular approach to model nonlinear systems [198].

Using such a formulation, the system output (in the discrete-time domain) can

be represented as follows:

y(k) = h(y(k−1),···, y(k−P), u(k−1),···, u(k−M), d(k)) ,(138)

where uand ddenote the system input and the vector of disturbances, re-

spectively. Also, Pand Mrepresent the number of past outputs and inputs

employed in the modeling. Here, his an unknown function, which should be

identiﬁed. Also, considering (138), the system states are [y(k),···, y(k−P)]T.

In this regard, the assumption on the inﬂuence of external disturbances and

noises on the system dynamics results in introducing two conventional model

structures. More speciﬁcally, in the presence of the state noise (which is also

known as the equation error ), the dynamic model (138) can be simpliﬁed as

y(k) = f(y(k−1),···, y(k−P), u(k−1),···, u(k−M)) + d(k),(139)

where fand drepresent, respectively, an unknown nonlinear function and an

additive disturbance term. Such a model is known as a Nonlinear Autoregressive

with exogenous inputs (NARX) model, which can be identiﬁed by a NARX NN.

According to (139), a NARX NN employs the past measured system outputs

and system inputs as the network inputs. Consequently, a NARX NN can be

considered as a feedforward NN with taped delay lines. The use of the NARX

structure in training the network is also known as series-parallel identiﬁcation

[239].

On the other hand, in the presence of the output noise, the system dynamics

model can be formulated as

x(k) = f(x(k−1),···, x(k−P), u(k−1),··· , u(k−M)) ,

y(k) = x(k) + d(k).

(140)

Accordingly, the system output at each step is a function of the disturbance

that occurs at the same time step only. Nonlinear Output Error (NOE) NNs

can be utilized to identify such dynamic models, where the NN employs the

past network’s outputs and system inputs as the network input. Thus, an NOE

NN would be a recurrent network, and employing such a scheme in the training

process results in a parallel identiﬁcation method [240, 241]. Further, a combi-

nation of (139) and (140) can be taken into account to consider both the state

and output noises in the dynamic model, simultaneously. Such a method, which

results in a Nonlinear Autoregressive with Moving average and exogenous in-

puts (NARMAX) model, requires both the past measured system outputs and

model outputs in the dynamic model [240]. Besides, a class of identiﬁcation

techniques has been introduced in the literature to combine the advantages of

both the parallel and series-parallel approaches, which are typically developed

based on the average of the measured outputs and the predicted outputs.

Concerning the diﬀerence between these two types of NNs, the use of series-

parallel identiﬁcation prevents several complexities of training a recurrent net-

work, thereby guaranteeing the training convergence. In addition, the series-

parallel structure, which is also known as the teacher forcing method, leads to

a faster training speed [242, 216]. On the other hand, parallel identiﬁcation

suﬀers from stability problems and complicated training methods [243, 244].

However, it should be noted that a NARX NN can be used only in the case

of the single-step ahead prediction, while in multi-step ahead predictions, there

is a need for an NOE network. Although one can convert the NARX neural

network after the training process to an NOE network, the use of the series-

parallel approach in the training phase may lead to inaccurate predictions for

long prediction horizons [241].

NARX NNs have been used in [198] to identify the nonlinear model of an

F-16 aircraft using a hybrid oﬄine-online training algorithm considering model

uncertainties. The NN’s parameters have been trained using the LM method.

Subsequently, the obtained model has been employed in a fault-tolerant NN-

based adaptive PID attitude control system. A similar identiﬁcation technique

has been used in [216] for a similar aircraft model, where the identiﬁed model

has been utilized in a predictive attitude controller. To train the network param-

eters, an adaptive updating rule with exponential forgetting has been derived

based on a recursive formulation of the Gauss–Newton method, which is similar

to the OS-ELM algorithm introduced in the following.

On the other hand, concerning the training algorithm of a NN in an iden-

tiﬁcation problem, in contrast to the FEL method, which has been developed

based on the Lyapunov stability theorem, a variety of online training algorithms

have been introduced in the framework of an open-loop identiﬁcation problem,

which is typically developed based on the minimization of the mean squared

prediction error [40]. As a simple and popular method, OS-ELM, which has

been developed based on the Recursive Least Squares (RLS) optimization [245],

can be eﬀectively employed in online identiﬁcation problems. The use of such

an approach to identify the system dynamics (based on the RLS optimization)

can result in a signiﬁcantly better performance (compared to the FEL-based

method) in the structure of the trajectory tracking control of a damaged air-

craft [38]. In the following, a brief description of the OS-ELM algorithm is given.

Extreme learning machine (ELM), which can be considered as a single-hidden-

layer feedforward neural network with random constant weights and biases in

the hidden layer, has been employed in several studies as a part of the control

system. This is due to the simple linear learning method of this type of NNs

in which only the output weights of the NN are trained during the identiﬁca-

tion process [47]. Now, consider an ELM as f(u) = WTµ(u) to identify the

unknown mapping between system inputs (u) and outputs (y) in the case of a

SISO system (a similar formulation can be presented for MIMO systems [195]).

Considering a set of system inputs-outputs D={(u(k), y (k)) |k= 1, ..., K}with

Kdistinct samples, the introduced ELM can be trained through the data set

D. As a result, ideally, we should have ΦW=Ywhere,

Φ = [µ(u(1)) ··· µ(u(K))]T,(141)

Y= [y(1) ··· y(K)]T.(142)

Assuming K > N, ΦK×Nbecomes a non-square matrix. In such cases, the

optimal weights of ELM can be determined using the least-squares optimization

as ˆ

W= Φ†Y, where Φ†= (ΦTΦ)−1ΦTdenotes the pseudo-inverse of Φ [246].

However, in the case of online training problems, the incoming data are ob-

tained one by one. Thus, ˆ

Wcan be updated at each iteration using a recursive

formulation as follows:

W(k+ 1) = ˆ

W(k) + ℓ(k)e(k),(143)

P(k+ 1) = I−ℓ(k)µ(u(k))TP(k),(144)

where,

e(k) = y(k)−ˆ

W(k)Tµ(u(k)),(145)

ℓ(k) = P(k)µ(u(k))

1 + µ(u(k))TP(k)µ(u(k)) .(146)

(147)

As discussed in [35], there is a need for a persistent exciting regressor µ(u(k))

to ensure the convergence of ˆ

Wto its optimal value. Various types of OS-ELM

have been proposed in the literature for diﬀerent identiﬁcation and control pur-

poses [247, 248]. OS-ELM with constant or variable forgetting mechanism has

been widely utilized by researchers to identify time-varying system dynamics

[249, 250]. Further, the OS-ELM algorithm can be adopted, in a similar man-

ner, to identify the vector of unknown parameters corresponding to linear-in-

parameters model uncertainty in the dynamic model or to relative weights of

local models in a multi-model ensemble. Such an approach has been utilized

in [195] to identify the unknown coeﬃcients corresponding to actuator faults in

the case of a quadrotor UAV. Subsequently, a trajectory tracking method has

been proposed using an acceleration-based model predictive control, which en-

sures the bounded tracking error in the presence of system constraints. Besides,

a hybrid oﬄine-online identiﬁcation scheme has been presented in [199] for a

generic transport model in the presence of actuator faults. A set of local NARX

NNs has been ﬁrst trained under speciﬁc ﬂight conditions and actuator faults,

and subsequently, they have been aggregated as a single model using a set of

adaptive weights updated using an OS-ELM-like approach. A similar method

has been adopted in [200] to develop a fault-tolerant trajectory tracking control

based on a modiﬁed model predictive control. The proposed approach leads

to acceptable trajectory tracking even in the presence of unexpected actuator

faults and ﬂight conditions.

Although, due to the universal approximation property, NNs can estimate al-

most all continuous dynamic systems using a suﬃcient number of hidden nodes,

increasing the hidden nodes may lead to the overﬁtting problem [251]. To be

more precise, the generalization capability of NNs in modeling the system dy-

namics is a crucial issue in utilizing them in a wide range of operating conditions,

which are not necessarily covered in the training stage [252]. This, in turn, may

lead to diﬀerent considerations about the training of a NN, such as employ-

ing PE input signals, selecting appropriate frequency range for input signals

according to dynamic modes of the system, determining optimal network struc-

ture, etc. These concerns have been thoroughly addressed in the ﬁeld of system

identiﬁcation [253], which are beyond the scope of this paper.

4.1.2. Deep neural networks

As an alternative, deep NNs utilize more hidden layers rather than increas-

ing the hidden node in a single hidden layer. In this regard, Convolutional

Neural Networks (CNN) can be considered as one of the most important deep

NNs. CNN has a cascade connection structure. Each CNN cell has two lay-

ers: the convolution layer and the sub-sampling layer. Also, the last layer is

fully connected. The output of a CNN can be formulated as y(k) = VΦ(x(k)),

where Φ represents the operation of hidden layers and Vis the weight vector

of the ﬁnal layer. As discussed in [254], CNN is an extremely powerful tool for

the identiﬁcation of nonlinear systems. This is due to the following facts: the

convolution operation in CNN is the same as the input-output relation of the

linear time-invariant systems; a CNN employs sparse connectivity and shared

weights, thereby reducing the NN parameters and the risk of the over-ﬁtting

issue; the multi-level pooling results in a robust identiﬁcation scheme against

the measurement noises. However, despite the above-mentioned characteristics,

few researchers have addressed the development of ﬂight control systems us-

ing a CNN-based identiﬁed model. CNN has been utilized in [232] to identify

uncertain terms induced by hidden states, varying inertia, and aerodynamic

disturbances in the dynamic model of a helicopter UAV. More precisely, the

dynamic model consists of a simple nominal dynamic model and a set of CNNs.

A two-step optimization process has been adopted. First, the parameters of a

nominal ﬁrst-principles-based model have been optimized using the least-squares

method, where model uncertainties have been neglected at this stage. Subse-

quently, the parameters of deep CNNs have been determined in an open-loop

optimization using the Stochastic Gradient-Descent (SGD) method. The dy-

namic model has been trained and validated under diﬀerent aerobatic maneu-

vers. Afterward, an adaptive backstepping controller has been designed for the

air vehicle which ensures semi-global UUB stability. The use of CNNs in the dy-

namic model to identify diﬀerent types of model uncertainties results in a less

conservative control system compared to conventional FEL-based controllers,

which attempt to compensate for only bounded uncertain terms.

Moreover, concerning the direct employment of CNNs in the control system,

CNN is an appropriate choice for high-level control schemes (such as localization

and path planning), due to its excellent capability in extracting useful informa-

tion, particularly from images [255, 256, 257].

4.2. Neuroadaptive optimal control

4.2.1. Optimal control formulation (HJB vs. HJI equations)

The feedback control law may be obtained using an H2or H∞optimal

control problem at each time step. To be more precise, considering a nonlinear

aﬃne model as ˙x=F(x)+ B(x)u, we can deﬁne a cost-to-go function as follows:

V(x) = Z∞

L(x, u)dτ, (148)

where L(x, u) represents a running cost function. The introduced cost function

is also known as a value function if the running cost L(x, u) is considered as a

reward function (this is the common notation in the framework of reinforcement

learning). Notice that, here, it has been assumed that xd= 0. Thus, in the

case of trajectory tracking problems, we should consider the system dynamics

as ˙e=F(x) + B(x)u−˙xdand substitute x(t) in (148) by e(t) = x(t)−xd(t).

Now, deﬁning the Hamiltonian as

H(x, λ, u) = L(x, u) + λT(t) (F(x) + B(x)u),(149)

with λdenotes the Lagrange multiplier, the optimal control law can be obtained

within the framework of dynamic programming by the following equation [258],

0 = min

uH(x, ∂V ∗

∂x , u),(150)

where the superscript ∗stands for the optimal solution. In the literature, (150) is

known as the Hamilton-Jacobi-Bellman (HJB) equation, while, in general, there

is no analytic solution for it. It is notable that, in the case of unconstrained

linear time-invariant (LTI) systems and using a quadratic running cost L, the

HJB equation reduces to the well-known algebraic Riccati equation [259, 260].

Now, using a quadratic running cost as L(x, u) = xTQx +uTRu with Qand

Rdenote positive deﬁnite matrices, one can obtain the optimal control law as

u∗(x) = −1

2R−1BT(x)∂V ∗

∂x .(151)

By substituting (151) in the HJB equation (150), it is obtained that:

xTQx −1

4∇V∗TBR−1BT∇V∗+∇V∗TF(x) = 0,(152)

where ∇V∗=∂V ∗

∂x . Accordingly, the optimal cost function is determined by

solving the diﬀerential equation (152) considering the boundary conditions, and

subsequently, the optimal control law is computed using (151) at each time.

A similar discussion can be provided in the framework of an H∞control

problem in which the control objective is to achieve closed-loop stability while

attenuating external disturbances. More precisely, consider the nonlinear dy-

namic model ˙x=F(x)+B(x)u+D(x)w, where wdenotes external disturbances.

Accordingly, considering a running cost function L(x, u, w), the optimal control

problem can be formulated as [261]

0 = min

umax

wH(x, ∇V∗, u, w),(153)

where

H(x, λ, u) = L(x, u) + λT(t) (F(x) + B(x)u+D(x)w).(154)

Eq. (153), which is known as the Hamilton-Jacobi-Isaacs (HJI) equation, rep-

resents a minimax optimization problem. It can be referred to as a two-player

diﬀerential game, where the player uattempts to minimize the cost function

while the player wtries to maximize it [262]. Again, by deﬁning a quadratic

running cost as

L(x, u, w) = xTQx +uTRu −β2wTP w, (155)

with βand Prepresent, respectively, a positive constant and a positive deﬁnite

matrix, the optimal control law and the worst-case disturbance can be obtained,

respectively, as (151) and

w∗(x) = 1

2β2P−1DT(x)∇V∗.(156)

By substituting (151) and (156) in (153), the HJI equation becomes as [263, 262]

xTQx +1

4∇V∗TE∇V∗+∇V∗TF(x) = 0,(157)

E=1

β2DP −1DT−BR−1BT.(158)

In this regard, V∗should be determined by solving the HJI Partial Diﬀerential

Equation (PDE) (157) considering the boundary conditions, and the optimal

control law is then computed using (151) at each time. Unfortunately, ﬁnd-

ing the solution of HJ PDEs (152) or (157) is not generally an easy task at

all. Another challenging issue in such optimal control problems is that they

require the complete system dynamics model, which may not be available in

real applications.

4.2.2. Approximate dynamic programming (continuous-time systems)

Diﬀerent approaches have been introduced in the literature to provide a nu-

merical approximation for these control problems [260]. These approaches are

typically addressed within the framework of approximate (or sometimes adap-

tive) dynamic programming (ADP) [264, 265, 266]. The principal diﬀerence

between such adaptive controllers and the previously proposed control methods

in Section 2 is that here, we attempt to determine the approximate optimal

control law, adaptively, while the previous control methods do not necessarily

satisfy the optimality condition. Policy iteration and Value iteration are the

well-known methods in the literature to determine the approximate solution of

HJ equations [267]. The policy iteration method consists of a policy evaluation

and a policy improvement step at each iteration. In the ith iteration, ﬁrst,

the value function V(i)(x) corresponding to the current control law u(i)(x) is

computed by solving Hx, ∇V(i)(x), u(i)(x)= 0, while in the second step, the

control law is updated using (151) (a similar approach can also be taken into

account in the case of the HJI equation). Such an iterative method will con-

tinue until the convergence of the policy function u(x). As discussed in [268],

the policy iteration algorithm will converge to the optimal solution by having

an initial stabilizing control law (policy). On the other hand, the value iteration

method includes an iterative approach for ﬁnding the optimal value function,

and once the optimal value function is determined, the optimal policy can be

explicitly computed using (151) [269]. Unlike the policy iteration, the value iter-

ation does not require an initial stabilizing control law. In a more general view,

however, both methods can be expressed within the framework of the general-

ized policy iteration [269, 270]. The concept of the generalized policy iteration

can be deﬁned as a set of interacting approximate policy evaluation and policy

improvement steps, in which in the ﬁrst step, we do not completely evaluate the

cost of a given control law, but only update the current cost estimate towards

that value. Similarly, in the policy improvement step, the control policy is not

fully updated to the minimizing policy for the new cost estimate, but we only

update the policy towards that policy. Nevertheless, the convergence analysis

of such an ADP scheme, in a general case, is not trivial.

Owing to the unique capabilities of NNs in learning diﬀerent nonlinear func-

tions, traditionally, two NNs were employed as the actor and critic network to

approximate the optimal policy and value function, respectively [267]. Such

an approach can be categorized as a Heuristic Dynamic Programming (HDP)

scheme [271, 265]. In the following, we focus on solving the HJB equation, while

a similar discussion can be provided in the case of the HJI equation. Accord-

ingly, we have

V(x) = W∗

Tµv(x) + εv,ˆ

V(x) = ˆ

vµv(x),(159)

u(x) = W∗

Tµu(x) + εu,ˆu(x) = ˆ

uµu(x),(160)

where V(x) represents the corresponding value function of u(x), which satisﬁes

H(x, ∇V(x), u(x)) = 0. Thus, it is obtained that

xTQx +uTRu + (∇V(x))T(F(x) + B(x)u) = 0,(161)

u(x) = −1

2R−1BT(x)∇V. (162)

Consequently, knowing that H(x, ∇V∗, u∗) = 0, we can deﬁne

ec=H(x, ∇ˆ

V , u)−H(x, ∇V∗, u∗) = xTQx

+uTRu +ˆ

v∇µv(x) (F(x) + B(x)u),

(163)

where the last equality is obtained using ∇ˆ

V(x) = (∇µv(x))Tˆ

Wv. Therefore,

an appropriate training rule may be obtained by minimizing Ec= 1/2e2

c. Using

a normalized gradient descent algorithm, we have

Wv=˙

Wv=−αc

∂Ec/∂ ˆ

(1 + φTφ)2=−αc

(1 + φTφ)2ec

=−αc

φφT

(1 + φTφ)2˜

Wv+αc

(1 + φTφ)2(∇εv(x))T(F(x) + B(x)u),

(164)

where φ=∇µv(x) (F(x) + B(x)u) and ˜

Wv=ˆ

Wv−W∗

v. However, due to the

unknown value of u, it should be substituted by ˆu. As can be observed in (164),

such a training algorithm requires the PE condition to ensure λmin (φφT)>0

[272]. Further, the updating rule of ˆ

Wuwould be obtained according to (162).

However, there is a need for a nonstandard modiﬁcation term in this updating,

which consists of the cross-product of the actor and the critic networks’ weights

to ensure the closed-loop stability. The obtained training rule as well as the

proof of the UUB stability of the system (under conservative assumptions) can

be found in [270].

Alternatively, event-triggered optimal control schemes have been introduced

in the literature, where the control law is updated only at the time instants

that a triggering condition is satisﬁed, while it remains constant in other times.

Such a control scheme can signiﬁcantly reduce the online computational cost of

a controller. An event-triggered optimal control has been introduced in [273],

where the updating rule of the critic network has been derived similarly to (164).

In addition, the actor network’s updating rule has been obtained using (162) by

deﬁning

eu=ˆ

uµu(x) + 1

2R−1BT(x) (∇µv(x))Tˆ

Wv.(165)

Accordingly, by deﬁning Eu= 1/2e2

u, the updating rule of ˆ

Wuat the jth trig-

gering instant tjis obtained as

Wu(t+

j) = ˆ

Wu(tj)−αu

∂eu

∂ˆ

=ˆ

Wu(tj)−αuµu(xj)eT

u(tj),

(166)

where αudenotes a positive constant. Similar to the above-mentioned design,

there is a need for a robustifying term in the control law to ensure the closed-loop

stability while requiring several conservative assumptions. Such an approach

has been extended in [274] to a trajectory tracking control problem by deﬁning

an augmented state, which consists of both the tracking error and the desired

trajectory. The designed controller has been subsequently applied to a linear

model of the elevation of a Quanser helicopter.

Another alternative training rule for the critic network can be derived based

on the method of weighted residuals [275, 276]. More precisely, at each step,

Wvcan be obtained by projecting econto ∂ec/∂ ˆ

Wvand setting the result to

zero, i.e.

∂ec

∂ˆ

, ec= 0,(167)

where hf, g i=RfTg. Thus, we have

hφ, φiˆ

Wv+hL(x, u), φi= 0,(168)

which leads to

Wv=−hφ, φi−1hL(x, u), φi.(169)

Indeed, such an approach leads to the solution of the least-squares optimiza-

tion. Subsequently, an improved control law can be obtained using ˆu(x) =

−1

2R−1BT(x) (∇µv(x))Tˆ

Wv. This process will continue until convergence. How-

ever, as discussed in [277], such an iterative optimization process still requires

rich input signals to ensure the existence of hφ, φi−1. In addition, computing the

necessary integrals in (169) may be a complicated task in practice. Thus, they

are typically approximated by discretization. This method has been employed

in [277] to successively solve the HJI equation, where the designed controller

has been applied to a linear model of a ﬁghter aircraft.

It should be noted that the introduced actor-critic scheme can also be im-

plemented using a single NN [262]. This can be performed by employing a critic

NN to approximate the value function V(x) and subsequently, computing the

approximate optimal control law as

ˆu(x) = −1

2R−1BT(x) (∇µv(x))Tˆ

Wv.(170)

Accordingly, there is a need for a modiﬁcation term in the updating rule (164) to

ensure closed-loop stability. The modiﬁcation term is obtained by assuming that

the optimal control law u∗(x) can stabilize the system such that the following

equation holds [278].

Js(x) = (∇Js)T(F(x) + B(x)u∗) = −(∇Js)TΛ (∇Js),(171)

where Js(x) and Λ(x) represent a Lyapunov function of the system (as a poly-

nomial) and a positive deﬁnite matrix, respectively [267]. Consequently, the

modiﬁcation term is obtained by preventing the function Js(x) from increasing

as follows:

∆˙

Wv=−αs

∂˙

Js(x)

∂ˆ

=−αs

∂h(∇Js)T(F(x) + B(x)ˆu(x))i

∂ˆ

(172)

where αsrepresents a positive constant. A similar approach has been employed

in [272] in an event-triggered H∞control problem to solve an HJI equation,

where the proposed method has been applied to a linear model of an F-16

aircraft. Besides, such a scheme has been adopted in [279] in combination

with an NN-based state observer to provide a trajectory tracking controller for

a helicopter UAV, where the NNs have been trained online by an on-policy

learning method. In the on-policy learning, the control law that is applied to

the system (called the behavior policy) is the same as the control law, which is

evaluated and improved (called the estimation or target policy). On the other

hand, in the oﬀ-policy learning scheme, the behavior policy and the target policy

can be unrelated. The employment of oﬀ-policy learning in the control design

process provides considerable advantages in comparison with on-policy learning

schemes [277]. More speciﬁcally, in the on-policy H∞control, the external

disturbance should be obtained by (156), while specifying the disturbance term

is typically impractical in real systems. In addition, the issue of the exploration

(which is partly related to the PE condition) is of signiﬁcant importance to

guarantee the convergence of the control law to the optimal policy. However,

since in the on-policy learning, we should apply the target policy to the system,

the exploration would be limited by the UAV trajectory.

Further, a remaining issue with all the above-mentioned designs is that they

still depend on the system dynamics (i.e. F(x) and B(x)). To tackle such a

problem, model-free oﬀ-policy learning schemes have been introduced in the

literature to provide an acceptable approximate solution for the optimal control

problem. To this end, consider again the dynamic model of the system. By

adding and subtracting the target policy u(i)(x) (at ith optimization iteration)

to the model, it is obtained that

˙x=F(x) + B(x)u(i)+B(x)u−u(i).(173)

Thus, considering the value function Vcorresponding to u(i)(x), we can write

V= (∇V)T˙x= (∇V)TF(x) + B(x)u(i)+ (∇V)TB(x)u−u(i).(174)

As a result, the policy evaluation equation (161) can be reformulated as follows

[277]:

V= (∇V)TB(x)u−u(i)−xTQx −u(i)TRu(i),(175)

By integrating from both sides of (175) in a speciﬁc time interval, a policy evalu-

ation equation is obtained which is independent of the internal dynamics F(x).

Thus, we can redeﬁne the design process by employing the policy evaluation

equation (175) rather than (161). Such an approach, which is similar to the

Integral Reinforcement Learning (IRL) scheme [280, 259], has been utilized in

[277] to approximately solve an HJI equation, where the designed controller has

been applied to a linear model of the longitudinal dynamics of an F-16 aircraft.

Meanwhile, the model-free approach to optimal control may be better expressed

within the framework of reinforcement learning, which will be discussed in the

following subsection.

Another concern with above mentioned ADP schemes is that the obtained

information from the system inputs-outputs data is used to update only a scalar

function, i.e. the value function. This results in ineﬃcient usage of data which

may slow down the convergence. To deal with such an issue, another actor-

critic ADP method has been introduced in the literature, which attempts to

approximate the optimal value function derivative ∇V∗(x) (using the critic NN)

rather than the value function itself. This method falls into the framework of

Dual Heuristic Programming (DHP) [281]. Two sets of NNs have been employed

in [282] as the actor and critic networks in a constrained minimum-time optimal

control problem, i.e. the control of the ﬂight path angle of a missile given a ﬁnal

Mach number. Indeed, instead of utilizing a single NN, a set of NNs have

been trained oﬄine as the actor (and critic) NN, which have been employed

sequentially to determine u∗(x) and ∇V∗(x) during the time. Also, to deal with

the free-ﬁnal time, the dynamic equations of the system have been reformulated

considering the ﬂight path angle as the independent variable, thereby providing

a ﬁxed-ﬁnal condition problem. A similar actor-critic method has been utilized

in [283], where an oﬄine training stage has been performed using the linearized

model of the air vehicle, and an online training phase has been employed to

improve the closed-loop performance. The designed controller has been applied

to a ﬁxed-wing aircraft considering model uncertainties, unmodeled dynamics,

and actuator faults. However, the closed-loop stability was not analyzed in these

papers.

4.3. Direct adaptive control using Reinforcement learning (RL)

The concept of adaptive optimal control, particularly for systems with un-

known system dynamics, can be presented within the framework of Reinforce-

ment Learning (RL) as well. Although the notion of RL and the optimal control

theory share a somewhat similar idea, i.e. moving towards the optimal solution

over time, they possess diﬀerent mathematical notations due to their diﬀer-

ent origins [284]. In a conventional RL problem, which is typically formulated

in the discrete-time domain, the objective is to search for an optimal control

law (policy) for a dynamic system (agent) while interacting with an uncer-

tain environment that maximizes the total reward obtained during an episode.

Traditionally, the problem is formulated as a Markov Decision Process (MDP)

described by a four-tuple (x, u, F, R). Here, xand urepresent, respectively, the

current system state and inputs. The system inputs are obtained according to

a policy π, which in turn, results in receiving a reward R(x, u). Further, F

denotes the system dynamics model or (in a probabilistic formulation) a sta-

tionary transition distribution F∼P(x(k+ 1)|x(k), u(k)), which satisﬁes the

Markov property [281]

P(x(k+ 1)|x(1), u(1),···x(k), u(k)) = P(x(k+ 1)|x(k), u(k)) .(176)

Thus, the choice of appropriate states to satisfy the Markov property is of

signiﬁcant importance. However, although most of the theoretical achievements

within the framework of RL have been obtained under such a property, many

approaches can still work well for diﬀerent practical problems which do not

satisfy the Markov property [285]. Now, similar to the common notation in the

RL framework, consider a discrete-time optimal control problem as follows:

max

Eπ"∞

k=t

γk−tR(x, u)#,(177)

subject to x(k+ 1) = F(x(k), u(k), d(k)).

Accordingly, the objective is to maximize the obtained accumulative reward,

R(x, u), and the expected value is computed considering the random external

disturbance d(k). Further, πkand γ∈(0,1) represent the current control com-

mand (policy) and the discount factor, respectively. Thus, the control command

u(k) is computed at each step using either the stochastic or deterministic policy

π(in the case of stochastic policies, πrepresents the conditional probability

distribution of the control command, i.e. π(u|x), while concerning deterministic

policies, we have u(k) = π(x)). It is notable that a similar discussion can also be

made on the basis of an average reward rather than a discounted reward, which

eliminates the requirement for a discount factor (for more details, see [285]).

The focus of this section is on a model-free optimal control approach. Like

other adaptive control methods, we can employ either an indirect or a direct

control design procedure. To be more precise, it is possible to ﬁrst derive an es-

timation of the system dynamics model Fand then attempt to (approximately)

solve the optimization problem (177), or try directly to develop an optimal con-

trol policy. Within the framework of the RL, the former approach is known as

the model-based RL, while the latter corresponds to the model-free RL. On the

other hand, owing to the unstable behavior of typical aerial vehicles and the

inherent trial and error scheme employed in RL, commonly, the learning phase

should be performed in a simulation environment on an existing model of the

system. Thus, even in the model-free RL, there is a requirement for a (simple)

dynamic model of the system to be used in the learning phase (in the simula-

tion environment). We will give a short insight into the method of eliminating

the requirement for a dynamic model in the RL-based ﬂight control systems at

the end of this section. Concerning the model-based RL, however, the model

would be obtained by the system identiﬁcation method as discussed in Section

4.1. Consequently, apart from the model identiﬁcation phase (required in the

model-based RL), the learning process in the ﬂight control design in both the

model-based and model-free RL schemes can be discussed in the same fashion.

Now, in a general view, we can solve an adaptive optimal control problem

within the RL framework through two diﬀerent approaches: ADP and direct

policy updating [286], which will be addressed in detail in the following.

4.3.1. Approximate dynamic programming (discrete-time systems)

Within the ADP framework, we ﬁrst attempt to estimate the action-value

function Q(x, u), which is deﬁned as follows [269]:

Q(x, u) = Eπ"∞

k=t

γk−tR(x, u)

x(t) = x, u(t) = u#.(178)

Notice that, it is also possible to derive the RL-based control formulation using

the value function V(x) rather than the action-value function. Indeed, such

an approach would result in the discrete equivalent of the previously discussed

ADP scheme for continuous-time systems. However, as will be observed in the

following, the employment of the introduced action-value function instead of the

value function can help to develop an entirely model-free control system [281].

Now, using the concept of DP, one can obtain the Bellman optimality equation

as follows:

Q∗(x, u) = R(x, u) + γEπhmax

u′Q∗(x(k+ 1), u′)i,(179)

where the superscript ∗denotes the action-value function corresponding to the

optimal policy. Such an approach can be utilized in both the on-policy and

oﬀ-policy iterative learning schemes to estimate the action-value function.

A traditional on-policy learning approach to iteratively estimate the action-

value function is known as Sarsa. In this regard, by incorporating the Temporal

diﬀerence (TD) error, which is equivalent to the Hamiltonian introduced in the

previous subsection for continuous-time systems, an on-policy learning rule can

be derived as

Qk+1(x, u) = Qk(x, u) + η(R(x, u) + γQk(x(k+ 1), u(k+ 1)) −Qk(x, u)) ,

(180)

where η∈R+denotes the learning rate. The second term on the right-hand

side of the equation corresponds to the TD error. Subsequently, an improved

action is chosen at each step using

πk+1(x) = arg max

uQk+1(x, u).(181)

Such an approach (using the multi-step TD, which is discussed in the following)

has been employed in [287] to control a 2-DOF Quanser helicopter. The opti-

mization problem has been presented for a linear model of the system, which

leads to the well-known algebraic Riccati equation. Sarsa has also been adopted

in [288] for a quite complex control problem, i.e. the control of glider soar-

ing in a turbulent environment (by taking advantage of turbulent ﬂuctuations),

while such a problem has been dealt with in [289] by employing an oﬀ-policy

value-iteration method.

Accordingly, if we estimate the action-value function corresponding to the

current control policy using a NN as ˆ

Q(x, u) = ˆ

qµq(x, u), the network weights

Wcan be updated at each step using the gradient descent method as follows

[269]:

Wq(k+ 1) = ˆ

Wq(k) + ηµq(x, u)R(x, u) +

γˆ

Q(x(k+ 1), u(k+ 1)) −ˆ

Q(x(k), u(k)) ,

(182)

where ηrepresents a positive learning rate, and xand ucorrespond to the

current value of the system state and input. As seen, the proposed updating

rule is similar to the training rule (7) employed in the FEL scheme, where

the tracking error has been substituted by the TD error. A notable point,

however, is that (182) is obtained using a semi-gradient method rather than

a true gradient descent scheme. This is due to the employment of R(x, u) +

γˆ

Q(x(k+ 1), u(k+ 1)) as the target value of the action-value function, which

in turn is a function of ˆ

Wq, while the eﬀect of it is not included in the gradient

function.

Thereafter, in the policy improvement step, (181) can be solved as

∂ˆ

Q(x, u)/∂u = 0,

which shows well a principal advantage of employing the action-value function

instead of the value function, hence we can simply determine the improved

policy at each step by maximizing the Q-function with respect to uwith no

requirement for the system dynamics model.

On the other hand, concerning oﬀ-policy learning methods, an oﬀ-policy

TD-based learning rule called the Q-learning can be derived as

Qk+1(x, u) = Qk(x, u) + ηR(x, u) + γmax

u′Qk(x(k+ 1), u′)−Qk(x, u),

(183)

while, again, the improved policy would be determined using (181). Such a

learning method has been adopted in [290] to learn the optimal servoing gain

in an Image-Based Visual Servoing (IBVS) design for the trajectory tracking

control of a quadrotor UAV, while the learning rate ηhas been updated using

a fuzzy controller.

In a similar manner to Sarsa, the NN-based estimation can also be adopted

in the Q-learning algorithm. The corresponding updating rule is obtained by

substituting ˆ

Q(x(k+ 1), u(k+ 1)) in (182) by maxu′ˆ

Q(x(k+ 1), u′) or

π(u|x(k+ 1)) ˆ

Q(x(k+ 1), u)

in the case of stochastic target policy π. Such a scheme has been employed in

[291] to control an airship in a 3D environment, where the scale of the state

space was reduced by a coordinate transformation. Diﬀerent variants of the

Q-learning method have been presented in the literature, which are beyond the

scope of this paper (see for example [284, 292]).

In the Q-learning, a common choice for the behavior policy is to choose ei-

ther the current improved target policy (with a probability of 1−ǫ) or a random

action (with a probability of ǫ, where ǫdenotes a small positive constant). Such

a behavior policy results in a good exploration, which is critical in the conver-

gence of oﬀ-policy algorithms. As an alternative, an evolutionary exploration

algorithm has been introduced in [293] in which a set of random trajectories are

generated at each step while the mean and the variance of them are updated

considering the obtained reward corresponding to each tra jectory in such a way

that the resultant behavior policy moves toward better trajectories. Such an ap-

proach has been adopted in [293] to train a ﬂapping-wing aerial vehicle using the

Q-learning. Nevertheless, NN-based oﬀ-policy learning algorithms suﬀer from

convergence issues in various problems. The updating rules, which are derived

based on the true gradient descent method (such as the gradient-TD method)

can address this issue at the expense of excessive computational complexity,

while their performance in real applications is still not clear. In this regard, a

comprehensive comparison between the performance of semi-gradient methods

and TD methods those based on true gradient descent, in the case of intelligent

ﬂight control systems, is a necessity in future research.

It is also possible to derive an (on-policy) updating rule by attempting to

eliminate the TD error at each time step using a least-squares optimization (or

an RLS optimization similar to the OS-ELM approach introduced in Section

4.1) to solve the following equation.

Wq(k+ 1)T(µq(x, u)−γµq(x(k+ 1), u(k+ 1))) = R(x, u).(184)

To this end, there is a requirement for the regression vector

(µq(x, u)−γµq(x(k+ 1), u(k+ 1)))

to be persistent exciting [281]. Such a method has been utilized in [294] to

generate the desired trajectory for a quadrotor aimed to transport a suspended

load.

As discussed, both the Q-learning and Sarsa have been developed based on

the TD method, where its iterative updating rule bases in part on the current

estimation of Q. Thus, they are known as bootstrapping methods. Further,

notice that the proposed schemes are developed based on a simple one-step TD

error. More complex and eﬀective learning rules can be derived by employing

multi-step TD. Multi-step TD learning is indeed a bridge between the simple

one-step TD learning and the Monte Carlo method wherein the updating rule

is derived using the entire sequence of rewards obtained from the current state

until the end of the episode. A detailed description of multi-step TD can be

found in [269]. Compared to the TD method, the Monte Carlo approach could

not be used in an online training scheme, because we should wait until the

end of the episode to determine the obtained rewards corresponding to the

current policy. On the other hand, there are some concerns with the convergence

of the TD learning, which is a bootstrapping method, particularly under the

usage of neural approximation. The Monte Carlo method has been adopted

in [295] to maximize a value function in order to develop a controller for a

helicopter in low-speed aerobatic maneuvers, e.g. the inverted ﬂight of the aerial

vehicle, where the optimization process has been performed in the simulation

environment using an identiﬁed stochastic, nonlinear model of the system. Using

the Monte Carlo method, a collision-avoidance control system has been proposed

in [296]. In this regard, after the training of the action-value function, which was

modeled by a CNN, the control command, i.e. the velocity direction of the UAV,

could be obtained by maximizing the Q-function at each step. An intelligent

trajectory generation approach has been proposed in [297] for a UAV aimed to

collect information from the environment considering the constraint on the total

energy consumption of the vehicle. A CNN has been utilized to estimate the

Q-function using an oﬀ-policy modiﬁed Deep RL (DRL) method. In contrast

to the TD and Monte Carlo methods, in the DRL method, a replay buﬀer

has been utilized, which stores a ﬁnite number of tuples of (xk, uk, rk, xk+1 )

obtained under an exploration (behavior) policy. Subsequently, at each step,

a mini-batch of samples is chosen uniformly from the entire buﬀer allowing for

a set of uncorrelated samples to be used in the training process. In addition,

a copy of the main NN called the target network has been generated, where

the target network, which provides the target value for training the main critic

network, is trained with a signiﬁcantly less learning rate, thereby avoiding the

learning divergence. The two above-mentioned high-level control systems can

be considered as preliminary intelligent path planning designs, which could be

integrated with conventional IFCSs to provide a completely intelligent ﬂight

control system. The development of such a combination would be a critical

step to develop a truly intelligent UAV, while, due to the complicated and high-

dimensional nature of the problem, it has not been thoroughly addressed by

researchers yet.

Despite the simplicity of introduced approaches to approximate the optimal

action-value function (and subsequently, the optimal policy), they still face fun-

damental challenges to ensure the convergence to the optimal solution (particu-

larly in the case of oﬀ-policy algorithms). More speciﬁcally, diﬀerent impractical

assumptions (such as the requirement for visiting all possible state-action pairs

for an inﬁnite number of times) have been adopted in the literature to achieve

the convergence property [269].

4.3.2. Direct policy updating

Another approach to solve the optimization problem (177) is to directly up-

date the approximate optimal policy rather than employing an estimated action-

value function to ﬁnd the optimal policy. More precisely, here, we attempt to

directly ﬁnd an appropriate updating rule for the approximate optimal policy,

which is estimated by a NN as π(x) = ˆ

πµπ(x), or π(u|x) = ˆ

πµπ(x, u) in

the case of a stochastic policy (for the ease of notation, we do not use the ˆ.

symbol for the estimated optimal policy in the rest of this section). Such a di-

rect policy parametrization brings a principal advantage into the control design

process that we can incorporate the prior knowledge of the optimal policy in

the parametrization of the estimated optimal policy.

In this context, the most commonly used approach, called the policy gradient

method, attempts to update the network weights ˆ

Wπby moving in the direction

of the gradient of a performance function in order to improve π(x). Typically,

the value function Vπ(x) (the subscript πindicates that the value function has

been computed along the trajectory obtained by π) is chosen as the performance

function. In the following, we ﬁrst give a brief introduction to the policy gradient

theorem for stochastic policies and then address the corresponding theorem of

deterministic policies as a special case. Now, deﬁning the advantage function as

Aπ(x, u) = Qπ(x, u)−Vπ(x),(185)

one can derive an equation for the diﬀerence between the value functions corre-

sponding to two diﬀerent policies as follows [298, 299]:

Vπ(x0)−V(x0) = −X

ρ(x)X

(u|x)Aπ(x, u),(186)

where,

ρ(x) = X

γkP(x(k) = x|x(0) = x0),(187)

denotes the unnormalized discounted visitation frequency, where actions are

determined according to . Let be a ﬁxed policy (which may be considered

as the behavior policy in oﬀ-policy methods) and πcorresponds to the estimated

optimal policy. Thus, using the fact that Puπ(u|x)Aπ(x, u) = 0 and

((u|x)−π(u|x)) Vπ(x) = 0,

we have [300]:

Vπ(x0)−V(x0) = X

ρ(x)X

(π(u|x)−(u|x)) Qπ(x, u)

ρ(x)X

(π(u|x)−(u|x)) Aπ(x, u).

(188)

By estimating the target policy as π(u|x) = ˆ

πµπ(x, u) and diﬀerentiating

both sides of the (188) with respect to ˆ

Wπ, one can obtain the gradient of the

performance function as follows:

∇Vπ(x0) = X

ρ(x)X

u∇π(u|x)Qπ(x, u) + (π(u|x)−(u|x)) ∇Qπ(x, u),

(189)

where ∇=∂

∂ˆ

Wπ. The obtained result is analogous to the oﬀ-policy actor-

critic algorithm proposed in [301], while the second term on the right-hand

100

side of (189) is neglected in [301]. A similar equation can also be derived by

substituting Qπ(x, u) in (189) by Aπ(x, u). Now, considering the special case

π=in (189), it is obtained that

∇Vπ(x0) = Eρπ,π ∇π(u|x)

π(u|x)Qπ(x, u),(190)

∇Vπ(x0) = Eρπ,π ∇π(u|x)

π(u|x)Aπ(x, u).(191)

The ﬁrst equation is known as the fundamental equation of the policy gradient

theorem, while the second one is called the policy gradient with baseline, which

in turn reduces the variance of the algorithm, thereby improving the perfor-

mance. Now, the NN weights can be updated through either an oﬀ-policy or

on-policy method using each data sample as follows:

Wπ(k+ 1) = ˆ

Wπ(k) + ηρ(x, u)∇π(u|x)

π(u|x)ˆ

Aπ(x, u),(192)

where ρ(x, u) = π(u|x)

(u|x), called the importance sampling ratio, is employed to

compensate for the fact that the data samples have been collected under the

behavior policy (u|x) rather than the estimated target policy π(u|x) (in the

on-policy learning, we have ρ= 1). Further,

Aπ(x, u) = R(x, u) + γˆ

Vπ(x(k+ 1)) −ˆ

Vπ(x),(193)

denotes an estimation of the advantage function. As seen, it requires the esti-

mation of the value function, which can be obtained by a critic network using

the introduced learning schemes in the previous section for the action-value

function, while in the case of the oﬀ-policy learning of the value function, unlike

the learning algorithm of the action-value function, we should again employ the

importance sampling ratio in the updating rule [269].

A variety of conservative approximated policy gradient approaches have been

introduced in the literature to restrict the policy update at each step, thereby

improving the performance of the method. This is due to the great eﬀect of

the magnitude of ∆ ˆ

Wπ(which can also be controlled by the learning rate) in

each iteration of the policy gradient on the performance and the convergence

101

of the algorithm. Trust Region Policy Optimization (TRPO) [298] and Prox-

imal Policy Optimization (PPO) [302] are two common methods in this ﬁeld.

TRPO employs a constrained optimization problem in which an approximated

value function is optimized (through updating the target policy) subject to a

constraint on the KL divergence of the old policy and the new policy. The KL

divergence represents a measure of the divergence of a distribution from the

other one. On the other hand, the PPO introduced a simpliﬁed design to keep

the ratio of the new policy to the old policy in a permissible range. Such an

approach has been utilized in [300] to develop a trajectory tracking control for

a quadrotor air vehicle.

Policy gradient theorem can also be extended to deterministic policies, which

is called the Determinist Policy Gradient (DPG) [303]. To this end, consider

again (188) while substituting the probability distribution π(u|x) with the Dirac

delta function, i.e. π(u|x)∼δ(u−π(x)), which is equivalent to a deterministic

policy. Subsequently, knowing that

(δ(u−π(x))) Qπ(x, u) = Qπ(x, π(x)),

by estimating π(x) as ˆ

πµπ(x) and diﬀerentiating both sides of (188), it is

obtained that

∇Vπ(x0) = X

ρ(x)∇Qπ(x, π(x)) + X

(δ(u−π(x)) −(u|x)) ∇Qπ(x, u).

(194)

Thus, knowing that

∇Qπ(x, π(x)) = ∇π(x)∇uQπ(x, u)|u=π(x),

one can obtain the on-policy DPG algorithm by setting π=in (194) as

follows:

Wπ(k+ 1) = ˆ

Wπ(k) + η∇π(x)∇uˆ

Qπ(x, u)u=π(x),(195)

where ˆ

Qπ(x, u) is the estimated action-value function, which can be obtained

using a critic network trained by the Sarsa algorithm through (182).

102

Concerning the oﬀ-policy DPG, note that there is an additional term in

(194), while similar to the stochastic policy gradient theorem, it is neglected in

[303]. Thus, the oﬀ-policy DPG equation is obtained again as (195), whereas

the estimated action-value function computed using the critic network should

be trained by the Q-learning method. Similar to the stochastic policy gradient,

it is also possible to derive the DPG algorithm using the advantage function

rather than the action-value function in (195).

DPG would be more convenient in the control design process since a stochas-

tic policy results in unpredictable behavior, which is not desirable in autonomous

vehicles. However, the exploration strategy in DPG is of signiﬁcant importance

to avoid the convergence to local optima. A common choice to provide an ac-

ceptable exploration is adding white noise to the current optimized policy at

each step to obtain an exploratory behavior policy.

A simpliﬁed version of the introduced (on-policy) actor-critic scheme has

been employed in [304] and [305] to stabilize and control a nonlinear model of

an Apache helicopter, respectively, while in [305], three cascaded NNs have been

employed in the action network (equivalently to conventional multi-loop control

systems) to improve the training performance. An on-policy DPG employing

the Monte-Carlo method (rather than the TD method), which updates the ac-

tor and critic networks after the end of each episode, has been used in [10] to

control a quadrotor UAV, where a constrained optimization has been utilized

in the design to avoid large policy updates at each step, similarly to the TRPO

method. In addition, the natural gradient descent, which attempts to include the

eﬀects of the performance function’s curvature induced by higher-order deriva-

tives into the updating rule [306], has been employed in the training rule instead

of the conventional gradient descent algorithm. The control scheme has been

subsequently applied to a real quadrotor air vehicle, while it suﬀers from the

huge computational cost of the (oﬄine) training phase, which is performed in a

simulation environment.

Deep DPG (DDPG) has been introduced in [307] as a combination of the

DPG and DRL to employ (deep) NN in a stable manner as the actor and

103

critic estimators, where, here, target networks (which are employed in DRL for

the critic network) are deﬁned for both the actor and critic networks. An oﬀ-

policy DDPG has been adopted in [308] to control a quadrotor UAV considering

external disturbances and actuator faults (only in the ﬂight tests). Adopting the

concept of DRL results in more eﬃcient training with improved stability, while

oﬀ-policy learning allows for utilizing an exploratory behavior policy, which

is independent of the estimated target policy. However, as discussed in the

paper, the combination of the (neural network) function approximation, the

bootstrapping scheme (due to the TD learning), and the oﬀ-policy learning can

lead to signiﬁcant bias and variance in estimations (while in some cases, it may

result in the divergence of the algorithm [269]). To deal with such an issue, an

integrator has been placed at the input of the actor network, which signiﬁcantly

reduces the steady-state error. Besides, a hybrid oﬄine-online training method

has been employed to improve the target policy during the real ﬂight, while no

experimental results have been included in the paper. DDPG has been utilized

in [309] to address the autonomous landing of a UAV on a moving platform,

while the problem has been dealt with in a 2D environment. As mentioned

in the paper, DDPG can be an optimal choice in control problems with low-

dimensional continuous states and actions. Further, the shaping method has

been utilized in the paper to design an appropriate reward function in which

the progress of the UAV in approaching the desired goal between two successive

time steps has been considered as the reward function. It has been claimed

that such a technique would results in a faster learning process [310], though at

the cost of signiﬁcant design eﬀort and possible change of the optimal solution.

This is similar to the reward shaping method introduced in [311], wherein a

potential-based function is summed with the basic reward function to speed

up the learning process with no eﬀect on the optimal policy. To develop an

intelligent UAV navigation system in large-scale complex environments (with

no requirement for map reconstruction), authors in [312], involved the concept

of Partially Observable MDPs (POMDP) within the framework of DRL. In a

POMDP, at each step, we can observe only a part of the system state denoted

104

by ot, which does not satisfy the Markov property, and so, the current policy

requires the entire previous trajectory τt= (u0, o0,···, ut−1, ot−1) to determine

the control command. Such a framework provides the capability of capturing

the complex features of the environment by storing the previous trajectory of

the system. Accordingly, a determinist policy gradient theorem, called the Fast-

recurrent DPG, has been introduced in the paper to deal with POMDPs in which

∇Vπis computed similar to (194) except that the current state xis replaced by

τt. In a similar manner, a combination of the POMDP and the deep Q-learning

concepts has been utilized in [313] to address the obstacle avoidance problem

in the case of a UAV with limited environment knowledge, where a recurrent

NN has been employed as the estimator of the Q-function to better estimate

the current system state using information from an arbitrarily long sequence of

observations.

As a notable shortcoming, almost all of the RL-based control strategies re-

quire a remarkable time for oﬄine training of NNs to be employed in a real

application. A preliminary study has been given in [314] in which a quadrotor

can learn to hover by a relatively small amount of training data using the model-

based RL. The incoming data are ﬁrst employed to build a dynamic model of

the system followed by a policy updating algorithm, which uses an MPC-like

cost function. However, the designed control system results in unstable behavior

after about ﬁve seconds.

Besides, a well-known issue in utilizing RL in ﬂight control systems arises

from the fact that the learning process in RL relies on trial and error, which can

simply make the air vehicle unstable. Thus, the learning process (in the current

form) should be performed in a simulation environment, and subsequently, the

trained policy is employed in a real application. However, the employment

of a policy, which is trained in a simulation environment, in a real experiment

suﬀers from the well-known reality gap problem. Diﬀerent approaches have been

proposed in the literature to overcome this issue [315]. The generalization of the

policy through learning in diﬀerent simulation environments with diﬀerent ﬂight

conditions has been suggested in [296]. Further, the utilization of abstracted

105

inputs and outputs in the learning process would be an eﬀective approach to

tackle the reality gap [316]. In this regard, it may be a need for a mapping

(or an intermediate controller) between the abstracted inputs-outputs and real

signals in the control system. Besides, one can employ a dynamic model, which

involved probabilistic uncertainties in the model, in order to evaluate and bound

the worst-case controller performance in real applications [317].

A similar idea can also be beneﬁcial to deal with the issue of the stability

analysis in RL-based IFCSs. More speciﬁcally, a preliminary idea to analyze

the closed-loop stability under the framework of RL could be maximizing the

expected rewards at the neighborhood of an action sequence rather than that

of a speciﬁc action sequence [318]. It can be a starting point to develop a

probabilistic stability analysis framework in contrast to well-known approaches

to stability analysis (using the Lyapunov theorem or similar methods) to be

employed in the case of dynamic systems controlled by an RL-based scheme.

To develop such a framework, we should also provide appropriate answers to

principal questions regarding the quality and quantity of data samples required

in the learning process.

In addition to the above-mentioned RL scheme based on MDP, there are

other types of policy optimization algorithms that directly search for the optimal

policy as a black-box optimization without employing the estimated action-value

function into the optimization algorithm. Random search [319, 320], guided pol-

icy search [321], and evolutionary algorithms [322] are well-known approaches

in this category, while due to the lack of a solid mathematical foundation, they

are not widely employed in ﬂight control systems yet. A guided policy search

based on MPC has been introduced in [323] in which a set of trajectories are

ﬁrst generated at each step using Linear Quadratic Gaussian (LQG) controllers,

where their objective is to maximize a quadratic reward by penalizing the devi-

ation from the current policy. Subsequently, a modiﬁed MPC was designed in

the vicinity of obtained trajectories, where the sampled data from tra jectories,

which were generated by MPC, are then employed to train the policy network in

a supervised learning framework. However, there is a need for an approximate

106

dynamic model in the proposed design. Such an approach has been utilized in

[323] to control a quadrotor trajectory in the presence of obstacles. While the

MPC in the training phase requires access to full state observation, the ﬁnal NN

policy employs the data gathered by only the onboard sensors. Since in such a

guided policy search, the control commands, in the training phase, are obtained

using the MPC rather than a partially trained policy network, it is a beneﬁ-

cial approach to avoid a remarkable drawback of the RL, i.e the occurrence of

catastrophic failures during the training. Accordingly, the training phase of RL

can be performed safely in a real environment to avoid the reality gap. An-

other approach to achieve this goal could be the design of a training scenario

using gradually increasing control commands to learn the optimal policy (in a

safe environment) to avoid the systems’ instability during the training. This

is conceptually similar to teaching a child to walk by his/her parents (with no

simulation environment!). Such an idea could be a starting point on the way to

a truly model-free RL-based IFCS.

Finally, it is notable that the concept of adaptive optimal control can also be

incorporated in the framework of the Stochastic Optimal Control (SOC) [324].

Since few studies have addressed the application of such a design in ﬂight con-

trol systems, the mathematical details are not given here. Brieﬂy, considering

an aﬃne dynamic system and a quadratic cost with respect to system inputs, it

can be proved that the HJB equation for a stochastic model can be transformed

into a linear PDE by deﬁning a desirability function as an exponential value

function. The solution of such a linear PDE, called the Cauchy problem, can be

represented in a probabilistic manner by applying the Feynman-Kac formula,

where the solution can be derived by an expectation over all possible system

paths. Accordingly, the Monte-Carlo method involving the importance sam-

pling technique is utilized to approximate it [325]. This problem can also be

formulated within the framework of the information theory by incorporating

the concepts of the free energy and the KL divergence, while there is no need

for mentioned restrictions (such as an aﬃne model) in such a formulation [326].

To this end, the optimal probability distribution of the control command is ﬁrst

107

determined, where the control problem is then converted to the minimization

of the KL divergence of the current probability distribution from the optimal

distribution. The solution is typically determined iteratively at each step con-

sidering a ﬁnite prediction horizon in the cost (value) function. Such a method

is also known as Model Predictive Path Integral (MPPI). Indeed, MPPI is a

variety of MPCs in which a set of trajectories are generated at each step by

adding noises to system inputs, and then, future control commands are im-

proved by utilizing a Monte-Carlo sampling and computing the corresponding

cost of each trajectory. The ﬁrst control command in the computed sequence

is then applied to the system and the remaining terms are used as the baseline

in the next time step [327]. Such a method, which is somewhat similar to the

guided policy search, results in a more eﬃcient exploration rather than MPCs

based on random tra jectories [328]. Consequently, it can be an eﬃcient alterna-

tive to conventional RL-based control systems, thereby providing the signiﬁcant

potential to be employed in IFCSs in the future. A vision-based MPPI has been

given in [327] in which a deep NN was utilized to learn the optical ﬂow of each

pixel in the image, and then an MPC attempted to bring a target pixel to the

center of the camera ﬁeld of view while controlling the UAV path. Note that,

as the MPC requires a prediction model, these methods are expressible within

the framework of the model-based RL. An iterative learning control has been

adopted within the introduced information-theoretic MPPI scheme in [326] and

[328] for obstacle-avoidance control of a quadrotor trajectory and to provide a

missile guidance law, respectively, where the system dynamics have been mod-

eled by feedforward NNs. In this regard, as a key requirement, there is a need

for a large number of samples in sampling-based MPCs, while the mathemati-

cal foundation for the analysis of the algorithm convergence and the closed-loop

stability in the above-mentioned method should still be strengthened.

Various challenges remain in the way of eﬃcient RL-based model-free control

systems yet, and in some cases, a simple PID or LQR controller may behave more

eﬀectively than existing RL-based control approaches [286]. However, RL has

provided a window into a new look at the control problem of complex systems

108

in complex environments, and it is expected that such a framework can lead to

a generic, fully autonomous, truly model-free, and safe control methodology in

the near future such that it can be reliably employed in the case of more complex

aerial vehicles (such as nonconventional aircraft) and more complex problems

(such as the presence of severe external disturbances and actuator faults).

Five tables are given in the following. Principal characteristics of some key

research addressing the NN-based control of VTOL aerial vehicles, HFVs (and

NSVs), ﬁxed-wing aircraft, and nonconventional air vehicles are listed in Tables

1-4, respectively. Diﬀerent speciﬁcations of each research, i.e., the control ob-

jective, the consideration of system constraints in the design, the use of an MLP

technique, the employment of an OFB control scheme, and the type of uncertain

dynamics considered in the model, as well as the main features and limitations

arising from each control methodology are brieﬂy reported. The provided data

would be advantageous to identify considerable capabilities, complexities, and

limitations of each control strategy for each type of aerial vehicle, to compare

the importance and eﬀectiveness of diﬀerent control methods, and to ﬁgure out

the challenging issues remaining unsolved. On the other hand, as a separate

category, some novel high-level control systems incorporating NNs in their de-

sign are given in Table 5 in which the learning method, the control objective,

key features, and considerable limitations of each research are listed. The com-

bination of such high-level control strategies with existing low-level intelligent

control systems would be a critical research area to develop an intelligent ﬂight

management unit.

Table 1: Principal characteristics of some of the introduced intelligent control systems for

VTOL aerial vehicles

Ref. Controller Characteristics∗Main features Limitations/ Complexities

[180] Backstepping TD -Utilizing a switching func-

tion to integrate the NN

and DO

-Neglecting the approxima-

tion error of diﬀerentiators

109

[168] Neuroadaptive TMD -Considering aerodynamic

frictions in the model

-Considering unknown in-

ertia matrix

-Avoiding attitude singu-

larity problem using a BLF

rather than employing the

well-known quaternion for-

mulation

[95] Neuroadaptive TD -SMC-like-based training

algorithm for FNNs

-The system should be

decoupled into a set of

SISO models

-Applicable in second-

order systems

-Concerns with the stabil-

ity analysis

[274] ADP (H2

control)

T -Event-triggered control

-Using discounted cost

-On-policy learning

-The necessity for the PE

condition

-Several conservative as-

sumptions in the stability

analysis

-Requires entire dynamic

model

[10] On-policy

DPG

WR -Performing a wide range

of maneuvers, stably

-No stability analysis

-Huge oﬀ-line computa-

tional burden

[177] Backstepping ADIO -Adopting combined NN

and DO

-Using Nussbaum function

to deal with input satura-

tion

-Using BLF to tackle out-

put constraints

-Concerns with the stabil-

ity analysis

-Consdeiring a SISO model

[116] Backstepping AOMD -DSC for multi-rotor UAV

-Adopting combined NN

and DO

-Using a time-varying BLF

-Large control actions

caused by BLF

[96] Neuroadaptive AR -SMC-like-based training

algorithm for FNNs

-Generality of the control

scheme

-The system should be de-

coupled into a set of SISO

models

-Chattering phenomenon

-The plant must be stabi-

lizable by a PID controller

-Concerns with the stabil-

ity analysis

110

[308] Oﬀ-policy

DDPG

WFR -Hybrid oﬄine-online

learning algorithm

-Adopting integrators to

eliminate steady-state

error

-No stability analysis

-Signiﬁcant oﬀ-line compu-

tational burden

[156] Neuroadaptive TR -SMC-like-based training

algorithm for FNNs

-The system should be de-

coupled into a set of SISO

models

-The plant must be stabi-

lizable by a PD controller

-Concerns with the stabil-

ity analysis

* Control objective: A: Attitude control, W: waypoint tracking, T: Tra jectory tracking, L: Longitu-

dinal mode/ I: Consideration of input constraints, O: Consideration of output (state) constraints/

M: Minimal-learning parameter/ K: Output feedback control/ D: Disturbance or noise rejection, F:

Fault-tolerant control, R: Mo del-free control

Table 2: Principal characteristics of some of the introduced intelligent control systems for

HFVs and NSVs

Ref. Controller Characteristics Main features Limitations/ Complexities

[118] Backstepping ADI -DSC with WNN-based

-Using Nussbaum func-

tion to deal with input

saturation

-Concerns with satiability

analysis

[141] Backstepping ADI -Adopting combined NN

and DO

-Using a modiﬁed tracking

error to deal with input

saturation

-Control allocation using a

convex optimization solved

by an RNN

-Neglecting the control al-

location error in the stabil-

ity analysis

[174] Backstepping LI -DSC with direct neural

approximation

-Using Nussbaum function

to deal with dead-zone in-

put nonlinearity

[133] Backstepping LMO -Funnel control to guaran-

tee the transient perfor-

mance

-Consideration of ﬂexible

states

-Many design parameters

-Availability of the third

derivative of the tracking

error

111

[127] Backstepping LMIO -FOSMD in the backstep-

ping design

-Morphing aircraft (with

pure-feedback model)

-Using Butterworth ﬁlter

to avoid algebraic loop in

the control design

-Consideration of only the

cruise phase

-Assuming the bounded ﬁl-

tering error

[41] Backstepping L -Using a discrete-time

model

-Utlizing a prediciton

model

[130] Backstepping L -Direct neural-

backstepping scheme

-Using the integral of

tracking error to eliminate

the steady tracking error

[132] Neuroadaptive LMK -Deﬁning an OFB model

and using HGOs to avoid

backstepping

-Large control commands

at early times

[115] Backstepping LMF -Avoiding singularity prob-

lem using direct DSC

-Considering only the bias

actuator fault

-Unusual formulation of

NNs

[99] Pseudocontrol LM -Avoiding the backstep-

ping design through trans-

forming the mo del into a

normal feedback form

-No requirement for the

contraction assumption

-Time derivatives of FPA

should be measurable

[83] Backstepping LFDO -FOSMD in the backstep-

ping design

-Consideration of AOA

constraint

-Neural fault identiﬁcation

-Using a SISO model

[147] Backstepping LMFDO -Control of the transient

response

Asymptotic tracking con-

trol

-Parameter drift in the up-

dating rules

-Excessive control eﬀort at

the vicinity of permissible

output bounds

Table 3: Principal characteristics of some of the introduced intelligent control systems for

ﬁxed-wing aircraft

Ref. Controller Characteristics Main features Limitations/ Complexities

112

[38] Pseudocontrol AF -Hybrid direct-indirect

adaptive control

-Considering (a speciﬁc)

structural damage

-No actuator dynamics

-Slow convergence of the

algorithm

[100] Pseudocontrol WF -Modiﬁcation of guidance

commands to adapt to cur-

rent ﬂight condition

-No stability analysis

[277] ADP (H∞

control)

WD -Oﬀ-policy learning

-Partially model-free con-

trol

-Employing single NN

-No stability analysis

[272] ADP (H∞

control)

WD -Event-triggered control

-Employing single NN

-On-policy learning

-Requires entire dynamic

model

-The necessity for the PE

condition

-Several conservative as-

sumptions in the stability

analysis

[197] Dynamic

inversion

AF -Indirect EKF-based fault

identiﬁcation

-Concerns with the stabil-

ity analysis

[200] MPC TIFR -Multimodel FTC scheme

-Indirect RLS

optimization-based fault

identiﬁcation

-Concerns with the feasibil-

ity of the proposed control

design

[129] Backstepping AF -Fractional-order backstep-

ping control

-Decentralized control of

multi-UAVs

-Adopting combined NN

and DO

-Concerns with employing

fractional-order control in

practice

-Conservative assumptions

on estimation errors

[178] Backstepping TF -DSC-based distributed

formation ﬂight control

-Adopting combined NN

and DO

-Consideration of wake

vortices

-Some simpliﬁcations in

dynamic modeling

Table 4: Principal characteristics of some of the introduced intelligent control systems for

nonconventional air vehicles

Ref. Controller Characteristics Main features Limitations/ Complexities

113

[160] Neuroadaptive TOD -Flapping wing micro

aerial vehicle control

-Adopting combined NN

and DO

[293] Deep

Q-learning

TR -Control of ﬂapping-wing

aerial vehicles using RL

-Utilizing an evolutionary

exploration

-Maximizing expected re-

ward near an action se-

quence to improve the ro-

bustness

-No stability analysis

Table 5: Principal characteristics of some of the introduced intelligent high-level control meth-

ods

Ref. Method Ob jective∗Main features Limitations/ Complexities

[255] Supervised

learning

C -Using simple images for

training with no need for

determining characteristic

features of an ob ject

-Lack of strong mathemat-

ical foundation

-No stability analysis

[296] Deep RL C -Real ﬂight experiments

-Using only monocular im-

ages as input

-No stability analysis

[297] Deep RL DE -Training mobile charging

stations to autonomously

move to the charging point

in an optimal manner

-Considering the 2D prob-

lem

-No stability analysis

[312] Modiﬁed

DPG

CW -Using the POMDP scheme

-Navigating in lrage-scale

complex environment

-No stability analysis

[309] DDPG W -Auto-landing on a moving

platform

-Real ﬂight experiments

-Considering 2D problem

-No stability analysis

[323] Guided

policy

CW -MPC-based guided policy

-Providing a safer training

phase using MPC in the

training

-Requiring approximate

dynamic model

-No stability analysis

[326] MPPI CW -Control of nonaﬃne dy-

namics

-Utilizing information the-

oretic MPC with a generic

cost function

-Requiring the dynamic

model

-No stability analysis

114

* Obstacle or Collision avoidance: C/ Data collection: D/ Waypoint tracking: W/ Consideration

of total energy constraint: E

5. Concluding remarks and future directions

Intelligent ﬂight control systems have been signiﬁcantly evolved, particularly

during the last two decades. They have been able to satisfactorily deal with dif-

ferent practical issues in a real ﬂight, e.g. atmospheric disturbances, operational

faults, model uncertainties, unmodeled dynamics, etc. In addition, concerning

model-free control methods, there has been remarkable progress in both indi-

rect adaptive controllers, which employ NNs to provide a valid dynamic model

of the system, and direct adaptive control systems using the optimal control

or the RL frameworks. Besides, recently, intelligent approaches, particularly

those based on RL, have been eﬀectively adopted in high-level control systems

to provide intelligent path planning and guidance loops in ﬂight control systems.

Such remarkable progress of IFCSs results in introducing aerial robots with an

outstandingly high level of autonomy. Despite all these advances, there still is

a long way to go to introduce a generic intelligent ﬂight control system. In the

following, we address a set of crucial bottlenecks along with some suggestions

for the direction of future research in developing such an intelligent ﬂight control

system.

1. Design parameters: The determination of appropriate design parameters

in proposed IFCSs is a challenging issue, which is typically carried out by

trial and error. Although the training of the controller’s parameters (using

an additional learning loop [329, 47] or evolutionary algorithms [207]) or

the reduction of the design parameters (by incorporating self-organizing

[228] or new data analysis approaches [223]) can deal with such a problem

to a certain extent, the development of generic intelligent control systems

with no (or at least very low) design parameters is still an open problem

in the ﬁeld of intelligent control. Thinking about more ﬂexible control

115

structures organized by the incoming system data using machine learning

approaches can be a gateway to eﬃcient solutions.

2. High-level control : An intelligent guidance loop to ensure that a feasi-

ble trajectory is commanded to the aircraft is critical in developing a

reliable ﬂight control system in the presence of internal and external dis-

turbances. However, this loop is typically remained unchanged after the

fault occurrence in classical IFCSs [100]. Adaptive estimation of the ﬂight

envelope in the presence of operational faults would be the ﬁrst step to

provide a feasible FTC system [330]. In a more general view, the prob-

lem of the intelligent trajectory generation (for diﬀerent purposes such

as collision avoidance) is a challenging problem, which has received less

attention from academia. Such a problem considering diﬀerent design

criteria, such as obstacle/collision avoidance, optimal resource allocation,

etc., have been addressed in [296, 297, 312] using RL, while there are

concerns about the deﬁnition of an appropriate reward function. In this

respect, there is a signiﬁcant need for the improvement and uniﬁcation of

such high-level control systems with conventional low-level IFCSs, consis-

tently, to develop a fully autonomous ﬂight management system. Further,

by developing novel machine learning algorithms along with the develop-

ment of the computing power, there will be an opportunity to redeﬁne an

entire ﬂight control problem (which, in the existing framework, includes

various control loops) as a new framework with a more integrated and

concise structure that can map high-level commands to low-level inputs

with less human intervention using novel machine learning methods.

3. Evolutionary algorithms: Although evolutionary algorithms are not cur-

rently a mainstream topic in aerial robotics, they may be an appropriate

candidate in the near future to be adopted in NN-based ﬂight control

systems to enhance the eﬀectiveness and eﬃciency of training algorithms,

while reducing the computational complexity, and to learn the networks’

architecture and learning hyperparameters [14, 316]. To this end, there

would be a requirement for a new mathematical framework (maybe in a

116

probabilistic representation) to analyze the convergence of such learning

approaches in order to develop a reliable control design procedure.

4. Controllability region: The provided stability analysis in almost all of the

existing literature introduces a region of stability, where the boundaries of

that region are determined by upper limits of a set of parameters, which

do not have a physical meaning or are not measurable [60]. This is a

serious problem in utilizing such adaptive controllers in a real applica-

tion because we can not determine the controllability region of the system

based on physical parameters. This problem becomes more challenging in

model-free control systems. In this regard, there is a need for introducing

a set of tangible criteria for analyzing closed-loop stability. More specif-

ically, the borrowing of concepts from the information theory to analyze

the controllability of the system according to diﬀerent characteristics of

incoming data would be an attractive idea to provide a beneﬁcial stability

analysis framework even for model-free control systems [326].

5. Input-output constraints: Simultaneous consideration of input and output

constraints in the control design process is a challenging problem. This

is due to the fact that the satisfaction of input constraints may lead to

larger tracking errors, and conversely, the consideration of output con-

straints may necessitate impractical control commands. Integration of

the funnel control with input saturation constraint has been addressed in

[331] and [332] for a linear minimum phase and a SISO nonlinear system

without considering model uncertainty in the control problem, while the

problem becomes more challenging in the presence of uncertain dynamics.

Reinforcement learning would be an eﬀective candidate in such control

problems. More precisely, using the RL, it is possible to learn an optimal

policy as a mapping from permissible system inputs to desired outputs.

Further, concerning fault-tolerant ﬂight control systems, an iterative learn-

ing control scheme integrated by RL methods (such as MPPI [328]) would

be an appropriate solution in future studies.

6. Structural constraints: Elastic modes of an air vehicle can result in a va-

117

riety of undesirable phenomena such as ﬂutter and control reversal, which

may aﬀect the closed-loop performance. This can become more challeng-

ing in the case of damaged aircraft due to the uncertainty in structural

margins and elastic modes shifting (caused by changes in the structural

stiﬀness and mass of the airframe) [38]. The consideration of such struc-

tural constraints in the design process of IFCSs in future studies is a vital

issue.

7. RL computational complexity : The considerable computational cost of RL

is still a challenging issue, which is more problematic in the case of high

dimensional problems [10]. In this regard, the development of integrated

multi-loop control systems in which RL is employed in the outer control

loop design, while existing classical intelligent controllers (discussed in

Section 2) are utilized in the inner loop, would be an eﬀective solution to

reduce the dimension of the action space, thereby signiﬁcantly reducing

the learning complexity.

8. Reward function in RL: The deﬁnition of an appropriate reward function

in RL-based control systems is of great importance, where there is still

no well-known intelligent approach to deﬁne that. Although reward shap-

ing is a well-known approach to speed up the learning process, there are

signiﬁcant concerns about the optimality of the computed policy and the

convergence of the algorithm [311]. Inverse RL could be another solution

to identify appropriate reward function using the learning from demonstra-

tion (i.e. the task of learning from an expert) [333]. Further, the concept

of learning from demonstration (also known as apprenticeship learning)

would be a useful approach to train an intelligent trajectory generation

scheme [334].

9. NNs’ adaptation speed: There are still considerable concerns about the

adaptation speed of NNs in both the model-based and model-free control

systems, particularly in the case of time-varying systems and environments

with rapid changes [200, 195]. To deal with such an issue, there would be a

requirement for more eﬀective learning schemes with a faster convergence

118

rate, while not violating the robustness of the closed-loop system. On the

other hand, the computation of the best adaptation rate in diﬀerent ﬂight

conditions is another complicated problem with no clear answer [335].

Concerning the adaptive control of dynamic systems with parametric un-

certainty, the above-mentioned problems have been addressed in the past

two decades within the framework of L1-adaptive control, which attempts

to decouple the estimation loop from the control loop in order to decouple

the adaptation from the robustness [336]. However, several issues have

been reported in the literature regarding the claims made about the capa-

bilities of L1-adaptive control [37]. In this regard, there is a serious need to

develop novel learning frameworks to be used in the case of dynamics sys-

tems subject to rapid changes in the dynamic model and the environment

(considering both parametric and nonparametric uncertainties).

10. NNs’ structure in RL: Typically, feedforward NNs are used as the actor

and critic networks within the actor-critic framework (particularly, in RL

[308]). RNNs can be an eﬃcient alternative to feedforward NNs within

such a framework to improve the closed-loop stability while reducing the

bias and variance of the learning process.

11. Time-dependent NNs: There are more complex NNs in the literature,

which can be adopted in the control design procedure to enhance the mod-

eling and training performance. For instance, continuous-time RNNs [337]

can be used to incorporate the time-varying sampling times [316]. Further,

in spiking NNs, which are more close to biological NNs, each neuron has a

membrane potential aﬀected by incoming signals, where the neuron emits

a spike once its membrane potential exceeds a speciﬁc threshold, and sub-

sequently, the membrane potential is reset to a rest value. Although, due

to the complex training process of such NNs, currently, they are mostly

employed in relatively simple control problems [338, 339], spiking NNs can

be a beneﬁcial choice to imitate complex behavior of intelligent systems,

thereby providing a signiﬁcant potential to be used in complex intelli-

gent controllers. Further, as these NNs encompass the concept of time in

119

their models, they could be appropriate solutions to deal with explicitly

time-dependent model uncertainties and external disturbances.

12. Complex NNs: Diﬀerent types of deep NNs, wavelet NNs, and CNNs have

demonstrated their superior performance in the identiﬁcation of complex

nonlinear systems [254, 340]. Although several studies have addressed the

development of direct (and rarely indirect) adaptive ﬂight control systems

consisting of such complicated NNs [341, 342, 99, 232], the eﬀective em-

ployment of them in both the identiﬁcation and control design steps, which

may also require the development of more eﬃcient training algorithms

rather than existing ones, can impressively reduce the NNs’ estimation

error, which in turn, results in reducing the conservativeness of designed

controllers, signiﬁcantly.

13. Aggressive maneuvers: Most of the introduced trajectory tracking control

schemes have been designed and validated under simple trajectories with

no aggressive maneuvers [293]. In recent years, some researchers have ad-

dressed the development of autonomous aircraft aerobatics employing the

concept of learning from demonstrations [234]. Such an approach can be

eﬀectively employed in the framework of IFCSs, particularly those based

on RL to provide the ability to perform a wide range of maneuvers, and

in the near future, IFCSs are expected to be able to fulﬁll more complex

tasks, such as take-oﬀ, landing, and diﬀerent aggressive maneuvers, with

guaranteed performance.

Acknowledgements

This work was supported by Iran National Science Foundation (INSF) and

Iran’s National Elites Foundation (INEF) grant 98027065.

References

[1] Bruce T. Clough and Wright Patterson. Metrics, Schmetrics! How The

Heck Do You Determine A UAV’s Autonomy Anyway? In Proceed-

120

ings of the 2002 Performance Metrics for Intelligent Systems Workshop,

Gaithersburg, MD, 2002.

[2] Linda S. Gottfredson. Mainstream Science on Intelligence: An Edito-

rial With 52 Signatories, History, and Bibliography. INTELLIGENCE,

24(1):13–23, 1997.

[3] Lyle Long and Troy Kelley. The Requirements and Possibilities of Creat-

ing Conscious Systems. In AIAA Infotech@ Aerospace Conference, Seattle,

Washington, 2009.

[4] Thaddeus Eze, Richard Anthony, Alan Soper, and Chris Walshaw. A

Generic Approach towards Measuring Level of Autonomicity in Adaptive

Systems. International Journal on Advances in Intelligent Systems, 5(3-

4), 2012.

[5] Marialena Vagia, Aksel A. Transeth, and Sigurd A. Fjerdingen. A liter-

ature review on the levels of automation during the years. What are the

diﬀerent taxonomies that have been proposed? Applied ergonomics, 53

Pt A:190–202, 2016.

[6] Marco Protti and Riccardo Barzan. UAV Autonomy – Which level is

desirable? – Which level is acceptable? Alenia Aeronautica Viewpoint.

In Platform Innovations and System Integration for Unmanned Air, Land

and Sea Vehicles, Neuilly-sur-Seine, France, 2007.

[7] Andriy Sarabakha, Changhong Fu, Erdal Kayacan, and Tufan Kum-

basar. Type-2 Fuzzy Logic Controllers Made Even Simpler: From Design

to Deployment for UAVs. IEEE Transactions on Industrial Electronics,

65(6):5069–5077, 2018.

[8] Kirk Y. W. Scheper, Sjoerd Tijmons, Cornelis C. de Visser, and Guido C.

H. E. de Croon. Behavior Trees for Evolutionary Robotics. Artiﬁcial life,

22(1):23–48, 2016.

121

[9] G. C. H. E. de Croon, M. Per¸cin, B. D. W. Remes, C. de Wagter, and

R. Ruijsink. The DelFly: Design, aerodynamics, and artiﬁcial intelligence

of a ﬂapping wing robot / G.C.H.E. de Croon, M. Per¸cin, B.D.W. Remes,

R. Ruijsink, C. de Wagter. Springer, Berlin, 2016.

[10] Jemin Hwangbo, Inkyu Sa, Roland Siegwart, and Marco Hutter. Con-

trol of a Quadrotor With Reinforcement Learning. IEEE Robotics and

Automation Letters, 2(4):2096–2103, 2017.

[11] Lu Cheng, ChangSheng Jiang, and Ming Pu. Online-SVR-compensated

nonlinear generalized predictive control for hypersonic vehicles. Science

China Information Sciences, 54(3):551–562, 2011.

[12] Eunjung Ju, Jungdam Won, Jehee Lee, Byungkuk Choi, Junyong Noh,

and Min Gyu Choi. Data-driven control of ﬂapping ﬂight. ACM Trans-

actions on Graphics, 32(5):1–12, 2013.

[13] Dario Floreano, Jean-Christophe Zuﬀerey, and Jean-Daniel Nicoud. From

wheels to wings with evolutionary spiking circuits. Artiﬁcial life, 11(1-

2):121–138, 2005.

[14] Fernando Silva, Miguel Duarte, Lu´ıs Correia, Sancho Moura Oliveira, and

Anders Lyhne Christensen. Open Issues in Evolutionary Robotics. Evo-

lutionary Computation, 24(2):205–236, 2016.

[15] Fendy Santoso, Matthew A. Garratt, and Sreenatha G. Anavatti. State-of-

the-Art Intelligent Flight Control Systems in Unmanned Aerial Vehicles.

IEEE Transactions on Automation Science and Engineering, 15(2):613–

627, 2018.

[16] C. C. Lee. Fuzzy logic in control systems: Fuzzy logic controller. I. IEEE

Transactions on Systems, Man, and Cybernetics, 20(2):404–418, 1990.

[17] Giampiero Campa, Mario L. Fravolini, Brad Seanor, Marcello R. Napoli-

tano, Diego Del Gobbo, Gu Yu, and Srikanth Gururajan. On-line learn-

ing neural networks for sensor validation for the ﬂight control system of a

122

B777 research scale model. International Journal of Robust and Nonlinear

Control, 12(11):987–1007, 2002.

[18] Janusz Kacprzyk, Johann Schumann, and Yan Liu, editors. Applications

of Neural Networks in High Assurance Systems. Studies in Computational

Intelligence. Springer Berlin Heidelberg, Berlin, Heidelberg, 2010.

[19] James E. Tomayko. The Story of Self-Repairing Flight Control Systems.

NASA Dryden Flight Reasearch Center, 2003.

[20] M. Steinberg. Historical Overview of Research in Reconﬁgurable Flight

Control. Proceedings of the Institution of Mechanical Engineers, Part G:

Journal of Aerospace Engineering, 219(4):263–275, 2005.

[21] Johann Schumann, Pramod Gupta, and Yan Liu. Application of Neural

Networks in High Assurance Systems: A Survey. In Janusz Kacprzyk, Jo-

hann Schumann, and Yan Liu, editors, Applications of Neural Networks in

High Assurance Systems, volume 268 of Studies in Computational Intelli-

gence, pages 1–19. Springer Berlin Heidelberg, Berlin, Heidelberg, 2010.

[22] Markus Kaminski. A Rapid Robust Fault Detection Algorithm for Flight

Control Reconﬁguration. Master Thesis, Germany, 2017.

[23] Peggy Williams. Selected Flight Test Results for Online Learning Neural

Network-Based Flight Control System. In AIAA 1st Intelligent Systems

Technical Conference, Chicago, Illinois, 2004. AIAA.

[24] Jacob Hageman, Mark Smith, and Susan Stachowiak. Integration of On-

line Parameter Identiﬁcation and Neural Network for In-Flight Adaptive

Control. NASA Dryden Flight Research Center, 2003.

[25] Tim Smith, Jim Barhorst, and James M. Urnes. Design and Flight Test

of an Intelligent Flight Control System. In Janusz Kacprzyk, Johann

Schumann, and Yan Liu, editors, Applications of Neural Networks in High

Assurance Systems, volume 268 of Studies in Computational Intelligence,

pages 57–76. Springer Berlin Heidelberg, Berlin, Heidelberg, 2010.

123

[26] John T. Bosworth and Peggy S. Williams-Hayes. Flight Test Results

from the NF-15B Intelligent Flight Control System (IFCS) Project with

Adaptation to a Simulated Stabilator Failure. In AIAA Conference and

exhibit, AIAA infotech@Aerospace Conference and Exhibit, Rohnert Park,

California, 2007. AIAA.

[27] John Burken, Curt Hanson, Jim Lee, and John Kaneshige. Flight Test

Comparison of Diﬀerent Adaptive Augmentations of Fault Tolerant Con-

trol Laws for a Modiﬁed F-15 Aircraft. In AIAA Infotech@Aerospace

Conference, Seattle, Washington, 2009.

[28] Curt Hanson, Jacob Schaefer, Marcus Johnson, and Nhan Nguyen. Design

of Low Complexity Model Reference Adaptive Controllers. NASA Dryden

Flight Research Center, 2012.

[29] Curt Hanson, Jacob Schaefer, John J. Burken, David Larson, and Marcus

Johnson. Complexity and Pilot Workload Metrics for the Evaluation of

Adaptive Flight Controls on a Full Scale Piloted Aircraft. NASA Dryden

Flight Research Center, 2014.

[30] Konstantinos Dalamagkidis, Kimon P. Valavanis, and Les A. Piegl. Non-

linear Model Predictive Control With Neural Network Optimization for

Autonomous Autorotation of Small Unmanned Helicopters. IEEE Trans-

actions on Control Systems Technology, 19(4):818–831, 2011.

[31] Zhijun Li, Jun Deng, Renquan Lu, Yong Xu, Jianjun Bai, and Chun-Yi

Su. Tra jectory-Tracking Control of Mobile Robot Systems Incorporating

Neural-Dynamic Optimized Model Predictive Approach. IEEE Transac-

tions on Systems, Man, and Cybernetics: Systems, 46(6):740–749, 2016.

[32] Mehmet ¨

Onder Efe. Neural Network Assisted Computationally Simple

PID Control of a Quadrotor UAV. IEEE Transactions on Industrial In-

formatics, 7(2):354–361, 2011.

124

[33] Bin Xu, Chenguang Yang, and Yongping Pan. Global neural dynamic sur-

face tracking control of strict-feedback systems with application to hyper-

sonic ﬂight vehicle. IEEE Transactions on Neural Networks and Learning

Systems, 26(10):2563–2575, 2015.

[34] Bin Xu, Daipeng Yang, Zhongke Shi, Yongping Pan, Badong Chen, and

Fuchun Sun. Online Recorded Data-Based Composite Neural Control

of Strict-Feedback Systems With Application to Hypersonic Flight Dy-

namics. IEEE Transactions on Neural Networks and Learning Systems,

29(8):3839–3849, 2018.

[35] Karl J. ˚

Astr¨om and Bj¨orn Wittenmark. Adaptive control. Dover books on

engineering. Dover Publications, Mineola N.Y., 2nd ed., dover ed. edition,

2008.

[36] Weibin Gu, Kimon P. Valavanis, Matthew J. Rutherford, and Alessandro

Rizzo. A Survey of Artiﬁcial Neural Networks with Model-based Control

Techniques for Flight Control of Unmanned Aerial Vehicles. In 2019 In-

ternational Conference on Unmanned Aircraft Systems (ICUAS), pages

362–371, Atlanta, GA, USA, 2019. IEEE.

[37] P. A. Ioannou, A. M. Annaswamy, K. S. Narendra, S. Jafari, L. Rudd,

R. Ortega, and J. Boskovic. L1-Adaptive Control: Stability, Robust-

ness, and Interpretations. IEEE Transactions on Automatic Control,

59(11):3075–3080, 2014.

[38] Nhan Nguyen, Kalmanje Krishnakumar, John Kaneshige, and Pascal Ne-

speca. Flight dynamics and hybrid adaptive control of damaged aircraft.

Journal of Guidance, Control, and Dynamics, 31(3):751–764, 2008.

[39] A. J. Calise. Neural networks in nonlinear aircraft ﬂight control. IEEE

Aerospace and Electronic Systems Magazine, 11(7):5–10, 1996.

[40] Martin T. Hagan, Howard B. Demuth, Mark Hudson Beale, and Orlando

de Jes´us. Neural network design. 2nd ed. edition, 2016.

125

[41] Bin Xu and Yu Zhang. Neural discrete back-stepping control of hypersonic

ﬂight vehicle with equivalent prediction model. Neurocomputing, 154:337–

346, 2015.

[42] Bin Xu, Yongping Pan, Danwei Wang, and Fuchun Sun. Discrete-time

hypersonic ﬂight control based on extreme learning machine. Neurocom-

puting, 128:232–241, 2014.

[43] F. Rinaldi, S. Chiesa, and F. Quagliotti. Linear Quadratic Control for

Quadrotors UAVs Dynamics and Formation Flight. Journal of Intelligent

& Robotic Systems, 70(1-4):203–220, 2013.

[44] Valeria Artale, Mario Collotta, Cristina Milazzo, Giovanni Pau, and An-

gela Ricciardello. An Integrated System for UAV Control Using a Neural

Network Implemented in a Prototyping Board. Journal of Intelligent and

Robotic Systems, 2016.

[45] Jih-Gau Juang and Kai-Chung Cheng. Application of Neural Networks

to Disturbances Encountered Landing Control. IEEE Transactions on

Intelligent Transportation Systems, 7(4):582–588, 2006.

[46] Sefer Kurnaz, Omer Cetin, and Okyay Kaynak. Adaptive neuro-fuzzy in-

ference system based autonomous ﬂight control of unmanned air vehicles.

Expert Systems with Applications, 37(2):1229–1234, 2010.

[47] Seyyed Ali Emami and Alireza Roudbari. Multimodel ELM-Based

Identiﬁcation of an Aircraft Dynamics in the Entire Flight Envelope.

IEEE Transactions on Aerospace and Electronic Systems, 55(5):2181–

2194, 2019.

[48] Erdal Kayacan. Fuzzy neural networks for real time control applica-

tions: Concepts, modeling and algorithms for fast learning. Elsevier and

Butterworth-Heinemann, Amsterdam and Oxford UK, 2016.

[49] P. Baldi, P. Castaldi, N. Mimmo, and S. Simani. Satellite attitude active

FTC based on Geometric Approach and RBF Neural Network. In 2013

126

Conference on Control and Fault-Tolerant Systems (SysTol), pages 667–

673, Nice, France, 2013. IEEE.

[50] P. Baldi, M. Blanke, P. Castaldi, N. Mimmo, and S. Simani. Combined

Geometric and Neural Network Approach to Generic Fault Diagnosis in

Satellite Actuators and Sensors. IFAC-PapersOnLine, 49(17):432–437,

2016.

[51] M. Verhaegen, S. Kanev, R. Hallouzi, C. Jones, J. Maciejowski, and

H. Smail. Fault Tolerant Flight Control - A Survey. In Christopher

Edwards, Thomas Lombaerts, and Haﬁd Smaili, editors, Fault tolerant

ﬂight control, Lecture notes in control and information sciences, 0170-

8643. Springer, Berlin, 2010.

[52] Weibin Gu, Kimon P. Valavanis, Matthew J. Rutherford, and Alessandro

Rizzo. UAV Model-based Flight Control with Artiﬁcial Neural Networks:

A Survey. Journal of Intelligent and Robotic Systems, 100(3-4):1469–1491,

2020.

[53] Mohd Ariﬀanan Mohd Basri, Abdul Rashid Husain, and Kumeresan A.

Danapalasingam. Intelligent adaptive backstepping control for MIMO

uncertain non-linear quadrotor helicopter systems. Transactions of the

Institute of Measurement and Control, 37(3):345–361, 2015.

[54] Anthony J. Calise, Naira Hovakimyan, and Moshe Idan. Adaptive output

feedback control of nonlinear systems using neural networks. Automatica,

37(8):1201–1211, 2001.

[55] Nakwan Kim. Improved methods in neural network based adaptive output

feedback control, with applications to ﬂight control. PhD Thesis, School of

Aerospace Engineering, 2003.

[56] Girish Chowdhary, Maximilian M¨uhlegg, and Eric Johnson. Exponential

parameter and tracking error convergence guarantees for adaptive con-

127

trollers without persistency of excitation. International Journal of Con-

trol, 87(8):1583–1603, 2014.

[57] Taeyoung Lee and Youdan Kim. Nonlinear Adaptive Flight Control Us-

ing Backstepping and Neural Networks Controller. Journal of Guidance,

Control, and Dynamics, 24(4):675–682, 2001.

[58] Shushuai Li, Yaonan Wang, Jianhao Tan, and Yan Zheng. Adaptive

RBFNNs/integral sliding mode control for a quadrotor aircraft. Neu-

rocomputing, 216:126–134, 2016.

[59] Samir Zeghlache, Hemza Mekki, Abderrahmen Bouguerra, and Ali Djeri-

oui. Actuator fault tolerant control using adaptive RBFNN fuzzy sliding

mode controller for coaxial octorotor UAV. ISA Transactions, 80:267–278,

2018.

[60] Yongduan Song, Liu He, Dong Zhang, Jiye Qian, and Jin Fu. Neuroad-

aptive fault-tolerant control of quadrotor UAVs: a more aﬀordable so-

lution. IEEE Transactions on Neural Networks and Learning Systems,

30(7):1975–1983, 2019.

[61] Dong-Ho Shin and Youdan Kim. Nonlinear discrete-time reconﬁgurable

ﬂight control law using neural networks. IEEE Transactions on Control

Systems Technology, 14(3):408–422, 2006.

[62] T. Zhang, S. S. Ge, and C. C. Hang. Design and performance analysis of a

direct adaptive controller for nonlinear systems. Automatica, 35(11):1809–

1817, 1999.

[63] S. S. Ge and Cong Wang. Direct adaptive NN control of a class of nonlinear

systems. IEEE Transactions on Neural Networks, 13(1):214–221, 2002.

[64] M. Vijaya Kumar, S. Suresh, S. N. Omkar, Ranjan Ganguli, and Prasad

Sampath. A direct adaptive neural command controller design for an

unstable helicopter. Engineering Applications of Artiﬁcial Intelligence,

22(2):181–191, 2009.

128

[65] E. Tzirkel-Hancock and F. Fallside. Stable control of nonlinear systems

using neural networks. International Journal of Robust and Nonlinear

Control, 2(1):63–86, 1992.

[66] S. Fabri and V. Kadirkamanathan. Dynamic structure neural networks

for stable adaptive control of nonlinear systems. IEEE Transactions on

Neural Networks, 7(5):1151–1167, 1996.

[67] Jovan D. Boskovic, Lingji Chen, and Raman K. Mehra. Adaptive Con-

trol Design for Nonaﬃne Models Arising in Flight Control. Journal of

Guidance, Control, and Dynamics, 27(2):209–217, 2004.

[68] Petros A. Ioannou and Petar V. Kokotovic, editors. Adaptive Systems with

Reduced Models, volume 47 of Lecture Notes in Control and Information

Sciences. Springer, Berlin and Heidelberg, 1983.

[69] P. A. Ioannou and Jing Sun. Robust adaptive control. Dover Publications

Inc, Mineola New York, 2012.

[70] D.-H. Shin and Y. Kim. Reconﬁgurable Flight Control System Design

Using Adaptive Neural Networks. IEEE Transactions on Control Systems

Technology, 12(1):87–100, 2004.

[71] Naira Hovakimyan, Nakwan Kim, Anthony Calise, and J.V.R. Prasad.

Adaptive Output Feedback for High-Bandwidth Control of an Unmanned

Helicopter. In AIAA Guidance, Navigation, and Control Conference,

Montreal, Canada, 2001. American Institute of Aeronautics and Astro-

nautics.

[72] N. Hovakimyan, F. Nardi, A. Calise, and Nakwan Kim. Adaptive output

feedback control of uncertain nonlinear systems using single-hidden-layer

neural networks. IEEE Transactions on Neural Networks, 13(6):1420–

1431, 2002.

129

[73] S. S. Ge, B. Ren, Keng Peng Tee, and T. H. Lee. Approximation-based

control of uncertain helicopter dynamics. IET Control Theory & Applica-

tions, 3(7):941–956, 2009.

[74] Kumpati S. Narendra and Anuradha M. Annaswamy. A New Adaptive

Law for Robust Adaptation without Persistent Excitation. IEEE Trans-

actions on Automatic Control, pages 1067–1072, 1987.

[75] R. Rysdyk and A. J. Calise. Robust nonlinear adaptive ﬂight control

for consistent handling qualities. IEEE Transactions on Control Systems

Technology, 13(6):896–910, 2005.

[76] R. Rysdyk and A. Calise. Fault tolerant ﬂight control via adaptive neural

network augmentation. In Guidance, Navigation, and Control Conference

and Exhibit, Guidance, Navigation, and Control Conference and Exhibit,

Boston, USA, 1998.

[77] C.J.B. Macnab. Robust Associative-Memory Adaptive Control in the

Presence of Persistent Oscillations. Neural Information Processing,

10(12):277–287, 2006.

[78] C. Nicol, C.J.B. Macnab, and A. Ramirez-Serranob. Robust neural net-

work control of a quadrotor helicopter. In Canadian Conference on Elec-

trical and Computer Engineering, Canadian Conference on Electrical and

Computer Engineering, Niagara Falls, Canada, 2008.

[79] C. Coza, C. Nicol, C.J.B. Macnab, and A. Ramirez-Serrano. Adaptive

fuzzy control for a quadrotor helicopter robust to wind buﬀeting. Journal

of Intelligent & Fuzzy Systems, 22(5,6):267–283, 2011.

[80] Bin Xu, Zhongke Shi, Chenguang Yang, and Fuchun Sun. Composite Neu-

ral Dynamic Surface Control of a Class of Uncertain Nonlinear Systems in

Strict-Feedback Form. IEEE Transactions on Cybernetics, 44(12):2626–

2634, 2014.

130

[81] Bin Xu, Danwei Wang, Youmin Zhang, and Zhongke Shi. DOB-Based

Neural Control of Flexible Hypersonic Flight Vehicle Considering Wind

Eﬀects. IEEE Transactions on Industrial Electronics, 64(11):8676–8685,

2017.

[82] Bin Xu, Xia Wang, and Zhongke Shi. Robust Adaptive Neural Control

of Nonminimum Phase Hypersonic Vehicle Model. IEEE Transactions on

Systems, Man, and Cybernetics: Systems, 51(2):1107–1115, 2021.

[83] Bin Xu, Zhongke Shi, Fuchun Sun, and Wei He. Barrier Lyapunov Func-

tion Based Learning Control of Hypersonic Flight Vehicle With AOA

Constraint and Actuator Faults. IEEE Transactions on Cybernetics,

49(3):1047–1057, 2019.

[84] Girish V. Chowdhary and Eric N. Johnson. Theory and Flight-Test Val-

idation of a Concurrent-Learning Adaptive Controller. Journal of Guid-

ance, Control, and Dynamics, 34(2):592–607, 2011.

[85] Chuan-Kai Lin. Robust adaptive critic control of nonlinear systems using

fuzzy basis function networks: An LMI approach. Information Sciences,

177(22):4934–4946, 2007.

[86] Chuan-Kai Lin. H∞reinforcement learning control of robot manipulators

using fuzzy wavelet networks. Fuzzy Sets and Systems, 160(12):1765–1786,

2009.

[87] Xiangwei Bu, Yu Xiao, and Humin Lei. An Adaptive Critic Design-

Based Fuzzy Neural Controller for Hypersonic Vehicles: Predeﬁned Be-

havioral Nonaﬃne Control. IEEE/ASME Transactions on Mechatronics,

24(4):1871–1881, 2019.

[88] Yanhong Luo, Qiuye Sun, Huaguang Zhang, and Lili Cui. Adaptive critic

design-based robust neural network control for nonlinear distributed pa-

rameter systems with unknown dynamics. Neurocomputing, 148(2):200–

208, 2015.

131

[89] Haojian Xu, Ma j Mirmirani, and Petros A. Ioannou. Robust Neural Adap-

tive Control of a Hypersonic Aircraft. In AIAA Guidance, Navigation, and

Control Conference and Exhibit, Austin, Texas, 2003.

[90] Hiroaki Gomi and Mitsuo Kawato. Neural network control for a closed-

loop System using Feedback-error-learning. Neural Networks, 6(7):933–

946, 1993.

[91] Yan Li, N. Sundararajan, P. Saratchandran, and Zhifeng Wang. Robust

neuro-H∞controller design for aircraft auto-landing. IEEE Transactions

on Aerospace and Electronic Systems, 40(1):158–167, 2004.

[92] A. A. Pashilkar, N. Sundararajan, and P. Saratchandran. A fault-tolerant

neural aided controller for aircraft auto-landing. Aerospace Science and

Technology, 10(1):49–61, 2006.

[93] Y. Li, N. Sundarara jan, and P. Saratchandran. Neuro-controller design

for nonlinear ﬁghter aircraft maneuver using fully tuned RBF networks.

Automatica, 37(8):1293–1301, 2001.

[94] Mojtaba Ahmadieh Khanesar, Erdal Kayacan, and Okyay Kaynak. Op-

timal sliding mode type-2 TSK fuzzy control of a 2-DOF helicopter. In

2015 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE),

pages 1–6, Istanbul, Turkey, 2015. IEEE.

[95] Erdal Kayacan and Reinaldo Maslim. Type-2 Fuzzy Logic Tra jectory

Tracking Control of Quadrotor VTOL Aircraft With Elliptic Membership

Functions. IEEE/ASME Transactions on Mechatronics, 22(1):339–348,

2017.

[96] Md Meftahul Ferdaus, Mahardhika Pratama, Sreenatha G. Anavatti,

Matthew A. Garratt, and Yongping Pan. Generic Evolving Self-

Organizing Neuro-Fuzzy Control of Bio-Inspired Unmanned Aerial Ve-

hicles. IEEE Transactions on Fuzzy Systems, 28(8):1542–1556, 2020.

132

[97] Md Meftahul Ferdaus, Mahardhika Pratama, Sreenatha G. Anavatti,

Matthew A. Garratt, and Edwin Lughofer. PAC: A novel self-adaptive

neuro-fuzzy controller for micro aerial vehicles. Information Sciences,

512(4):481–505, 2020.

[98] Byoung S. Kim and Anthony J. Calise. Nonlinear Flight Control Using

Neural Networks. Journal of Guidance, Control, and Dynamics, 20(1):26–

33, 1997.

[99] Xiangwei Bu and Humin Lei. A fuzzy wavelet neural network-based ap-

proach to hypersonic ﬂight vehicle direct nonaﬃne hybrid control. Non-

linear Dynamics, 94(3):1657–1668, 2018.

[100] Girish Chowdhary, Eric N. Johnson, Ra jeev Chandramohan, M. Scott

Kimbrell, and Anthony Calise. Guidance and Control of Airplanes Under

Actuator Failures and Severe Structural Damage. Journal of Guidance,

Control, and Dynamics, 36(4):1093–1104, 2013.

[101] S. Lee, C. Ha, and B. S. Kim. Adaptive nonlinear control system de-

sign for helicopter robust command augmentation. Aerospace Science and

Technology, 9(3):241–251, 2005.

[102] A. Rahideh, A. H. Bajodah, and M. H. Shaheed. Real time adaptive non-

linear model inversion control of a twin rotor MIMO system using neural

networks. Engineering Applications of Artiﬁcial Intelligence, 25(6):1289–

1297, 2012.

[103] Anthony Calise. Development of a reconﬁgurable ﬂight control law for the

X-36 tailless ﬁghter aircraft. In AIAA Guidance, Navigation, and Con-

trol Conference and Exhibit, Dever,CO,U.S.A, 2000. American Institute

of Aeronautics and Astronautics.

[104] Joseph Brinker and Kevin Wise. Flight testing of a reconﬁgurable ﬂight

control law on the X-36 tailless ﬁghter aircraft. In AIAA Guidance,

133

Navigation, and Control Conference and Exhibit, Dever,CO,U.S.A, 2000.

American Institute of Aeronautics and Astronautics.

[105] Anthony J. Calise, Seungjae Lee, and Manu Sharma. Development of a

Reconﬁgurable Flight Control Law for Tailless Aircraft. Journal of Guid-

ance, Control, and Dynamics, 24(5):896–902, 2001.

[106] Joseph S. Brinker and Kevin A. Wise. Flight Testing of Reconﬁgurable

Control Law on the X-36 Tailless Aircraft. Journal of Guidance, Control,

and Dynamics, 24(5):903–909, 2001.

[107] Nhan Nguyen, Kalmanje Krishnakumar, John Kaneshige, and Pascal Ne-

speca. Dynamics and adaptive control for stability recovery of damaged

asymmetric aircraft. In AIAA Guidance, Navigation, and Control Con-

ference and Exhibit, AIAA Guidance, Navigation, and Control Conference

and Exhibit, Keystone, Colorado, 2006. American Institute of Aeronautics

and Astronautics.

[108] Keng Peng Tee, Shuzhi Sam Ge, and F.E.H. Tay. Adaptive Neural Net-

work Control for Helicopters in Vertical Flight. IEEE Transactions on

Control Systems Technology, 16(4):753–762, 2008.

[109] DaoXiang Gao, ShiXing Wang, and Houjiang Zhang. A Singularly Per-

turbed System Approach to Adaptive Neural Back-stepping Control De-

sign of Hypersonic Vehicles. Journal of Intelligent & Robotic Systems,

73(1-4):249–259, 2014.

[110] S. S. Ge, B. Ren, and M. Chen. Robust attitude control of helicopters

with actuator dynamics using neural networks. IET Control Theory &

Applications, 4(12):2837–2854, 2010.

[111] Xiaolong Zheng and Xuebo Yang. Improved adaptive NN backstep-

ping control design for a perturbed PVTOL aircraft. Neurocomputing,

410(7):51–60, 2020.

134

[112] D. Swaroop, J. K. Hedrick, P. P. Yip, and J. C. Gerdes. Dynamic surface

control for a class of nonlinear systems. IEEE Transactions on Automatic

Control, 45(10):1893–1899, 2000.

[113] Waseem Aslam Butt, Lin Yan, and Amezquita S. Kendrick. Adaptive dy-

namic surface control of a hypersonic ﬂight vehicle with improved tracking.

Asian Journal of Control, 15(2):594–605, 2013.

[114] Qun Zong, Fang Wang, Bailing Tian, and Rui Su. Robust adaptive dy-

namic surface control design for a ﬂexible air-breathing hypersonic vehicle

with input constraints and uncertainty. Nonlinear Dynamics, 78(1):289–

315, 2014.

[115] Bin Xu, Qi Zhang, and Yongping Pan. Neural network based dynamic

surface control of hypersonic ﬂight dynamics using small-gain theorem.

Neurocomputing, 173:690–699, 2016.

[116] Chunyang Fu, Wei Hong, Huiqiu Lu, Lei Zhang, Xiaojun Guo, and Yantao

Tian. Adaptive robust backstepping attitude control for a multi-rotor

unmanned aerial vehicle with time-varying output constraints. Aerospace

Science and Technology, 78:593–603, 2018.

[117] Waseem Aslam Butt, Lin Yan, and Kendrick Amezquita S. Adaptive in-

tegral dynamic surface control of a hypersonic ﬂight vehicle. International

Journal of Systems Science, 46(10):1717–1728, 2013.

[118] Mou Chen, Yanlong Zhou, and William W. Guo. Robust tracking con-

trol for uncertain MIMO nonlinear systems with input saturation using

RWNNDO. Neurocomputing, 144(20):436–447, 2014.

[119] Li Zhou and Liping Yin. Dynamic surface control based on neural network

for an air-breathing hypersonic vehicle. Optimal Control Applications and

Methods, 36(6):774–793, 2015.

135

[120] Bin Xu, Xia Wang, Weisheng Chen, and Peng Shi. Robust Intelligent

Control of SISO Nonlinear Systems Using Switching Mechanism. IEEE

Transactions on Cybernetics, 51(8):3975–3987, 2021.

[121] J. A. Farrell, M. Polycarpou, M. Sharma, and Wenjie Dong. Com-

mand Filtered Backstepping. IEEE Transactions on Automatic Control,

54(6):1391–1395, 2009.

[122] Wenjie Dong, J. A. Farrell, M. M. Polycarpou, V. Djapic, and M. Sharma.

Command Filtered Adaptive Backstepping. IEEE Transactions on Con-

trol Systems Technology, 20(3):566–580, 2012.

[123] Bin Xu, Yuyan Guo, Yuan Yuan, Yonghua Fan, and Danwei Wang. Fault-

tolerant control using command-ﬁltered adaptive back-stepping technique:

Application to hypersonic longitudinal ﬂight dynamics. International

Journal of Adaptive Control and Signal Processing, 30(4):553–577, 2016.

[124] Lars Sonneveldt, Q. P. Chu, and J. A. Mulder. Nonlinear Flight Control

Design Using Constrained Adaptive Backstepping. Journal of Guidance,

Control, and Dynamics, 30(2):322–336, 2007.

[125] L. Sonneveldt, E. R. van Oort, Q. P. Chu, and J. A. Mulder. Nonlinear

adaptive trajectory control applied to an F-16 model. Journal of Guid-

ance, Control, and Dynamics, 32(1):25–39, 2009.

[126] Arie Levant. Robust exact diﬀerentiation via sliding mode technique.

Automatica, 34(3):379–384, 1998.

[127] Zhonghua Wu, Jingchao Lu, Qing Zhou, and Jingping Shi. Modiﬁed adap-

tive neural dynamic surface control for morphing aircraft with input and

output constraints. Nonlinear Dynamics, 87(4):2367–2383, 2017.

[128] Arie Levant. Higher-order sliding modes, diﬀerentiation and output-

feedback control. International Journal of Control, 76(9-10):924–941,

2003.

136

[129] Ziquan Yu, Youmin Zhang, Bin Jiang, Chun-Yi Su, Jun Fu, Ying Jin, and

Tianyou Chai. Decentralized fractional-order backstepping fault-tolerant

control of multi-UAVs against actuator faults and wind eﬀects. Aerospace

Science and Technology, 104(6):105939, 2020.

[130] Xiangwei Bu, Xiaoyan Wu, Zhen Ma, and Rui Zhang. Nonsingular di-

rect neural control of air-breathing hypersonic vehicle via back-stepping.

Neurocomputing, 153:164–173, 2015.

[131] Bin Xu, DaoXiang Gao, and ShiXing Wang. Adaptive neural control

based on HGO for hypersonic ﬂight vehicles. Science China Information

Sciences, 54(3):511–520, 2011.

[132] Bin Xu, Yonghua Fan, and Shangmin Zhang. Minimal-learning-parameter

technique based adaptive neural control of hypersonic ﬂight dynamics

without back-stepping. Neurocomputing, 164:201–209, 2015.

[133] Xiangwei Bu. Air-Breathing Hypersonic Vehicles Funnel Control Using

Neural Approximation of Non-aﬃne Dynamics. IEEE/ASME Transac-

tions on Mechatronics, 23(5):2099–2108, 2018.

[134] Bin Xu, Fuchun Sun, Chenguang Yang, DaoXiang Gao, and Jianxin Ren.

Adaptive discrete-time controller design with neural network for hyper-

sonic ﬂight vehicle via back-stepping. International Journal of Control,

84(9):1543–1552, 2011.

[135] Bin Xu, Danwei Wang, Fuchun Sun, and Zhongke Shi. Direct neural dis-

crete control of hypersonic ﬂight vehicle. Nonlinear Dynamics, 70(1):269–

278, 2012.

[136] Bin Xu, Zhongke Shi, Chenguang Yang, and ShiXing Wang. Neural con-

trol of hypersonic ﬂight vehicle model via time-scale decomposition with

throttle setting constraint. Nonlinear Dynamics, 73(3):1849–1861, 2013.

137

[137] Bin Xu, Danwei Wang, Fuchun Sun, and Zhongke Shi. Direct neural

control of hypersonic ﬂight vehicles with prediction model in discrete time.

Neurocomputing, 115:39–48, 2013.

[138] Zongyu Zuo and Chenliang Wang. Adaptive trajectory tracking control

of output constrained multi-rotors systems. IET Control Theory & Ap-

plications, 8(13):1163–1174, 2014.

[139] Bin Xian, Chen Diao, Bo Zhao, and Yao Zhang. Nonlinear robust output

feedback tracking control of a quadrotor UAV using quaternion represen-

tation. Nonlinear Dynamics, 79(4):2735–2752, 2015.

[140] Mou Chen, Peng Shi, and Cheng-Chew Lim. Adaptive neural fault-

tolerant control of a 3-DOF model helicopter system. IEEE Transactions

on Systems, Man, and Cybernetics: Systems, 46(2):260–270, 2016.

[141] Mou Chen. Constrained Control Allocation for Overactuated Aircraft

Using a Neurodynamic Model. IEEE Transactions on Systems, Man, and

Cybernetics: Systems, 2015.

[142] M. Krsti´c, I. Kanellakopoulos, and P. V. Kokotovi´c. Adaptive nonlin-

ear control without overparametrization. Systems & Control Letters,

19(3):177–185, 1992.

[143] Peng Shi, Cheng Chew Lim, Bin Jiang, and Dezhi Xu. Adaptive neural

observer-based backstepping fault tolerant control for near space vehi-

cle under control eﬀector damage. IET Control Theory & Applications,

8(9):658–666, 2014.

[144] Parag M. Patre, William MacKunis, Kent Kaiser, and Warren E. Dixon.

Asymptotic Tracking for Uncertain Dynamic Systems Via a Multilayer

Neural Network Feedforward and RISE Feedback Control Structure. IEEE

Transactions on Automatic Control, 53(9):2180–2185, 2008.

138

[145] Hadi Razmi and Sima Afshinfar. Neural network-based adaptive sliding

mode control design for position and attitude control of a quadrotor UAV.

Aerospace Science and Technology, 91(3):12–27, 2019.

[146] Jongho Shin, H. J. Kim, Youdan Kim, and Warren E. Dixon. Autonomous

Flight of the Rotorcraft-Based UAV Using RISE Feedback and NN Feed-

forward Terms. 20(5):1392–1399, 2012.

[147] Yajun Li, Mingshan Hou, Shuai Liang, and Gang Jiao. Predeﬁned-time

adaptive fault-tolerant control of hypersonic ﬂight vehicles without over-

parameterization. Aerospace Science and Technology, 104:105987, 2020.

[148] Hongwei Mo and Ghulam Farid. Nonlinear and Adaptive Intelligent Con-

trol Techniques for Quadrotor UAV – A Survey. Asian Journal of Control,

21(2):989–1008, 2018.

[149] Zhijun Zhang, Lunan Zheng, and Qi Guo. A Varying-Parameter Conver-

gent Neural Dynamic Controller of Multirotor UAVs for Tracking Time-

Varying Tasks. IEEE Transactions on Vehicular Technology, 67(6):4793–

4805, 2018.

[150] Zheng Zhu, Yuanqing Xia, and Mengyin Fu. Attitude stabilization of rigid

spacecraft with ﬁnite-time convergence. International Journal of Robust

and Nonlinear Control, 21(6):686–702, 2011.

[151] Bin Xu. Composite Learning Finite-Time Control With Application to

Quadrotors. IEEE Transactions on Systems, Man, and Cybernetics: Sys-

tems, 48(10):1806–1815, 2018.

[152] Xiang Yu, Yu Fu, Peng Li, and Youmin Zhang. Fault-Tolerant Aircraft

Control Based on Self-Constructing Fuzzy Neural Networks and Multivari-

able SMC Under Actuator Faults. IEEE Transactions on Fuzzy Systems,

26(4):2324–2335, 2018.

[153] Dandan Wang, Qun Zong, Bailing Tian, Shikai Shao, Xiuyun Zhang, and

Xinyi Zhao. Neural network disturbance observer-based distributed ﬁnite-

139

time formation tracking control for multiple unmanned helicopters. ISA

Transactions, 73:208–226, 2018.

[154] Dandan Wang, Qun Zong, Bailing Tian, Hanchen Lu, and Jie Wang.

Adaptive ﬁnite-time reconﬁguration control of unmanned aerial vehicles

with a moving leader. Nonlinear Dynamics, 95(2):1099–1116, 2019.

[155] E. Kayacan, O. Cigdem, and O. Kaynak. Sliding Mode Control Approach

for Online Learning as Applied to Type-2 Fuzzy Neural Networks and Its

Experimental Evaluation. IEEE Transactions on Industrial Electronics,

59(9):3510–3520, 2012.

[156] Efe Camci, Devesh Raju Kripalani, Linlu Ma, Erdal Kayacan, and Mo-

jtaba Ahmadieh Khanesar. An aerial robot for rice farm quality inspection

with type-2 fuzzy neural networks tuned by particle swarm optimization-

sliding mode control hybrid algorithm. Swarm and Evolutionary Compu-

tation, 41:1–8, 2018.

[157] S. Seshagiri and H. K. Khalil. Output feedback control of nonlinear sys-

tems using RBF neural networks. IEEE Transactions on Neural Networks,

11(1):69–79, 2000.

[158] Hassan K. Khalil. Nonlinear systems. Prentice Hall and London : Pearson

Education, Upper Saddle River, N.J., 3rd ed. edition, 2002.

[159] Jongho Shin, H. Jin Kim, and Youdan Kim. Adaptive support vector

regression for UAV ﬂight control. Neural networks : the oﬃcial journal of

the International Neural Network Society, 24(1):109–120, 2011.

[160] Wei He, Zichen Yan, Changyin Sun, and Yunan Chen. Adaptive Neural

Network Control of a Flapping Wing Micro Aerial Vehicle With Distur-

bance Observer. IEEE Transactions on Cybernetics, 47(10):3452–3465,

2017.

[161] SAMAN BEHTASH. Robust output tracking for non-linear systems. In-

ternational Journal of Control, 51(6):1381–1407, 2007.

140

[162] Travis Dierks and Sarangapani Jagannathan. Output feedback control of

a quadrotor UAV using neural networks. IEEE Transactions on Neural

Networks, 21(1):50–66, 2010.

[163] Yansheng Yang, Changjiu Zhou, and Jusheng Ren. Model reference adap-

tive robust fuzzy control for ship steering autopilot with uncertain non-

linear systems. Applied Soft Computing, 3(4):305–316, 2003.

[164] Y. Yang, G. Feng, and J. Ren. A Combined Backstepping and Small-

Gain Approach to Robust Adaptive Fuzzy Control for Strict-Feedback

Nonlinear Systems. IEEE Transactions on Systems, Man, and Cybernetics

- Part A: Systems and Humans, 34(3):406–420, 2004.

[165] Yansheng Yang, Tieshan Li, and Xiaofeng Wang. Robust Adaptive Neu-

ral Network Control for Strict-Feedback Nonlinear Systems Via Small-

Gain Approaches. In David Hutchison, Takeo Kanade, Josef Kittler, Bao-

Liang Lu, and Yin Hujun, editors, Advances in Neural Networks - ISNN

2006, Lecture Notes in Computer Science. Springer Berlin Heidelberg,

Berlin/Heidelberg, 2006.

[166] Tie-Shan Li, Dan Wang, Gang Feng, and Shao-Cheng Tong. A DSC

approach to robust adaptive NN tracking control for strict-feedback non-

linear systems. IEEE transactions on systems, man, and cybernetics. Part

B, Cybernetics : a publication of the IEEE Systems, Man, and Cybernetics

Society, 40(3):915–927, 2010.

[167] Bing Chen, Xiaoping Liu, Kefu Liu, and Chong Lin. Direct adaptive fuzzy

control of nonlinear strict-feedback systems. Automatica, 45(6):1530–1535,

2009.

[168] Guanyu Lai, Zhi Liu, Yun Zhang, and C. L. Philip Chen. Adaptive Po-

sition/Attitude Tracking Control of Aerial Robot With Unknown Inertial

Matrix Based on a New Robust Neural Identiﬁer. IEEE Transactions on

Neural Networks and Learning Systems, 27(1):18–31, 2016.

141

[169] Z.-P. Jiang, A. R. Teel, and L. Praly. Small-gain theorem for ISS systems

and applications. Mathematics of Control, Signals, and Systems, 7(2):95–

120, 1994.

[170] Yan-Jun Liu, Guo-Xing Wen, and Shao-Cheng Tong. Direct adaptive

NN control for a class of discrete-time nonlinear strict-feedback systems.

Neurocomputing, 73(13-15):2498–2505, 2010.

[171] Roger D. Nussbaum. Some remarks on a conjecture in parameter adaptive

control. Systems & Control Letters, 3(5):243–246, 1983.

[172] Zhiyong Chen. Nussbaum functions in adaptive control with time-varying

unknown control coeﬃcients. Automatica, 102:72–79, 2019.

[173] Shuzhi Sam Ge and J. Wang. Robust adaptive tracking for time-varying

uncertain nonlinear systems with unknown control coeﬃcients. IEEE

Transactions on Automatic Control, 48(8):1463–1469, 2003.

[174] Bin Xu. Robust adaptive neural control of ﬂexible hypersonic ﬂight vehicle

with dead-zone input nonlinearity. Nonlinear Dynamics, 80(3):1509–1520,

2015.

[175] P. Castaldi, N. Mimmo, R. Naldi, and L. Marconi. Robust Trajectory

Tracking for Underactuated VTOL Aerial Vehicles: Extended for Adap-

tive Disturbance Compensation. In Proceedings of the19th IFAC world

congress, Cape Town, South Africa,, 2014. IFAC.

[176] Mihai Lungu. Auto-landing of UAVs with variable centre of mass using

the backstepping and dynamic inversion control. Aerospace Science and

Technology, 103(2):105912, 2020.

[177] Rong Li, Mou Chen, and Qingxian Wu. Adaptive neural tracking control

for uncertain nonlinear systems with input and output constraints using

disturbance observer. Neurocomputing, 235:27–37, 2017.

142

[178] Ziquan Yu, Youmin Zhang, Bin Jiang, Xiang Yu, Jun Fu, Ying Jin, and

Tianyou Chai. Distributed adaptive fault-tolerant close formation ﬂight

control of multiple trailing ﬁxed-wing UAVs. ISA Transactions, 106:181–

199, 2020.

[179] M. M. Polycarpou. Stable adaptive neural control scheme for nonlinear

systems. IEEE Transactions on Automatic Control, 41(3):447–451, 1996.

[180] Yao Zou and Zewei Zheng. A Robust Adaptive RBFNN Augmenting

Backstepping Control Approach for a Model-Scaled Helicopter. IEEE

Transactions on Control Systems Technology, 23(6):2344–2352, 2015.

[181] Zeng Lian Liu and J. Svoboda. A new control scheme for nonlinear systems

with disturbances. IEEE Transactions on Control Systems Technology,

14(1):176–181, 2006.

[182] S. Suresh, S. N. Omkar, V. Mani, and N. Sundarara jan. Nonlinear Adap-

tive Neural Controller for Unstable Aircraft. Journal of Guidance, Con-

trol, and Dynamics, 28(6):1103–1111, 2005.

[183] Eric N. Johnson and Anthony Calise. Neural Network Adaptive Control

of Systems with Input Saturation. In Proceedings of the 2001 American

Control Conference, pages 3527–3532, Crystal Gateway Marriot, Arling-

ton, VA, USA, 2000. American Automatic Control Council.

[184] A. A. Pashilkar, N. Sundararajan, and P. Saratchandran. Adaptive back-

stepping neural controller for reconﬁgurable ﬂight control systems. IEEE

Transactions on Control Systems Technology, 14(3):553–561, 2006.

[185] Wenxing Fu, Yuji Wang, Supeng Zhu, and Yingzhou Xia. Neural adaptive

control of hypersonic aircraft with actuator fault using randomly assigned

nodes. Neurocomputing, 174:1070–1076, 2016.

[186] Zian Cheng, Fuyang Chen, and Jingxiu Gong. Self-repairing control of air-

breathing hypersonic vehicle with actuator fault and backlash. Aerospace

Science and Technology, 97(10):105608, 2020.

143

[187] Heng Liu, Hongxing Wang, Jinde Cao, Ahmed Alsaedi, and Tasawar

Hayat. Composite learning adaptive sliding mode control of fractional-

order nonlinear systems with actuator faults. Journal of the Franklin

Institute, 356(16):9580–9599, 2019.

[188] Xidong Tang, Gang Tao, and Suresh M. Joshi. Adaptive actuator fail-

ure compensation for parametric strict feedback systems and an aircraft

application. Automatica, 39(11):1975–1982, 2003.

[189] Zhiyu Peng, Ruiyun Qi, and Bin Jiang. Adaptive fault tolerant control

for hypersonic ﬂight vehicle system with state constraints. Journal of the

Franklin Institute, 357(14):9351–9377, 2020.

[190] Yuan Yuan, Zheng Wang, Lei Guo, and Huaping Liu. Barrier Lyapunov

Functions-Based Adaptive Fault Tolerant Control for Flexible Hypersonic

Flight Vehicles With Full State Constraints. IEEE Transactions on Sys-

tems, Man, and Cybernetics: Systems, 50(9):3391–3400, 2020.

[191] Wangkui Liu, Yiyin Wei, Mingzhe Hou, and Guangren Duan. Integrated

guidance and control with partial state constraints and actuator faults.

Journal of the Franklin Institute, 356(9):4785–4810, 2019.

[192] Marcello R. Napolitano, Younghwan An, and Brad A. Seanor. A fault

tolerant ﬂight control system for sensor and actuator failures using neural

networks. Aircraft Design, 3(2):103–128, 2000.

[193] H. A. Talebi, K. Khorasani, and S. Tafazoli. A recurrent neural-network-

based sensor and actuator fault detection and isolation for nonlinear sys-

tems with application to the satellite’s attitude control subsystem. IEEE

Transactions on Neural Networks, 20(1):45–60, 2009.

[194] C. de Persis and A. Isidori. A geometric approach to nonlinear fault detec-

tion and isolation. IEEE Transactions on Automatic Control, 45(6):853–

865, 2001.

144

[195] Seyyed Ali Emami and Afshin Banazadeh. Fault–tolerant predictive tra-

jectory tracking of an air vehicle based on acceleration control. IET Con-

trol Theory & Applications, 14(5):750–762, 2020.

[196] Alireza Abbaspour, Payam Aboutalebi, Kang K. Yen, and Arman Sar-

golzaei. Neural adaptive observer-based sensor and actuator fault de-

tection in nonlinear systems: Application in UAV. ISA Transactions,

67:317–329, 2017.

[197] Alireza Abbaspour, Kang K. Yen, Parisa Forouzannezhad, and Arman

Sargolzaei. A Neural Adaptive Approach for Active Fault-Tolerant Con-

trol Design in UAV. IEEE Transactions on Systems, Man, and Cybernet-

ics: Systems, pages 1–11, 2018.

[198] Aydogan Savran, Ramazan Tasaltin, and Yasar Becerikli. Intelligent adap-

tive nonlinear ﬂight control for a high performance aircraft with neural

networks. ISA Transactions, 45(2):225–247, 2006.

[199] S. A. Emami and A. Banazadeh. Online Identiﬁcation of Aircraft Dynam-

ics in the Presence of Actuator Faults. Journal of Intelligent and Robotic

Systems, 96(3-4):541–553, 2019.

[200] Seyyed Ali Emami and Afshin Banazadeh. Intelligent trajectory tracking

of an aircraft in the presence of internal and external disturbances. In-

ternational Journal of Robust and Nonlinear Control, 29(16):5820–5844,

2019.

[201] David Mayne. An apologia for stabilising terminal conditions in model

predictive control. International Journal of Control, 86(11):2090–2095,

2013.

[202] Shuyi Shao, Mou Chen, and Youmin Zhang. Adaptive Discrete-Time

Flight Control Using Disturbance Observer and Neural Networks. IEEE

Transactions on Neural Networks and Learning Systems, 30(12):3708–

3721, 2019.

145

[203] Eric Johnson, Anthony Calise, Hesham El-Shirbiny, and Rolf Eysdyk.

Feedback linearization with Neural Network augmentation applied to X-33

attitude control. In AIAA Guidance, Navigation, and Control Conference

and Exhibit, Denver, CO, 2000.

[204] Eric N. Johnson and Anthony J. Calise. Limited Authority Adaptive

Flight Control for Reusable Launch Vehicles. Journal of Guidance, Con-

trol, and Dynamics, 26(6):906–913, 2003.

[205] Eric N. Johnson and Suresh K. Kannan. Adaptive Flight Control for an

Autonomous Unmanned Helicopter. In AIAA Guidance, Navigation, and

Control Conference and Exhibit, Monterey, California, 2002.

[206] Eric N. Johnson and Suresh K. Kannan. Adaptive Tra jectory Control for

Autonomous Helicopters. Journal of Guidance, Control, and Dynamics,

28(3):524–538, 2005.

[207] Alireza Abaspour, Seyed Hossein Sadati, and Mohammad Sadeghi. Non-

linear optimized adaptive tra jectory control of helicopter. Control Theory

and Technology, 13(4):297–310, 2015.

[208] Eric N. Johnson and Michael A. Turbe. Modeling, Control, and Flight

Testing of a Small-Ducted Fan Aircraft. Journal of Guidance, Control,

and Dynamics, 29(4):769–779, 2006.

[209] Mihai Lungu and Romulus Lungu. Landing Auto-Pilots for Aircraft Mo-

tion in Longitudinal Plane using Adaptive Control Laws Based on Neural

Networks and Dynamic Inversion. Asian Journal of Control, 2016.

[210] J. Farrell, M. Polycarpou, and M. Sharma. Adaptive backstepping with

magnitude, rate, and bandwidth constraints: Aircraft longitude control.

In Proceedings of the 2003 American Control Conference, pages 3898–

3904, Colorado, USA, 2003. IEEE.

146

[211] ShiXing Wang, Yu Zhang, YuQiang Jin, and YongQuan Zhang. Neural

control of hypersonic ﬂight dynamics with actuator fault and constraint.

Science China Information Sciences, 58(7):1–10, 2015.

[212] Mou Chen, Shuzhi Sam Ge, and Beibei Ren. Adaptive tracking control of

uncertain MIMO nonlinear systems with input constraints. Automatica,

47(3):452–465, 2011.

[213] Thomas Besselmann, Johan Lofberg, and Manfred Morari. Explicit MPC

for LPV Systems: Stability and Optimality. IEEE Transactions on Auto-

matic Control, 57(9):2322–2332, 2012.

[214] De-Feng He, Hua Huang, and Qiu-Xia Chen. Quasi-min–max MPC for

constrained nonlinear systems with guaranteed input-to-state stability.

Journal of the Franklin Institute, 351(6):3405–3423, 2014.

[215] Maciej Lawry´nczuk. Computationally eﬃcient model predictive control

algorithms, volume 3. Springer International Publishing, Cham, 2014.

[216] Vincent A. Akpan and George D. Hassapis. Nonlinear model identiﬁca-

tion and adaptive model predictive control using neural networks. ISA

Transactions, 50(2):177–194, 2011.

[217] Gonzalo Andres Garcia, Shawn Shahriar Keshmiri, and Thomas Stastny.

Robust and Adaptive Nonlinear Model Predictive Controller for Unsteady

and Highly Nonlinear Unmanned Aircraft. IEEE Transactions on Control

Systems Technology, 23(4):1620–1627, 2015.

[218] Zheng Yan and Jun Wang. Model Predictive Control of Nonlinear Systems

With Unmodeled Dynamics Based on Feedforward and Recurrent Neural

Networks. IEEE Transactions on Industrial Informatics, 8(4):746–756,

2012.

[219] Changyun Wen, Jing Zhou, Zhitao Liu, and Hongye Su. Robust Adaptive

Control of Uncertain Nonlinear Systems in the Presence of Input Satura-

147

tion and External Disturbance. IEEE Transactions on Automatic Control,

56(7):1672–1678, 2011.

[220] Khoi B. Ngo, Robert Mahony, and Jiang Zhong-Ping. Integrator Back-

stepping using Barrier Functions for Systems with Multiple State Con-

straints. In 44th IEEE Conference on Decision and Control, and Eu-

ropean Control Conference, pages 8306–8312, Seville, Spain, 2005. IEEE

Operations Center.

[221] Keng Peng Tee, Shuzhi Sam Ge, and Eng Hock Tay. Barrier Lyapunov

Functions for the control of output-constrained nonlinear systems. Auto-

matica, 45(4):918–927, 2009.

[222] Achim Ilchmann, Eugene P. Ryan, and Philip Townsend. Tracking with

Prescribed Transient Behavior for Nonlinear Systems of Known Relative

Degree. SIAM Journal on Control and Optimization, 46(1):210–230, 2007.

[223] Md Meftahul Ferdaus, Mahardhika Pratama, Sreenatha G. Anavatti, and

Matthew A. Garratt. PALM: An Incremental Construction of Hyper-

planes for Data Stream Regression. IEEE Transactions on Fuzzy Systems,

27(11):2115–2129, 2019.

[224] Y. Lu, N. Sundarara jan, and P. Saratchandran. Performance evaluation of

a sequential minimal radial basis function (RBF) neural network learning

algorithm. IEEE Transactions on Neural Networks, 9(2):308–318, 1998.

[225] P. Saratchandran, N. Sundararajan, and Y. Li. Analysis of minimal radial

basis function network algorithm for real-time identiﬁcation of nonlinear

dynamic systems. IEE Proceedings - Control Theory and Applications,

147(4):476–484, 2000.

[226] Shaik Ismail, Abhay A. Pashilkar, Ramakalyan Ayyagari, and Narasimhan

Sundararajan. Improved neural-aided sliding mode controller for au-

tolanding under actuator failures and severe winds. Aerospace Science

and Technology, 33(1):55–64, 2014.

148

[227] Guang-Bin Huang, P. Saratchandran, and Narasimhan Sundararajan.

An eﬃcient sequential learning algorithm for growing and pruning RBF

(GAP-RBF) networks. IEEE transactions on systems, man, and cyber-

netics. Part B, Cybernetics : a publication of the IEEE Systems, Man,

and Cybernetics Society, 34(6):2284–2292, 2004.

[228] Mahardhika Pratama, Sreenatha G. Anavatti, and Edwin Lughofer.

GENEFIS: Toward an Eﬀective Localist Network. IEEE Transactions

on Fuzzy Systems, 22(3):547–562, 2014.

[229] Shiqian Wu, Meng Joo Er, and Yang Gao. A fast approach for automatic

generation of fuzzy rules by generalized dynamic fuzzy neural networks.

IEEE Transactions on Fuzzy Systems, 9(4):578–594, 2001.

[230] Seyyed Ali Emami and Kasra K. A. Ahmadi. A self-organizing multi-

model ensemble for identiﬁcation of nonlinear time-varying dynamics of

aerial vehicles. Proceedings of the Institution of Mechanical Engineers,

Part I: Journal of Systems and Control Engineering, 235(7):1164–1178,

2021.

[231] Abhijit Das, Frank Lewis, and Kamesh Subbarao. Backstepping Approach

for Controlling a Quadrotor Using Lagrange Form Dynamics. Journal of

Intelligent and Robotic Systems, 56(1-2):127–151, 2009.

[232] Yu Kang, Shaofeng Chen, Xuefeng Wang, and Yang Cao. Deep Con-

volutional Identiﬁer for Dynamic Modeling and Adaptive Control of Un-

manned Helicopter. IEEE Transactions on Neural Networks and Learning

Systems, 30(2):524–538, 2019.

[233] Chia-Wei Kuo, Ching-Chih Tsai, and Chi-Tai Lee. Intelligent Leader-

Following Consensus Formation Control Using Recurrent Neural Networks

for Small-Size Unmanned Helicopters. IEEE Transactions on Systems,

Man, and Cybernetics: Systems, 51(2):1288–1301, 2021.

149

[234] Pieter Abbeel, Adam Coates, and Andrew Y. Ng. Autonomous Helicopter

Aerobatics through Apprenticeship Learning. The International Journal

of Robotics Research, 29(13):1608–1639, 2010.

[235] Qianying Li. Masters Thesis: Grey-Box System Identiﬁcation of a

Quadrotor Unmanned Aerial Vehicle. Master Thesis, Faculty of Mechan-

ical, Maritime and Materials Engineering, 2014.

[236] Shuai Tang, Zhiqiang Zheng, Shaoke Qian, and Xinye Zhao. Nonlinear

system identiﬁcation of a small-scale unmanned helicopter. Control Engi-

neering Practice, 25:1–15, 2014.

[237] Cl´ement Hamel, Ruxandra Botez, and Margaux Ruby. Cessna Citation

X Airplane Grey-Box Model Identiﬁcation without Preliminary Data. In

SAE 2014 Aerospace Systems and Technology Conference, SAE Technical

Paper Series. SAE International400 Commonwealth Drive, Warrendale,

PA, United States, 2014.

[238] F. F. Leung, H. K. Lam, S. H. Ling, and P. S. Tam. Tuning of the structure

and parameters of a neural network using an improved genetic algorithm.

IEEE Transactions on Neural Networks, 14(1):79–88, 2003.

[239] Francisco da Costa Lopes, Edson H. Watanabe, and Luis Guilherme B.

Rolim. A control-oriented model of a PEM fuel cell stack based on NARX

and NOE neural networks. IEEE Transactions on Industrial Electronics,

62(8):5155–5163, 2015.

[240] G´erard Dreyfus. Neural networks: Methodology and applications, vol-

ume 24. Springer, Berlin, 2005.

[241] Qinghua Zhang. Nonlinear system identiﬁcation with output error model

through stabilized simulation. IFAC Proceedings Volumes, 37(13):501–

506, 2004.

150

[242] Heidar A. Talebi, Farzaneh Abdollahi, Rajni V. Patel, and Khashayar

Khorasani. Neural Network-Based State Estimation of Nonlinear Systems,

volume 395. Springer New York, New York, NY, 2010.

[243] Krzysztof Patan and J´ozef Korbicz. Nonlinear model predictive control

of a boiler unit: A fault tolerant control study. International Journal of

Applied Mathematics and Computer Science, 22(1):225–237, 2012.

[244] Krzysztof Patan. Neural network-based model predictive control: fault

tolerance and stability. IEEE Transactions on Control Systems Technol-

ogy, 23(3):1147–1155, 2015.

[245] Nan-Ying Liang, Guang-Bin Huang, P. Saratchandran, and N. Sundarara-

jan. A fast and accurate online sequential learning algorithm for feedfor-

ward networks. IEEE Transactions on Neural Networks, 17(6):1411–1423,

2006.

[246] Zheng Yan and Jun Wang. Robust model predictive control of nonlinear

systems with unmodeled dynamics and bounded uncertainties based on

neural networks. IEEE Transactions on Neural Networks and Learning

Systems, 25(3):457–469, 2014.

[247] Chao Jia, Xiaoli Li, Kang Wang, and Dawei Ding. Adaptive control of

nonlinear system using online error minimum neural networks. ISA Trans-

actions, 2016.

[248] Ning Wang, Jing-Chao Sun, Meng Joo Er, and Yan-Cheng Liu. A Novel

Extreme Learning Control Framework of Unmanned Surface Vehicles.

IEEE Transactions on Cybernetics, 46(5):1106–1117, 2016.

[249] Jianwei Zhao, Zhihui Wang, and Dong Sun Park. Online sequential

extreme learning machine with forgetting mechanism. Neurocomputing,

87:79–89, 2012.

151

[250] Symone G. Soares and Rui Ara´ujo. An adaptive ensemble of on-line ex-

treme learning machines with variable forgetting factor for dynamic sys-

tem prediction. Neurocomputing, 171:693–707, 2016.

[251] Nitish Srivastava, Geoﬀrey Hinton, Alex Krizhevsky, Ilya Sutskever,

and Ruslan Salakhutdinov. Dropout: A Simple Way to Prevent Neu-

ral Networks from Overﬁtting. Journal of Machine Learning Research,

15(1):1929–1958, 2014.

[252] Somil Bansal, K. Anayo Akametalu, Frank J. Jiang, Forrest Laine, and

Claire J. Tomlin. Learning Quadrotor Dynamics Using Neural Network for

Flight Control. In 2016 IEEE 55th Conference on Decision and Control

(CDC). IEEE, 2016.

[253] Mark B. Tischler and Robert K. Remple. Aircraft & Rotorcraft system

identiﬁcation: Engineering Methods with Flight-Test Examples. AIAA,

Blacksburg, Virginia, 2006.

[254] Wen Yu and Mario Pacheco. Impact of random weights on nonlinear

system identiﬁcation using convolutional neural networks. Information

Sciences, 477:1–14, 2019.

[255] Alessandro Giusti, Jerome Guzzi, Dan C. Ciresan, Fang-Lin He, Juan P.

Rodriguez, Flavio Fontana, Matthias Faessler, Christian Forster, Jurgen

Schmidhuber, Gianni Di Caro, Davide Scaramuzza, and Luca M. Gam-

bardella. A Machine Learning Approach to Visual Perception of For-

est Trails for Mobile Robots. IEEE Robotics and Automation Letters,

1(2):661–667, 2016.

[256] Dong Ki Kim and Tsuhan Chen. Deep Neural Network for Real-Time

Autonomous Indoor Navigation.

[257] Adrian Carrio, Carlos Sampedro, Alejandro Rodriguez-Ramos, and Pas-

cual Campoy. A Review of Deep Learning Methods and Applications for

Unmanned Aerial Vehicles. Journal of Sensors, 2017(2):1–13, 2017.

152

[258] Arthur E. Bryson and Yu-Chi Ho. Applied optimal control. Taylor &

Francis, 1975.

[259] Lemei M. Zhu, Hamidreza Modares, Gan Oon Peen, Frank L. Lewis, and

Baozeng Yue. Adaptive Suboptimal Output-Feedback Control for Linear

Systems Using Integral Reinforcement Learning. IEEE Transactions on

Control Systems Technology, 23(1):264–273, 2015.

[260] Dante Kalise, Sudeep Kundu, and Karl Kunisch. Robust Feedback Con-

trol of Nonlinear PDEs by Numerical Approximation of High-Dimensional

Hamilton–Jacobi–Isaacs Equations. SIAM Journal on Applied Dynamical

Systems, 19(2):1496–1524, 2020.

[261] Murad Abu-Khalaf, Jie Huang, and Frank L. Lewis. Nonlinear

H2/H inﬁnity Constrained Feedback Control: A practical design ap-

proach using neural networks. Advances in industrial control, 1430-9491.

Springer, London, 2006.

[262] Travis Dierks and Sarangapani Jagannathan. Optimal Control of Aﬃne

Nonlinear Continuous-time Systems Using an Online Hamilton-Jacobi-

Isaacs Formulation. In 49th IEEE Conference on Decision and Control

(CDC), pages 3048–3053, Atlanta, GA, USA, 2010. IEEE.

[263] A. J. van der Schaft. L/sub 2/-gain analysis of nonlinear systems and

nonlinear state-feedback H/sub inﬁnity / control. IEEE Transactions on

Automatic Control, 37(6):770–784, 1992.

[264] Randal W. Bea. Successive Galerkin approximation algorithms for non-

linear optimal and robust control. International Journal of Control,

71(5):717–743, 1998.

[265] Asma Al-Tamimi, Frank L. Lewis, and Murad Abu-Khalaf. Discrete-time

nonlinear HJB solution using approximate dynamic programming: Con-

vergence proof. IEEE transactions on systems, man, and cybernetics. Part

153

B, Cybernetics : a publication of the IEEE Systems, Man, and Cybernetics

Society, 38(4):943–949, 2008.

[266] Jennie Si. Handbook of learning and approximate dynamic programming,

volume 2 of IEEE press series on computational intelligence. IEEE Press,

Hoboken, New Jersey, 2004.

[267] Derong Liu, Ding Wang, Fei-Yue Wang, Hongliang Li, and Xiong Yang.

Neural-network-based online HJB solution for optimal robust guaranteed

cost control of continuous-time uncertain nonlinear systems. IEEE Trans-

actions on Cybernetics, 44(12):2834–2847, 2014.

[268] Randal W. Beard, George N. Saridis, and John T. Wen. Galerkin approx-

imations of the generalized Hamilton-Jacobi-Bellman equation. Automat-

ica, 33(12):2159–2177, 1997.

[269] Richard S. Sutton and Andrew G. Barto. Reinforcement learning: An

introduction. Adaptive computation and machine learning. MIT Press,

Cambridge, Mass. and London, 1998.

[270] Kyriakos G. Vamvoudakis and Frank L. Lewis. Online actor–critic algo-

rithm to solve the continuous-time inﬁnite horizon optimal control prob-

lem. Automatica, 46(5):878–888, 2010.

[271] Paul J. Werbos. Approximate dynamic programming for real-time control

and neural modeling. In David A. White and Donald A. Sorge, editors,

Handbook of intelligent control. Van Nostrand Reinhold, New York, 1992.

[272] Ding Wang, Haibo He, and Derong Liu. Improving the Critic Learning

for Event-Based Nonlinear H inﬁnity Control Design. IEEE Transactions

on Cybernetics, 47(10):3417–3428, 2017.

[273] Kyriakos G. Vamvoudakis. Event-triggered optimal adaptive control al-

gorithm for continuous-time nonlinear systems. IEEE/CAA Journal of

Automatica sinica, 1(3):282–293, 2014.

154

[274] Kyriakos G. Vamvoudakis, Arman Mojoodi, and Henrique Ferraz. Event-

triggered optimal tracking control of nonlinear systems. International

Journal of Robust and Nonlinear Control, 27(4):598–619, 2017.

[275] Murad Abu-Khalaf and Frank L. Lewis. Nearly optimal control laws for

nonlinear systems with saturating actuators using a neural network HJB

approach. Automatica, 41(5):779–791, 2005.

[276] Draguna Vrabie and Frank Lewis. Neural network approach to continuous-

time direct adaptive optimal control for partially unknown nonlinear sys-

tems. Neural networks : the oﬃcial journal of the International Neural

Network Society, 22(3):237–246, 2009.

[277] Biao Luo, Huai-Ning Wu, and Tingwen Huang. Oﬀ-policy reinforce-

ment learning for H∞control design. IEEE Transactions on Cybernetics,

45(1):65–76, 2015.

[278] Shan Xue, Biao Luo, and Derong Liu. Event-Triggered Adaptive Dynamic

Programming for Unmatched Uncertain Nonlinear Continuous-Time Sys-

tems. IEEE Transactions on Neural Networks and Learning Systems,

32(7):2939–2951, 2021.

[279] David Nodland, Hassan Zargarzadeh, and Sarangapani Jagannathan.

Neural network-based optimal adaptive output feedback control of a heli-

copter UAV. IEEE Transactions on Neural Networks and Learning Sys-

tems, 24(7):1061–1073, 2013.

[280] Draguna Vrabie and Frank Lewis. Neural network approach to continuous-

time direct adaptive optimal control for partially unknown nonlinear sys-

tems. Neural networks : the oﬃcial journal of the International Neural

Network Society, 22(3):237–246, 2009.

[281] Frank L. Lewis and Draguna Vrabie. Reinforcement learning and adaptive

dynamic programming for feedback control. IEEE Circuits and Systems

Magazine, 9(3):32–50, 2009.

155

[282] Dongchen Han and S. N. Balakrishnan. State-constrained agile missile

control with adaptive-critic-based neural networks. IEEE Transactions

on Control Systems Technology, 10(4):481–489, 2002.

[283] Silvia Ferrari and Robert F. Stengel. Online Adaptive Critic Flight Con-

trol. Journal of Guidance, Control, and Dynamics, 27(5):777–786, 2004.

[284] Said G. Khan, Guido Herrmann, Frank L. Lewis, Tony Pipe, and Chris

Melhuish. Reinforcement learning and optimal adaptive control: An

overview and implementation examples. Annual Reviews in Control,

36(1):42–59, 2012.

[285] Jens Kober, J. Andrew Bagnell, and Jan Peters. Reinforcement learning

in robotics: A survey. The International Journal of Robotics Research,

32(11):1238–1274, 2013.

[286] Benjamin Recht. A Tour of Reinforcement Learning: The View from Con-

tinuous Control. Annual Review of Control, Robotics, and Autonomous

Systems, 2(1):253–279, 2019.

[287] Biao Luo, Huai-Ning Wu, and Tingwen Huang. Optimal Output Reg-

ulation for Model-Free Quanser Helicopter With Multistep Q-Learning.

IEEE Transactions on Industrial Electronics, 65(6):4953–4961, 2018.

[288] Gautam Reddy, Antonio Celani, Terrence J. Sejnowski, and Massimo

Vergassola. Learning to soar in turbulent environments. Proceedings

of the National Academy of Sciences of the United States of America,

113(33):E4877–84, 2016.

[289] Gautam Reddy, Jerome Wong-Ng, Antonio Celani, Terrence J. Sejnowski,

and Massimo Vergassola. Glider soaring via reinforcement learning in the

ﬁeld. Nature, 562(7726):236–239, 2018.

[290] Haobin Shi, Xuesi Li, Kao-Shing Hwang, Wei Pan, and Genjiu Xu. De-

coupled Visual Servoing With Fuzzy Q -Learning. IEEE Transactions on

Industrial Informatics, 14(1):241–252, 2018.

156

[291] Chunyu Nie, Zewei Zheng, and Ming Zhu. Three-Dimensional Path-

Following Control of a Robotic Airship with Reinforcement Learning. In-

ternational Journal of Aerospace Engineering, 2019:1–12, 2019.

[292] Beakcheol Jang, Myeonghwi Kim, Gaspard Harerimana, and Jong Wook

Kim. Q-Learning Algorithms: A Comprehensive Classiﬁcation and Ap-

plications. IEEE Access, 7:133653–133667, 2019.

[293] Jungdam Won, Jongho Park, Kwanyu Kim, and Jehee Lee. How to train

your dragon. ACM Transactions on Graphics, 36(6):1–13, 2017.

[294] Ivana Palunko, Aleksandra Faust, Patricio Cruz, Lydia Tapia, and Rafael

Fierro. A Reinforcement Learning Approach Towards Autonomous Sus-

pended Load Manipulation Using Aerial Robots. In IEEE International

Conference on Robotics and Automation, Karlsruhe, Germany, 2013.

IEEE.

[295] Andrew Y. Ng, Adam Coates, Mark Diel, Varun Ganapathi, Jamie

Schulte, Ben Tse, Eric Berger, and Eric Liang. Autonomous Inverted

Helicopter Flight via Reinforcement Learning. In M. H. Ang, Marcelo H.

Ang, and Oussama Khatib, editors, Experimental Robotics IX, volume 21

of Springer Tracts in Advanced Robotics, pages 363–372. Springer, Secau-

cus, 2006.

[296] Fereshteh Sadeghi and Sergey Levine. CAD2RL: Real Single-Image Flight

without a Single Real Image. In Robotics: Science and Systems Conference

(RSS), Cambridge MA, USA, 2017.

[297] Bo Zhang, Chi Harold Liu, Jian Tang, Zhiyuan Xu, Jian Ma, and Wen-

dong Wang. Learning-Based Energy-Eﬃcient Data Collection by Un-

manned Vehicles in Smart Cities. IEEE Transactions on Industrial Infor-

matics, 14(4):1666–1676, 2018.

[298] John Schulman, Sergey Levine, Philipp Moritz, Michael Jordan, and

157

Pieter Abbeel. Trust Region Policy Optimization. In 31 st International

Conference on Machine Learning, Lille, France, 2015.

[299] Sham Kakade and John Langford. Approximately optimal approximate re-

inforcement learning. In 19th International Conference on Machine Learn-

ing, pages 267–274, 2002.

[300] Chen-Huan Pi, Kai-Chun Hu, Stone Cheng, and I-Chen Wu. Low-level au-

tonomous control and tracking of quadrotor using reinforcement learning.

Control Engineering Practice, 95(4):104222, 2020.

[301] Thomas Degris, Martha White, and Richard S. Sutton. Oﬀ-Policy Actor-

Critic. In International Conference on Machine Learning, Scotland, UK,

2012.

[302] John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg

Klimov. Proximal Policy Optimization Algorithms.

[303] David Silver, Guy Lever, Nicolas Heess, Thomas Degris, Daan Wierstra,

and Martin Riedmiller. Deterministic Policy Gradient Algorithms. In

31st International Conference on Machine Learning, volume 32, Beijing,

China, 2014.

[304] Russell Enns and Jennie Si. Apache Helicopter Stabilization Using Neural

Dynamic Programming. Journal of Guidance, Control, and Dynamics,

25(1):19–25, 2002.

[305] R. Enns and Jennie Si. Helicopter trimming and tracking control using

direct neural dynamic programming. IEEE Transactions on Neural Net-

works, 14(4):929–939, 2003.

[306] Shun-ichi Amari. Natural Gradient Works Eﬃciently in Learning. Neural

computation, 10(2):251–276, 1998.

[307] Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess,

Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. Continuous

158

control with deep reinforcement learning. In International Conference on

Learning Representations, San Juan, Puerto Rico, 2016.

[308] Yuanda Wang, Jia Sun, Haibo He, and Changyin Sun. Deterministic Pol-

icy Gradient With Integral Compensator for Robust Quadrotor Control.

IEEE Transactions on Systems, Man, and Cybernetics: Systems, pages

1–13, 2019.

[309] Alejandro Rodriguez-Ramos, Carlos Sampedro, Hriday Bavle, Paloma

de La Puente, and Pascual Campoy. A Deep Reinforcement Learning

Strategy for UAV Autonomous Landing on a Moving Platform. Journal

of Intelligent & Robotic Systems, 93(1-2):351–366, 2019.

[310] Carlos Sampedro, Hriday Bavle, Alejandro Rodriguez-Ramos, Paloma

de La Puente, and Pascual Campoy. Laser-Based Reactive Navigation for

Multirotor Aerial Robots using Deep Reinforcement Learning. In Inter-

national Conference on Intelligent Robots and Systems (IROS), Madrid,

Spain, 2018. IEEE.

[311] Andrew Y. Ng, Daishi Harada, and Stuart Russell. Policy invariance under

reward transformations: Theory and application to reward shaping. In

Ivan Bratko and Saso. Ed Dzeroski, editors, Proceedings of the Sixteenth

International Conference on Machine Learning, Bled, Slovenia, 1999.

[312] Chao Wang, Jian Wang, Yuan Shen, and Xudong Zhang. Autonomous

Navigation of UAVs in Large-Scale Complex Environments: A Deep Re-

inforcement Learning Approach. IEEE Transactions on Vehicular Tech-

nology, 68(3):2124–2136, 2019.

[313] Abhik Singla, Sindhu Padakandla, and Shalabh Bhatnagar. Memory-

Based Deep Reinforcement Learning for Obstacle Avoidance in UAV

With Limited Environment Knowledge. IEEE Transactions on Intelli-

gent Transportation Systems, 22(1):107–118, 2021.

159

[314] Nathan O. Lambert, Daniel S. Drew, Joseph Yaconelli, Sergey Levine,

Roberto Calandra, and Kristofer S. J. Pister. Low-Level Control of

a Quadrotor With Deep Model-Based Reinforcement Learning. IEEE

Robotics and Automation Letters, 4(4):4224–4230, 2019.

[315] Sylvain Koos, J-B Mouret, and S. Doncieux. The Transferability Ap-

proach: Crossing the Reality Gap in Evolutionary Robotics. IEEE Trans-

actions on Evolutionary Computation, 17(1):122–145, 2013.

[316] Kirk Y.W. Scheper and Guido C.H.E. de Croon. Evolution of robust

high speed optical-ﬂow-based landing for autonomous MAVs. Robotics

and Autonomous Systems, 124(46):103380, 2020.

[317] J. Andrew Bagnell and Jeﬀ G. Schneider. Autonomous Helicopter Con-

trol Using Reinforcement Learning Policy Search Methods. In Interna-

tional Conference on Robotics and Automation (ICRA), Seoul, Korea,

2001. IEEE.

[318] Jack M. Wang, David J. Fleet, and Aaron Hertzmann. Optimizing walking

controllers for uncertain inputs and environments. ACM Transactions on

Graphics, 29(4):1–8, 2010.

[319] Horia Mania, Aurelia Guy, and Benjamin Recht. Simple random search

provides a competitive approach to reinforcement learning.

[320] S. L. Waslander, G. M. Hoﬀmann, Jung Soon Jang, and C. J. Tomlin.

Multi-agent quadrotor testbed control design: Integral sliding mode vs.

reinforcement learning. In 2005 IEEE/RSJ International Conference on

Intelligent Robots and Systems, pages 3712–3717, Edmonton, AB, Canada,

2005. IEEE.

[321] Sergey Levine and Vladlen Koltun. Guided Policy Search. In 30th Inter-

national Conference on Machine Learning, Atlanta, Georgia, USA, 2013.

[322] Tim Salimans, Jonathan Ho, Xi Chen, Szymon Sidor, and Ilya Sutskever.

Evolution Strategies as a Scalable Alternative to Reinforcement Learning.

160

[323] Tianhao Zhang, Gregory Kahn, Sergey Levine, and Pieter Abbeel. Learn-

ing Deep Control Policies for Autonomous Aerial Vehicles with MPC-

Guided Policy Search. In Allison Okamura and Arianna Menciassi, edi-

tors, 2016 IEEE International Conference on Robotics and Automation,

pages 528–535. IEEE, 2016.

[324] H. J. Kappen. Path integrals and symmetry breaking for optimal con-

trol theory. Journal of Statistical Mechanics: Theory and Experiment,

2005(11):P11011, 2005.

[325] Jung-Su Ha, Soon-Seo Park, and Han-Lim Choi. Topology-guided path

integral approach for stochastic optimal control in cluttered environment.

Robotics and Autonomous Systems, 113(28):81–93, 2019.

[326] Grady Williams, Nolan Wagener, Brian Goldfain, Paul Drews, James M.

Rehg, Byron Boots, and Evangelos A. Theodorou. Information Theoretic

MPC for Model-Based Reinforcement Learning. In IEEE International

Conference on Robotics and Automation (ICRA)), IEEE International

Conference on Robotics and Automation, Singapore, 2017. IEEE.

[327] Keuntaek Lee, Jason Gibson, and Evangelos A. Theodorou. Aggressive

Perception-Aware Navigation Using Deep Optical Flow Dynamics and

PixelMPC. IEEE Robotics and Automation Letters, 5(2):1207–1214, 2020.

[328] Chen Liang, Weihong Wang, Zhenghua Liu, Chao Lai, and Benchun Zhou.

Learning to Guide: Guidance Law Based on Deep Meta-Learning and

Model Predictive Path Integral Control. IEEE Access, 7:47353–47365,

2019.

[329] Atilim Gunes Baydin, Robert Cornish, David Martinez Rubio, Mark

Schmidt, and Frank Wood. Online Learning Rate Adaptation with Hy-

pergradient Descent. ArXiv e-prints, pages 1–10, 2017.

[330] Liang Tang, Michael Roemer, Jianhua Ge, Agamemnon Crassidis, J.V.R.

Prasad, and Christine Belcastro. Methodologies for Adaptive Flight En-

161

velope Estimation and Protection. In AIAA Guidance, Navigation, and

Control Conference, Chicago, Illinois, 2009. AIAA.

[331] N. Hopfe, A. Ilchmann, and E. P. Ryan. Funnel Control With Satura-

tion: Linear MIMO Systems. IEEE Transactions on Automatic Control,

55(2):532–538, 2010.

[332] Norman Hopfe, Achim Ilchmann, and Eugene P. Ryan. Funnel Control

With Saturation: Nonlinear SISO Systems. IEEE Transactions on Auto-

matic Control, 55(9):2177–2182, 2010.

[333] Pieter Abbeel and Andrew Y. Ng. Apprenticeship learning via inverse

reinforcement learning. In Proceedings of 21st International conference

on Machine learning, Banﬀ, Canada, 2004.

[334] Adam Coates, Pieter Abbeel, and Andrew Y. Ng. Apprenticeship learning

for helicopter control. Communications of the ACM, 52(7):97–105, 2009.

[335] Naira Hovakimyan, Chengyu Cao, Evgeny Kharisov, Enric Xargay, and

Irene M. Gregory. L 1 adaptive control for safety-critical systems. IEEE

Control Systems, 31(5):54–104, 2011.

[336] Naira Hovakimyan and Chengyu Cao. L1 adaptive control theory: Guar-

anteed robustness with fast adaptation, volume 21 of Advances in design

and control. Society for Industrial and Applied Mathematics, Philadel-

phia, 2010.

[337] Randall D. Beer. On the Dynamics of Small Continuous-Time Recurrent

Neural Networks. Adaptive Behavior, 3(4):469–509, 1995.

[338] Taylor Clawson, Silvia Ferrari, Sawyer Fuller, and Robert Wood. Spiking

Neural Network (SNN) Control of a Flapping Insect-Scale Robot. In 55th

Conference on Decision and Control (CDC), pages 3381–3388, Las Vegas,

USA, 2016. IEEE.

162

[339] Jesse J. Hagenaars, Federico Paredes-Valles, Sander M. Bohte, and Guido

C. H. E. de Croon. Evolved Neuromorphic Control for High Speed

Divergence-Based Landings of MAVs. IEEE Robotics and Automation

Letters, 5(4):6239–6246, 2020.

[340] Antonios K. Alexandridis and Achilleas D. Zapranis. Wavelet neural net-

works: A practical guide. Neural networks : the oﬃcial journal of the

International Neural Network Society, 42:1–27, 2013.

[341] Chih-Min Lin, Ching-Fu Tai, and Chang-Chih Chung. Intelligent control

system design for UAV using a recurrent wavelet neural network. Neural

Computing and Applications, 24(2):487–496, 2014.

[342] Chih-Min Lin and Enkh-Amgalan Boldbaatar. Autolanding Control Us-

ing Recurrent Wavelet Elman Neural Network. IEEE Transactions on

Systems, Man, and Cybernetics: Systems, 45(9):1281–1291, 2015.

163

The application of artificial intelligence in aerospace engineering

Article

Full-text available

Jan 2024

Wenyue Li

In recent years, there has been considerable interest in applying Artificial Intelligence (AI) in the field of aerospace engineering. However, the existing literature on this topic is not sufficiently comprehensive. This paper is purposed to solve this problem by providing a thorough analysis and overview of the current state of AI in aerospace engineering. The paper is divided into four sections. Firstly, the use of AI in autonomous navigation and flight control is explored, focusing on advanced algorithms and sensor technologies that enable highly autonomous and efficient aircraft navigation. Secondly, the application of AI in image recognition and computer vision is discussed, highlighting its significance in remote sensing and aerospace component quality inspection. The third section examines the integration of AI in unmanned aerial vehicles (UAV), covering the control system and the utilization of machine learning techniques for improved UAV capabilities. Lastly, the paper explores the impact of AI on data analysis and prediction in the aerospace industry, encompassing weather forecasting, resource allocation, and decision-making processes. Finally, this paper gives a general overview of the nowadays application of AI in aerospace engineering.

Neural network observer-based predefined-time attitude control for morphing hypersonic vehicles

Article

Jun 2024
AEROSP SCI TECHNOL

Knowledge atlas analysis of AI-Driven multidisciplinary development of hypersonic aircrafts AI驱动高速飞行器多学科发展知识图谱分析

Article

Full-text available

Jun 2024

A review of fault management issues in aircraft systems: Current status and future directions

Article

May 2024
PROG AEROSP SCI

A. Zolghadri

Nested Saturated Control of Uncertain Complex Cascade Systems Using Mixed Saturation Levels

Article

May 2024

This study addresses the problem of global asymptotic stability for uncertain complex cascade systems composed of multiple integrator systems and non-strict feedforward nonlinear systems. To tackle the complexity inherent in such structures, a novel nested saturated control design is proposed that incorporates both constant saturation levels and state-dependent saturation levels. Specifically, a modified differentiable saturation function is proposed to facilitate the saturation reduction analysis of the uncertain complex cascade systems under the presence of mixed saturation levels. In addition, the design of modified differentiable saturation function will help to construct a hierarchical global convergence strategy to improve the robustness of control design scheme. Through calculation of relevant inequalities, time derivative of boundary surface and simple Lyapunov function, saturation reduction analysis and convergence analysis are carried out, and then a set of explicit parameter conditions are provided to ensure global asymptotic stability in the closed-loop systems. Finally, a simplified system of the mechanical model is presented to validate the effectiveness of the proposed method.

Automatic design of interpretable control laws through parametrized Genetic Programming with adjoint state method gradient evaluation

Article

Apr 2024
APPL SOFT COMPUT

This work investigates the application of a Local Search (LS) enhanced Genetic Programming (GP) algorithm to the control scheme’s design task. The combination of LS and GP aims to produce an interpretable control law as similar as possible to the optimal control scheme reference. Inclusive Genetic Programming (IGP), a GP heuristic capable of promoting and maintaining the population diversity, is chosen as the GP algorithm since it proved successful on the considered task. IGP is enhanced with the Operators Gradient Descent (OPGD) approach, which consists of embedding learnable parameters into the GP individuals. These parameters are optimized during and after the evolutionary process. Moreover, the OPGD approach is combined with the adjoint state method to evaluate the gradient of the objective function. The original OPGD was formulated by relying on the backpropagation technique for the gradient’s evaluation, which is impractical in an optimization problem involving a dynamical system because of scalability and numerical errors. On the other hand, the adjoint method allows for overcoming this issue. Two experiments are formulated to test the proposed approach, named Operator Gradient Descent - Inclusive Genetic Programming (OPGD-IGP): the design of a Proportional-Derivative (PD) control law for a harmonic oscillator and the design of a Linear Quadratic Regulator (LQR) control law for an inverted pendulum on a cart. OPGD-IGP proved successful in both experiments, being capable of autonomously designing an interpretable control law similar to the optimal ones, both in terms of shape and control gains.

Evolutionary Reinforcement Learning: Hybrid Approach for Safety-Informed Fault-Tolerant Flight Control

Article

Feb 2024

Recent research in artificial intelligence potentially provides solutions to the challenging problem of fault-tolerant and robust flight control. This paper proposes a novel Safety-Informed Evolutionary Reinforcement Learning algorithm (SERL), which combines Deep Reinforcement Learning (DRL) and neuroevolution to optimize a population of nonlinear control policies. Using SERL, the work has trained agents to provide attitude tracking on a high-fidelity nonlinear fixed-wing aircraft model. Compared to a state-of-the-art DRL solution, SERL achieves better tracking performance in nine out of ten cases, remaining robust against faults and changes in flight conditions, while providing smoother action signals.

Autopilot Controller of Fixed-Wing Planes Based on Curriculum Reinforcement Learning Scheduled by Adaptive Learning Curve

Article

Jun 2024

In this paper, we present a novel curriculum reinforcement learning method that can automatically generate a high-performance autopilot controller for a 6-degree-of-freedom (6-DOF) aircraft with an unknown dynamic model, which is difficult to be handled using traditional control methods. In this method, a sigmoid-like learning curve is elegantly introduced to generate goals (the desired heading, altitude, and velocity) from easy to hard for autopilot. The shape of the learning curve can be intelligently adjusted to adapt to the training process of Proximal Policy Optimization (PPO). In addition, the conflict between multiple goals in autopilot training is solved by designing an adaptive reward function. Furthermore, the control inputs can avoid large oscillations by filtering the outputs from PPO with a first-order filter to ensure the smoothness. A series of simulation results show that the proposed method can not only observably improve the success rate and stability of training but also has superior performance in settling time and robustness compared with the traditional PID control and a state-of-the-art (SOTA) method. In the end, the applications of the controller, including the navigation task, pursuit-evasion, and dogfighting, are demonstrated to prove its feasibility to multiple tasks.

Evolutionary Reinforcement Learning: A Hybrid Approach for Safety-informed Intelligent Fault-tolerant Flight Control

Conference Paper

Jan 2024

Autonomous Navigation for eVTOL: Review and Future Perspectives

Article

Feb 2024

This survey paper explores the emergent domain of electric vertical takeoff and landing vehicles (eVTOLs), emphasizing the critical role of autonomous navigation capabilities essential for their effective integration and operation in complex urban environments. Pioneering in this review is the introduction of a novel six-level autonomy concept for eVTOLs, categorizing them based on their degree of intelligence. The paper offers a comprehensive review of state-of-the-art developments that together fortify the autonomous functionality of eVTOLs, with a special focus on enhanced perception, intelligent planning, and advanced control. Perception technologies empower eVTOLs with the environmental awareness crucial for navigating the intricate urban airspace, while advanced planning algorithms navigate their paths through densely populated skies, ensuring optimal routing and safety. Control strategies are developed for their ability to endow these eVTOLs with the stability and agility needed to execute complex flight dynamics. The synthesis of these elements forms the backbone of autonomous navigation for eVTOLs, outlining a clear direction for future eVTOL research. This comprehensive survey can serve as a vital resource for enhancing the autonomy of eVTOLs, steering them towards a future where they integrate effortlessly into intelligent urban transportation systems.

A Self-Organizing Multi-Model Ensemble for Identification of Nonlinear Time-Varying Dynamics of Aerial Vehicles

Article

Full-text available

Dec 2020

This paper presents a novel identification approach which can deal with nonlinear and time-varying characteristics of complex dynamic systems, especially an aerial vehicle in the entire flight envelope. A set of local sub-models are first developed at different operating points of the system, and subsequently a Self-Organizing Multi-Model Ensemble (SO-MME) is introduced to aggregate the outputs of the local models as a single model. The number of employed local models in the proposed MME is optimized using a novel self-organizing approach. Also, Wavelet Neural Networks (WNNs), which combine both the universal approximation property of neural networks and the wavelet decomposition capability, are used as the local models of the proposed method. In addition, a generalized Online Sequential Extreme Learning Machine (OSELM) is adopted in the introduced approach to determine the optimal validity function of the local models at each time step. Finally, the introduced SO-MME is applied to the NASA Generic Transport Model (GTM) as a complex nonlinear system to demonstrate the effectiveness of the proposed identification approach. Further, the results obtained from the conventional artificial neural networks are carefully compared with those from the wavelet neural networks, which are employed as the local models of the introduced MME. The simulation results suggest that the introduced WNN-based SO-MME can be used satisfactorily as the prediction model of model-based control systems for long prediction horizons.

UAV Model-based Flight Control with Artificial Neural Networks: A Survey

Article

Full-text available

Dec 2020
J INTELL ROBOT SYST

Model-Based Control (MBC) techniques have dominated flight controller designs for Unmanned Aerial Vehicles (UAVs). Despite their success, MBC-based designs rely heavily on the accuracy of the mathematical model of the real plant and they suffer from the explosion of complexity problem. These two challenges may be mitigated by Artificial Neural Networks (ANNs) that have been widely studied due to their unique features and advantages in system identification and controller design. Viewed from this perspective, this survey provides a comprehensive literature review on combined MBC-ANN techniques that are suitable for UAV flight control, i.e., low-level control. The objective is to pave the way and establish a foundation for efficient controller designs with performance guarantees. A reference template is used throughout the survey as a common basis for comparative studies to fairly determine capabilities and limitations of existing research. The end-result offers supported information for advantages, disadvantages and applicability of a family of relevant controllers to UAV prototypes.

Three-Dimensional Path-Following Control of a Robotic Airship with Reinforcement Learning

Article

Full-text available

Mar 2019

This paper proposed an adaptive three-dimensional (3D) path-following control design for a robotic airship based on reinforcement learning. The airship 3D path-following control is decomposed into the altitude control and the planar path-following control, and the Markov decision process (MDP) models of the control problems are established, in which the scale of the state space is reduced by parameter simplification and coordinate transformation. To ensure the control adaptability without dependence on an accurate airship dynamic model, a Q-Learning algorithm is directly adopted for learning the action policy of actuator commands, and the controller is trained online based on actual motion. A cerebellar model articulation controller (CMAC) neural network is employed for experience generalization to accelerate the training process. Simulation results demonstrate that the proposed controllers can achieve comparable performance to the well-tuned proportion integral differential (PID) controllers and have a more intelligent decision-making ability.

Evolved Neuromorphic Control for High Speed Divergence-Based Landings of MAVs

Article

Full-text available

Jul 2020

Flying insects are capable of vision-based navigation in cluttered environments, reliably avoiding obstacles through fast and agile maneuvers, while being very efficient in the processing of visual stimuli. Meanwhile, autonomous micro air vehicles still lag far behind their biological counterparts, displaying inferior performance at a much higher energy consumption. In light of this, we want to mimic flying insects in terms of their processing capabilities, and consequently show the efficiency of this approach in the real world. This letter does so through evolving spiking neural networks for controlling landings of micro air vehicles using optical flow divergence from a downward-looking camera. We demonstrate that the resulting neuromorphic controllers transfer robustly from a highly abstracted simulation to the real world, performing fast and safe landings while keeping network spike rate minimal. Furthermore, we provide insight into the resources required for successfully solving the problem of divergence-based landing, showing that high-resolution control can be learned with only a single spiking neuron. To the best of our knowledge, this work is the first to integrate spiking neural networks in the control loop of a real-world flying robot.

Adaptive fault tolerant control for hypersonic flight vehicle system with state constraints

Article

Full-text available

Jul 2020
J FRANKLIN I

In this paper, an adaptive fault tolerant control (FTC) scheme based on barrier Lyapunov functions (BLFs) for the hypersonic flight vehicle (HFV) with state constraints is proposed. Complexities of the aerodynamical uncertainties, external disturbances and flexible dynamics are taken into account in the controller design process. The paper deals with the MIMO system properly so that the backstepping technique based on BLFs can be successfully used for the issue of state constraints without neglecting the coupling of HFV system. To aim at the unknown faults of the elevators, an adaptive FTC mechanism is introduced to the BLFs-based controller, which increases the reliability of the system. Finally, simulations are conducted to verify the validity of the designed controller.

Event-Triggered Adaptive Dynamic Programming for Unmatched Uncertain Nonlinear Continuous-Time Systems

Article

Jul 2020

In this article, an event-triggered adaptive dynamic programming (ADP) method is proposed to solve the robust control problem of unmatched uncertain systems. First, the robust control problem with unmatched uncertainties is transformed into the optimal control design for an auxiliary system. Subsequently, to reduce controller executions and save computational and communication resources, an event-triggering mechanism is introduced. By using a critic neural network (NN) to approximate the value function, novel concurrent learning is developed to learn NN weights, which avoids the requirement of an initial admissible control and the persistence of excitation condition. Moreover, it is proven that the developed event-triggered ADP controller guarantees the robustness of the uncertain system and the uniform ultimate boundedness of the NN weight estimation error. Finally, by using the F-16 aircraft and the inverted pendulum with unmatched uncertainties as examples, the simulation results show the effectiveness of the developed event-triggered ADP method.

Distributed adaptive fault-tolerant close formation flight control of multiple trailing fixed-wing UAVs

Article

Jul 2020
ISA T

This paper considers the reliable control problem for multiple trailing fixed-wing unmanned aerial vehicles (UAVs) against actuator faults and wake vortices. A distributed adaptive fault-tolerant control (FTC) scheme is proposed by using a distributed sliding-mode estimator, dynamic surface control architecture, neural networks, and disturbance observers. The proposed control scheme can make all trailing fixed-wing UAVs converge to the leading UAV with pre-defined time-varying relative positions even when all trailing UAVs encounter the wake vortices generated by the leading UAV and a portion of trailing UAVs is subjected to the actuator faults. It is shown that under the proposed distributed FTC scheme, the tracking errors of all trailing UAVs with respect to their desired positions are bounded. Comparative simulation results are provided to illustrate the effectiveness of the proposed control scheme.

Decentralized fractional-order backstepping fault-tolerant control of multi-UAVs against actuator faults and wind effects

Article

Jun 2020
AEROSP SCI TECHNOL

Concurrent occurrences of actuator faults and wind effects can significantly threaten the flight safety of multiple unmanned aerial vehicles (multi-UAVs). To address this difficult control problem against actuator faults and wind effects, a composite decentralized fractional-order (FO) backstepping adaptive neural fault-tolerant control (FTC) method is presented for the attitude synchronization tracking of multi-UAVs, which is integrated with neural networks (NNs), disturbance observers (DOs), FO calculus, and high-order sliding-mode differentiators (HOSMDs). The distinctive feature of this work is addressing the attitude synchronization tracking control problem with actuator faults and wind effects in a decentralized framework and proposing a composite approximation method for multi-UAVs. It is shown that by using Lyapunov methods the synchronization tracking control is achieved even when multi-UAVs simultaneously encounter wind effects and actuator faults. Comparative simulation results illustrate the theoretical feasibility.

Predefined-time adaptive fault-tolerant control of hypersonic flight vehicles without overparameterization

Article

Jun 2020
AEROSP SCI TECHNOL

In this paper, the predefined-time adaptive fuzzy tracking control problem of hypersonic flight vehicles (HFVs) without overparameterization is investigated. Novel fuzzy adaptive controllers are derived based on fuzzy approximation and backstepping control techniques. Firstly, a new performance function is introduced to ensure the predefined-time convergence of the output tracking error. Secondly, a novel exact tracking control strategy is proposed by introducing a vanishing time-related function, and the asymptotic tracking performance of the proposed controller is achieved. Thirdly, a bound estimation approach is involved in dealing with the unknown system dynamics as well as time-varying faults, and only two adaptive parameters are required to be estimated for each subsystem. As a result, the computational burden is significantly reduced without accuracy decreasing. The stability of the closed-loop system is proved via Lyapunov method, and the effectiveness of the proposed control scheme is demonstrated with numerical simulations.

Auto-landing of UAVs with variable centre of mass using the backstepping and dynamic inversion control

Article

Jun 2020
AEROSP SCI TECHNOL

Mihai Lungu

The paper proposes the design methodology of a new auto-landing system for unmanned aerial vehicles (UAVs) with variable centre of mass subject to wind shears, wind gusts, and atmospheric turbulences. Starting from the UAV nonlinear dynamics, a new control architecture is developed. It combines the backstepping and dynamic inversion approaches for the control of UAV attitude angles, lateral deviation from runway, flight altitude, and forward speed during the three landing stages (final approach, glide slope, flare). The estimation of the wind shears, wind gusts, and atmospheric turbulences is achieved via a neural network based disturbance observer, included in the new auto-landing architecture. By its software implementation and validation, the robustness to wind type disturbances and the stability of the new auto-landing system are proved. There are cancelled the altitude error, the lateral deviation from runway, while the UAV trajectory is the desired one with continuous and smooth transition from one stage of landing to another.

Neural network-based flight control systems: Present and future

Abstract and Figures

Recommended publications

Neural Network-based Flight Control Systems: Present and Future

Higher-level application of Adaptive Dynamic Programming/Reinforcement Learning - a next phase for c...

Disturbance observer-based adaptive neural guidance and control of an aircraft using composite learn...

Fault-Tolerant Predictive Trajectory Tracking of an Air Vehicle Based on Acceleration Control

Nonlinear Fault-Tolerant Trajectory Tracking Control of a Quadrotor UAV