PreprintPDF Available

Differentiable Predictive Control: Constrained Deep Learning Alternative to Explicit Model Predictive Control for Unknown Nonlinear Systems

June 2021

June 2021

Authors:

Ján Drgoňa

Pacific Northwest National Laboratory

Karol Kis

Slovak University of Technology in Bratislava

Aaron Tuor

Western Washington University

Show all 5 authorsHide

Preprints and early-stage research may not have been peer reviewed yet.

We present differentiable predictive control as a deep learning alternative to the explicit model predictive control for unknown nonlinear systems. The structure of the proposed neural architecture is inspired by the structure of a model predictive control problem, by i) using a prediction model capturing controlled system dynamics, ii) receding horizon optimal control action predictions, and iii) enforcing inequality constraints via penalty methods. In the presented framework, a neural state-space model is learned from time-series measurements of the unknown system. The control policy is then optimized via gradient descent by differentiating the closed-loop system model fully parametrized by neural networks. The proposed architecture allows to train explicit control policy tracking the distribution of reference signals and handle time-varying constraints imposed on states and control actions. We experimentally demonstrate that it is possible to train constrained explicit control policies purely based on the observations of the dynamics of the unknown nonlinear system. The proposed method is applied to a laboratory device in embedded implementation using a Raspberry-Pi platform. We compare reference tracking and constraints satisfaction against explicit model predictive control and report pivotal efficiency gains in online computational demands, memory requirements, policy complexity, and construction time. We show that the differentiable predictive control method scales linearly compared to exponential scalability of the explicit predictive control solved via multiparametric programming, hence, opening doors for applications in nonlinear systems with a large number of variables, longer prediction horizons, and faster sampling rates that are beyond the reach of classical explicit predictive control.

Conceptual methodology of the proposed constrained nonlinear differentiable predictive control (DPC).

…

Block-structured neural dynamics architecture. Here y represents observed system outputs, x are hidden states, and u are control action trajectories.

…

Realisation of the closed-loop control system with an unknown dynamical system.

…

Constrained differentiable control policy architecture. y represents controlled outputs of the system, y and y represent lower and upper output constraints, r are sampled reference trajectories, and u are control actions generated by the policy.

…

+10

System identification with block-structured neural state space model (BN-SSM). Here y in red and blue color represent observed and predicted system outputs, respectively, x are hidden states, and u are observed control action trajectories.

…

Figures - uploaded by Ján Drgoňa

Content may be subject to copyright.

Content uploaded by Ján Drgoňa

Content may be subject to copyright.

Content uploaded by Ján Drgoňa

Content may be subject to copyright.

Differentiable Predictive Control:

Constrained Deep Learning Alternative to

Explicit Model Predictive Control for

Unknown Nonlinear Systems

J´

an Drgoˇ

na, Karol Kiˇ

s, Aaron Tuor, Draguna Vrabie, Martin Klau ˇ

Abstract—We present differentiable predictive control

as a deep learning alternative to the explicit model predic-

tive control for unknown nonlinear systems. The structure

of the proposed neural architecture is inspired by the struc-

ture of a model predictive control problem, by i) using a

prediction model capturing controlled system dynamics, ii)

receding horizon optimal control action predictions, and

iii) enforcing inequality constraints via penalty methods.

In the presented framework, a neural state-space model

is learned from time-series measurements of the unknown

system. The control policy is then optimized via gradient

descent by differentiating the closed-loop system model

fully parametrized by neural networks. The proposed archi-

tecture allows to train explicit control policy tracking the

distribution of reference signals and handle time-varying

constraints imposed on states and control actions. We

experimentally demonstrate that it is possible to train con-

strained explicit control policies purely based on the obser-

vations of the dynamics of the unknown nonlinear system.

The proposed method is applied to a laboratory device in

embedded implementation using a Raspberry-Pi platform.

We compare reference tracking and constraints satisfac-

tion against explicit model predictive control and report

pivotal efﬁciency gains in online computational demands,

memory requirements, policy complexity, and construction

time. We show that the differentiable predictive control

method scales linearly compared to exponential scalability

of the explicit predictive control solved via multiparametric

programming, hence, opening doors for applications in

nonlinear systems with a large number of variables, longer

prediction horizons, and faster sampling rates that are

beyond the reach of classical explicit predictive control.

Index Terms—constrained deep learning, differentiable

predictive control, explicit model predictive control, neural

state space model, nonlinear system identiﬁcation

This work was funded by the Mathematics for Artiﬁcial Reasoning in

Science (MARS) investment at the Paciﬁc Northwest National Laboratory

(PNNL). K. Ki

s and M. Klau

co gratefully acknowledge the contribution

of the Scientiﬁc Grant Agency of the Slovak Republic under the grants

1/0585/19 and 1/0545/20.

J. Drgo

na, is with Paciﬁc Northwest National Laboratory, Richland,

Washington, USA (e-mail: jan.drgona@pnnl.gov).

K. Ki

s is with Slovak University of Technology, Bratislava, Slovakia

(e-mail: karol.kis@stuba.sk).

A. Tuor is with Paciﬁc Northwest National Laboratory, Richland,

Washington, USA (e-mail: aaron.tuor@pnnl.gov).

D. Vrabie is with Paciﬁc Northwest National Laboratory, Richland,

Washington, USA (e-mail: draguna.vrabie@pnnl.gov).

M. Klau

co is with Slovak University of Technology, Bratislava, Slovakia

(e-mail: martin.klauco@stuba.sk).

I. INTRODUCTION

Incorporation of machine learning methods in control ap-

plications is becoming one of the leading research avenues in

the ﬁeld of control theory. The design of many novel control

methods based on machine learning (ML) approaches is heavily

inspired by the beneﬁts of model predictive control (MPC),

such as constraints handling and robustness. The substitution

of MPC with ML-based controllers was studied by several

well-established researchers in the control domain [1]–[5].

Furthermore, the application of neural networks as substitutes of

MPC behavior has been considered in practical applications as

well [6]–[9]. All aforementioned works fall into the category

of so-called approximate MPC based on imitation learning

of original MPC. As such, these works have one signiﬁcant

disadvantage, which is the reliance on data sets collected from

closed-loop experiments, i.e., the ML-based controllers are

trained based on experiments involving fully implemented

MPC.

On the other hand, the main advantage from considering

these machine learning controllers is their explicit form

in which they are implemented. In fact, the reduction of

computational and memory burden by considering controllers

based on neural networks boils down to an evaluation of an

explicit function that amounts to a couple of kilobytes in

source code [10]. In the case study presented in [10], an

online MPC with hybrid dynamics was replaced by the explicit

ML-based controller. Naturally, assembling hybrid explicit

MPC is nearly impossible even for simple systems. Compared

to traditional explicit MPC [11], [12], this is a signiﬁcant

advantage since even simple explicit MPC strategies cover

several hundreds of megabytes [13]. Even if we still manage to

produce reasonable scaled explicit MPC by several complexity

reduction techniques [14], [15], the need for the MPC design

alongside linear system identiﬁcation is unavoidable.

Linear system identiﬁcation [16] combined with linear

MPC, is a standard practice in various industrial applications.

However, despite its wide-ranging uses, its deployment in

connection with MPC is not practical for many settings. In

particular, low-resource embedded systems controlling highly

nonlinear dynamics present serious obstacles to an MPC

controller implementation. Despite its compelling theory [17],

the system dynamics’ linearization may not be a sufﬁcient

approximation for most highly nonlinear systems. In other

words, implicit MPC may be too slow, and explicit MPC too

memory intensive for an embedded system which must operate

at high frequency due to highly nonlinear dynamics [18].

Due to these promises and difﬁculties, research in the area

of connecting neural networks and MPC is gaining interest in

the optimization, controls, and machine learning communities.

Neural networks trained to mimic MPC policies can offer fast

approximations for low-resource settings, but often at the cost

of constraints satisfaction and performance guarantees over

longer time horizons. Alternative approaches which integrate

constraints satisfaction into neural network predictive control

policies, are associated with other disadvantages such as costly

optimization routines and the necessity for a high volume of

data that may be unrealistic to gather in an operational setting.

Addressing these challenges, we present a novel control

method called differentiable predictive control (DPC) for

learning both nonlinear system dynamics and control policies

end-to-end, without supervision from the expert controller, with

constraint satisfaction capabilities, and in a sample efﬁcient

way based only on the observed time-series data of the

system dynamics. This in a stark contrast with the recent

implementation in [1], where a trained ML-based controller

runs alongside an online MPC strategy, making the algorithm

unusable in many applications running in low computational

resource settings. The presented DPC method is based on the

neural parametrization of the closed-loop dynamical system

with two building blocks: i) neural system model, and ii) a

deep learning formulation of model predictive control policy.

In particular, we combine system identiﬁcation based on

structured neural state-space model and policy optimization

with embedded inequality constraints via backpropagation of

the control loss through the closed-loop system dynamics model.

The conceptual methodology of DPC is illustrated in Fig. 1.

The presented DPC method supports nonlinear systems, with

both input and state constraints, arbitrary reference signals and

represents a methodological extension of prior work on linear

systems introduced in [19]. A method similar in spirit has been

recently proposed by [20], where the authors use Log-Sum-Exp

neural networks to approximate the MPC-related cost function

given the measurements of the system dynamics. However, in

contrast to the proposed DPC method, the implementation of the

method in [20] still relies on the online nonlinear optimization,

does not handle state constraints, and training on arbitrary

reference signals poses computational challenges.

We demonstrate the capabilities of the proposed DPC method

on experimental results using a laboratory device called

FlexyAir with nonlinear dynamics and noisy measurements.

We demonstrate several key features of DPC resulting in data-

efﬁciency, scalability, and systematic constraint handling:

Solution of constrained optimal control problems for

unknown nonlinear systems, given only a time-series

of measurements of the observed system dynamics and

sampled boundary conditions.

Constrained neural control policies and state space dynam-

ics models which promote physically plausible outcomes.

In contrast with explicit MPC, our method supports

dynamical constraints and trajectory preview capabilities.

Linear scalability in terms of the number of decision

variables and length of the prediction horizon compared

to exponential scalability of explicit MPC solutions based

on multi-parametric programming.

Our approach requires less online computation time and

memory than explicit MPC solutions.

A. Related Work

a) Approximate MPC: Recently, several works have been

devoted to implementing MPC and its variations based on

machine learning. One set of these works includes mimicking

the MPC behavior by replacing the control law with a neural

network [7], [21], [22]. These approaches are known in the

machine learning community as imitation learning [23], or

MPC-guided policy search [24]. The underlying principle is

to train the neural networks in supervised learning fashion

on a labeled data set of the system dynamics controlled by

MPC. Several procedures were designed to approximate MPC’s

behavior with neural networks, which signiﬁcantly reduces

the implementation requirements of the MPC with minimal

impact on the control performance [7], [22]. A signiﬁcant

disadvantage of these imitation learning approaches lies in the

lack of guarantees on constraints satisfaction and performance.

Constraint satisfaction using neural network-based approxi-

mations of MPC control laws is a new and active research area.

To tackle this issue, some authors proposed including additional

layers in the policy network projecting the control inputs onto

the constrained region of the state and action spaces [25], [26].

Authors in [1] use an additional dual policy neural network

to estimate the sub-optimality of the learned control law with

probabilistic guarantees. Others employ learning bounds for

empirically validating constraints handling capabilities after the

network is trained [5]. However, all previous works rely upon

several factors, which are prerequisites for their successful

implementation. For example, they need to construct the MPC

strategy ﬁrst, which requires prior knowledge of the system

dynamics model and relatively costly optimization required

for solving the associated MPC, constraints satisfaction, or

projection problem. This means that prior knowledge about

underlying system dynamics is necessary, which is often expert-

dependent and time-consuming. For more comprehensive

overview of the approximate MPC, or learning-based MPC

(LBMPC) methods we refer the reader to [27], [28].

b) Neural Models in MPC: To tackle the modeling prob-

lem, some researchers focus on training neural networks as

prediction models for MPC [29]. Authors in [30] use low-rank

features of the high-dimensional system to train a recurrent

neural network (RNN) to predict the control relevant quantities

for MPC. Other authors have used structured neural network

models inspired by classical linear time-varying state-space

models [31], whereas some have proposed using convex neural

architectures [32], graph neural networks [33], or stable neural

networks based on Lyapunov functions [34].

c) Automatic Differentiation in Control: The use of automatic

differentiation (AD) for control and optimization is a well-

established method. For instance, CasADi toolbox [35]

Fig. 1: Conceptual methodology of the proposed constrained nonlinear differentiable predictive control (DPC).

uses known system dynamics and constraints for constructing a

computational graph and computes the gradients for nonlinear

optimization solvers. From the perspective of machine learning,

the authors in [32] investigated the idea of solving the optimal

control problem by backpropagation through the learned system

model parametrized via convex neural networks [36]. Others

developed, domain-speciﬁc differentiable physical models for

the robotics domain [37], [38]. Learning linear Model Predictive

Control (MPC) policies by differentiating the KKT conditions

of the convex approximation at a ﬁxed point was introduced

in [39]. However, a generic method for explicit solution of

constrained nonlinear optimal control problems with unknown

dynamics using AD is still lacking.

d) Constrained and structured deep learning: Incorporating

constraints into deep learning represents multiple challenges

such as non-convexity, convergence, and stability of the learning

process [40]. Penalty methods and loss function regularizations

are the most straightforward way of imposing constraints on the

deep neural network outputs and parameters [41]–[43]. In those

methods, the loss function is augmented with additional terms

penalizing the violations of soft constraints via slack variables,

which typically works well in practice, often outperforming

hard constraint methods [44], [45]. Authors in [46], [47] use

barrier methods combined with Lyapunov functions to enforce

output constraints, stability, and boundedness of the neural

network controller. An alternative to penalty and barrier meth-

ods are neural network architectures imposing hard constraints,

such as linear operator constraints [48], or architectures with

Hamiltonian [49] and Lagrangian [50] structural priors for

enforcing energy conservation laws.

II. PRELIMINARIES

A. System Dynamics

We assume an unknown partially observable nonlinear

dynamical system in discrete time:

xk+1 =f(xk,uk)(1a)

yk=g(xk)(1b)

where

xk∈Rnx

is the unknown system state,

yk∈Rny

is the

observed output, and

uk∈Rnu

is the control input at time

. We assume we have access to the data generated by the

system in the form of input-output tuples:

Ξ = {(yi

k,ui

k),(yi

k+1,ui

k+1),· · · ,(yi

k+N,ui

k+N)}, i ∈Nn

(2)

where

is the number of sampled trajectories with

time

steps.

B. Model Predictive Control

In this work, we consider a well-known model predictive

controller (MPC) [51]. We follow standard formulation of the

linear MPC as the quadratic optimization problem, speciﬁcally

formed as

min

u0,...,uN−1

N−1

k=0 ||yk−r||2

Qr+||uk−uk−1||2

Qdu(3a)

s.t. xk+1 =Axk+Buk,(3b)

yk=Cxk+Duk,(3c)

uk≤uk≤uk,(3d)

xk≤xk≤xk,(3e)

x0=x(t),(3f)

u−1=u(t−Ts).(3g)

Here, the objective function is deﬁned as the ﬁnite sum of two

terms over a prediction horizon

. Both terms are considered

as a weighted second norm, i.e.

||a||2

Q=a|Qa

. Note, that to

enforce problem

(3)

feasibility, the weighting factors

Qdu

and

must be chosen as positive deﬁnite and positive semi-deﬁnite

matrices, respectively. The constraints

(3b)

(3e)

are enforced for

k∈0,1, . . . , N −1

. Moreover, to enforce reference tracking,

the minimization objective, Equation 3a, includes a term for

reference tracking error,

||yk−rk||2

, as well as a control rate

penalisation term, ||uk−uk−1||2

Qdu, which is a standard way

to enforce offset free tracking [52]. The optimization problem

is initialised with current state measurement

x(t)

, as in

(3f)

, by

the value of the control action from previous sampling instant

u(t−Ts)

, as described in

(3g)

, and by the reference value

(3a)

. Note, that the notation

u−1

is valid for

k= 0

, which

is necessary to compute the value of the objective function in

the initial prediction step.

The control strategy is implemented in the receding horizon

fashion, where we solve

(3)

towards global optimality

yielding an optimal sequence of control actions

U?=

[u?|

0,...,u?|

N−1]|

, while only the ﬁrst action is applied to the

Fig. 2: Realisation of the closed-loop control system with an

unknown dynamical system.

system. Representation of such an implementation is visualised

on the Fig. 2.

We formulate the quadratic optimization problem

(3)

MATLA B with the YALMIP toolbox [53]. The problem is then

solved numerically with the GUROBI solver. Due to the fast

dynamics of the controlled system, and that the numerical

solution to the MPC problem takes more time compared to

the sampling rate

, we consider a parametric solution to the

optimization problem

(3)

. The parametric solution allows us

to evaluate the control law to obtain the optimal control action

within allotted time.

C. Explicit Model Predictive Control

Parametric optimization theory allows us to create an

explicit map between initial conditions of the optimal control

problem

(3)

and the optimal solution

. Speciﬁcally, the

vector of parameters, thus the initial conditions, is deﬁned as

ξ=



x(t)

u(t−Ts)



.(4)

After applying elementary matrix operations, presented in [54],

the OCP from (3) can be reformulated to

min

UU|HU +ˆ

ξ|FU (5a)

s.t. GU≤w+Sˆ

ξ, (5b)

which constitutes as a parametric quadratic optimization

problem (PQP). The solution to this problem can be obtain via

standard procedures of parametric programming. Namely, the

results is represented by a piece-wise afﬁne (PWA) function,

given as

U?(θ) =











α1ˆ

ξ+β1if ˆ

ξ∈ R1

αnRˆ

ξ+βnRif ˆ

ξ∈ RnR

.(6)

Here, the variable

stands for the vector of parameters,

denotes the total number of regions, while the vectors

αi

and

βi

deﬁne the speciﬁc control law with respect to a region

The regions are deﬁned as polyhedral sets, namely

Ri={ˆ

ξ|Γiˆ

ξ≤γi}i= 1, . . . , nR,(7)

Here, the matrices

Γi

and

γi

denote the half-space repre-

sentation of regions. Since, the optimization problem

(3)

a quadratic problem with linear constraints, all regions are

deﬁned by linear inequalities.

Note that the procedure of obtaining a numerical represen-

tation of matrices

Γi

and

γi

is done by means of Multi-

Parametric Toolbox [55] in Matlab. To use the explicit MPC in

connection with the laboratory device, we export the control

law into Python source code. The export to Python source

code contains two parts. The ﬁrst part is the coefﬁcients

from

(III-C.0.a)

and

(7)

, and the second part is the algorithm

that evaluates the control law. The algorithm is based on a

well-known sequential search method from [56].

III. METHOD

This section presents Differentiable Predictive Control (DPC),

a constrained neural network-based method for learning nonlin-

ear state-space models and optimal control policies for unknown

dynamical systems represented by time-series data. Our system

identiﬁcation method is based on a recently proposed block-

structured neural state-space model architecture [57], allowing

us to impose constraints on the model structure or variables

to enforce physically realistic predictions. These nonlinear

systems models are then combined with a neural control policy

forming a fully parametrized differentiable closed-loop system

dynamics model. This generic architecture allows us to learn

a wide range of constrained control policies using end-to-end

auto-differentiation of the MPC-like loss functions and gradient

descent optimization. Similar to explicit MPC, DPC optimizes

control policies ofﬂine using

-step ahead predictions of the

closed-loop system dynamics model generated as a response

to the distribution of synthetically generated control features

Then after the training, analogous to MPC, DPC is deployed

in the receding horizon control (RHC) fashion, thus at each

time step, a sequence of

optimal control actions is predicted

by the policy, but only the ﬁrst is applied to the system.

In the proposed methodology, both equality and inequality

constraints play a crucial role. The former can be modeled

by the feed forward nature of the neural component blocks,

as shown in the sections on the dynamics model and closed-

loop system architectures. To tackle the latter in the context

of neural networks, we leverage a well-known penalty method

for constrained optimization.

A. Penalty Constraints

To impose inequality constraints on variable

one can use

a penalty function

p(y)

to evaluate the constraints violations

via slack variable

. In principle, the penalty method can be

used to model arbitrary non-linear inequality constraints. In

this work, we employ penalty functions for time-varying lower

and upper bounds yk,yk, respectively, given as follows:

p(yk,yk) = max(0,−yk+yk)(8a)

p(yk,yk) = max(0,yk−yk)(8b)

Remark.

Penalty functions in the form

(8)

can be straightfor-

wardly implemented in modern deep-learning libraries such

as Pytorch or Tensorﬂow using standard

ReLU

activation

functions.

B. Constrained Neural Dynamics Models

We aim to learn a constrained neural representation of the

unknown system dynamics, given the input-output time-series

dataset (2) obtained from system observation.

a) Model architecture: We present a generic block neural

state-space model (BN-SSM) to represent and learn partially

observable unknown nonlinear system dynamics

(1)

, given the

labeled dataset

(2)

. The BN-SSM architecture is shown in

Fig. 3 with corresponding equations given as follows:

xk+1 =fx(xk) + fu(uk)(9a)

yk+1 =fy(xk+1)(9b)

x0=fo(y1−N,...,y0)(9c)

k∈NN

0(9d)

Where

deﬁnes the discrete time step, and

deﬁnes the

prediction horizon, i.e. number of rollout steps of the recurrent

model. Individual block components

, and

are represented by neural networks. Where

, and

deﬁne the hidden state and input dynamics, replacing the

and

matrices in the classical linear state-space model,

respectively. The block

deﬁnes the output mapping from

hidden states

to observables

, replacing the

matrix in

the linear model. The observer block

maps the past output

trajectories

Yp={y1−N,...,y0}

onto initial states

, which

is necessary for handling partially observable systems. Now

let us compactly represent the

-step ahead rollout of the

model

(9)

{Y,X}=fN

θ(Yp,U)

with lumped parameters

θ.

Fig. 3: System identiﬁcation with block-structured neural state

space model (BN-SSM). Here

in red and blue color represent

observed and predicted system outputs, respectively,

are

hidden states, and uare observed control action trajectories.

Remark.

The proposed BN-SSM architecture

(9)

represents a

generalization of a family of neural state-space models [58]–

[65]. Depending on the choice of neural blocks

and

one can represent fully nonlinear, Hammerstein-Weiner,

Weiner, Hammerstein, or simply linear dynamics models with

or without internal feedback. Additionally, a block architecture

allows us to impose local regularizations on the block structure

or constraints on internal block-generated variables.

b) System identiﬁcation loss: We train the neural state-

space dynamics

(9)

on sampled input-output trajectories

(2)

of the observed system dynamics. The multi-term system

identiﬁcation loss is given as follows:

LMSE(Ytrue ,Y,X,Y,Y,U|θ) =

i=1

k=1

||ytrue,i

k−yi

k||2

2+Qdx||xi

k−xi

k−1)||2

Qy||p(yi

k,yi

k)||2

2+Qy||p(yi

k,yi

k)||2

Qu||p(fu(ui

k),fu)||2

2+Qu||p(fu(ui

k),fu)||2

2

(10)

Here

represents time step of the prediction horizon

and

is the batch index of

sampled trajectories. The ﬁrst

term represents the trajectory tracking loss deﬁned as the two

norm over a vector of residuals between the true

Ytrue =

{ytrue,i

1,...,ytrue,i

and predicted

Y={yi

1,...,yi

output

trajectories over

steps. The second term is a regularization

for smoothing the trajectories by penalizing the one-time step

difference between the successive hidden states

. The third and

fourth terms impose box constraints on the output phase space.

Output constraints in the system identiﬁcation loss can help

learn models with physically meaningful trajectories outside

the training set’s distribution. Furthermore we can leverage the

structure of the proposed block neural state space model

(9)

and impose similar constraints on the inﬂuence of the control

input dynamics components fu(uk).

Remark.

Input dynamics constraints can be leveraged during

the system identiﬁcation phase in case of prior knowledge

or assumptions about maximal

and minimal

temporal

contribution of the control actions to the one time step state

dynamics differences

∆xk=xk−xk−1

. In practice, these

bounds can be estimated from data by means of residuals be-

tween perturbed and non-perturbed system dynamic responses.

In our experimental case study presented in section IV, we

have used fu= 0.5,fu=−0.5.

C. Constrained Differentiable Predictive Control

The objective is to learn the constrained differentiable pre-

dictive control (DPC) policy to govern the unknown dynamical

system (1), given the learned neural state space model (9).

a) Neural control policy: The input to the neural con-

trol policy is a vector of selected control parameters

ξ=

Y|

pR|Y|Y|

. The neural control policy map is given

as follows:

U=πΘ(ξ)(11)

Where

U={u1,...,uN}

is an optimal control trajectory,

Yp={y1−N,...,y0}

represents observed output trajectories

-steps into the past,

R={r1,...,rN}

, is a tensor of

reference trajectories, while

Y={y1,...,yN}

, and

{y1,...,yN}

are tensors of imposed lower and upper bounds

for future output trajectories, respectively. In this paper, we

assume

πΘ(ξ) : Rm→Rn

to be a fully connected neural

network architecture with l∈NL

1layers given as:

πΘ(ξ) = WLhL+bL(12a)

hl=σ(Wl−1hl−1+bl−1)(12b)

h0=ξ(12c)

parametrized by

Θ={Wl,bl,|∀l∈NL

with weights

and biases

, and nonlinear activation function

σ:Rnh→

Rnh

. In DPC, the neural control policy

(11)

replaces the PWA

control law of explicit MPC described in Section II-C, thus

in DPC the neural policy represents an expanded parametric

space compared to the lower-dimensional parametric space of

explicit MPC given as eq. (4).

b) Differentiable closed-loop system architecture: To train

the constrained control policy

(11)

, we design the neural

representation of the closed-loop dynamics using the learned

neural state space model fN

θ(9):

U=πΘ(ξ)(13a)

Y=fN

θ(Yp,U)(13b)

The system dynamics model

θ(9)

is used to predict

-steps

Fig. 4: Differentiable predictive control (DPC) architecture.

Here

represents controlled outputs of the system,

and

represent lower and upper output constraints,

are sampled

reference trajectories, and

are control actions generated by

the neural policy optimized with MPC-inspired loss function.

ahead future output trajectories

, given the control actions

trajectories Ugenerated by the neural control policy (11).

The corresponding network architecture is shown in Fig. 4.

Here the closed-loop model is constructed by connecting the

learned system dynamics model

(9)

with the control policy

(11)

through the control actions trajectories

. Hence the proposed

control policy represents a predictive control strategy with

a preview of future constraints and reference signals. The

policy is optimized by differentiating the closed-loop neural

dynamics model on sampled past output trajectories

, and

given the forecast of lower and upper constraint trajectories

, and

. The parameters of pre-trained neural state-space

model

(9)

representing the open-loop system dynamics are ﬁxed

during the policy optimization. The distribution of past output

trajectories

represents a sampling of initial conditions,

while the distribution of time-varying constraints and reference

signals represent a sampling of different operational scenarios

and tasks.

Remark.

The proposed control policy optimization procedure

does not require an interaction with the real system or its

emulator model. Instead, the policy is trained by sampling

the closed-loop system using the trained dynamics model.

Moreover, the training is extremely data-efﬁcient as all sampled

trajectories can be generated synthetically.

c) MPC-inspired loss function: The closed-loop system

parametrization

(13)

now allows us to simulate the effect of

varying control features

on the system’s output dynamics

This simulation capability, together with differentiability of the

closed-loop model

(13)

is a key feature of the proposed control

method which allows us to perform data-driven optimization

of neural control policy

(11)

. We train the policy parameters

by sampling the distribution of the control features

, and

backpropagating the gradients of the loss function through

the closed-loop system model. In particular, we leverage the

following MPC-inspired multi-term loss function. The primary

reference tracking term is augmented with control smoothing

and penalty terms

(8)

imposed on control actions and output

trajectories:

LMSE(R,Y,Y,Y,U,U,U|Θ) =

i=1

k=1

Qr||ri

k−yi

k||2

2+Qdu||ui

k−ui

k−1)||2

Qy||p(yi

k,yi

k)||2

2+Qy||p(yi

k,yi

k)||2

Qu||p(ui

k,ui

k)||2

2+Qu||p(ui

k,ui

k)||2

2

(14)

Where

represents time index,

is the prediction horizon,

is the batch index and

is the number of batches of sam-

pled trajectories, respectively.

represent sampled reference

trajectories to be tracked by the output trajectories of the closed-

loop system

, where

is reference tracking weight. The

second term weighted by

Qdu

represents the control action

smoothing. Similar to

, and

are tensors

of lower and upper bounds for the

-step ahead control

action trajectories. We optimize the policy parameters

, while

keeping the parameters of the dynamics model

(9)

ﬁxed. The

penalty terms are weighted by terms

, and

for output

and input constraints, respectively.

IV. EXPERIMENTAL CASE STUDY

A. System Description

The presented control approaches are implemented on a

laboratory device called FlexyAir

. The device is a single-input

single-output system, where the actuator is a fan that drives air

into a vertical tube with a ﬂoater inside. An infrared proximity

sensor is placed on the top of the tube, which measures the

ﬂoater’s level. The manipulated variable in this laboratory

process is a fan speed command, given to an internal fan speed

controller, which sets the corresponding current for the fan

itself. Next, the process variable is the position of the ﬂoater

in the vertical tube. The control objective is to stabilize the

ﬂoater’s vertical distance at the desired reference level, while

satisfying given constraints. The sketch of the laboratory device

is shown on the Fig. 5.

Fig. 5: Sketch of the laboratory device with the Raspberry-Pi

platform.

B. Dataset

a) System identiﬁcation: Our experimental dataset is ob-

tained by observing the real system dynamics with sampling

time

Ts= 0.25

seconds. The measured input-output time series

in the form

(2)

has

m= 9 ·103

datapoints which are used to

create training, validation, and test sets with equal lengths of

1000

samples. To take into account time horizons during the

training, we apply

-step time shift to generate the past

Ytrue

and future

Ytrue

tensors for output variables, respectively. The

time series in each set are subsequently separated to

-step

batches, generating tensors with dimensions

(N, n, ny)

, where

represents number of batches, and

is the dimension of

the variable

(the same applies for

). The number of batches

n=m

depends on the total number of datapoints

and

length of the prediction horizon N.

1www.ocl.sk/flexyair

b) Closed-loop control: As mentioned in Section III-C, the

control policy training is based on a sampling of the input

sequences of the closed-loop dynamics model and does not

require extra measurements of the real system. To demonstrate

the data-efﬁciency, we generate each continuous time series

with only

3·103

samples for train, validation, and test set,

respectively. We apply the same

-step horizon batching as

in the case of system identiﬁcation task. The dataset is also

normalized using min-max normalization.

The past observations of the output trajectories

are

randomly sampled continuous trajectories, while predicted

future trajectories

are internally generated by the trained

system dynamics model. To improve generalization across

dynamic modes, we assume that sampled trajectories

are

dynamically generated sine waves with varying frequency,

amplitude, and noise at each optimization epoch. The time-

varying references and constraints bounds can be arbitrarily

sampled from user-deﬁned distribution to generalize the control

across a set of tasks.

In our case, we sample sine waves for the output reference

with amplitude in range

[0.2,0.8]

, the lower bound

range

[0.1,0.4]

, and the upper bound

in range

[0.6,0.9]

. The

control action bounds can be in principle time-varying as well,

however, in our case due to the nature of the experimental

setup, we assume static constraints U= 0.0and U= 1.0

Remark.

Training of the control policy on dynamically

sampled system output trajectories

with varying frequencies

and amplitudes is inspired by the fact that system response

can be decomposed to a set of dynamic modes with ﬁxed

frequencies [66]. Alternatively, the trajectories

could be

generated by perturbing the learned system dynamics or simply

represented by observations of the real system.

C. Metrics

We assess trained DPC performance on two sets of metrics,

the ﬁrst one for training and hyperparameter selection, the

second for task-speciﬁc performance evaluation. For training,

we evaluate the mean squared error (MSE) of system identi-

ﬁcation loss

(10)

and policy learning loss

(14)

, respectively.

The loss function MSE evaluated on development sets are

used for hyperparameter selection, while MSE on test sets

are used for performance assessment of the training process.

Second, instead of training-oriented processed datasets, we

deﬁne the task-speciﬁc metrics using the real system data with

time steps. For the system identiﬁcation, we evaluate MSE

of the open-loop response of the trained model compared to the

response of the real system given as:

TPT

k=1 ||yk−ytrue

k||2

For evaluation of the closed-loop control performance, we

compute the reference tracking MSE as

TPT

k=1 ||yk−rk||2

and integral of the absolute error (IAE) as

k=1 |yk−rk|

. For

constraints satisfaction, we evaluate mean absolute (MA) value

of the output constraints violations:

TPT

k=1 |p(yk,yk)|+

|p(yk,yk)|.

D. Optimization, and Hyperparameters Selection

The presented method with structured neural network models

was implemented using Pytorch [67]. We train our models with

randomly initialized weights using Adam optimizer [68] with a

learning rate of

0.001

. All neural network blocks in our models

are designed with

GELU

activation functions [69]. We use a

grid search for ﬁnding the best performing hyperparameters

by assessing the performance of the trained models on the

development set and task performance metrics speciﬁed in

section IV-C. For both the dynamics model and control policy,

we select the prediction horizon

N= 32

steps, which with

sampling time

Ts= 0.25

seconds, corresponds to the

seconds

time window.

a) System identiﬁcation: The system dynamics model

(9)

trained using the system identiﬁcation dataset on

1000

epochs.

The state transition block

fx:R30 →R30

and the input

dynamics block

fu:R1→R30

are represented by residual

neural networks, while output decoder

fy:R30 →R1

is a

simple linear map. The state encoder map

fo:R1→R30

represented by a standard, fully connected neural network. For

simplicity, we assume only one step time lag for the state

encoder. All individual neural block components are designed

with

hidden layers and

hidden neurons. The resulting

neural state space model has

nθ= 24661

trainable parameters.

The weight factors of the system identiﬁcation loss function

(10)

are given as follows:

Qdx = 0.2

Qy= 1.0

Qu= 1.0

. The

trained block neural state-space model

(9)

is subsequently used

for the design of the constrained differentiable control policy,

as described in section III-C.

b) Closed-loop control: The constrained differentiable con-

trol policy

(11)

is trained using synthetically sampled closed-

loop system dataset on

5·103

epochs with early stopping based

on development set MSE

. The policy map

πΘ:R128 →R32

is represented by fully connected neural network with

layers

each with

hidden neurons, resulting in

nΘ= 7272

trainable

parameters. The weight factors of the constrained control loss

function (14) are Qr= 1.0,Qdu = 0.1,Qy= 2.0,Qu= 10.0.

In case of the explicit model predictive control (eMPC)

strategy

(3a)

, we chose to set the length of the prediction

horizon to

N= 5

, while the tuning factors were set to

Qr= 3

and

Qdu= 4

. Because, classical MPC can not handle neural

state space models, the corresponding quadratic optimization

problem

(3)

is constructed using a simpliﬁed linear model

obtained from the System Identiﬁcation toolbox in Matlab [16].

E. Real-time Closed-loop Control Performance

This section shows the results of the real-time implemen-

tation of two control strategies. First, we present a step-

change scenario to show the performance concerning transient

behavior and reference tracking. Secondly, we introduce an

experimental case study, where a harmonic reference alongside

a set of harmonic constraints is considered. Here, we show how

the proposed differentiable predictive control (DPC) handles

constraint satisfaction. All control strategies for these real-time

experiments are implemented using an embedded platform

running on Raspberry-Pi 3 using Python 3.7.

Using early stopping the DPC policy training typically converged using

less than 1000 epochs.

a) Step-change scenario: This scenario presents the tracking

performance when a step change occurs in the process variable.

Speciﬁcally, we consider a step changes in the reference from

20 →15 →28

cm for the ﬂoater position. Moreover, we set

the constraints to

y= 30cm

and bottom

y= 13cm

. Here,

we compare the performance of

controllers. First, we set

the baseline with the explicit MPC strategy. Then we include

the proposed DPC policy. The tuning factors of individual

controllers are mentioned in the section IV-D.0.b.

The control performance of respective policies is visualized

on Fig. 6. Furthermore, rigorous validation of measured

performances is reported in the Table I. We can see that both

eMPC and DPC follow the trajectory and respect bottom and

top constraints from reported results. We can observe that

eMPC and DPC react to the reference change in the same

fashion (slope of the transient behavior). The difference is just

the time instant when the change occurs. This is determined

by the length of the prediction horizon of individual policies.

While for the DPC the length of the horizon is not a bottleneck

(

samples in this case), the eMPC can not be constructed

for longer horizons than

in this case we use

N= 5

due to

the memory footprint. .

TABLE I: Qualitative evaluation of control performance, based

on metrics given in IV-C.

reference tracking constraints violation

MSE IAE MSE MAE

DPC 1.76 265 0 0

eMPC 8.97 526 0 0

Finally, we direct the reader to the table I, where a

quantitative numerical evaluation is reported. Here we followed

the metrics established in section IV-C. In terms of MSE

criterion, the DPC performs almost

times better than the

explicit MPC. We also include the IAE criterion, which is

more common in the control community, and in this case, the

DPC outperforms the eMPC by more than

50%

. In terms of

constraints violations, these criteria read to

since neither DPC

and eMPC cross the limits.

b) Constraints Satisfaction: The second scenario involves

the demonstration of systematic constraint handling by the DPC

strategy. Here we utilize a harmonic reference and harmonic

constraints. Concrete results are presented in the Fig. 7. Note,

that the constraints are not violated even if the reference

crosses out of allowed space, thus empirically demonstrating

remarkable robustness capabilities even in this challenging

scenario with high measurement noise.

F. Idealized Simulation Case Studies

The purpose of this section is to demonstrate control capabili-

ties of the proposed constrained differentiable predictive control

policies (DPC) in idealized simulations. We use the trained

system dynamics model for representing the controlled system,

omitting the inﬂuence of the plant-model mismatch and additive

real-time disturbances affecting the dynamics of the real system.

We show that, in case of perfect system dynamics model, DPC

can achieve offset-free reference tracking capabilities, and

0 3 0 6 0 9 0

t im e [ s ]

1 5

2 0

2 5

3 0

p o s it ion [c m ]

(a) Position measurements.

0 3 0 60 9 0

t im e [ s ]

3 3

3 4

3 5

3 6

fa n s p ee d [%]

(b) Proﬁle of the manipulated variable.

Fig. 6: Real-time measurement proﬁles of DPC and eMPC

control strategies with static constraints. Top ﬁgure shows

the reference (red-dashed), explicit model predictive control

(purple), constrained differentiable predictive control (blue)

and constraints (black-dashed). Bottom ﬁgure shows the fan

speeds of respective controllers.

robust constraints satisfaction, hence empirically demonstrating

convergence to near optimal control performance.

a) Reference Tracking: Fig. 8 shows the offset-free reference

tracking capabilities of the trained DPC method assessed using

four different dynamic signals. Besides tracking an arbitrary

dynamic reference, DPC policy demonstrates predictive control

capabilities using a reference preview. In particular, notice the

change in the trajectories several time steps before previewed

step changes in the reference signals. To be able to react safely

and in advance of forecasted parameters such as references

and constraints is a desired feature for many industrial control

applications. The preview capability also represents additional

value compared to explicit MPC, which can not handle preview

of its parameters.

b) Constraints Satisfaction: We demonstrate the capability

of DPC to balance conﬂicting objectives in terms of reference

tracking and constraints handling. Fig. 9 plots DPC perfor-

mance with dynamic reference crossing dynamic constraints.

Even in this challenging scenario, the trained DPC policy

satisﬁes the constraints while compromising on the tracking

performance of the unattainable reference.

0 25 50 75 100 125 150 175 200

t [s]

position [cm]

(a) Position measurements.

0 25 50 75 100 125 150 175 200

t [s]

fan speed [%]

(b) Proﬁle of the manipulated variable.

Fig. 7: Real-time measurement proﬁle of proposed DPC

strategy under the inﬂuence of harmonic reference and con-

straints. Top ﬁgure shows the reference signal (red-dashed)

and measured ﬂoater position (blue). Bottom ﬁgure shows the

proﬁle of the fan speed.

G. Scalability Analysis

In Table II we compare the scalability of the proposed

differentiable predictive control (DPC) against explicit model

predictive control (eMPC) in terms of on-line computational

requirements, memory footprint, policy complexity, and off-line

construction time with the increasing length of the prediction

horizon N.

We compare the mean and maximum online computational

(CPU) time and memory footprint required to evaluate and store

the DPC and eMPC control policies. As shown in Table II, the

online evaluation of both control policies is extremely fast in

terms of CPU time. This property is crucial for controlling fast

and highly dynamical systems with high-frequency sampling

rates, such as UAVs or agile robotic systems. However, memory

requirements are signiﬁcantly different for linearly scalable

DPC compared to exponentially growing requirements of

eMPC. In the case of eMPC, its enormous memory demands

limit the applicability of this control strategy only to small

scale systems with very low prediction horizons. Contrary to

eMPC, DPC policies have an extremely low memory footprint

regardless of the prediction horizon’s length, opening the doors

to large-scale practical control applications.

Fig. 8: Simulated closed-loop control trajectories demonstrating reference tracking capabilities of DPC.

Fig. 9: Simulated closed-loop control trajectories demonstrating

balancing conﬂicting objectives with dynamic reference and

dynamic constraints.

To deeper understand the reported memory requirements,

we evaluate the complexity of the control policies in terms

of the number of parameters for DPC and the number of

critical regions for eMPC. Table II shows that the number

of parameters of the DPC scale linearly with the increasing

prediction horizon

, while a number of critical regions of

eMPC scales exponentially. The reason behind this is that

complexity of eMPC policy

(III-C.0.a)

is primarily given by

the number of constraints that scale exponentially with the

length of the prediction horizon and the number of optimized

variables. On the other hand, the complexity of DPC policy

depends mainly on the number of hidden nodes and the number

of layers, which allows it to scale to large state-action spaces

with long prediction horizons.

Table II also reports the construction time for the eMPC

policy using multiparametric programming [55], [70], to give

the reader notions about the limitations of this solution method.

Speciﬁcally, we direct the attention to the construction time

TABLE II: Scalability analysis of the proposed differentiable

predictive control (DPC) policy against explicit MPC (eMPC).

N5 7 10 12 15

mean CPU time [ms]

DPC 0.369 0.355 0.371 0.380 0.502

eMPC 0.455 0.472 0.429 - -

max CPU time [ms]

DPC 6.978 7.978 7.945 8.066 5.026

eMPC 1.325 1.927 4.684 - -

memory footprint [kB]

DPC 13 15 17 19 21

eMPC 611 9300 65200 - -

number of policy parameters

DPC 1845 2247 2850 3252 3855

number of policy regions

eMPC 108 347 1420 2631 5333

construction time [h]

DPCab 0.1 0.1 0.1 0.2 0.2

eMPCc0.1 4.0 66.5 - -

aTrained on 1000 epochs without early stopping.

bComputed on Core i7 2.6GHz CPU with 16GB RAM.

cComputed on Core i7 4.0GHz CPU with 32GB RAM.

of the eMPC with

N= 10

for which the generation of

the associated controller took almost

3−

days. Contrarily, the

construction time of DPC scales linearly to a large prediction

horizon, as reported in Table II. Here we demonstrate a

tremendous potential of the proposed DPC policies to scale

up to large-scale control problems way beyond the reach of

classical eMPC.

V. CONCLUSIONS

We have experimentally demonstrated that it is possible

to train constrained optimal control policies purely based on

the observations of the dynamics of the unknown nonlinear

system. The principle is based on optimizing control policies

with constraints penalty functions by differentiating trained

neural state space models representing an internal model of the

observed nonlinear system. We denote this control approach

constrained differentiable predictive control (DPC).

We compare the performance of trained DPC policy against

classical explicit model predictive control (eMPC). The control

algorithms are implemented on a laboratory device with the

Raspberry-Pi platform. In comparison with eMPC using a

linear model, DPC achieves better control performance due

to the nonlinear system dynamics model and the reference

and constraints preview capability. However, most importantly,

DPC has unprecedented scalability beyond the limitations of

eMPC. DPC scales well with increased problem complexity

deﬁned by the length of the prediction horizon resulting in

an increased number of decision variables and constraints

of the underlying optimization problem. DPC demonstrates

linear scalability in terms of memory footprint, number of

policy parameters, and required construction time. Therefore,

we believe that the proposed DPC method has the potential

for wide adoption in large-scale control systems with limited

computational resources and fast sampling rates.

It is essential to mention that the proposed differentiable

control methodology is generic and not limited to a partic-

ular structure of the underlying system dynamics. Various

architectures might be more or less suitable depending on the

modeled dynamical system. Straightforward model architecture

extensions of the proposed methodology may include a whole

range of differentiable models, for instance sparse identiﬁca-

tion of nonlinear dynamics (SINDy) [71], multistep neural

networks [72], or graph neural networks [73].

From a theoretical perspective, structural similarity of the

underlying optimization problem to well studied MPC problem

presents opportunities for further extensions of the proposed

DPC methodology. Some examples include adaptive control

via online updates of the nonlinear system model, robust and

stochastic control via constraints tightening, or closed-loop

stability guarantees based on stability of MPC.

VI. ACKNOWLEDGEMENTS

This work was funded by the Mathematics for Artiﬁcial

Reasoning in Science (MARS) investment at the Paciﬁc

Northwest National Laboratory (PNNL).

K. Ki

s and M. Klau

co gratefully acknowledge the contri-

bution of the Scientiﬁc Grant Agency of the Slovak Republic

under the grants 1/0585/19 and 1/0545/20.

REFERENCES

[1]

X. Zhang, M. Bujarbaruah, and F. Borrelli, “Near-Optimal Rapid MPC

Using Neural Networks: A Primal-Dual Policy Learning Framework,”

IEEE Transactions on Control Systems Technology, pp. 1–13, 2020.

[2]

S. Lucia and B. Karg, “A deep learning-based approach to

robust nonlinear model predictive control,” IFAC-PapersOnLine,

vol. 51, no. 20, pp. 511 – 516, 2018, 6th IFAC Conference on

Nonlinear Model Predictive Control NMPC 2018. [Online]. Available:

http://www.sciencedirect.com/science/article/pii/S2405896318326958

[3]

S. Chen, K. Saulnier, N. Atanasov, D. D. Lee, V. Kumar, G. J.

Pappas, and M. Morari, “Approximating explicit model predictive control

using constrained neural networks,” in 2018 Annual American Control

Conference (ACC), 2018, pp. 1520–1527.

[4]

E. T. Maddalena, C. G. da S. Moraes, G. Waltrich, and C. N. Jones, “A

neural network architecture to learn explicit mpc controllers from data,”

2019.

[5]

M. Hertneck, J. K

ohler, S. Trimpe, and F. Allg

ower, “Learning an

approximate model predictive controller with guarantees,” IEEE Control

Systems Letters, vol. 2, no. 3, pp. 543–548, 2018.

[6]

S. Lucia, D. Navarro, B. Karg, H. Sarnago, and Luc

ıa, “Deep learning-

based model predictive control for resonant power converters,” IEEE

Transactions on Industrial Informatics, vol. 17, no. 1, pp. 409–420, 2021.

[7]

Y. Lohr, M. Klau

co, M. Fikar, and M. M

onnigmann, “Machine learning

assisted solutions of mixed integer mpc on embedded platforms,” in

Preprints of the 21st IFAC World Congress (Virtual), Berlin, Germany,

July 12-17, 2020, vol. 21, July 12-17, 2020 2020. [Online]. Available:

https://www.uiam.sk/assets/publication info.php?id pub=2192

[8]

J. Drgo

na, D. Picard, M. Kvasnica, and L. Helsen, “Approximate

model predictive building control via machine learning,” Applied

Energy, vol. 218, pp. 199 – 216, 2018. [Online]. Available:

http://www.sciencedirect.com/science/article/pii/S0306261918302903

[9]

U. Rosolia, X. Zhang, and F. Borrelli, “Data-driven predictive control

for autonomous systems,” Annual Review of Control, Robotics, and

Autonomous Systems, vol. 1, no. 1, pp. 259–286, 2018. [Online].

Available: https://doi.org/10.1146/annurev-control- 060117-105215

[10]

Y. Lohr, M. Klau

co, M. Kal

uz, and M. M

onnigmann, “Mimicking

predictive control with neural networks in domestic heating systems,”

in Proceedings of the 22nd International Conference on Process

Control, M. Fikar and M. Kvasnica, Eds., Slovak University of

Technology in Bratislava.

Strbsk

e Pleso, Slovakia: Slovak Chemical

Library, June 11-14, 2019 2019, pp. 19–24. [Online]. Available:

https://www.uiam.sk/assets/publication info.php?id pub=2035

[11]

A. Bemporad, M. Morari, V. Dua, and E. N. Pistikopoulos,

“The explicit linear quadratic regulator for constrained systems,”

Automatica, vol. 38, no. 1, pp. 3–20, 2002. [Online]. Available:

https://www.sciencedirect.com/science/article/pii/S0005109801001741

[12]

D. Tavernini, M. Metzler, P. Gruber, and A. Sorniotti, “Explicit nonlinear

model predictive control for electric vehicle traction control,” IEEE

Transactions on Control Systems Technology, vol. 27, no. 4, pp. 1438–

1451, 2019.

[13]

M. Kvasnica, B. Tak

acs, J. Holaza, and S. Di Cairano, “On region-free

explicit model predictive control,” in 54rd IEEE Conference on Decision

and Control, vol. 54, Osaka, Japan, December 15-18, 2015 2015, pp.

3669–3674.

[14]

M. Kvasnica and M. Fikar, “Clipping-based complexity reduction in

explicit mpc,” IEEE Transactions On Automatic Control, vol. 57, no. 7,

pp. 1878–1883, July 2012.

[15]

M. Kvasnica, P. Bakar

c, and M. Klau

co, “Complexity reduction

in explicit mpc: A reachability approach,” Systems & Control

Letters, vol. 124, pp. 19–26, 2019. [Online]. Available: https:

//www.uiam.sk/assets/publication info.php?id pub=1980

[16]

L. Ljung, System Identiﬁcation: Theory for the User, 2nd edition.

Prentice-Hall, Upper Saddle River, NJ, 1999, p. 607.

[17]

K. Tohru, Subspace Methods for System Identiﬁcation. Springer, 2005.

[18]

M. Kvasnica, “Implicit vs explicit mpc — similarities, differences, and

a path owards a uniﬁed method,” in 2016 European Control Conference

(ECC), 2016, pp. 603–603.

[19]

J. Drgona, A. Tuor, and D. Vrabie, “Constrained physics-informed deep

learning for stable system identiﬁcation and control of unknown linear

systems,” vol. abs/2004.11184, 2020.

[20]

S. Br

uggemann and C. Possieri, “On the use of difference of log-sum-exp

neural networks to solve data-driven model predictive control tracking

problems,” IEEE Control Systems Letters, vol. 5, no. 4, pp. 1267–1272,

2021.

[21]

B. Karg and S. Lucia, “Approximate moving horizon estimation and

robust nonlinear model predictive control via deep learning,” Computers

Chemical Engineering, vol. 148, p. 107266, 2021. [Online]. Available:

https://www.sciencedirect.com/science/article/pii/S0098135421000442

[22]

K. Ki

s and M. Klau

co, “Neural network based explicit mpc for chemical

reactor control,” Acta Chimica Slovaca, vol. 12, no. 2, pp. 218–223,

2019. [Online]. Available: https://www.uiam.sk/assets/publication info.

php?id pub=2115

[23]

I. Mordatch and E. Todorov, “Combining the beneﬁts of function

approximation and trajectory optimization,” in In Robotics: Science

and Systems (RSS, 2014.

[24]

T. Zhang, G. Kahn, S. Levine, and P. Abbeel, “Learning deep

control policies for autonomous aerial vehicles with MPC-guided

policy search,” CoRR, vol. abs/1509.06791, 2015. [Online]. Available:

http://arxiv.org/abs/1509.06791

[25]

S. Chen, K. Saulnier, N. Atanasov, D. D. Lee, V. Kumar, G. J.

Pappas, and M. Morari, “Approximating explicit model predictive control

using constrained neural networks,” in 2018 Annual American Control

Conference (ACC), June 2018, pp. 1520–1527.

[26]

P. L. Donti, M. Roderick, M. Fazlyab, and J. Z. Kolter, “Enforcing

robust control guarantees within neural network policies,” in The Ninth

International Conference on Learning Representations (ICLR), 2021.

[27]

L. Hewing, K. P. Wabersich, M. Menner, and M. N. Zeilinger,

“Learning-based model predictive control: Toward safe learning in

control,” Annual Review of Control, Robotics, and Autonomous

Systems, vol. 3, no. 1, p. null, 2020. [Online]. Available: https:

//doi.org/10.1146/annurev-control-090419- 075625

[28]

L. Hewing, J. Kabzan, and M. N. Zeilinger, “Cautious model predictive

control using gaussian process regression,” IEEE Transactions on Control

Systems Technology, vol. 28, no. 6, pp. 2736–2743, 2020.

[29]

I. Lenz, R. A. Knepper, and A. Saxena, “Deepmpc: Learning deep latent

features for model predictive control,” in Robotics: Science and Systems,

2015.

[30]

K. Bieker, S. Peitz, S. L. Brunton, J. N. Kutz, and M. Dellnitz,

“Deep model predictive control with online learning for complex

physical systems,” CoRR, vol. abs/1905.10094, 2019. [Online]. Available:

http://arxiv.org/abs/1905.10094

[31]

A. Broad, I. Abraham, T. D. Murphey, and B. D. Argall, “Structured

neural network dynamics for model-based control,” CoRR, vol.

abs/1808.01184, 2018. [Online]. Available: http://arxiv.org/abs/1808.

01184

[32]

Y. Chen, Y. Shi, and B. Zhang, “Optimal control via neural networks: A

convex approach,” 2018.

[33] Y. Li, J. Wu, J.-Y. Zhu, J. B. Tenenbaum, A. Torralba, and R. Tedrake,

“Propagation networks for model-based control under partial observation,”

in ICRA, 2019.

[34]

Y.-C. Chang, N. Roohi, and S. Gao, “Neural lyapunov control,” in

Advances in Neural Information Processing Systems 32, H. Wallach,

H. Larochelle, A. Beygelzimer, F. d

Alch

e-Buc, E. Fox, and R. Garnett,

Eds. Curran Associates, Inc., 2019, pp. 3245–3254. [Online]. Available:

http://papers.nips.cc/paper/8587-neural- lyapunov-control.pdf

[35]

J. A. E. Andersson, J. Gillis, G. Horn, J. B. Rawlings, and

M. Diehl, “Casadi: a software framework for nonlinear optimization

and optimal control,” Mathematical Programming Computation,

vol. 11, no. 1, pp. 1–36, Mar 2019. [Online]. Available: https:

//doi.org/10.1007/s12532-018-0139-4

[36]

B. Amos, L. Xu, and J. Z. Kolter, “Input convex neural

networks,” CoRR, vol. abs/1609.07152, 2016. [Online]. Available:

http://arxiv.org/abs/1609.07152

[37]

F. de Avila Belbute-Peres, K. Smith, K. Allen, J. Tenenbaum, and J. Z.

Kolter, “End-to-end differentiable physics for learning and control,” in

Advances in Neural Information Processing Systems 31, S. Bengio,

H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett,

Eds. Curran Associates, Inc., 2018, pp. 7178–7189.

[38]

J. Degrave, M. Hermans, J. Dambre, and F. Wyffels, “A differentiable

physics engine for deep learning in robotics,” CoRR, vol. abs/1611.01652,

2016. [Online]. Available: http://arxiv.org/abs/1611.01652

[39]

B. Amos, I. D. J. Rodriguez, J. Sacks, B. Boots, and J. Z. Kolter,

“Differentiable MPC for end-to-end planning and control,” CoRR, vol.

abs/1810.13400, 2018. [Online]. Available: http://arxiv.org/abs/1810.

13400

[40]

T. Yang, “Advancing non-convex and constrained learning: Challenges

and opportunities,” AI Matters, vol. 5, no. 3, p. 29–39, Dec. 2019.

[Online]. Available: https://doi.org/10.1145/3362077.3362085

[41]

D. Pathak, P. Kr

ahenb

uhl, and T. Darrell, “Constrained convolutional

neural networks for weakly supervised segmentation,” CoRR, vol.

abs/1506.03648, 2015. [Online]. Available: http://arxiv.org/abs/1506.

03648

[42]

Z. Jia, X. Huang, E. I. Chang, and Y. Xu, “Constrained deep weak

supervision for histopathology image segmentation,” IEEE Transactions

on Medical Imaging, vol. 36, no. 11, pp. 2376–2388, 2017.

[43]

C. K. Goh, Y. Liu, and A. W. K. Kong, “A constrained deep neural

network for ordinal regression,” in 2018 IEEE/CVF Conference on

Computer Vision and Pattern Recognition, 2018, pp. 831–839.

[44]

P. M

arquez-Neila, M. Salzmann, and P. Fua, “Imposing hard constraints

on deep networks: Promises and limitations,” CoRR, vol. abs/1706.02025,

2017. [Online]. Available: http://arxiv.org/abs/1706.02025

[45]

H. Kervadec, J. Dolz, J. Yuan, C. Desrosiers, E. Granger, and I. B.

Ayed, “Log-barrier constrained cnns,” CoRR, vol. abs/1904.04205, 2019.

[Online]. Available: http://arxiv.org/abs/1904.04205

[46]

Y. Liu, C. Su, H. Li, and R. Lu, “Barrier function-based adaptive

control for uncertain strict-feedback systems within predeﬁned neural

network approximation sets,” IEEE Transactions on Neural Networks

and Learning Systems, vol. 31, no. 8, pp. 2942–2954, 2020.

[47]

K. Zhao and J. Chen, “Adaptive neural quantized control of mimo non-

linear systems under actuation faults and time-varying output constraints,”

IEEE Transactions on Neural Networks and Learning Systems, vol. 31,

no. 9, pp. 3471–3481, 2020.

[48]

J. Hendriks, C. Jidling, A. Wills, and T. Sch

on, “Linearly constrained

neural networks,” Submitted to IEEE Transactions on Neural Networks

and Learning Systems, 2020.

[49]

S. Greydanus, M. Dzamba, and J. Yosinski, “Hamiltonian neural

networks,” CoRR, vol. abs/1906.01563, 2019. [Online]. Available:

http://arxiv.org/abs/1906.01563

[50]

M. Lutter, C. Ritter, and J. Peters, “Deep lagrangian networks: Using

physics as model prior for deep learning,” CoRR, vol. abs/1907.04490,

2019. [Online]. Available: http://arxiv.org/abs/1907.04490

[51]

D. Q. Mayne, J. B. Rawlings, C. V. Rao, and P. O. M. Scokaert,

“Constrained model predictive control: Stability and optimality,”

Automatica, vol. 36, no. 6, pp. 789 – 814, 2000. [Online]. Available:

http://www.sciencedirect.com/science/article/pii/S0005109899002149

[52]

G. Pannocchia, “Robust disturbance modeling for model predictive control

with application to multivariable ill-conditioned processes,” Journal of

Process Control, vol. 13, no. 8, pp. 693 – 701, 2003. [Online]. Available:

http://www.sciencedirect.com/science/article/pii/S0959152402001348

[53]

J. L

ofberg, “YALMIP : A Toolbox for Modeling and Optimization in

MATLAB,” in Proc. of the CACSD Conference, Taipei, Taiwan, 2004,

available from http://users.isy.liu.se/johanl/yalmip/.

[54]

F. Borrelli, A. Bemporad, and M. Morari, Predictive Control for Linear

and Hybrid Systems. Cambridge University Press, 2017. [Online].

Available: https://books.google.de/books?id=cdQoDwAAQBAJ

[55]

M. Herceg, M. Kvasnica, C. Jones, and M. Morari, “Multi-parametric

toolbox 3.0,” in 2013 European Control Conference, Zurich, Switzerland,

2013, pp. 502–510.

[56]

B. Tak

acs, J.

Stevek, R. Valo, and M. Kvasnica, “Python code

generation for explicit mpc in mpt,” in European Control Conference

2016, Aalborg, Denmark, 2016, pp. 1328–1333. [Online]. Available:

https://www.uiam.sk/assets/publication info.php?id pub=1737

[57]

E. Skomski, S. Vasisht, C. Wight, A. Tuor, J. Drgona, and D. Vrabie,

“Constrained block nonlinear neural dynamical models,” arXiv preprint

arXiv:2101.01864, 2021.

[58]

R. G. Krishnan, U. Shalit, and D. Sontag, “Structured inference networks

for nonlinear state space models,” in AAAI’17: Proceedings of the Thirty-

First AAAI Conference on Artiﬁcial Intelligence, 2017.

[59]

D. Hafner, T. P. Lillicrap, I. Fischer, R. Villegas, D. Ha, H. Lee,

and J. Davidson, “Learning latent dynamics for planning from

pixels,” CoRR, vol. abs/1811.04551, 2018. [Online]. Available:

http://arxiv.org/abs/1811.04551

[60]

O. P. Ogunmolu, X. Gu, S. B. Jiang, and N. R. Gans, “Nonlinear

systems identiﬁcation using deep dynamic neural networks,” CoRR,

vol. abs/1610.01439, 2016. [Online]. Available: http://arxiv.org/abs/1610.

01439

[61]

S. S. Rangapuram, M. W. Seeger, J. Gasthaus, L. Stella, Y. Wang, and

T. Januschowski, “Deep state space models for time series forecasting,”

in Advances in Neural Information Processing Systems 31, S. Bengio,

H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett,

Eds. Curran Associates, Inc., 2018, pp. 7785–7794.

[62]

Jeen-Shing Wang and Yi-Chung Chen, “A hammerstein-wiener recurrent

neural network with universal approximation capability,” in 2008 IEEE

International Conference on Systems, Man and Cybernetics, Oct 2008,

pp. 1832–1837.

[63]

D. Masti and A. Bemporad, “Learning nonlinear state-space models

using deep autoencoders,” in 2018 IEEE Conference on Decision and

Control (CDC), 2018, pp. 3862–3867.

[64]

A. Tuor, J. Drgona, and D. Vrabie, “Constrained neural ordinary

differential equations with stability guarantees,” 2020.

[65]

J. Schoukens and L. Ljung, “Nonlinear system identiﬁcation: A

user-oriented roadmap,” CoRR, vol. abs/1902.00683, 2019. [Online].

Available: http://arxiv.org/abs/1902.00683

[66]

P. Schmid, “Dynamic mode decomposition of numerical and experimental

data,” Journal of Fluid Mechanics, vol. 656, pp. 5–28, 2008.

[67]

A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan,

T. Killeen, Z. Lin, N. Gimelshein, L. Antiga et al., “Pytorch: An

imperative style, high-performance deep learning library,” in Advances

in Neural Information Processing Systems, 2019, pp. 8024–8035.

[68]

D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,”

arXiv preprint arXiv:1412.6980, 2014.

[69]

D. Hendrycks and K. Gimpel, “Bridging nonlinearities and stochastic

regularizers with gaussian error linear units,” CoRR, vol. abs/1606.08415,

2016. [Online]. Available: http://arxiv.org/abs/1606.08415

[70]

R. Oberdieck, N. Diangelakis, I. Nascu, M. Papathanasiou, M. Sun,

S. Avraamidou, and E. Pistikopoulos, “On multi-parametric program-

ming and its applications in process systems engineering,” Chemical

Engineering Research and Design, 2016.

[71]

S. L. Brunton, J. L. Proctor, and J. N. Kutz, “Discovering

governing equations from data by sparse identiﬁcation of nonlinear

dynamical systems,” Proceedings of the National Academy of

Sciences, vol. 113, no. 15, pp. 3932–3937, 2016. [Online]. Available:

https://www.pnas.org/content/113/15/3932

[72]

M. Raissi, P. Perdikaris, and G. E. Karniadakis, “Multistep neural

networks for data-driven discovery of nonlinear dynamical systems,”

arXiv preprint arXiv:1801.01236, 2018.

[73]

A. Sanchez-Gonzalez, J. Godwin, T. Pfaff, R. Ying, J. Leskovec, and P. W.

Battaglia, “Learning to simulate complex physics with graph networks,”

in ICML, 2020.

ResearchGate has not been able to resolve any citations for this publication.

Neural network based explicit MPC for chemical reactor control

Article

Full-text available

Oct 2019

In this paper, implementation of deep neural networks applied in process control is presented. In our approach, training of the neural network is based on model predictive control, which is popular for its ability to be tuned by the weighting matrices and for it respecting the system constraints. A neural network that can approximate the MPC behavior by mimicking the control input trajectory while the constraints on states and control input remain unimpaired by the weighting matrices is introduced. This approach is demonstrated in a simulation case study involving a continuous stirred tank reactor where a multi-component chemical reaction takes place.

Nonlinear System Identification: A User-Oriented Road Map

Article

Dec 2019

Nonlinear system identification is an extremely broad topic, since every system that is not linear is nonlinear. That makes it impossible to give a full overview of all aspects of the fi eld. For this reason, the selection of topics and the organization of the discussion are strongly colored by the personal journey of the authors in this nonlinear universe.

Large scale model predictive control with neural networks and primal active sets

Article

Jan 2022
AUTOMATICA

This work presents an explicit–implicit procedure to compute a model predictive control (MPC) law with guarantees on recursive feasibility and asymptotic stability. The approach combines an offline-trained fully-connected neural network with an online primal active set solver. The neural network provides a control input initialization while the primal active set method ensures recursive feasibility and asymptotic stability. The neural network is trained with a primal–dual loss function, aiming to generate control sequences that are primal feasible and meet a desired level of suboptimality. Since the neural network alone does not guarantee constraint satisfaction, its output is used to warm start the primal active set method online. We demonstrate that this approach scales to large problems with thousands of optimization variables, which are challenging for current approaches. Our method achieves a 2× reduction in online inference time compared to the best method in a benchmark suite of different solver and initialization strategies.

Approximate moving horizon estimation and robust nonlinear model predictive control via deep learning

Article

Mar 2021
COMPUT CHEM ENG

Optimization-based methods for output-feedback control enable dealing with multiple-input and multiple-output nonlinear systems in the presence of uncertainties and constraints. The combination of moving horizon estimation (MHE) and nonlinear model predictive control (NMPC) can be especially powerful because of its general formulation but its implementation requires solving two optimization problems at every sampling instant, which can be challenging due to hardware or time constraints. We propose to take advantage of the expressive capabilities of deep neural networks to approximate the solution of the MHE and NMPC problems. By substituting the MHE and NMPC with their learning-based counterparts, the required online computations are significantly reduced. We also propose to use sensitivity analysis to compute an approximate upper-bound of the maximum one-step divergence from the optimal performance caused by the approximation error. The efficacy of the proposed learning-based approach is illustrated with simulation results of a semi-batch reactor for industrial polymerization.

Near-Optimal Rapid MPC Using Neural Networks: A Primal-Dual Policy Learning Framework

Article

Oct 2020

In this article, we propose a novel framework for approximating the MPC policy for linear parameter-varying systems using supervised learning. Our learning scheme guarantees feasibility and near-optimality of the approximated MPC policy with high probability. Furthermore, in contrast to most existing approaches that only learn the MPC policy, we also learn the "dual policy," which enables us to keep a check on the approximated MPC's optimality online during the control process. If the check deems the control input from the approximated MPC policy safe and near-optimal, then it is applied to the plant; otherwise, a backup controller is invoked, thus filtering out (severely) suboptimal control inputs. The backup controller is only invoked with a bounded (low) probability, where the exact probability level can be chosen by the user. Since our framework does not require solving any optimization problem during the control process, it enables the deployment of MPC on resource-constrained systems. Specifically, we illustrate the utility of the proposed framework on a vehicle dynamics control problem. Compared with online optimization methods, we demonstrate a speedup of up to 62x on a desktop computer and 10x on an automotive-grade electronic control unit, while maintaining high control performance.

On the Use of Difference of Log-Sum-Exp Neural Networks to Solve Data-Driven Model Predictive Control Tracking Problems

Article

Oct 2021

We employ Difference of Log-Sum-Exp neural networks to generate a data-driven feedback controller based on Model Predictive Control (MPC) to track a given reference trajectory. By using this class of networks to approximate the MPC-related cost function subject to the given system dynamics and input constraint, we avoid two of the main bottlenecks of classical MPC: the availability of an accurate model for the system being controlled, and the computational cost of solving the MPC-induced optimization problem. The former is tackled by exploiting the universal approximation capabilities of this class of networks. The latter is alleviated by making use of the difference-of-convex-functions structure of these networks. Furthermore, we show that the system driven by the MPC-neural structure is practically stable.

Iterative Learning Model Predictive Control Based on Iterative Data-Driven Modeling

Article

Aug 2020

Iterative learning model predictive control (ILMPC) has been recognized as an effective approach to realize high-precision tracking for batch processes with repetitive nature because of its excellent learning ability and closed-loop stability property. However, as a model-based strategy, ILMPC suffers from the unavailability of accurate first principal model in many complex nonlinear batch systems. On account of the abundant process data, nonlinear dynamics of batch systems can be identified precisely along the trials by neural network (NN), making it enforceable to design a data-driven ILMPC. In this article, by using a control-affine feedforward neural network (CAFNN), the features in the process data of the former batch are extracted to form a nonlinear affine model for the controller design in the current batch. Based on the CAFNN model, the ILMPC is formulated in a tube framework to attenuate the influence of modeling errors and track the reference trajectory with sustained accuracy. Due to the control-affine structure, the gradients of the objective function can be analytically computed offline, so as to improve the online computational efficiency and optimization feasibility of the tube ILMPC. The robust stability and the convergence of the data-driven ILMPC system are analyzed theoretically. The simulation on a typical batch reactor verifies the effectiveness of the proposed control method.

End-to-end differentiable physics for learning and control

Article

Dec 2018

2018 Curran Associates Inc.All rights reserved. We present a differentiable physics engine that can be integrated as a module in deep neural networks for end-to-end learning. As a result, structured physics knowledge can be embedded into larger systems, allowing them, for example, to match observations by performing precise simulations, while achieves high sample efficiency. Specifically, in this paper we demonstrate how to perform backpropagation analytically through a physical simulator defined via a linear complementarity problem. Unlike traditional finite difference methods, such gradients can be computed analytically, which allows for greater flexibility of the engine. Through experiments in diverse domains, we highlight the system's ability to learn physical parameters from data, efficiently match and simulate observed visual behavior, and readily enable control via gradient-based planning methods. Code for the engine and experiments is included with the paper.

Deep Learning-Based Model Predictive Control for Resonant Power Converters

Article

Jan 2020

Resonant power converters offer improved levels of efficiency and power density. In order to implement such systems, advanced control techniques are required to take the most of the power converter. In this context, model predictive control arises as a powerful tool that is able to consider nonlinearities and constraints, but it requires the solution of complex optimization problems or strong simplifying assumptions that hinder its application in real situations. Motivated by recent theoretical advances in the field of deep learning, this paper proposes to learn, offline, the optimal control policy defined by a complex model predictive formulation using using deep neural networks so that the online use of the learned controller requires only the evaluation of a neural network. The obtained learned controller can be executed very rapidly on embedded hardware. We show the potential of the presented approach on a Hardware-in-the-Loop setup of an FPGA-controlled resonant power converter.

Advancing non-convex and constrained learning: challenges and opportunities

Article

Dec 2019

Tianbao Yang

As data gets more complex and applications of machine learning (ML) algorithms for decision-making broaden and diversify, traditional ML methods by minimizing an unconstrained or simply constrained convex objective are becoming increasingly unsatisfactory. To address this new challenge, recent ML research has sparked a paradigm shift in learning predictive models into non-convex learning and heavily constrained learning. Non-Convex Learning (NCL) refers to a family of learning methods that involve optimizing non-convex objectives. Heavily Constrained Learning (HCL) refers to a family of learning methods that involve constraints that are much more complicated than a simple norm constraint (e.g., data-dependent functional constraints, non-convex constraints), as in conventional learning. This paradigm shift has already created many promising outcomes: (i) non-convex deep learning has brought breakthroughs for learning representations from large-scale structured data (e.g., images, speech) (LeCun, Bengio, & Hinton, 2015; Krizhevsky, Sutskever, & Hinton, 2012; Amodei et al., 2016; Deng & Liu, 2018); (ii) non-convex regularizers (e.g., for enforcing sparsity or low-rank) could be more effective than their convex counterparts for learning high-dimensional structured models (C.-H. Zhang & Zhang, 2012; J. Fan & Li, 2001; C.-H. Zhang, 2010; T. Zhang, 2010); (iii) constrained learning is being used to learn predictive models that satisfy various constraints to respect social norms (e.g., fairness) (B. E. Woodworth, Gunasekar, Ohannessian, & Srebro, 2017; Hardt, Price, Srebro, et al., 2016; Zafar, Valera, Gomez Rodriguez, & Gummadi, 2017; A. Agarwal, Beygelzimer, Dudík, Langford, & Wallach, 2018), to improve the interpretability (Gupta et al., 2016; Canini, Cotter, Gupta, Fard, & Pfeifer, 2016; You, Ding, Canini, Pfeifer, & Gupta, 2017), to enhance the robustness (Globerson & Roweis, 2006a; Sra, Nowozin, & Wright, 2011; T. Yang, Mahdavi, Jin, Zhang, & Zhou, 2012), etc. In spite of great promises brought by these new learning paradigms, they also bring emerging challenges to the design of computationally efficient algorithms for big data and the analysis of their statistical properties.

Differentiable Predictive Control: Constrained Deep Learning Alternative to Explicit Model Predictive Control for Unknown Nonlinear Systems

Abstract and Figures

Recommended publications

Differentiable predictive control: Deep learning alternative to explicit model predictive control fo...

Learning Constrained Adaptive Differentiable Predictive Control Policies With Guarantees

Learning Constrained Parametric Differentiable Predictive Control Policies With Guarantees

Neural Lyapunov Differentiable Predictive Control