PreprintPDF Available

Differentiable Predictive Control: Constrained Deep Learning Alternative to Explicit Model Predictive Control for Unknown Nonlinear Systems

Authors:
Preprints and early-stage research may not have been peer reviewed yet.

Abstract and Figures

We present differentiable predictive control as a deep learning alternative to the explicit model predictive control for unknown nonlinear systems. The structure of the proposed neural architecture is inspired by the structure of a model predictive control problem, by i) using a prediction model capturing controlled system dynamics, ii) receding horizon optimal control action predictions, and iii) enforcing inequality constraints via penalty methods. In the presented framework, a neural state-space model is learned from time-series measurements of the unknown system. The control policy is then optimized via gradient descent by differentiating the closed-loop system model fully parametrized by neural networks. The proposed architecture allows to train explicit control policy tracking the distribution of reference signals and handle time-varying constraints imposed on states and control actions. We experimentally demonstrate that it is possible to train constrained explicit control policies purely based on the observations of the dynamics of the unknown nonlinear system. The proposed method is applied to a laboratory device in embedded implementation using a Raspberry-Pi platform. We compare reference tracking and constraints satisfaction against explicit model predictive control and report pivotal efficiency gains in online computational demands, memory requirements, policy complexity, and construction time. We show that the differentiable predictive control method scales linearly compared to exponential scalability of the explicit predictive control solved via multiparametric programming, hence, opening doors for applications in nonlinear systems with a large number of variables, longer prediction horizons, and faster sampling rates that are beyond the reach of classical explicit predictive control.
Content may be subject to copyright.
1
Differentiable Predictive Control:
Constrained Deep Learning Alternative to
Explicit Model Predictive Control for
Unknown Nonlinear Systems
J´
an Drgoˇ
na, Karol Kiˇ
s, Aaron Tuor, Draguna Vrabie, Martin Klau ˇ
co
AbstractWe present differentiable predictive control
as a deep learning alternative to the explicit model predic-
tive control for unknown nonlinear systems. The structure
of the proposed neural architecture is inspired by the struc-
ture of a model predictive control problem, by i) using a
prediction model capturing controlled system dynamics, ii)
receding horizon optimal control action predictions, and
iii) enforcing inequality constraints via penalty methods.
In the presented framework, a neural state-space model
is learned from time-series measurements of the unknown
system. The control policy is then optimized via gradient
descent by differentiating the closed-loop system model
fully parametrized by neural networks. The proposed archi-
tecture allows to train explicit control policy tracking the
distribution of reference signals and handle time-varying
constraints imposed on states and control actions. We
experimentally demonstrate that it is possible to train con-
strained explicit control policies purely based on the obser-
vations of the dynamics of the unknown nonlinear system.
The proposed method is applied to a laboratory device in
embedded implementation using a Raspberry-Pi platform.
We compare reference tracking and constraints satisfac-
tion against explicit model predictive control and report
pivotal efficiency gains in online computational demands,
memory requirements, policy complexity, and construction
time. We show that the differentiable predictive control
method scales linearly compared to exponential scalability
of the explicit predictive control solved via multiparametric
programming, hence, opening doors for applications in
nonlinear systems with a large number of variables, longer
prediction horizons, and faster sampling rates that are
beyond the reach of classical explicit predictive control.
Index Termsconstrained deep learning, differentiable
predictive control, explicit model predictive control, neural
state space model, nonlinear system identification
This work was funded by the Mathematics for Artificial Reasoning in
Science (MARS) investment at the Pacific Northwest National Laboratory
(PNNL). K. Ki
ˇ
s and M. Klau
ˇ
co gratefully acknowledge the contribution
of the Scientific Grant Agency of the Slovak Republic under the grants
1/0585/19 and 1/0545/20.
J. Drgo
ˇ
na, is with Pacific Northwest National Laboratory, Richland,
Washington, USA (e-mail: jan.drgona@pnnl.gov).
K. Ki
ˇ
s is with Slovak University of Technology, Bratislava, Slovakia
(e-mail: karol.kis@stuba.sk).
A. Tuor is with Pacific Northwest National Laboratory, Richland,
Washington, USA (e-mail: aaron.tuor@pnnl.gov).
D. Vrabie is with Pacific Northwest National Laboratory, Richland,
Washington, USA (e-mail: draguna.vrabie@pnnl.gov).
M. Klau
ˇ
co is with Slovak University of Technology, Bratislava, Slovakia
(e-mail: martin.klauco@stuba.sk).
I. INTRODUCTION
Incorporation of machine learning methods in control ap-
plications is becoming one of the leading research avenues in
the field of control theory. The design of many novel control
methods based on machine learning (ML) approaches is heavily
inspired by the benefits of model predictive control (MPC),
such as constraints handling and robustness. The substitution
of MPC with ML-based controllers was studied by several
well-established researchers in the control domain [1]–[5].
Furthermore, the application of neural networks as substitutes of
MPC behavior has been considered in practical applications as
well [6]–[9]. All aforementioned works fall into the category
of so-called approximate MPC based on imitation learning
of original MPC. As such, these works have one significant
disadvantage, which is the reliance on data sets collected from
closed-loop experiments, i.e., the ML-based controllers are
trained based on experiments involving fully implemented
MPC.
On the other hand, the main advantage from considering
these machine learning controllers is their explicit form
in which they are implemented. In fact, the reduction of
computational and memory burden by considering controllers
based on neural networks boils down to an evaluation of an
explicit function that amounts to a couple of kilobytes in
source code [10]. In the case study presented in [10], an
online MPC with hybrid dynamics was replaced by the explicit
ML-based controller. Naturally, assembling hybrid explicit
MPC is nearly impossible even for simple systems. Compared
to traditional explicit MPC [11], [12], this is a significant
advantage since even simple explicit MPC strategies cover
several hundreds of megabytes [13]. Even if we still manage to
produce reasonable scaled explicit MPC by several complexity
reduction techniques [14], [15], the need for the MPC design
alongside linear system identification is unavoidable.
Linear system identification [16] combined with linear
MPC, is a standard practice in various industrial applications.
However, despite its wide-ranging uses, its deployment in
connection with MPC is not practical for many settings. In
particular, low-resource embedded systems controlling highly
nonlinear dynamics present serious obstacles to an MPC
controller implementation. Despite its compelling theory [17],
2
the system dynamics’ linearization may not be a sufficient
approximation for most highly nonlinear systems. In other
words, implicit MPC may be too slow, and explicit MPC too
memory intensive for an embedded system which must operate
at high frequency due to highly nonlinear dynamics [18].
Due to these promises and difficulties, research in the area
of connecting neural networks and MPC is gaining interest in
the optimization, controls, and machine learning communities.
Neural networks trained to mimic MPC policies can offer fast
approximations for low-resource settings, but often at the cost
of constraints satisfaction and performance guarantees over
longer time horizons. Alternative approaches which integrate
constraints satisfaction into neural network predictive control
policies, are associated with other disadvantages such as costly
optimization routines and the necessity for a high volume of
data that may be unrealistic to gather in an operational setting.
Addressing these challenges, we present a novel control
method called differentiable predictive control (DPC) for
learning both nonlinear system dynamics and control policies
end-to-end, without supervision from the expert controller, with
constraint satisfaction capabilities, and in a sample efficient
way based only on the observed time-series data of the
system dynamics. This in a stark contrast with the recent
implementation in [1], where a trained ML-based controller
runs alongside an online MPC strategy, making the algorithm
unusable in many applications running in low computational
resource settings. The presented DPC method is based on the
neural parametrization of the closed-loop dynamical system
with two building blocks: i) neural system model, and ii) a
deep learning formulation of model predictive control policy.
In particular, we combine system identification based on
structured neural state-space model and policy optimization
with embedded inequality constraints via backpropagation of
the control loss through the closed-loop system dynamics model.
The conceptual methodology of DPC is illustrated in Fig. 1.
The presented DPC method supports nonlinear systems, with
both input and state constraints, arbitrary reference signals and
represents a methodological extension of prior work on linear
systems introduced in [19]. A method similar in spirit has been
recently proposed by [20], where the authors use Log-Sum-Exp
neural networks to approximate the MPC-related cost function
given the measurements of the system dynamics. However, in
contrast to the proposed DPC method, the implementation of the
method in [20] still relies on the online nonlinear optimization,
does not handle state constraints, and training on arbitrary
reference signals poses computational challenges.
We demonstrate the capabilities of the proposed DPC method
on experimental results using a laboratory device called
FlexyAir with nonlinear dynamics and noisy measurements.
We demonstrate several key features of DPC resulting in data-
efficiency, scalability, and systematic constraint handling:
1)
Solution of constrained optimal control problems for
unknown nonlinear systems, given only a time-series
of measurements of the observed system dynamics and
sampled boundary conditions.
2)
Constrained neural control policies and state space dynam-
ics models which promote physically plausible outcomes.
3)
In contrast with explicit MPC, our method supports
dynamical constraints and trajectory preview capabilities.
4)
Linear scalability in terms of the number of decision
variables and length of the prediction horizon compared
to exponential scalability of explicit MPC solutions based
on multi-parametric programming.
5)
Our approach requires less online computation time and
memory than explicit MPC solutions.
A. Related Work
a) Approximate MPC: Recently, several works have been
devoted to implementing MPC and its variations based on
machine learning. One set of these works includes mimicking
the MPC behavior by replacing the control law with a neural
network [7], [21], [22]. These approaches are known in the
machine learning community as imitation learning [23], or
MPC-guided policy search [24]. The underlying principle is
to train the neural networks in supervised learning fashion
on a labeled data set of the system dynamics controlled by
MPC. Several procedures were designed to approximate MPC’s
behavior with neural networks, which significantly reduces
the implementation requirements of the MPC with minimal
impact on the control performance [7], [22]. A significant
disadvantage of these imitation learning approaches lies in the
lack of guarantees on constraints satisfaction and performance.
Constraint satisfaction using neural network-based approxi-
mations of MPC control laws is a new and active research area.
To tackle this issue, some authors proposed including additional
layers in the policy network projecting the control inputs onto
the constrained region of the state and action spaces [25], [26].
Authors in [1] use an additional dual policy neural network
to estimate the sub-optimality of the learned control law with
probabilistic guarantees. Others employ learning bounds for
empirically validating constraints handling capabilities after the
network is trained [5]. However, all previous works rely upon
several factors, which are prerequisites for their successful
implementation. For example, they need to construct the MPC
strategy first, which requires prior knowledge of the system
dynamics model and relatively costly optimization required
for solving the associated MPC, constraints satisfaction, or
projection problem. This means that prior knowledge about
underlying system dynamics is necessary, which is often expert-
dependent and time-consuming. For more comprehensive
overview of the approximate MPC, or learning-based MPC
(LBMPC) methods we refer the reader to [27], [28].
b) Neural Models in MPC: To tackle the modeling prob-
lem, some researchers focus on training neural networks as
prediction models for MPC [29]. Authors in [30] use low-rank
features of the high-dimensional system to train a recurrent
neural network (RNN) to predict the control relevant quantities
for MPC. Other authors have used structured neural network
models inspired by classical linear time-varying state-space
models [31], whereas some have proposed using convex neural
architectures [32], graph neural networks [33], or stable neural
networks based on Lyapunov functions [34].
c) Automatic Differentiation in Control: The use of automatic
differentiation (AD) for control and optimization is a well-
established method. For instance, CasADi toolbox [35]
3
Fig. 1: Conceptual methodology of the proposed constrained nonlinear differentiable predictive control (DPC).
uses known system dynamics and constraints for constructing a
computational graph and computes the gradients for nonlinear
optimization solvers. From the perspective of machine learning,
the authors in [32] investigated the idea of solving the optimal
control problem by backpropagation through the learned system
model parametrized via convex neural networks [36]. Others
developed, domain-specific differentiable physical models for
the robotics domain [37], [38]. Learning linear Model Predictive
Control (MPC) policies by differentiating the KKT conditions
of the convex approximation at a fixed point was introduced
in [39]. However, a generic method for explicit solution of
constrained nonlinear optimal control problems with unknown
dynamics using AD is still lacking.
d) Constrained and structured deep learning: Incorporating
constraints into deep learning represents multiple challenges
such as non-convexity, convergence, and stability of the learning
process [40]. Penalty methods and loss function regularizations
are the most straightforward way of imposing constraints on the
deep neural network outputs and parameters [41]–[43]. In those
methods, the loss function is augmented with additional terms
penalizing the violations of soft constraints via slack variables,
which typically works well in practice, often outperforming
hard constraint methods [44], [45]. Authors in [46], [47] use
barrier methods combined with Lyapunov functions to enforce
output constraints, stability, and boundedness of the neural
network controller. An alternative to penalty and barrier meth-
ods are neural network architectures imposing hard constraints,
such as linear operator constraints [48], or architectures with
Hamiltonian [49] and Lagrangian [50] structural priors for
enforcing energy conservation laws.
II. PRELIMINARIES
A. System Dynamics
We assume an unknown partially observable nonlinear
dynamical system in discrete time:
xk+1 =f(xk,uk)(1a)
yk=g(xk)(1b)
where
xkRnx
is the unknown system state,
ykRny
is the
observed output, and
ukRnu
is the control input at time
k
. We assume we have access to the data generated by the
system in the form of input-output tuples:
Ξ = {(yi
k,ui
k),(yi
k+1,ui
k+1),· · · ,(yi
k+N,ui
k+N)}, i Nn
1
(2)
where
n
is the number of sampled trajectories with
N
time
steps.
B. Model Predictive Control
In this work, we consider a well-known model predictive
controller (MPC) [51]. We follow standard formulation of the
linear MPC as the quadratic optimization problem, specifically
formed as
min
u0,...,uN1
N1
X
k=0 ||ykr||2
Qr+||ukuk1||2
Qdu(3a)
s.t. xk+1 =Axk+Buk,(3b)
yk=Cxk+Duk,(3c)
ukukuk,(3d)
xkxkxk,(3e)
x0=x(t),(3f)
u1=u(tTs).(3g)
Here, the objective function is defined as the finite sum of two
terms over a prediction horizon
N
. Both terms are considered
as a weighted second norm, i.e.
||a||2
Q=a|Qa
. Note, that to
enforce problem
(3)
feasibility, the weighting factors
Qdu
and
Qr
must be chosen as positive definite and positive semi-definite
matrices, respectively. The constraints
(3b)
-
(3e)
are enforced for
k0,1, . . . , N 1
. Moreover, to enforce reference tracking,
the minimization objective, Equation 3a, includes a term for
reference tracking error,
||ykrk||2
Qr
, as well as a control rate
penalisation term, ||ukuk1||2
Qdu, which is a standard way
to enforce offset free tracking [52]. The optimization problem
is initialised with current state measurement
x(t)
, as in
(3f)
, by
the value of the control action from previous sampling instant
u(tTs)
, as described in
(3g)
, and by the reference value
r
in
(3a)
. Note, that the notation
u1
is valid for
k= 0
, which
is necessary to compute the value of the objective function in
the initial prediction step.
The control strategy is implemented in the receding horizon
fashion, where we solve
(3)
towards global optimality
yielding an optimal sequence of control actions
U?=
[u?|
0,...,u?|
N1]|
, while only the first action is applied to the
4
Fig. 2: Realisation of the closed-loop control system with an
unknown dynamical system.
system. Representation of such an implementation is visualised
on the Fig. 2.
We formulate the quadratic optimization problem
(3)
in
MATLA B with the YALMIP toolbox [53]. The problem is then
solved numerically with the GUROBI solver. Due to the fast
dynamics of the controlled system, and that the numerical
solution to the MPC problem takes more time compared to
the sampling rate
Ts
, we consider a parametric solution to the
optimization problem
(3)
. The parametric solution allows us
to evaluate the control law to obtain the optimal control action
within allotted time.
C. Explicit Model Predictive Control
Parametric optimization theory allows us to create an
explicit map between initial conditions of the optimal control
problem
(3)
and the optimal solution
U?
. Specifically, the
vector of parameters, thus the initial conditions, is defined as
ˆ
ξ=
x(t)
r
u(tTs)
.(4)
After applying elementary matrix operations, presented in [54],
the OCP from (3) can be reformulated to
min
UU|HU +ˆ
ξ|FU (5a)
s.t. GUw+Sˆ
ξ, (5b)
which constitutes as a parametric quadratic optimization
problem (PQP). The solution to this problem can be obtain via
standard procedures of parametric programming. Namely, the
results is represented by a piece-wise affine (PWA) function,
given as
U?(θ) =
α1ˆ
ξ+β1if ˆ
ξ∈ R1
.
.
.
αnRˆ
ξ+βnRif ˆ
ξ∈ RnR
.(6)
Here, the variable
ˆ
ξ
stands for the vector of parameters,
nR
denotes the total number of regions, while the vectors
αi
and
βi
define the specific control law with respect to a region
Ri
.
The regions are defined as polyhedral sets, namely
Ri={ˆ
ξ|Γiˆ
ξγi}i= 1, . . . , nR,(7)
Here, the matrices
Γi
and
γi
denote the half-space repre-
sentation of regions. Since, the optimization problem
(3)
is
a quadratic problem with linear constraints, all regions are
defined by linear inequalities.
Note that the procedure of obtaining a numerical represen-
tation of matrices
Γi
and
γi
is done by means of Multi-
Parametric Toolbox [55] in Matlab. To use the explicit MPC in
connection with the laboratory device, we export the control
law into Python source code. The export to Python source
code contains two parts. The first part is the coefficients
from
(III-C.0.a)
and
(7)
, and the second part is the algorithm
that evaluates the control law. The algorithm is based on a
well-known sequential search method from [56].
III. METHOD
This section presents Differentiable Predictive Control (DPC),
a constrained neural network-based method for learning nonlin-
ear state-space models and optimal control policies for unknown
dynamical systems represented by time-series data. Our system
identification method is based on a recently proposed block-
structured neural state-space model architecture [57], allowing
us to impose constraints on the model structure or variables
to enforce physically realistic predictions. These nonlinear
systems models are then combined with a neural control policy
forming a fully parametrized differentiable closed-loop system
dynamics model. This generic architecture allows us to learn
a wide range of constrained control policies using end-to-end
auto-differentiation of the MPC-like loss functions and gradient
descent optimization. Similar to explicit MPC, DPC optimizes
control policies offline using
N
-step ahead predictions of the
closed-loop system dynamics model generated as a response
to the distribution of synthetically generated control features
ξ
.
Then after the training, analogous to MPC, DPC is deployed
in the receding horizon control (RHC) fashion, thus at each
time step, a sequence of
N
optimal control actions is predicted
by the policy, but only the first is applied to the system.
In the proposed methodology, both equality and inequality
constraints play a crucial role. The former can be modeled
by the feed forward nature of the neural component blocks,
as shown in the sections on the dynamics model and closed-
loop system architectures. To tackle the latter in the context
of neural networks, we leverage a well-known penalty method
for constrained optimization.
A. Penalty Constraints
To impose inequality constraints on variable
y
one can use
a penalty function
p(y)
to evaluate the constraints violations
via slack variable
s
. In principle, the penalty method can be
used to model arbitrary non-linear inequality constraints. In
this work, we employ penalty functions for time-varying lower
and upper bounds yk,yk, respectively, given as follows:
p(yk,yk) = max(0,yk+yk)(8a)
p(yk,yk) = max(0,ykyk)(8b)
Remark.
Penalty functions in the form
(8)
can be straightfor-
wardly implemented in modern deep-learning libraries such
as Pytorch or Tensorflow using standard
ReLU
activation
functions.
5
B. Constrained Neural Dynamics Models
We aim to learn a constrained neural representation of the
unknown system dynamics, given the input-output time-series
dataset (2) obtained from system observation.
a) Model architecture: We present a generic block neural
state-space model (BN-SSM) to represent and learn partially
observable unknown nonlinear system dynamics
(1)
, given the
labeled dataset
(2)
. The BN-SSM architecture is shown in
Fig. 3 with corresponding equations given as follows:
xk+1 =fx(xk) + fu(uk)(9a)
yk+1 =fy(xk+1)(9b)
x0=fo(y1N,...,y0)(9c)
kNN
0(9d)
Where
k
defines the discrete time step, and
N
defines the
prediction horizon, i.e. number of rollout steps of the recurrent
model. Individual block components
fx
, and
fu
,
fy
, and
fo
are represented by neural networks. Where
fx
, and
fu
define the hidden state and input dynamics, replacing the
A
and
B
matrices in the classical linear state-space model,
respectively. The block
fy
defines the output mapping from
hidden states
xk
to observables
yk
, replacing the
C
matrix in
the linear model. The observer block
fo
maps the past output
trajectories
Yp={y1N,...,y0}
onto initial states
x0
, which
is necessary for handling partially observable systems. Now
let us compactly represent the
N
-step ahead rollout of the
model
(9)
as
{Y,X}=fN
θ(Yp,U)
with lumped parameters
θ.
Fig. 3: System identification with block-structured neural state
space model (BN-SSM). Here
y
in red and blue color represent
observed and predicted system outputs, respectively,
x
are
hidden states, and uare observed control action trajectories.
Remark.
The proposed BN-SSM architecture
(9)
represents a
generalization of a family of neural state-space models [58]–
[65]. Depending on the choice of neural blocks
fx
,
fu
,
fy
,
and
fy
one can represent fully nonlinear, Hammerstein-Weiner,
Weiner, Hammerstein, or simply linear dynamics models with
or without internal feedback. Additionally, a block architecture
allows us to impose local regularizations on the block structure
or constraints on internal block-generated variables.
b) System identification loss: We train the neural state-
space dynamics
(9)
on sampled input-output trajectories
(2)
of the observed system dynamics. The multi-term system
identification loss is given as follows:
LMSE(Ytrue ,Y,X,Y,Y,U|θ) =
1
nN
n
X
i=1
N
X
k=1
||ytrue,i
kyi
k||2
2+Qdx||xi
kxi
k1)||2
2+
Qy||p(yi
k,yi
k)||2
2+Qy||p(yi
k,yi
k)||2
2+
Qu||p(fu(ui
k),fu)||2
2+Qu||p(fu(ui
k),fu)||2
2
(10)
Here
k
represents time step of the prediction horizon
N
,
and
i
is the batch index of
n
sampled trajectories. The first
term represents the trajectory tracking loss defined as the two
norm over a vector of residuals between the true
Ytrue =
{ytrue,i
1,...,ytrue,i
N}
and predicted
Y={yi
1,...,yi
N}
output
trajectories over
N
steps. The second term is a regularization
for smoothing the trajectories by penalizing the one-time step
difference between the successive hidden states
x
. The third and
fourth terms impose box constraints on the output phase space.
Output constraints in the system identification loss can help
learn models with physically meaningful trajectories outside
the training set’s distribution. Furthermore we can leverage the
structure of the proposed block neural state space model
(9)
and impose similar constraints on the influence of the control
input dynamics components fu(uk).
Remark.
Input dynamics constraints can be leveraged during
the system identification phase in case of prior knowledge
or assumptions about maximal
fu
and minimal
fu
temporal
contribution of the control actions to the one time step state
dynamics differences
xk=xkxk1
. In practice, these
bounds can be estimated from data by means of residuals be-
tween perturbed and non-perturbed system dynamic responses.
In our experimental case study presented in section IV, we
have used fu= 0.5,fu=0.5.
C. Constrained Differentiable Predictive Control
The objective is to learn the constrained differentiable pre-
dictive control (DPC) policy to govern the unknown dynamical
system (1), given the learned neural state space model (9).
a) Neural control policy: The input to the neural con-
trol policy is a vector of selected control parameters
ξ=
Y|
pR|Y|Y|
. The neural control policy map is given
as follows:
U=πΘ(ξ)(11)
Where
U={u1,...,uN}
is an optimal control trajectory,
Yp={y1N,...,y0}
represents observed output trajectories
N
-steps into the past,
R={r1,...,rN}
, is a tensor of
reference trajectories, while
Y={y1,...,yN}
, and
Y=
{y1,...,yN}
are tensors of imposed lower and upper bounds
for future output trajectories, respectively. In this paper, we
6
assume
πΘ(ξ) : RmRn
to be a fully connected neural
network architecture with lNL
1layers given as:
πΘ(ξ) = WLhL+bL(12a)
hl=σ(Wl1hl1+bl1)(12b)
h0=ξ(12c)
parametrized by
Θ={Wl,bl,|∀lNL
1}
with weights
Wl
and biases
bl
, and nonlinear activation function
σ:Rnh
Rnh
. In DPC, the neural control policy
(11)
replaces the PWA
control law of explicit MPC described in Section II-C, thus
ξ
in DPC the neural policy represents an expanded parametric
space compared to the lower-dimensional parametric space of
explicit MPC given as eq. (4).
b) Differentiable closed-loop system architecture: To train
the constrained control policy
(11)
, we design the neural
representation of the closed-loop dynamics using the learned
neural state space model fN
θ(9):
U=πΘ(ξ)(13a)
Y=fN
θ(Yp,U)(13b)
The system dynamics model
fN
θ(9)
is used to predict
N
-steps
Fig. 4: Differentiable predictive control (DPC) architecture.
Here
y
represents controlled outputs of the system,
y
and
y
represent lower and upper output constraints,
r
are sampled
reference trajectories, and
u
are control actions generated by
the neural policy optimized with MPC-inspired loss function.
ahead future output trajectories
Y
, given the control actions
trajectories Ugenerated by the neural control policy (11).
The corresponding network architecture is shown in Fig. 4.
Here the closed-loop model is constructed by connecting the
learned system dynamics model
(9)
with the control policy
(11)
,
through the control actions trajectories
U
. Hence the proposed
control policy represents a predictive control strategy with
a preview of future constraints and reference signals. The
policy is optimized by differentiating the closed-loop neural
dynamics model on sampled past output trajectories
Yp
, and
given the forecast of lower and upper constraint trajectories
Y
, and
Y
. The parameters of pre-trained neural state-space
model
(9)
representing the open-loop system dynamics are fixed
during the policy optimization. The distribution of past output
trajectories
Yp
represents a sampling of initial conditions,
while the distribution of time-varying constraints and reference
signals represent a sampling of different operational scenarios
and tasks.
Remark.
The proposed control policy optimization procedure
does not require an interaction with the real system or its
emulator model. Instead, the policy is trained by sampling
the closed-loop system using the trained dynamics model.
Moreover, the training is extremely data-efficient as all sampled
trajectories can be generated synthetically.
c) MPC-inspired loss function: The closed-loop system
parametrization
(13)
now allows us to simulate the effect of
varying control features
ξ
on the system’s output dynamics
Y
.
This simulation capability, together with differentiability of the
closed-loop model
(13)
is a key feature of the proposed control
method which allows us to perform data-driven optimization
of neural control policy
(11)
. We train the policy parameters
by sampling the distribution of the control features
ξ
, and
backpropagating the gradients of the loss function through
the closed-loop system model. In particular, we leverage the
following MPC-inspired multi-term loss function. The primary
reference tracking term is augmented with control smoothing
and penalty terms
(8)
imposed on control actions and output
trajectories:
LMSE(R,Y,Y,Y,U,U,U|Θ) =
1
nN
n
X
i=1
N
X
k=1
Qr||ri
kyi
k||2
2+Qdu||ui
kui
k1)||2
2+
Qy||p(yi
k,yi
k)||2
2+Qy||p(yi
k,yi
k)||2
2+
Qu||p(ui
k,ui
k)||2
2+Qu||p(ui
k,ui
k)||2
2
(14)
Where
k
represents time index,
N
is the prediction horizon,
i
is the batch index and
n
is the number of batches of sam-
pled trajectories, respectively.
R
represent sampled reference
trajectories to be tracked by the output trajectories of the closed-
loop system
Y
, where
Qr
is reference tracking weight. The
second term weighted by
Qdu
represents the control action
smoothing. Similar to
Y
, and
Y
,
U
, and
U
are tensors
of lower and upper bounds for the
N
-step ahead control
action trajectories. We optimize the policy parameters
Θ
, while
keeping the parameters of the dynamics model
(9)
fixed. The
penalty terms are weighted by terms
Qy
, and
Qu
for output
and input constraints, respectively.
IV. EXPERIMENTAL CASE STUDY
7
A. System Description
The presented control approaches are implemented on a
laboratory device called FlexyAir
1
. The device is a single-input
single-output system, where the actuator is a fan that drives air
into a vertical tube with a floater inside. An infrared proximity
sensor is placed on the top of the tube, which measures the
floater’s level. The manipulated variable in this laboratory
process is a fan speed command, given to an internal fan speed
controller, which sets the corresponding current for the fan
itself. Next, the process variable is the position of the floater
in the vertical tube. The control objective is to stabilize the
floater’s vertical distance at the desired reference level, while
satisfying given constraints. The sketch of the laboratory device
is shown on the Fig. 5.
Fig. 5: Sketch of the laboratory device with the Raspberry-Pi
platform.
B. Dataset
a) System identification: Our experimental dataset is ob-
tained by observing the real system dynamics with sampling
time
Ts= 0.25
seconds. The measured input-output time series
in the form
(2)
has
m= 9 ·103
datapoints which are used to
create training, validation, and test sets with equal lengths of
1000
samples. To take into account time horizons during the
training, we apply
N
-step time shift to generate the past
Ytrue
p
and future
Ytrue
tensors for output variables, respectively. The
time series in each set are subsequently separated to
N
-step
batches, generating tensors with dimensions
(N, n, ny)
, where
n
represents number of batches, and
ny
is the dimension of
the variable
y
(the same applies for
u
). The number of batches
n=m
N
depends on the total number of datapoints
m
and
length of the prediction horizon N.
1www.ocl.sk/flexyair
b) Closed-loop control: As mentioned in Section III-C, the
control policy training is based on a sampling of the input
sequences of the closed-loop dynamics model and does not
require extra measurements of the real system. To demonstrate
the data-efficiency, we generate each continuous time series
with only
3·103
samples for train, validation, and test set,
respectively. We apply the same
N
-step horizon batching as
in the case of system identification task. The dataset is also
normalized using min-max normalization.
The past observations of the output trajectories
Yp
are
randomly sampled continuous trajectories, while predicted
future trajectories
Y
are internally generated by the trained
system dynamics model. To improve generalization across
dynamic modes, we assume that sampled trajectories
Yp
are
dynamically generated sine waves with varying frequency,
amplitude, and noise at each optimization epoch. The time-
varying references and constraints bounds can be arbitrarily
sampled from user-defined distribution to generalize the control
across a set of tasks.
In our case, we sample sine waves for the output reference
R
with amplitude in range
[0.2,0.8]
, the lower bound
Y
in
range
[0.1,0.4]
, and the upper bound
Y
in range
[0.6,0.9]
. The
control action bounds can be in principle time-varying as well,
however, in our case due to the nature of the experimental
setup, we assume static constraints U= 0.0and U= 1.0
Remark.
Training of the control policy on dynamically
sampled system output trajectories
Yp
with varying frequencies
and amplitudes is inspired by the fact that system response
can be decomposed to a set of dynamic modes with fixed
frequencies [66]. Alternatively, the trajectories
Yp
could be
generated by perturbing the learned system dynamics or simply
represented by observations of the real system.
C. Metrics
We assess trained DPC performance on two sets of metrics,
the first one for training and hyperparameter selection, the
second for task-specific performance evaluation. For training,
we evaluate the mean squared error (MSE) of system identi-
fication loss
(10)
and policy learning loss
(14)
, respectively.
The loss function MSE evaluated on development sets are
used for hyperparameter selection, while MSE on test sets
are used for performance assessment of the training process.
Second, instead of training-oriented processed datasets, we
define the task-specific metrics using the real system data with
T
time steps. For the system identification, we evaluate MSE
of the open-loop response of the trained model compared to the
response of the real system given as:
1
TPT
k=1 ||ykytrue
k||2
2
.
For evaluation of the closed-loop control performance, we
compute the reference tracking MSE as
1
TPT
k=1 ||ykrk||2
2
,
and integral of the absolute error (IAE) as
PT
k=1 |ykrk|
. For
constraints satisfaction, we evaluate mean absolute (MA) value
of the output constraints violations:
1
TPT
k=1 |p(yk,yk)|+
|p(yk,yk)|.
D. Optimization, and Hyperparameters Selection
The presented method with structured neural network models
was implemented using Pytorch [67]. We train our models with
8
randomly initialized weights using Adam optimizer [68] with a
learning rate of
0.001
. All neural network blocks in our models
are designed with
GELU
activation functions [69]. We use a
grid search for finding the best performing hyperparameters
by assessing the performance of the trained models on the
development set and task performance metrics specified in
section IV-C. For both the dynamics model and control policy,
we select the prediction horizon
N= 32
steps, which with
sampling time
Ts= 0.25
seconds, corresponds to the
8
seconds
time window.
a) System identification: The system dynamics model
(9)
is
trained using the system identification dataset on
1000
epochs.
The state transition block
fx:R30 R30
and the input
dynamics block
fu:R1R30
are represented by residual
neural networks, while output decoder
fy:R30 R1
is a
simple linear map. The state encoder map
fo:R1R30
is
represented by a standard, fully connected neural network. For
simplicity, we assume only one step time lag for the state
encoder. All individual neural block components are designed
with
4
hidden layers and
30
hidden neurons. The resulting
neural state space model has
nθ= 24661
trainable parameters.
The weight factors of the system identification loss function
(10)
are given as follows:
Qdx = 0.2
,
Qy= 1.0
,
Qu= 1.0
. The
trained block neural state-space model
(9)
is subsequently used
for the design of the constrained differentiable control policy,
as described in section III-C.
b) Closed-loop control: The constrained differentiable con-
trol policy
(11)
is trained using synthetically sampled closed-
loop system dataset on
5·103
epochs with early stopping based
on development set MSE
2
. The policy map
πΘ:R128 R32
is represented by fully connected neural network with
3
layers
each with
20
hidden neurons, resulting in
nΘ= 7272
trainable
parameters. The weight factors of the constrained control loss
function (14) are Qr= 1.0,Qdu = 0.1,Qy= 2.0,Qu= 10.0.
In case of the explicit model predictive control (eMPC)
strategy
(3a)
, we chose to set the length of the prediction
horizon to
N= 5
, while the tuning factors were set to
Qr= 3
,
and
Qdu= 4
. Because, classical MPC can not handle neural
state space models, the corresponding quadratic optimization
problem
(3)
is constructed using a simplified linear model
obtained from the System Identification toolbox in Matlab [16].
E. Real-time Closed-loop Control Performance
This section shows the results of the real-time implemen-
tation of two control strategies. First, we present a step-
change scenario to show the performance concerning transient
behavior and reference tracking. Secondly, we introduce an
experimental case study, where a harmonic reference alongside
a set of harmonic constraints is considered. Here, we show how
the proposed differentiable predictive control (DPC) handles
constraint satisfaction. All control strategies for these real-time
experiments are implemented using an embedded platform
running on Raspberry-Pi 3 using Python 3.7.
2
Using early stopping the DPC policy training typically converged using
less than 1000 epochs.
a) Step-change scenario: This scenario presents the tracking
performance when a step change occurs in the process variable.
Specifically, we consider a step changes in the reference from
20 15 28
cm for the floater position. Moreover, we set
the constraints to
y= 30cm
and bottom
y= 13cm
. Here,
we compare the performance of
2
controllers. First, we set
the baseline with the explicit MPC strategy. Then we include
the proposed DPC policy. The tuning factors of individual
controllers are mentioned in the section IV-D.0.b.
The control performance of respective policies is visualized
on Fig. 6. Furthermore, rigorous validation of measured
performances is reported in the Table I. We can see that both
eMPC and DPC follow the trajectory and respect bottom and
top constraints from reported results. We can observe that
eMPC and DPC react to the reference change in the same
fashion (slope of the transient behavior). The difference is just
the time instant when the change occurs. This is determined
by the length of the prediction horizon of individual policies.
While for the DPC the length of the horizon is not a bottleneck
(
32
samples in this case), the eMPC can not be constructed
for longer horizons than
10
in this case we use
N= 5
due to
the memory footprint. .
TABLE I: Qualitative evaluation of control performance, based
on metrics given in IV-C.
reference tracking constraints violation
MSE IAE MSE MAE
DPC 1.76 265 0 0
eMPC 8.97 526 0 0
Finally, we direct the reader to the table I, where a
quantitative numerical evaluation is reported. Here we followed
the metrics established in section IV-C. In terms of MSE
criterion, the DPC performs almost
8
times better than the
explicit MPC. We also include the IAE criterion, which is
more common in the control community, and in this case, the
DPC outperforms the eMPC by more than
50%
. In terms of
constraints violations, these criteria read to
0
since neither DPC
and eMPC cross the limits.
b) Constraints Satisfaction: The second scenario involves
the demonstration of systematic constraint handling by the DPC
strategy. Here we utilize a harmonic reference and harmonic
constraints. Concrete results are presented in the Fig. 7. Note,
that the constraints are not violated even if the reference
crosses out of allowed space, thus empirically demonstrating
remarkable robustness capabilities even in this challenging
scenario with high measurement noise.
F. Idealized Simulation Case Studies
The purpose of this section is to demonstrate control capabili-
ties of the proposed constrained differentiable predictive control
policies (DPC) in idealized simulations. We use the trained
system dynamics model for representing the controlled system,
omitting the influence of the plant-model mismatch and additive
real-time disturbances affecting the dynamics of the real system.
We show that, in case of perfect system dynamics model, DPC
can achieve offset-free reference tracking capabilities, and
9
0 3 0 6 0 9 0
t im e [ s ]
1 5
2 0
2 5
3 0
p o s it ion [c m ]
(a) Position measurements.
0 3 0 60 9 0
t im e [ s ]
3 3
3 4
3 5
3 6
fa n s p ee d [%]
(b) Profile of the manipulated variable.
Fig. 6: Real-time measurement profiles of DPC and eMPC
control strategies with static constraints. Top figure shows
the reference (red-dashed), explicit model predictive control
(purple), constrained differentiable predictive control (blue)
and constraints (black-dashed). Bottom figure shows the fan
speeds of respective controllers.
robust constraints satisfaction, hence empirically demonstrating
convergence to near optimal control performance.
a) Reference Tracking: Fig. 8 shows the offset-free reference
tracking capabilities of the trained DPC method assessed using
four different dynamic signals. Besides tracking an arbitrary
dynamic reference, DPC policy demonstrates predictive control
capabilities using a reference preview. In particular, notice the
change in the trajectories several time steps before previewed
step changes in the reference signals. To be able to react safely
and in advance of forecasted parameters such as references
and constraints is a desired feature for many industrial control
applications. The preview capability also represents additional
value compared to explicit MPC, which can not handle preview
of its parameters.
b) Constraints Satisfaction: We demonstrate the capability
of DPC to balance conflicting objectives in terms of reference
tracking and constraints handling. Fig. 9 plots DPC perfor-
mance with dynamic reference crossing dynamic constraints.
Even in this challenging scenario, the trained DPC policy
satisfies the constraints while compromising on the tracking
performance of the unattainable reference.
0 25 50 75 100 125 150 175 200
t [s]
0
10
20
30
position [cm]
(a) Position measurements.
0 25 50 75 100 125 150 175 200
t [s]
32
33
34
35
36
37
38
fan speed [%]
(b) Profile of the manipulated variable.
Fig. 7: Real-time measurement profile of proposed DPC
strategy under the influence of harmonic reference and con-
straints. Top figure shows the reference signal (red-dashed)
and measured floater position (blue). Bottom figure shows the
profile of the fan speed.
G. Scalability Analysis
In Table II we compare the scalability of the proposed
differentiable predictive control (DPC) against explicit model
predictive control (eMPC) in terms of on-line computational
requirements, memory footprint, policy complexity, and off-line
construction time with the increasing length of the prediction
horizon N.
We compare the mean and maximum online computational
(CPU) time and memory footprint required to evaluate and store
the DPC and eMPC control policies. As shown in Table II, the
online evaluation of both control policies is extremely fast in
terms of CPU time. This property is crucial for controlling fast
and highly dynamical systems with high-frequency sampling
rates, such as UAVs or agile robotic systems. However, memory
requirements are significantly different for linearly scalable
DPC compared to exponentially growing requirements of
eMPC. In the case of eMPC, its enormous memory demands
limit the applicability of this control strategy only to small
scale systems with very low prediction horizons. Contrary to
eMPC, DPC policies have an extremely low memory footprint
regardless of the prediction horizon’s length, opening the doors
to large-scale practical control applications.
10
Fig. 8: Simulated closed-loop control trajectories demonstrating reference tracking capabilities of DPC.
Fig. 9: Simulated closed-loop control trajectories demonstrating
balancing conflicting objectives with dynamic reference and
dynamic constraints.
To deeper understand the reported memory requirements,
we evaluate the complexity of the control policies in terms
of the number of parameters for DPC and the number of
critical regions for eMPC. Table II shows that the number
of parameters of the DPC scale linearly with the increasing
prediction horizon
N
, while a number of critical regions of
eMPC scales exponentially. The reason behind this is that
complexity of eMPC policy
(III-C.0.a)
is primarily given by
the number of constraints that scale exponentially with the
length of the prediction horizon and the number of optimized
variables. On the other hand, the complexity of DPC policy
depends mainly on the number of hidden nodes and the number
of layers, which allows it to scale to large state-action spaces
with long prediction horizons.
Table II also reports the construction time for the eMPC
policy using multiparametric programming [55], [70], to give
the reader notions about the limitations of this solution method.
Specifically, we direct the attention to the construction time
TABLE II: Scalability analysis of the proposed differentiable
predictive control (DPC) policy against explicit MPC (eMPC).
N5 7 10 12 15
mean CPU time [ms]
DPC 0.369 0.355 0.371 0.380 0.502
eMPC 0.455 0.472 0.429 - -
max CPU time [ms]
DPC 6.978 7.978 7.945 8.066 5.026
eMPC 1.325 1.927 4.684 - -
memory footprint [kB]
DPC 13 15 17 19 21
eMPC 611 9300 65200 - -
number of policy parameters
DPC 1845 2247 2850 3252 3855
number of policy regions
eMPC 108 347 1420 2631 5333
construction time [h]
DPCab 0.1 0.1 0.1 0.2 0.2
eMPCc0.1 4.0 66.5 - -
aTrained on 1000 epochs without early stopping.
bComputed on Core i7 2.6GHz CPU with 16GB RAM.
cComputed on Core i7 4.0GHz CPU with 32GB RAM.
of the eMPC with
N= 10
for which the generation of
the associated controller took almost
3
days. Contrarily, the
construction time of DPC scales linearly to a large prediction
horizon, as reported in Table II. Here we demonstrate a
tremendous potential of the proposed DPC policies to scale
up to large-scale control problems way beyond the reach of
classical eMPC.
11
V. CONCLUSIONS
We have experimentally demonstrated that it is possible
to train constrained optimal control policies purely based on
the observations of the dynamics of the unknown nonlinear
system. The principle is based on optimizing control policies
with constraints penalty functions by differentiating trained
neural state space models representing an internal model of the
observed nonlinear system. We denote this control approach
constrained differentiable predictive control (DPC).
We compare the performance of trained DPC policy against
classical explicit model predictive control (eMPC). The control
algorithms are implemented on a laboratory device with the
Raspberry-Pi platform. In comparison with eMPC using a
linear model, DPC achieves better control performance due
to the nonlinear system dynamics model and the reference
and constraints preview capability. However, most importantly,
DPC has unprecedented scalability beyond the limitations of
eMPC. DPC scales well with increased problem complexity
defined by the length of the prediction horizon resulting in
an increased number of decision variables and constraints
of the underlying optimization problem. DPC demonstrates
linear scalability in terms of memory footprint, number of
policy parameters, and required construction time. Therefore,
we believe that the proposed DPC method has the potential
for wide adoption in large-scale control systems with limited
computational resources and fast sampling rates.
It is essential to mention that the proposed differentiable
control methodology is generic and not limited to a partic-
ular structure of the underlying system dynamics. Various
architectures might be more or less suitable depending on the
modeled dynamical system. Straightforward model architecture
extensions of the proposed methodology may include a whole
range of differentiable models, for instance sparse identifica-
tion of nonlinear dynamics (SINDy) [71], multistep neural
networks [72], or graph neural networks [73].
From a theoretical perspective, structural similarity of the
underlying optimization problem to well studied MPC problem
presents opportunities for further extensions of the proposed
DPC methodology. Some examples include adaptive control
via online updates of the nonlinear system model, robust and
stochastic control via constraints tightening, or closed-loop
stability guarantees based on stability of MPC.
VI. ACKNOWLEDGEMENTS
This work was funded by the Mathematics for Artificial
Reasoning in Science (MARS) investment at the Pacific
Northwest National Laboratory (PNNL).
K. Ki
ˇ
s and M. Klau
ˇ
co gratefully acknowledge the contri-
bution of the Scientific Grant Agency of the Slovak Republic
under the grants 1/0585/19 and 1/0545/20.
REFERENCES
[1]
X. Zhang, M. Bujarbaruah, and F. Borrelli, “Near-Optimal Rapid MPC
Using Neural Networks: A Primal-Dual Policy Learning Framework,
IEEE Transactions on Control Systems Technology, pp. 1–13, 2020.
[2]
S. Lucia and B. Karg, “A deep learning-based approach to
robust nonlinear model predictive control,IFAC-PapersOnLine,
vol. 51, no. 20, pp. 511 – 516, 2018, 6th IFAC Conference on
Nonlinear Model Predictive Control NMPC 2018. [Online]. Available:
http://www.sciencedirect.com/science/article/pii/S2405896318326958
[3]
S. Chen, K. Saulnier, N. Atanasov, D. D. Lee, V. Kumar, G. J.
Pappas, and M. Morari, “Approximating explicit model predictive control
using constrained neural networks,” in 2018 Annual American Control
Conference (ACC), 2018, pp. 1520–1527.
[4]
E. T. Maddalena, C. G. da S. Moraes, G. Waltrich, and C. N. Jones, “A
neural network architecture to learn explicit mpc controllers from data,
2019.
[5]
M. Hertneck, J. K
¨
ohler, S. Trimpe, and F. Allg
¨
ower, “Learning an
approximate model predictive controller with guarantees,IEEE Control
Systems Letters, vol. 2, no. 3, pp. 543–548, 2018.
[6]
S. Lucia, D. Navarro, B. Karg, H. Sarnago, and Luc
´
ıa, “Deep learning-
based model predictive control for resonant power converters,” IEEE
Transactions on Industrial Informatics, vol. 17, no. 1, pp. 409–420, 2021.
[7]
Y. Lohr, M. Klau
ˇ
co, M. Fikar, and M. M
¨
onnigmann, “Machine learning
assisted solutions of mixed integer mpc on embedded platforms,” in
Preprints of the 21st IFAC World Congress (Virtual), Berlin, Germany,
July 12-17, 2020, vol. 21, July 12-17, 2020 2020. [Online]. Available:
https://www.uiam.sk/assets/publication info.php?id pub=2192
[8]
J. Drgo
ˇ
na, D. Picard, M. Kvasnica, and L. Helsen, “Approximate
model predictive building control via machine learning,Applied
Energy, vol. 218, pp. 199 – 216, 2018. [Online]. Available:
http://www.sciencedirect.com/science/article/pii/S0306261918302903
[9]
U. Rosolia, X. Zhang, and F. Borrelli, “Data-driven predictive control
for autonomous systems,” Annual Review of Control, Robotics, and
Autonomous Systems, vol. 1, no. 1, pp. 259–286, 2018. [Online].
Available: https://doi.org/10.1146/annurev-control- 060117-105215
[10]
Y. Lohr, M. Klau
ˇ
co, M. Kal
´
uz, and M. M
¨
onnigmann, “Mimicking
predictive control with neural networks in domestic heating systems,
in Proceedings of the 22nd International Conference on Process
Control, M. Fikar and M. Kvasnica, Eds., Slovak University of
Technology in Bratislava.
ˇ
Strbsk
´
e Pleso, Slovakia: Slovak Chemical
Library, June 11-14, 2019 2019, pp. 19–24. [Online]. Available:
https://www.uiam.sk/assets/publication info.php?id pub=2035
[11]
A. Bemporad, M. Morari, V. Dua, and E. N. Pistikopoulos,
“The explicit linear quadratic regulator for constrained systems,
Automatica, vol. 38, no. 1, pp. 3–20, 2002. [Online]. Available:
https://www.sciencedirect.com/science/article/pii/S0005109801001741
[12]
D. Tavernini, M. Metzler, P. Gruber, and A. Sorniotti, “Explicit nonlinear
model predictive control for electric vehicle traction control,IEEE
Transactions on Control Systems Technology, vol. 27, no. 4, pp. 1438–
1451, 2019.
[13]
M. Kvasnica, B. Tak
´
acs, J. Holaza, and S. Di Cairano, “On region-free
explicit model predictive control,” in 54rd IEEE Conference on Decision
and Control, vol. 54, Osaka, Japan, December 15-18, 2015 2015, pp.
3669–3674.
[14]
M. Kvasnica and M. Fikar, “Clipping-based complexity reduction in
explicit mpc,” IEEE Transactions On Automatic Control, vol. 57, no. 7,
pp. 1878–1883, July 2012.
[15]
M. Kvasnica, P. Bakar
´
a
ˇ
c, and M. Klau
ˇ
co, “Complexity reduction
in explicit mpc: A reachability approach,” Systems & Control
Letters, vol. 124, pp. 19–26, 2019. [Online]. Available: https:
//www.uiam.sk/assets/publication info.php?id pub=1980
[16]
L. Ljung, System Identification: Theory for the User, 2nd edition.
Prentice-Hall, Upper Saddle River, NJ, 1999, p. 607.
[17]
K. Tohru, Subspace Methods for System Identification. Springer, 2005.
[18]
M. Kvasnica, “Implicit vs explicit mpc — similarities, differences, and
a path owards a unified method,” in 2016 European Control Conference
(ECC), 2016, pp. 603–603.
[19]
J. Drgona, A. Tuor, and D. Vrabie, “Constrained physics-informed deep
learning for stable system identification and control of unknown linear
systems,” vol. abs/2004.11184, 2020.
[20]
S. Br
¨
uggemann and C. Possieri, “On the use of difference of log-sum-exp
neural networks to solve data-driven model predictive control tracking
problems,” IEEE Control Systems Letters, vol. 5, no. 4, pp. 1267–1272,
2021.
[21]
B. Karg and S. Lucia, “Approximate moving horizon estimation and
robust nonlinear model predictive control via deep learning,Computers
Chemical Engineering, vol. 148, p. 107266, 2021. [Online]. Available:
https://www.sciencedirect.com/science/article/pii/S0098135421000442
[22]
K. Ki
ˇ
s and M. Klau
ˇ
co, “Neural network based explicit mpc for chemical
reactor control,” Acta Chimica Slovaca, vol. 12, no. 2, pp. 218–223,
2019. [Online]. Available: https://www.uiam.sk/assets/publication info.
php?id pub=2115
[23]
I. Mordatch and E. Todorov, “Combining the benefits of function
approximation and trajectory optimization,” in In Robotics: Science
and Systems (RSS, 2014.
12
[24]
T. Zhang, G. Kahn, S. Levine, and P. Abbeel, “Learning deep
control policies for autonomous aerial vehicles with MPC-guided
policy search,” CoRR, vol. abs/1509.06791, 2015. [Online]. Available:
http://arxiv.org/abs/1509.06791
[25]
S. Chen, K. Saulnier, N. Atanasov, D. D. Lee, V. Kumar, G. J.
Pappas, and M. Morari, “Approximating explicit model predictive control
using constrained neural networks,” in 2018 Annual American Control
Conference (ACC), June 2018, pp. 1520–1527.
[26]
P. L. Donti, M. Roderick, M. Fazlyab, and J. Z. Kolter, “Enforcing
robust control guarantees within neural network policies,” in The Ninth
International Conference on Learning Representations (ICLR), 2021.
[27]
L. Hewing, K. P. Wabersich, M. Menner, and M. N. Zeilinger,
“Learning-based model predictive control: Toward safe learning in
control,” Annual Review of Control, Robotics, and Autonomous
Systems, vol. 3, no. 1, p. null, 2020. [Online]. Available: https:
//doi.org/10.1146/annurev-control-090419- 075625
[28]
L. Hewing, J. Kabzan, and M. N. Zeilinger, “Cautious model predictive
control using gaussian process regression,” IEEE Transactions on Control
Systems Technology, vol. 28, no. 6, pp. 2736–2743, 2020.
[29]
I. Lenz, R. A. Knepper, and A. Saxena, “Deepmpc: Learning deep latent
features for model predictive control,” in Robotics: Science and Systems,
2015.
[30]
K. Bieker, S. Peitz, S. L. Brunton, J. N. Kutz, and M. Dellnitz,
“Deep model predictive control with online learning for complex
physical systems,” CoRR, vol. abs/1905.10094, 2019. [Online]. Available:
http://arxiv.org/abs/1905.10094
[31]
A. Broad, I. Abraham, T. D. Murphey, and B. D. Argall, “Structured
neural network dynamics for model-based control,” CoRR, vol.
abs/1808.01184, 2018. [Online]. Available: http://arxiv.org/abs/1808.
01184
[32]
Y. Chen, Y. Shi, and B. Zhang, “Optimal control via neural networks: A
convex approach,” 2018.
[33] Y. Li, J. Wu, J.-Y. Zhu, J. B. Tenenbaum, A. Torralba, and R. Tedrake,
“Propagation networks for model-based control under partial observation,
in ICRA, 2019.
[34]
Y.-C. Chang, N. Roohi, and S. Gao, “Neural lyapunov control,” in
Advances in Neural Information Processing Systems 32, H. Wallach,
H. Larochelle, A. Beygelzimer, F. d
'
Alch
´
e-Buc, E. Fox, and R. Garnett,
Eds. Curran Associates, Inc., 2019, pp. 3245–3254. [Online]. Available:
http://papers.nips.cc/paper/8587-neural- lyapunov-control.pdf
[35]
J. A. E. Andersson, J. Gillis, G. Horn, J. B. Rawlings, and
M. Diehl, “Casadi: a software framework for nonlinear optimization
and optimal control,” Mathematical Programming Computation,
vol. 11, no. 1, pp. 1–36, Mar 2019. [Online]. Available: https:
//doi.org/10.1007/s12532-018-0139-4
[36]
B. Amos, L. Xu, and J. Z. Kolter, “Input convex neural
networks,” CoRR, vol. abs/1609.07152, 2016. [Online]. Available:
http://arxiv.org/abs/1609.07152
[37]
F. de Avila Belbute-Peres, K. Smith, K. Allen, J. Tenenbaum, and J. Z.
Kolter, “End-to-end differentiable physics for learning and control,” in
Advances in Neural Information Processing Systems 31, S. Bengio,
H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett,
Eds. Curran Associates, Inc., 2018, pp. 7178–7189.
[38]
J. Degrave, M. Hermans, J. Dambre, and F. Wyffels, “A differentiable
physics engine for deep learning in robotics,” CoRR, vol. abs/1611.01652,
2016. [Online]. Available: http://arxiv.org/abs/1611.01652
[39]
B. Amos, I. D. J. Rodriguez, J. Sacks, B. Boots, and J. Z. Kolter,
“Differentiable MPC for end-to-end planning and control,CoRR, vol.
abs/1810.13400, 2018. [Online]. Available: http://arxiv.org/abs/1810.
13400
[40]
T. Yang, “Advancing non-convex and constrained learning: Challenges
and opportunities,” AI Matters, vol. 5, no. 3, p. 29–39, Dec. 2019.
[Online]. Available: https://doi.org/10.1145/3362077.3362085
[41]
D. Pathak, P. Kr
¨
ahenb
¨
uhl, and T. Darrell, “Constrained convolutional
neural networks for weakly supervised segmentation,” CoRR, vol.
abs/1506.03648, 2015. [Online]. Available: http://arxiv.org/abs/1506.
03648
[42]
Z. Jia, X. Huang, E. I. Chang, and Y. Xu, “Constrained deep weak
supervision for histopathology image segmentation,” IEEE Transactions
on Medical Imaging, vol. 36, no. 11, pp. 2376–2388, 2017.
[43]
C. K. Goh, Y. Liu, and A. W. K. Kong, “A constrained deep neural
network for ordinal regression,” in 2018 IEEE/CVF Conference on
Computer Vision and Pattern Recognition, 2018, pp. 831–839.
[44]
P. M
´
arquez-Neila, M. Salzmann, and P. Fua, “Imposing hard constraints
on deep networks: Promises and limitations,” CoRR, vol. abs/1706.02025,
2017. [Online]. Available: http://arxiv.org/abs/1706.02025
[45]
H. Kervadec, J. Dolz, J. Yuan, C. Desrosiers, E. Granger, and I. B.
Ayed, “Log-barrier constrained cnns,CoRR, vol. abs/1904.04205, 2019.
[Online]. Available: http://arxiv.org/abs/1904.04205
[46]
Y. Liu, C. Su, H. Li, and R. Lu, “Barrier function-based adaptive
control for uncertain strict-feedback systems within predefined neural
network approximation sets,” IEEE Transactions on Neural Networks
and Learning Systems, vol. 31, no. 8, pp. 2942–2954, 2020.
[47]
K. Zhao and J. Chen, “Adaptive neural quantized control of mimo non-
linear systems under actuation faults and time-varying output constraints,
IEEE Transactions on Neural Networks and Learning Systems, vol. 31,
no. 9, pp. 3471–3481, 2020.
[48]
J. Hendriks, C. Jidling, A. Wills, and T. Sch
¨
on, “Linearly constrained
neural networks,” Submitted to IEEE Transactions on Neural Networks
and Learning Systems, 2020.
[49]
S. Greydanus, M. Dzamba, and J. Yosinski, “Hamiltonian neural
networks,” CoRR, vol. abs/1906.01563, 2019. [Online]. Available:
http://arxiv.org/abs/1906.01563
[50]
M. Lutter, C. Ritter, and J. Peters, “Deep lagrangian networks: Using
physics as model prior for deep learning,” CoRR, vol. abs/1907.04490,
2019. [Online]. Available: http://arxiv.org/abs/1907.04490
[51]
D. Q. Mayne, J. B. Rawlings, C. V. Rao, and P. O. M. Scokaert,
“Constrained model predictive control: Stability and optimality,”
Automatica, vol. 36, no. 6, pp. 789 – 814, 2000. [Online]. Available:
http://www.sciencedirect.com/science/article/pii/S0005109899002149
[52]
G. Pannocchia, “Robust disturbance modeling for model predictive control
with application to multivariable ill-conditioned processes,Journal of
Process Control, vol. 13, no. 8, pp. 693 – 701, 2003. [Online]. Available:
http://www.sciencedirect.com/science/article/pii/S0959152402001348
[53]
J. L
¨
ofberg, “YALMIP : A Toolbox for Modeling and Optimization in
MATLAB,” in Proc. of the CACSD Conference, Taipei, Taiwan, 2004,
available from http://users.isy.liu.se/johanl/yalmip/.
[54]
F. Borrelli, A. Bemporad, and M. Morari, Predictive Control for Linear
and Hybrid Systems. Cambridge University Press, 2017. [Online].
Available: https://books.google.de/books?id=cdQoDwAAQBAJ
[55]
M. Herceg, M. Kvasnica, C. Jones, and M. Morari, “Multi-parametric
toolbox 3.0,” in 2013 European Control Conference, Zurich, Switzerland,
2013, pp. 502–510.
[56]
B. Tak
´
acs, J.
ˇ
Stevek, R. Valo, and M. Kvasnica, “Python code
generation for explicit mpc in mpt,” in European Control Conference
2016, Aalborg, Denmark, 2016, pp. 1328–1333. [Online]. Available:
https://www.uiam.sk/assets/publication info.php?id pub=1737
[57]
E. Skomski, S. Vasisht, C. Wight, A. Tuor, J. Drgona, and D. Vrabie,
“Constrained block nonlinear neural dynamical models,” arXiv preprint
arXiv:2101.01864, 2021.
[58]
R. G. Krishnan, U. Shalit, and D. Sontag, “Structured inference networks
for nonlinear state space models,” in AAAI’17: Proceedings of the Thirty-
First AAAI Conference on Artificial Intelligence, 2017.
[59]
D. Hafner, T. P. Lillicrap, I. Fischer, R. Villegas, D. Ha, H. Lee,
and J. Davidson, “Learning latent dynamics for planning from
pixels,” CoRR, vol. abs/1811.04551, 2018. [Online]. Available:
http://arxiv.org/abs/1811.04551
[60]
O. P. Ogunmolu, X. Gu, S. B. Jiang, and N. R. Gans, “Nonlinear
systems identification using deep dynamic neural networks,” CoRR,
vol. abs/1610.01439, 2016. [Online]. Available: http://arxiv.org/abs/1610.
01439
[61]
S. S. Rangapuram, M. W. Seeger, J. Gasthaus, L. Stella, Y. Wang, and
T. Januschowski, “Deep state space models for time series forecasting,”
in Advances in Neural Information Processing Systems 31, S. Bengio,
H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett,
Eds. Curran Associates, Inc., 2018, pp. 7785–7794.
[62]
Jeen-Shing Wang and Yi-Chung Chen, “A hammerstein-wiener recurrent
neural network with universal approximation capability,” in 2008 IEEE
International Conference on Systems, Man and Cybernetics, Oct 2008,
pp. 1832–1837.
[63]
D. Masti and A. Bemporad, “Learning nonlinear state-space models
using deep autoencoders,” in 2018 IEEE Conference on Decision and
Control (CDC), 2018, pp. 3862–3867.
[64]
A. Tuor, J. Drgona, and D. Vrabie, “Constrained neural ordinary
differential equations with stability guarantees,” 2020.
[65]
J. Schoukens and L. Ljung, “Nonlinear system identification: A
user-oriented roadmap,” CoRR, vol. abs/1902.00683, 2019. [Online].
Available: http://arxiv.org/abs/1902.00683
[66]
P. Schmid, “Dynamic mode decomposition of numerical and experimental
data,” Journal of Fluid Mechanics, vol. 656, pp. 5–28, 2008.
[67]
A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan,
T. Killeen, Z. Lin, N. Gimelshein, L. Antiga et al., “Pytorch: An
13
imperative style, high-performance deep learning library,” in Advances
in Neural Information Processing Systems, 2019, pp. 8024–8035.
[68]
D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,
arXiv preprint arXiv:1412.6980, 2014.
[69]
D. Hendrycks and K. Gimpel, “Bridging nonlinearities and stochastic
regularizers with gaussian error linear units,” CoRR, vol. abs/1606.08415,
2016. [Online]. Available: http://arxiv.org/abs/1606.08415
[70]
R. Oberdieck, N. Diangelakis, I. Nascu, M. Papathanasiou, M. Sun,
S. Avraamidou, and E. Pistikopoulos, “On multi-parametric program-
ming and its applications in process systems engineering,” Chemical
Engineering Research and Design, 2016.
[71]
S. L. Brunton, J. L. Proctor, and J. N. Kutz, “Discovering
governing equations from data by sparse identification of nonlinear
dynamical systems,” Proceedings of the National Academy of
Sciences, vol. 113, no. 15, pp. 3932–3937, 2016. [Online]. Available:
https://www.pnas.org/content/113/15/3932
[72]
M. Raissi, P. Perdikaris, and G. E. Karniadakis, “Multistep neural
networks for data-driven discovery of nonlinear dynamical systems,
arXiv preprint arXiv:1801.01236, 2018.
[73]
A. Sanchez-Gonzalez, J. Godwin, T. Pfaff, R. Ying, J. Leskovec, and P. W.
Battaglia, “Learning to simulate complex physics with graph networks,
in ICML, 2020.
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
In this paper, implementation of deep neural networks applied in process control is presented. In our approach, training of the neural network is based on model predictive control, which is popular for its ability to be tuned by the weighting matrices and for it respecting the system constraints. A neural network that can approximate the MPC behavior by mimicking the control input trajectory while the constraints on states and control input remain unimpaired by the weighting matrices is introduced. This approach is demonstrated in a simulation case study involving a continuous stirred tank reactor where a multi-component chemical reaction takes place.
Article
Nonlinear system identification is an extremely broad topic, since every system that is not linear is nonlinear. That makes it impossible to give a full overview of all aspects of the fi eld. For this reason, the selection of topics and the organization of the discussion are strongly colored by the personal journey of the authors in this nonlinear universe.
Article
This work presents an explicit–implicit procedure to compute a model predictive control (MPC) law with guarantees on recursive feasibility and asymptotic stability. The approach combines an offline-trained fully-connected neural network with an online primal active set solver. The neural network provides a control input initialization while the primal active set method ensures recursive feasibility and asymptotic stability. The neural network is trained with a primal–dual loss function, aiming to generate control sequences that are primal feasible and meet a desired level of suboptimality. Since the neural network alone does not guarantee constraint satisfaction, its output is used to warm start the primal active set method online. We demonstrate that this approach scales to large problems with thousands of optimization variables, which are challenging for current approaches. Our method achieves a 2× reduction in online inference time compared to the best method in a benchmark suite of different solver and initialization strategies.
Article
Optimization-based methods for output-feedback control enable dealing with multiple-input and multiple-output nonlinear systems in the presence of uncertainties and constraints. The combination of moving horizon estimation (MHE) and nonlinear model predictive control (NMPC) can be especially powerful because of its general formulation but its implementation requires solving two optimization problems at every sampling instant, which can be challenging due to hardware or time constraints. We propose to take advantage of the expressive capabilities of deep neural networks to approximate the solution of the MHE and NMPC problems. By substituting the MHE and NMPC with their learning-based counterparts, the required online computations are significantly reduced. We also propose to use sensitivity analysis to compute an approximate upper-bound of the maximum one-step divergence from the optimal performance caused by the approximation error. The efficacy of the proposed learning-based approach is illustrated with simulation results of a semi-batch reactor for industrial polymerization.
Article
In this article, we propose a novel framework for approximating the MPC policy for linear parameter-varying systems using supervised learning. Our learning scheme guarantees feasibility and near-optimality of the approximated MPC policy with high probability. Furthermore, in contrast to most existing approaches that only learn the MPC policy, we also learn the "dual policy," which enables us to keep a check on the approximated MPC's optimality online during the control process. If the check deems the control input from the approximated MPC policy safe and near-optimal, then it is applied to the plant; otherwise, a backup controller is invoked, thus filtering out (severely) suboptimal control inputs. The backup controller is only invoked with a bounded (low) probability, where the exact probability level can be chosen by the user. Since our framework does not require solving any optimization problem during the control process, it enables the deployment of MPC on resource-constrained systems. Specifically, we illustrate the utility of the proposed framework on a vehicle dynamics control problem. Compared with online optimization methods, we demonstrate a speedup of up to 62x on a desktop computer and 10x on an automotive-grade electronic control unit, while maintaining high control performance.
Article
We employ Difference of Log-Sum-Exp neural networks to generate a data-driven feedback controller based on Model Predictive Control (MPC) to track a given reference trajectory. By using this class of networks to approximate the MPC-related cost function subject to the given system dynamics and input constraint, we avoid two of the main bottlenecks of classical MPC: the availability of an accurate model for the system being controlled, and the computational cost of solving the MPC-induced optimization problem. The former is tackled by exploiting the universal approximation capabilities of this class of networks. The latter is alleviated by making use of the difference-of-convex-functions structure of these networks. Furthermore, we show that the system driven by the MPC-neural structure is practically stable.
Article
Iterative learning model predictive control (ILMPC) has been recognized as an effective approach to realize high-precision tracking for batch processes with repetitive nature because of its excellent learning ability and closed-loop stability property. However, as a model-based strategy, ILMPC suffers from the unavailability of accurate first principal model in many complex nonlinear batch systems. On account of the abundant process data, nonlinear dynamics of batch systems can be identified precisely along the trials by neural network (NN), making it enforceable to design a data-driven ILMPC. In this article, by using a control-affine feedforward neural network (CAFNN), the features in the process data of the former batch are extracted to form a nonlinear affine model for the controller design in the current batch. Based on the CAFNN model, the ILMPC is formulated in a tube framework to attenuate the influence of modeling errors and track the reference trajectory with sustained accuracy. Due to the control-affine structure, the gradients of the objective function can be analytically computed offline, so as to improve the online computational efficiency and optimization feasibility of the tube ILMPC. The robust stability and the convergence of the data-driven ILMPC system are analyzed theoretically. The simulation on a typical batch reactor verifies the effectiveness of the proposed control method.
Article
2018 Curran Associates Inc.All rights reserved. We present a differentiable physics engine that can be integrated as a module in deep neural networks for end-to-end learning. As a result, structured physics knowledge can be embedded into larger systems, allowing them, for example, to match observations by performing precise simulations, while achieves high sample efficiency. Specifically, in this paper we demonstrate how to perform backpropagation analytically through a physical simulator defined via a linear complementarity problem. Unlike traditional finite difference methods, such gradients can be computed analytically, which allows for greater flexibility of the engine. Through experiments in diverse domains, we highlight the system's ability to learn physical parameters from data, efficiently match and simulate observed visual behavior, and readily enable control via gradient-based planning methods. Code for the engine and experiments is included with the paper.
Article
Resonant power converters offer improved levels of efficiency and power density. In order to implement such systems, advanced control techniques are required to take the most of the power converter. In this context, model predictive control arises as a powerful tool that is able to consider nonlinearities and constraints, but it requires the solution of complex optimization problems or strong simplifying assumptions that hinder its application in real situations. Motivated by recent theoretical advances in the field of deep learning, this paper proposes to learn, offline, the optimal control policy defined by a complex model predictive formulation using using deep neural networks so that the online use of the learned controller requires only the evaluation of a neural network. The obtained learned controller can be executed very rapidly on embedded hardware. We show the potential of the presented approach on a Hardware-in-the-Loop setup of an FPGA-controlled resonant power converter.
Article
As data gets more complex and applications of machine learning (ML) algorithms for decision-making broaden and diversify, traditional ML methods by minimizing an unconstrained or simply constrained convex objective are becoming increasingly unsatisfactory. To address this new challenge, recent ML research has sparked a paradigm shift in learning predictive models into non-convex learning and heavily constrained learning. Non-Convex Learning (NCL) refers to a family of learning methods that involve optimizing non-convex objectives. Heavily Constrained Learning (HCL) refers to a family of learning methods that involve constraints that are much more complicated than a simple norm constraint (e.g., data-dependent functional constraints, non-convex constraints), as in conventional learning. This paradigm shift has already created many promising outcomes: (i) non-convex deep learning has brought breakthroughs for learning representations from large-scale structured data (e.g., images, speech) (LeCun, Bengio, & Hinton, 2015; Krizhevsky, Sutskever, & Hinton, 2012; Amodei et al., 2016; Deng & Liu, 2018); (ii) non-convex regularizers (e.g., for enforcing sparsity or low-rank) could be more effective than their convex counterparts for learning high-dimensional structured models (C.-H. Zhang & Zhang, 2012; J. Fan & Li, 2001; C.-H. Zhang, 2010; T. Zhang, 2010); (iii) constrained learning is being used to learn predictive models that satisfy various constraints to respect social norms (e.g., fairness) (B. E. Woodworth, Gunasekar, Ohannessian, & Srebro, 2017; Hardt, Price, Srebro, et al., 2016; Zafar, Valera, Gomez Rodriguez, & Gummadi, 2017; A. Agarwal, Beygelzimer, Dudík, Langford, & Wallach, 2018), to improve the interpretability (Gupta et al., 2016; Canini, Cotter, Gupta, Fard, & Pfeifer, 2016; You, Ding, Canini, Pfeifer, & Gupta, 2017), to enhance the robustness (Globerson & Roweis, 2006a; Sra, Nowozin, & Wright, 2011; T. Yang, Mahdavi, Jin, Zhang, & Zhou, 2012), etc. In spite of great promises brought by these new learning paradigms, they also bring emerging challenges to the design of computationally efficient algorithms for big data and the analysis of their statistical properties.