Content uploaded by Hannes Mandler
Author content
All content in this area was uploaded by Hannes Mandler on Oct 17, 2023
Content may be subject to copyright.
EMBEDDING EXPLICIT SMOOTHNESS CONSTRAINTS IN
DATA-DRIVEN TURBULENCE MODELS
H. Mandler1and B. Weigand1
1Institute of Aerospace Thermodynamics, University of Stuttgart
hannes.mandler@itlr.uni-stuttgart.de
Abstract
This paper is concerned with an innovative regular-
ization strategy for data-driven turbulence models. By
enforcing explicit constraints on the Lipschitz continu-
ity of the neural networks (NNs) which represent vari-
able coefficients of the closure, their sensitivity with
respect to input perturbations can be significantly re-
duced. This can be efficiently implemented by bound-
ing the spectral norm of the NNs’ weight matrices.
Furthermore, the influence of different levels of Lip-
schitz continuity on the mean flow field prediction is
illustrated for a two-dimensional channel flow with pe-
riodic hills. It is demonstrated that the Reynolds stress
tensor prediction becomes smoother as the Lipschitz
constants of the NNs decrease, which facilitates the
stability and accuracy of the flow field solution.
1 Introduction
Due to the model form uncertainty associated with
the Boussinesq hypothesis and the assumption of con-
stant closure coefficients (Xiao and Cinnella, 2019),
which are the foundation of most turbulence models of
practical relevance, data-driven models have recently
gained wide attraction. The vast majority of these ap-
proaches is based on a nonlinear closure equation
2b=τt
k+2
3I=
N≤10
X
n=1
gn(q)Tn(1)
which expresses the non-dimensional anisotropy ten-
sor bas a linear combination of Nbase tensors Tn.
In Pope’s (1975) original formulation, these are com-
binations of the mean strain and rotation rate tensors,
e
S=τ¯
Sand e
Ω=τ¯
Ω, respectively, which are non-
dimensionalized by the turbulent time scale τ. The
turbulent kinetic energy (TKE), which is related to the
trace of the Reynolds stress tensor (RST) τt, is de-
noted by k.
Recently, methods from the emerging field of ma-
chine learning have been adopted in order to systemat-
ically infer the variability of the closure coefficients
gn, which may be functions of the local flow state.
The straight-forward procedure to obtain these func-
tions can be broken down into two steps (Duraisamy,
2021) : Firstly, the spatial distributions of the inde-
pendent variables q(x)and the optimal closure coef-
ficients gop
n(x)are extracted from high-fidelity simu-
lations of representative flows. Secondly, a set of re-
gression problems can be posed opting for functions
hn:q(x)7→ gop
n(x)∀n, (2)
which are commonly referred to as hypotheses and
generalize the relationship between local flow quan-
tities and optimal closure coefficients.
A considerable number of data-driven closures re-
lies on NNs as representation for the hypotheses hn,
e.g. those proposed by Ling et al. (2016), Geneva and
Zabaras (2019), Jiang et al. (2021) as well as Mandler
and Weigand (2022b). Although NNs may approxi-
mate any non-linear relationship to arbitrary accuracy
(Hornik, 1991) and are suitable for deep learning ap-
plications, they are also prone to overfitting and ad-
versarial attacks (Rosca et al., 2020; Akhtar and Mian,
2018). Therefore, even slight input perturbations may
yield oscillatory or even anomalous predictions, which
would propagate into the momentum balance via the
divergence of the non-dimensional anisotropy tensor
∇ · b=
N
X
n=1 gn∇ · Tn+TT
n· ∇q·∂gn
∂qT!(3)
and thus decrease the stability and accuracy of its so-
lution. This effect may even be amplified by the Jaco-
bians of the hypotheses Jn=∂gn/∂q.
Data-driven turbulence models accordingly face
the bias-variance trade-off between exploiting the full
potential for model enhancement and the stability of
the solver, which is mainly controlled by the sensi-
tivity of the hypotheses w.r.t. their inputs and can be
adjusted by regularization techniques. The input sen-
sitivity can be expressed in terms of the Lipschitz con-
stant Kn, which is defined by
khn(q2)−hn(q1)k2≤ Knkq2−q1k2∀q1,q2,
(4)
and provides an upper bound for the maximum mag-
nitude of the hypothesis’ Jacobian, i.e. kJnk2≤ Kn.
The most common regularization method is called
weight decay (WD). It penalizes large magnitudes
of the weights by adding another term to the cost
function. There are plenty of alternative strategies,
e.g. gathering more data, pruning, early stopping,
dropout and noise injection (Goodfellow et al., 2016).
Even though these methods indeed reduce the hy-
potheses’ input sensitivity, it is uncertain to which ex-
tent as the methods’ hyperparameters are only vaguely
linked to the hypotheses’ Lipschitz constants. Hence,
Yoshida and Miyato (2017) as well as Usama and
Chang (2019) proposed to modify the cost function
such that hypotheses with low Lipschitz constants are
incentivized. Because this soft constraint does still
not guarantee a certain value of the Lipschitz constant,
Miyato et al. (2018) suggested spectral normalization
of the weight matrices in order to enforce K= 1.
Gouk et al. (2021) were able to extent the latter con-
cept to arbitrary Lipschitz constants owe to a projec-
tion method. The aforementioned concepts rely on ac-
curate estimates of the Lipschitz constant for a given
NN. While the present work is based on a straight-
forward and conservative estimate (Neyshabur, 2017),
more accurate methods are described by Scaman and
Virmaux (2018) and Pauli et al. (2022).
This is the first attempt to embed explicit smooth-
ness constraints into a data-driven turbulence models.
In the present work, it will be demonstrated that this
indeed yields spatially smoother flow field predictions
compared to models regularized by WD.
This paper is organized as follows. First, the Lips-
chitz continuity control (LCC) method described. Af-
ter briefly reviewing the neuralSST model as an exam-
ple of a data-driven closure, the training and test cases
are presented. Finally, a selection of a priori and a pos-
teriori results is thoroughly discussed.
2 Lipschitz continuity control of NNs
The Lipschitz constant of a feed-forward NN us-
ing an activation function whose slope does not exceed
unity is bounded by the product of the spectral norms
of all Lhidden as well as the output layers’ weight ma-
trices. Since the spectral norm of the lth weight ma-
trix is given by its largest singular value max σ(l),
an upper bound for the Lipschitz constant of the entire
NN reads (Gouk et al., 2021)
K ≤
L+1
Y
l=1
max σ(l).(5)
If all hidden layers are supposed to contribute equally
to the variability of the hypothesis, rearranging Eq. (5)
yields an upper bound for the maximum singular value
of each weight matrix (Gouk et al., 2021).
max σ(l)≤ K 1
L+1 ∀l≤L+ 1 (6)
To enforce the constraint given by Eq. (6), the updated
weight matrices returned by the optimizer at the end
of each epoch are normalized (Gouk et al., 2021).
W(l)←W(l)
max 1,max(σ(l))
K
1
L+1 ∀l≤L+ 1 (7)
3 Data-driven turbulence model
This section provides a concise review of the neu-
ralSST model of Mandler and Weigand (2022b).
Implemetation into RANS solver
All modifications of the closure equation w.r.t. the
underlying SST model (Menter et al., 2003) are
grouped in a single correction term. The non-
dimensional anisotropy tensor, therefore, reads
b=νt
SST
k¯
S−R(−bML ) + νt
SST
k¯
S,(8)
where its data-driven prediction
2bML =g1e
S+g2e
Se
Ω−e
Ωe
S
+g3e
S2−1
3tr ne
S2oI(9)
is subject to (s.t.) a barycentric realizability correction
R. By evaluating the hypotheses and limiting their
predictions, the closure coefficients
gn= max gmin
n,min [hn(q), gmax
n](10)
can be obtained. The corresponding limits are set such
that g1∈[0.1,0.4] , g2∈[0.0,0.3] , g3∈[−0.4,0.0]
and g4∈[−1.0,1.5].
In order to comply with the principle of turbu-
lent scale consistency (Ling and Templeton, 2015), the
scale equations
Dk
Dt =P∗
k−β∗kω +∇ · ν+σkνt
SST ∇k(11)
Dω
Dt =γP∗
k
νt
SST
−βω2+CD
+∇ · ν+σωνt
SST ∇ω(12)
are augmented by another data-driven coefficient
g4(Schmelzer et al., 2020; Mandler and Weigand,
2022a), which can be incorporated into the TKE pro-
duction term
P∗
k= min τt:¯
S,10β∗kω+g4
k
τ.(13)
The constants β, β ∗, σk, σω, σω2and γas well as the
cross-diffusion term CD in Eqs. (11) and (12) are
adopted from Menter et al. (2003). Likewise, νt
SST
denotes the eddy viscosity predicted by the original
SST model’s closure. Finally, the turbulent time scale
reads (Menter et al., 2012)
τ=1
β∗ωmax 1.0,6.0rβ∗νω
k!.(14)
The optimal coefficient distributions
The optimal spatial distributions of the closure co-
efficients gop
n(x)can be efficiently obtained from the
α
W0
2
_
H
θ
Ly
_
2
p = Lx
bulk flow
wall (no-slip)
periodicity
x
y
Figure 1: Geometry of the domain and boundary conditions
for the simulation of the flow through a two-
dimensional channel with periodic hills (Mandler
and Weigand, 2022b)
high-fidelity reference solution by a series of consecu-
tive tensor projections.
gop
n=2b−Pn−1
m=1 gop
mTm:Tn
kTnk2∀n≤N(15)
In order to determine gop
4, Eqs. (11) and (12) are
solved given the high-fidelity solution for the mean
flow field, the RST and the TKE. This requires a pre-
cursor RANS simulation which only solves the scale
but neither the continuity nor the momentum equations
(Schmelzer et al., 2020).
Inference of the coefficient variability
The optimal coefficients’ variability is inferred by
solving four regression problems defined by Eq. (2),
where the hypotheses hnare represented by NNs.
These consist of six hidden layers with six neurons
each. A rectified linear unit (ReLU) activation func-
tion is applied to all hidden neurons, whereas the out-
put neuron behaves linearly. By means of the ADAM
algorithm (Kingma and Ba, 2014), the mean squared
error cost function is minimized over 20,000 epochs at
a learning rate of 0.002. In the present work, WD with
a regularization constant of 0.1is replaced by LCC.
The raw features qare listed in Table 1. They
contain the wall-distance dand the blending argument
arg2, which are both provided by the underlying SST
model. In order facilitate the training process, a z-
score normalization is applied to the raw features q
which ensures that the final features qhave zero mean
and unit variance.
4 Training and test cases
As this paper is concerned with the influence of
different regularization strategies on data-driven tur-
bulence models and not with extending their range of
application, a common training and test case is consid-
ered. The two-dimensional flow over periodic hills is
characterized by a repeating pattern of flow separation
and reattachment. Its geometry is depicted in Fig. 1.
The size of the recirculation zone is mainly governed
100101102103104
K[−]
0.2
0.4
0.6
0.8
ε[−]
εtrain
1
εtest
1
εtrain
2
εtest
2
εtrain
3
εtest
3
εtrain
4
εtest
4
Figure 2: RMSE of the predictions of the four hypotheses
hnon the training and test sets as function of their
Lipschitz constants. The optimal a priori and a
posteriori Lipschitz constants are highlighted by
squares and circles, respectively.
by the Reynolds number and the average hill inclina-
tion. While the model is only informed with the DNS
data of one particular combination of these parame-
ters, the a priori test cases cover a variation in the av-
erage hill inclination θand the a posteriori test case
features a significantly higher bulk Reynolds number
ReDh
b(based on hydraulic diameter), see Tab. 2.
5 Results and discussion
Selection of an optimal Lipschitz constant
The Lipschitz constants can be regarded as hyper-
parameters of the hypotheses. In supervised learning
problems, hyperparameters are typically chosen such
that the test error is minimized. Thus, Fig. 2 depicts
the root mean squared error (RMSE) εnfor all four
hypothesis hnevaluated on the training and test set as
function of the Lipschitz constant. While the training
error continuously increases as the Lipschitz constant
decreases, the test error curves for all but the second
hypothesis exhibit a global minimum at an intermedi-
ate Lipschitz constant. This is a typical observation
for a regularization parameter variation. This global
minimum then defines the optimal a priori Lipschitz
constant for each hypothesis hn
KLCCprior
n= arg min
Kεtest
n,(16)
which is highlighted by a square in Fig. 2. The ideal
regularization strength obviously differs between the
different hypotheses. In contrast to the first hypothe-
sis, which requires strong regularization, this suggests
practically dispensing with regularization for the sec-
ond hypothesis.
This a priori selection, however, does not take into
account how the four hypotheses interact with each
other and with the solver. It may, therefore, not lead to
Table 1: Definition of the raw features qusing the squashing functions N1(x) = tanh (x/2) (Geneva and Zabaras, 2019) and
N2(x) = x/ (|x|+ 1) (Ling and Templeton, 2015)
Raw feature qiPhysical interpretation
N2(τkSk)Ratio of turbulent and mean strain time scale
N2(τkΩk)Ratio of turbulent and mean rotation time scale
N2τk−0.5|∇k|Ratio of turbulent length scale and TKE gradient decay length
min 0.02k0.5dν−1,2Wall-distance based Reynolds number
N1(min [arg2,10]) Blending function
Table 2: Relevant non-dimensional parameters for the training and test data sets
Usage θ[◦] ReDh
b[−]Source
training 27.4 22,803 Xiao et al. (2020)
a priori testing [23.4,32.9] 22,803 Xiao et al. (2020)
a posteriori testing 27.4 150,664 Rapp and Manhart (2011)
the best possible agreement with the high-fidelity flow
solution. The brute-force a posteriori selection proce-
dure, i.e. testing all possible combinations of indepen-
dently regularized hypotheses in the solver, would be
far too expensive in general. Hence, starting from the
optimal a priori selection, the Lipschitz constants of
each hypotheses have been independently reduced un-
til the flow solution did significantly deteriorate. The
optimal a posteriori Lipschitz constants KLCCpost
nfor
each hypothesis are highlighted by circles in Fig. 2.
In order to prove the validity of the two variations
of the neuralSST model which are based on the opti-
mal a priori and a posteriori Lipschitz constants, re-
spectively, the corresponding mean flow field predic-
tions for the training case are illustrated in Fig. 3.
In fact, both models yield almost identical predic-
tions and the significantly stronger regularization does
not sacrifice the model’s accuracy, which is measured
based on the agreement with the DNS solution. In
addition, there is no visible difference in the predic-
tions of the neuralSST models which were s.t. WD and
LCC.
Smoothness of the a posteriori coefficient field
As detailed above, the LCC regularization ap-
proach bounds the norm of hypotheses’ Jacobians and,
by virtue of the chain rule of differentiation, yields
smoother spatial distributions of the coefficients. This
is verified for the predictions of the first closure coeffi-
cient g1by the two neuralSST model variations, which
are KLCCprior
n- and KLCCpost
n-Lipschitz, respectively,
see Fig. 4. While the model based on the optimal
a priori Lipschitz constants better captures the true,
physical gradients, it also suffers from spatial oscilla-
tions in the prediction, in particular above the hill crest
and in the free stream where the velocity gradients al-
most vanish. On the contrary, the model based on the
optimal a posteriori Lipschitz constants yields a very
smooth prediction at the cost of not being able to re-
solve the steepest physical gradients. This side-effect
of the strong regularization is not critical, though, be-
cause the mean flow solution for the training case is
the same.
The original neuralSST model whose hypotheses
were s.t. WD, also leads to a smooth field for g1, as
shown in Fig. 4. By promoting smaller absolute val-
ues of the weights, WD also reduces the spectral norm
of the weight matrices and in turn the NN’s Lipschitz
constant. Evaluating Eq. (5) in fact yields that the WD-
regularized hypothesis h1is at most 0.1-Lipschitz.
The Lipschitz constant for the remaining hypotheses
which were s.t. WD are in the same order of mag-
nitude. This indicates that the regularization due to
the WD which was applied by Mandler and Weigand
(2022b) is stronger than enforcing the optimal a pos-
teriori Lipschitz constants. This is in contrast to the
intuition gained from the visual inspection of the coef-
ficient fields shown in Fig. 4.
Smoothness of the a posteriori RST field
A previous study (Mandler and Weigand, 2022b)
revealed that the WD-based neuralSST model drasti-
cally overpredicts the main normal RST component
close to the upper wall. As illustrated in Fig. 5, this
is not the case if the NNs are s.t. LCC. This overshoot
can consequently be attributed to this particular com-
bination of hyperparameters, including but not lim-
ited to the network size and the regularization strength.
Furthermore, the baseline model predicts strong oscil-
lations on the top of the hill crest, which further prop-
agate in streamwise direction, but enforcing the opti-
mal a posteriori Lipschitz constant by means of LCC
seems to prevent these issues. In light of the fact that
the WD-based model has the smallest upper bounds
for the Lipschitz constants, these observations are sur-
prising. However, they are consistent with the spatial
smoothness of the coefficient fields depicted in Fig. 4.
Even though these spatial RST oscillations in the
prediction of the WD-based model do not manifest
012345678910
2vx/Ub+x/H [−]
0
1
2
3
y/H [−]
DNS SST WD LCCprior LCCpost
Figure 3: Mean axial velocity profiles in a two-dimensional channel with periodic hills at ReDh
b= 22,803 predicted by the
SST and variations of the neuralSST model in comparison with DNS results
(a) neuralSST prediction s.t. WD regularization
(b) neuralSST prediction s.t. KLCCprior
n∀n
(c) neuralSST prediction s.t. KLCCpost
n∀n
Figure 4: Spatial distributions of the closure coefficient g1
for the training case predicted by variations of the
neuralSST model
themselves in a significant deterioration of the mean
axial velocity profiles, they hinder convergence. The
residuals of the momentum and pressure equation are
consistently one order of magnitude higher than for the
model which is s.t. LCC with the optimal a posteriori
Lipschitz constants.
6 Conclusion
Data-driven turbulence models which rely on NNs
to predict variable closure coefficients may suffer from
spatially oscillating RST and flow field predictions. In
order to prevent these phenomena, regularization tech-
niques are applied during the training process of the
NNs. In the present work, LCC was proposed as an
alternative to the common WD. This allows to enforce
explicit upper bounds for the Lipschitz constants of the
NNs. Besides providing a theoretical guarantee for a
black-box model, the practical advantages of reason-
ably tightening these bounds were demonstrated. As
the Lipschitz constant of the NNs decreases, the pre-
dicted coefficient fields become smoother which facil-
itates both stability and accuracy of the mean flow so-
lution. These findings suggest that Lipschitz continu-
ity based regularization techniques are very well suited
for NNs serving as sub-models in partial differential
equations. A particularly fruitful application, which
will be investigated in the future, may be the combina-
tion of an augmented turbulence model with sophisti-
cated scalar-flux models utilizing the entire RST rather
than assuming a constant turbulent Prandtl number.
Acknowledgments
The investigations were conducted as part of the
joint research programme Roboflex (AG Turbo 2019)
in the frame of AG Turbo. The work was supported
by the Bundesministerium f¨
ur Wirtschaft und Energie
(BMWE) under grant number 03EE5013C. The au-
thors gratefully acknowledge AG Turbo and MTU
Aero Engines AG for their support and the permission
to publish this paper.
References
Akhtar, N. and Mian, A. (2018), Threat of adversarial at-
tacks on deep learning in computer vision: A survey, IEEE
Access, Vol. 6, pp. 14410-14430.
Duraisamy, K. (2021), Perspectives on machine learning-
augmented reynolds-averaged and large eddy simulation
models of turbulence, Phys. Rev. Fluids, Vol. 6, pp. 050504.
Geneva, N. and Zabaras, N. (2019), Quantifying model form
uncertainty in reynolds-averaged turbulence models with
bayesian deep neural networks, J. Comp. Phys., Vol. 383,
pp. 125-147.
0123456789
16τt
xx/U 2
b+x/H [−]
0
1
2
3
y/H [−]
Exp. SST WD LCCpost
Figure 5: Main RST component profiles in a two-dimensional channel with periodic hills at ReDh
b= 150,664 predicted by the
SST and variations of the neuralSST model in comparison with measurements
Goodfellow, I., Bengio, Y., and Courville, A. (2016), Deep
Learning, MIT Press, Cambridge, MA, USA.
Gouk, H., Frank, E., Pfahringer, B., and Cree, M. J. (2021),
Regularisation of neural networks by enforcing Lipschitz
continuity, Mach. Learn., Vol. 110(2), pp. 393-416.
Hornik, K. (1991), Approximation capabilities of multilayer
feedforward networks, Neural Networks, Vol. 4(2), pp. 251-
257.
Jiang, C., Vinuesa, R., Chen, R., Mi, J., Laima, S., and Li,
H. (2021), An interpretable framework of data-driven turbu-
lence modeling using deep neural networks, Phys. Fluids,
Vol. 33(5), pp. 055133.
Kingma, D. P. and Ba, J. (2014), Adam: A method for
stochastic optimization, arXiv:1412.6980.
Ling, J., Kurzawski, A., and Templeton, J. (2016), Reynolds
averaged turbulence modelling using deep neural networks
with embedded invariance, J. Fluid Mech., Vol. 807, pp. 155-
166.
Ling, J. and Templeton, J. (2015), Evaluation of ma-
chine learning algorithms for prediction of regions of high
Reynolds averaged Navier Stokes uncertainty, Phys. Fluids,
Vol. 27(8), pp. 085103.
Mandler, H. and Weigand, B. (2022a), On frozen-RANS ap-
proaches in data-driven turbulence modeling: Practical rele-
vance of turbulent scale consistency during closure inference
and application, Int. J. Heat Fluid Flow, Vol. 97, pp. 109017.
Mandler, H. and Weigand, B. (2022b), A realizable and
scale-consistent data-driven non-linear eddy-viscosity mod-
eling framework for arbitrary regression algorithms, Int. J.
Heat Fluid Flow, Vol. 97, pp. 109018.
Menter, F. R., Garbaruk, A. V., and Egorov, Y. (2012), Ex-
plicit algebraic Reynolds stress models for anisotropic wall-
bounded flows, In EUCASS Proc. Ser. - Adv. Aerosp. Sci.,
Vol. 3, pp. 89-104.
Menter, F. R., Kuntz, M., and Langtry, R. (2003), Ten years
of industrial experience with the SST turbulence model, In
Hanjalic, K., Nagano, Y., and Tummers, M. J. (eds.), Turb
Heat Mass Transfer, Vol. 4, pp. 625-632.
Miyato, T., Kataoka, T., Koyama, M., and Yoshida, Y.
(2018), Spectral normalization for generative adversarial
networks, arXiv:1802.05957.
Neyshabur, B. (2017), Implicit Regularization in Deep
Learning, PhD thesis, Toyota Technical Institute, Chicago,
IL, USA.
Pauli, P., Koch, A., Berberich, J., Kohler, P., and Allg¨
ower,
F. (2022), Training robust neural networks using Lipschitz
bounds, IEEE Control Syst. Lett., Vol. 6, pp. 121-126.
Pope, S. B. (1975), A more general effective-viscosity hy-
pothesis, J. Fluid Mech., Vol. 72(2), pp. 331-340.
Rapp, C. and Manhart, M. (2011), Flow over periodic hills:
an experimental study, Exp. Fluids, Vol. 51(1), pp. 247-269.
Rosca, M., Weber, T., Gretton, A., and Mohamed, S. (2020),
A case for new neural network smoothness constraints, In
Zosa Forde, J., Ruiz, F., Pradier, M. F., and Schein, A. (eds.),
Proc. ”I Can’t Believe It’s Not Better!” NeurIPS Workshop,
Vol. 137 of Proc. Mach. Learn. Res., pp. 21-32.
Scaman, K. and Virmaux, A. (2018), Lipschitz regularity of
deep neural networks: Analysis and efficient estimation, In
Proc. 32nd Int. Conf. Neural Inf. Proc. Syst., pp. 3839-3848.
Schmelzer, M., Dwight, R. P., and Cinnella, P. (2020), Dis-
covery of algebraic Reynolds-stress models using sparse
symbolic regression, Flow Turbul. Combust., Vol. 104,
pp. 579-603.
Usama, M. and Chang, D. E. (2019), Towards robust neural
networks with Lipschitz continuity, In Yoo, C. D., Shi, Y.-Q.,
Kim, H. J., Piva, A., and Kim, G. (eds.), Digital Forensics
and Watermarking, Vol. 11378 of Lecture Notes in Computer
Science, pp. 373-389. Springer International Publishing.
Xiao, H. and Cinnella, P. (2019), Quantification of model
uncertainty in RANS simulations: A review, Prog. Aerosp.
Sci., Vol. 108, pp. 1-31.
Xiao, H., Wu, J.-L., Laizet, S., and Duan, L. (2020), Flows
over periodic hills of parameterized geometries: A dataset
for data-driven turbulence modeling from direct simulations,
Comp. Fluids, Vol. 200, pp. 104431.
Yoshida, Y. and Miyato, T. (2017), Spectral norm regular-
ization for improving the generalizability of deep learning,
arXiv:1705.10941.