PreprintPDF Available

Abstract

Safeguarding and type approval of automated vehicles is a key enabler for their market launch in our complex traffic environment. Scenario-based testing by means of computer simulation is becoming increasingly important to cope with the enormous complexity and effort. However, there is a huge gap when assessing the safety of the virtual vehicle while the real vehicle will drive on the road. Simulation must be accompanied by model validation to ensure its credibility since errors and uncertainties are inherent in every model. Unfortunately, this is rarely addressed in the current literature. In this paper, a modular process is presented covering both model validation and safeguarding. It is characterized by the fact that it quantifies a large number of errors and uncertainties, represents them in the form of an error model and ultimately integrates them into the safeguarding results. It is applied to a type approval regulation for the lane keeping behavior of a vehicle under various scenario conditions. The paper contains a thorough validation of the methodology itself by comparing its results with actual ground truth values. For this comparison, a binary classifier and confusion matrices are used that relate the binary type approval decisions. The classifier demonstrates that the methodology of this paper identifies a systematic error of the simulation model across several safeguarding scenarios. Finally, the paper provides recommendations for alternative configurations of the modular methodology depending on different requirements.
Non-deterministic Model Validation Methodology for Simulation-based Safety
Assessment of Automated VehiclesI
Stefan Riedmaiera,1,, Jakob Schneidera, Benedikt Danquaha, Bernhard Schickb, Frank Diermeyera
aTechnical University of Munich, Institute of Automotive Technology, Boltzmannstr. 15, 85748 Garching b. München, Germany
bKempten University of Applied Sciences, Bahnhofstr. 61, 87435 Kempten, Germany
Abstract
Safeguarding and type approval of automated vehicles is a key enabler for their market launch in our complex trac
environment. Scenario-based testing by means of computer simulation is becoming increasingly important to cope
with the enormous complexity and eort. However, there is a huge gap when assessing the safety of the virtual vehicle
while the real vehicle will drive on the road. Simulation must be accompanied by model validation to ensure its
credibility since errors and uncertainties are inherent in every model. Unfortunately, this is rarely addressed in the
current literature. In this paper, a modular process is presented covering both model validation and safeguarding. It is
characterized by the fact that it quantifies a large number of errors and uncertainties, represents them in the form of an
error model and ultimately integrates them into the safeguarding results. It is applied to a type approval regulation for
the lane keeping behavior of a vehicle under various scenario conditions. The paper contains a thorough validation of
the methodology itself by comparing its results with actual ground truth values. For this comparison, a binary classifier
and confusion matrices are used that relate the binary type approval decisions. The classifier demonstrates that the
methodology of this paper identifies a systematic error of the simulation model across several safeguarding scenarios.
Finally, the paper provides recommendations for alternative configurations of the modular methodology depending on
dierent requirements.
Keywords: Automated vehicles, Model validation, Predictive models, Safety assessment, Systems simulation,
Uncertainty aggregation
1. Introduction
In recent years, automated driving has raised great hopes of making future mobility more environmentally friendly,
comfortable and safe. Today, vehicles with assistance systems and partial automation are available on the market, but
automated vehicles (AVs) from Level 3 onward according to SAE [
1
] pose great challenges for industry and academia.
It is not only challenging to develop AVs, but especially to prove their safety in our complex trac environment with
an infinite amount of dierent situations [
2
]. The United Nations Economic Commission for Europe (UNECE) has
developed a proposal for the type approval of Automated Lane Keeping Systems (ALKS), which should pave the way
for the market release of Level 3 vehicles in the next years [3].
Various safety assessment approaches have been developed by the entire AV research community, since classical
real-world testing approaches reach their limits for AVs due to enormously high mileage [
4
,
5
]. Above all, the scenario-
based approach is to be mentioned here, as it is frequently used in the literature and as it was implemented in the two
large research projects PEGASUS [
6
] and ENABLE-S3 [
7
]. It is a promising method that tests the AV in individual
trac situations, mainly by means of systems simulation [
8
] and by discarding most of the driving time without any
actions and events. The scenario-based approach can be accompanied by further safety assessment approaches. System
IThis document is the results of the research project funded by TÜV SÜD Auto Service GmbH.
Corresponding author
Email address: riedmaier@ftm.mw.tum.de (Stefan Riedmaier)
1S. Riedmaier is the first author.
Preprint submitted to Simulation Modelling Practice and Theory November 6, 2020
functions of the AV can be compared with requirement specifications and standards [
9
], models of the controller and
trajectory planner can be verified by formal methods [
10
], trac simulations can analyze the overall impact of AVs on
trac [
11
], etc. A comprehensive overview of safety assessment approaches can be found in our previous paper [
12
].
We focus in the following on the scenario-based approach, since it is currently the most widely used approach.
The majority of the safety assessment literature uses mathematical or computational models for their proof of
concept, as the real testing eort is hardly feasible. However, the model-based techniques are very rarely accompanied
by model validation activities, especially in quantitative terms [
12
]. This represents a very risky gap. Without validation
of the simulation models, there can be no trustworthiness of the models and ultimately no credible decision making for
such safety-critical systems as AVs.
In our previous paper [
13
], we gave a survey about publications across several engineering fields, which present
Verification, Validation and Uncertainty Quantification (VV&UQ) approaches to assess the credibility of simulation
models. The few papers from the automotive safety assessment field were facing great challenges. They often focus
only on individual components such as vehicle dynamics or environmental sensors. They use simple qualitative or
deterministic validation approaches without considering dierent types of errors and uncertainties inherent in any
simulation model. They do not analyze the eect of those errors and uncertainties on the final safety assessment
decisions. The papers from numerical engineering fields presented enhanced VV&UQ approaches. However, those
could, to the best of our knowledge, never be applied to complex practical problems [
14
] such as cyber-physical
systems [
15
]. Therefore, we introduced a novel, modular and unified VV&UQ framework in [
13
] and integrated several
approaches so that the dierent engineering fields can benefit from each other.
After presenting the general framework, this paper focuses on its application to the specific use case of safeguarding
AVs, on its validation and on the comparison of dierent framework manifestations. In this paper we are less concerned
with the quality of a single simulation model compared to reality and more with the quality of our VV&UQ methodology
itself. We therefore apply the Method of Manufacture Universes (MMU) [
16
], which compares a simulation model
with a manufacturing universe to analyze important characteristics. The application of the VV&UQ methodology to
the comparison between simulation and reality follows in a subsequent paper.
The main contributions are:
1. a comprehensive overview about the validation of AV models with derivation of requirements,
2. our modular VV&UQ framework in specific configuration for model-based safety assessment of AVs,
3. the first application of a VV&UQ approach with aggregation of errors and uncertainties to AVs,
4.
a detailed analysis and fair comparison of a deterministic and a non-deterministic manifestation of the framework
based on MMU.
Section 2 summarizes the state of the art in validation of AV models and in VV&UQ in general. Section 3 concludes
the state of the art with an extensive analysis and derivation of requirements for our methodology. Section 4 illustrates
the use case of the type approval of an automated vehicle in this paper and the configuration of the manufactured
universe. Section 5 describes our model-based methodology based on the use case. Section 6 follows a detailed
evaluation of the results in order to validate and compare dierent manifestations of the methodology. Finally, the
conclusion in Section 7 summarizes the most important research findings.
2. Related Work
This section begins by outlining fundamental sources of modeling errors and uncertainties to motivate why VV&UQ
activities are so important. Then, it gives a comprehensive overview about validation of AV models. Since this field of
application is strongly characterized by deterministic systems simulations (point predictions) and hardly any distinction
is made between dierent sources of errors and uncertainties, a non-deterministic approach is presented in the last part.
Further information can be found in our survey paper [
13
]. Here, we highlight individual aspects that are integral to
this paper’s central theme and understanding2.
2
Section 2.1 is a compact summary of [
13
, Sec. 2.2.2, 2.2.4, 2.2.5] and Section 2.3 of [
13
, Sec. 5.2.3, 6.2.2, 6.4.5, 7.1.1]. Section 2.2 is a
restructured and revised version of [13, Sec. 7.2].
2
Table 1: Classification of references addressing AV model validation. The columns contain the aliation of the reference to a component according
to Sec. 2.2.1-2.2.4. The rows distinguish between deterministic and non-deterministic simulation. References of the same research group are
combined in the same bracket.
Sensor Vehicle Closed-loop Trac
Deterministic [22], [23, 24], [25], [26], [27] [28, 29], [30] [31, 32, 33], [34], [35] [36], [37], [38], [39]
Non-deterministic - [40, 41], [42] [43, 10], [44] [45]
2.1. Sources of Errors and Uncertainties
Errors are inherent in every simulation model, since a model is by definition a simplified abstraction of reality. The
error, denoted
e
, indicates the deviation compared to the true value of nature
ytrue
. However, it is often quite dicult
to precisely quantify the errors of a simulation model. As soon as it is not possible, associated uncertainties arise.
Two basic types of uncertainties can be distinguished. Aleatory uncertainties describe stochastic eects and natural
variability and can be quantified by probability distributions. Epistemic uncertainties exist when knowledge is poor and
can therefore either be reduced or at least be quantified, e. g. by intervals treating all values equally. [17, Sec. 1.2].
One source of errors and uncertainties results from the calculation of computational models with computers of
finite precision compared to the exact solution of the mathematical model. These numerical errors can vary greatly
depending on the specific application. They include both basic code errors and errors in solving the equations, such as
rounding errors or the discretization error
eh
. During model verification, the first category is treated by code verification
and the second category by solution verification [18, Sec. 2].
Another source of errors and uncertainties arises from model inputs (
ex
) and parameters (
eθ
). Whereas they are
usually assumed fully characterized in deterministic simulations, this is not the case for partially and uncharacterized
experiments as well as unknown model prediction conditions. Input uncertainty quantification intends to rigorously
quantify those uncertainties. In non-deterministic simulations, the input uncertainties are propagated through the
simulation model and aggregated on the output side [19].
Furthermore, the model-form itself contains errors
em
and uncertainties due to assumptions or the selection of
an inadequate equation. They can be quantified based on physical data from model validation experiments. The
experiments are re-simulated and the results of the physical system
gs
and the simulation model
gm
are compared using
validation metrics [20].
The involved errors can be summarized as follows [18, eq. (1-5-6)]:
ytrue =gs(x)ey,obs (1)
=gm(x, θ, h)(em+ex+eθ+eh).(2)
The various sources of errors and uncertainties are deeply interwoven and interfere with each other. Depending on
the constellation, they can magnify or compensate each other, which results in a misleading trustworthiness of the
simulation model [
18
, Sec. 1]. This eect is sometimes seen during model calibration, when an inadequate model-form
is compensated by adjusting the parameters far beyond their physical significance until the results coincide under
certain calibration conditions [
21
]. It is therefore helpful to quantify the dierent uncertainties separately as far as
possible. Certainly, additional sources such as measurement errors
ey,obs
or extrapolation errors caused by model
predictions outside the scope of validity make this even more dicult.
2.2. Validation of Automated Vehicle Models
We sort this section on the validation of AV models based on dierent components that form the entire AV, since
there exists some literature on component-level, but just a few references on system-level. The central modules of an
AV include environmental sensors, several software units for environment perception, motion planning and control, as
well as actuators and vehicle dynamics. An overview about most of the references is given in Table 1.
2.2.1. Sensor Model Validation
The development of models for camera, lidar and radar sensors is currently a key enabler for simulation-based
safety assessment of AVs. They are essential for realistic simulations and were recently used to identify challenging test
3
scenarios for AVs [
46
]. Holder et al.
[47]
, Rosenberger et al.
[48]
systematically derive requirements for sensor models
to determine required fidelity levels. Schaermann et al.
[24]
distinguish parametric, non-parametric and ray-tracing
approaches for sensor modeling, either on the level of raw sensor data or of processed object lists.
Regarding sensor model validation, it is also possible to distinguish between validation on raw data or object list
level. Hanke et al.
[23]
, Schaermann et al.
[24]
compare a lidar sensor and its corresponding model with regards to raw
point clouds and occupancy grids of trac objects. For the comparison, they select three validation metrics that can
handle the complex nature of the data. They are referred to as overall error, Barons and Pearson correlation coecients.
Abbas et al.
[22]
develop a metric to compare the visual complexity of real and synthetic images as camera raw data
based on color and spatial information.
Nentwig et al.
[26]
apply a classifier to camera images to perform comparisons on the level of object lists. They
compare both the object hypothesis and the bounding boxes from the classifier, the latter based on the box sizes and the
former based on a confusion matrix. Gaidon et al.
[25]
also apply classifiers and present eight metrics to measure the
real-to-virtual performance gap. Zec et al.
[27]
validate the sensor fusion of camera and radar models. On the one hand,
they compare time signals of processed trac objects with a log-likelihood and a Root Mean Square Error (RMSE).
On the other hand, they compare statistical histogram data with a Jensen-Shannon divergence.
2.2.2. Vehicle Model Validation
For validation of vehicle dynamics models, standardized test maneuvers such as steady-state cornering [
29
] or
sine with dwell are used [
28
]. They provoke a characteristic behavior of the vehicle, which can be evaluated on the
basis of time signals and Key Performance Indicators (KPIs). Both standards were developed with regard to the
simulation-based type approval of electronic stability control systems. The standard [
29
] assumes one deterministic
simulation of the steady-state cornering maneuver as a baseline, which results in one time signal of the steering wheel
angle, the sideslip angle and the roll angle, plotted against the lateral acceleration, respectively. It adds a pre-defined
tolerance band around each baseline signal and checks whether multiple repetitions of physical experiments fall within
these bands. Afterwards, the standard [
28
] performs a similar comparison based on tolerances for specific KPIs in the
sine with dwell maneuver. If all tolerances are met, the generic validity of the vehicle dynamics model is concluded.
Then, the actual type approval of dierent vehicle variants can take place in the simulation [
49
]. Kutluay and Winner
[30] give a comprehensive overview of the literature in validation of vehicle dynamics models.
Kutluay
[40]
inverts the tolerance approach from those standards in his PhD thesis. Instead of adding the tolerances
to the deterministic simulation, he calculates confidence intervals from the experimental repetitions based on Student’s
t-distribution and adds the tolerances to the confidence intervals around the experimental mean. He oers either
absolute magnitudes or relative percentages for the tolerance values. Then, he can check, whether the simulation falls
within the area around the experimental mean. He distinguishes an averaged input case, where the experimental input
data is averaged to perform one deterministic re-simulation, and an averaged output case, where each experiment is
re-simulated and the average is taken on the result stage. According to Oberkampf and Roy
[50
, p. 492
]
, the former
will lead to significant deviations for strongly non-linear systems.
Viehof
[41]
extends the validation methodology from Kutluay
[40]
in his PhD thesis. He also re-simulates each
experimental repetition, but then compares Probability Density Functions (PDFs), without averaging the outputs. He
accepts the simulation model based on a statistical t-test, if the PDF from the model lies within the one from the
experiment. In case of lower requirements, he switches back to the conventional tolerance values and checks whether
the simulation PDF lies within the tolerance around the experimental mean. Similarly, Rhode
[42]
use non-deterministic
Monte Carlo simulations and confidence bands for vehicle dynamics model validation.
2.2.3. Closed-loop Model Validation
In our previous paper [
35
], we used an adaptive cruise control function to control a test vehicle and its corresponding
vehicle model. We compared a purely virtual Model-in-the-Loop (MiL) simulation and a hybrid Vehicle-in-the-Loop
(ViL) simulation on a chassis dynamometer with physical proving ground tests. As test scenarios, we selected vehicle
following, emergency braking and cutting-in. We used qualitative graphical comparisons and quantitative validation
metrics such as the correlation coecient to validate the MiL and ViL simulation along the control pipeline.
Groh et al.
[31]
, Notz et al.
[32]
, Wagner et al.
[33]
focus on the influence of the sensor data on the validation of the
closed-loop AV, compared to independent ground truth measurements of the environment. They compare signals such
4
as the velocity, trajectories or an overall risk measure along the control pipeline and evaluate the validation errors via
box-plots. Aramrattana et al. [51] analyze the influence of modeling errors and uncertainties on control performance.
Schürmann et al.
[10]
use formal methods for safety assessment of AVs. They create non-deterministic models
based on set theory and apply reachability analysis to determine which states the AV can reach from given initial states
and possible inputs and parameters. If the reachable set of the AV does not intersect with predicted ones of other trac
participants, the AV is safe. In contrast to sampling techniques such as Monte Carlo, the reachability analysis can
formally guarantee safety. They require non-deterministic models for the vehicle dynamics with controller and for the
other trac participants.
Hartung et al.
[43]
create those models by using a conformance testing technique. They define a formal notion
of conformance, similar to validation metrics, and test whether the behavior of the non-deterministic model encloses
the one from the physical system. Strictly speaking, the conformance testing is formulated as an inverse optimization
problem, which optimizes the parameter sets so that the bound is as tight as possible. This can be seen as a model
calibration approach rather than a model validation approach. Böde et al.
[44]
propose three notions of model validity
and find an optimal split between calibration and validation data. Johnson et al.
[34]
apply formal methods based
on correct-by-construction controller design and validate the closed-loop model by comparing the percentage of
collision-free runs in a parking lot scenario.
2.2.4. Trac Model Validation
In contrast to microscopic systems simulations, trac simulations with multiple agents do not assess individual
AVs, but determine the overall impact of AVs on trac. Detering et al.
[45]
measure parametric uncertainties for Monte
Carlo simulations. Zheng et al.
[52]
use extreme value theory to validate statistical trac models. Rao and Owen
[38]
apply autoregressive integrated moving average models to analyze the errors of trac models. An overview about the
calibration of trac models including metrics for comparison is given in [36, 37, 39].
2.3. Probability Bound Analysis
Oberkampf and Roy
[50]
use Probability Bound Analysis (PBA) as VV&UQ framework to separately quantify input,
numerical and model-form uncertainties. They describe aleatory uncertainties in the form of probability distributions,
epistemic uncertainties as intervals and mixed uncertainties as probability boxes (p-boxes). The latter are imprecise
probabilities that add an interval width to a Cumulative Distribution Function (CDF). By combining all sources of
uncertainties they obtain a final p-box, which bounds the true value with a high probability.
They use the Richardson extrapolation technique to quantify the discretization error by comparing results of
dierent step sizes as a replacement for the exact mathematical model. They quantify input uncertainties and apply
nested Monte Carlo sampling to propagate them through the simulation model. In the outer loop they take samples
from the epistemic parameters and in the inner loop from the aleatory parameters for each of the epistemic ones. While
all aleatory samples form a single CDF – more precisely a stepwise Empirical CDF (ECDF) – the epistemic samples
provide the width of the CDF and ultimately result in a p-box. During model validation, they quantify the area between
the distribution from simulation and experiment to quantify the model-form uncertainty. They extrapolate the latter
from the experimental validation conditions to the ones used for model prediction with a regression model. Finally, they
add the epistemic intervals from the model-form and the numerical uncertainty to both sides of the input uncertainty
distribution to obtain the final p-box.
3. Analysis of the Related Work
In this section, we analyze the related work presented so far to identify research gaps and directions for the
remaining paper. Again, we address both the validation of AV models and the PBA approach.
3.1. Validation of Automated Vehicle Models
The related work on the validation of AV models contains references addressing specific component models as well
as references addressing the entire closed-loop model:
5
1.
The sensor model references focus mainly on the complex nature of the multi-dimensional sensor data and on
specific sensor eects. Comparisons on the level of object lists contain the impact of the sensor data on the
detection algorithms and allow the application of typical time series metrics.
2.
The vehicle dynamics references are based on standardized maneuvers and tolerance values for the permissible
deviations between simulation and experiment. These tolerances represent subjective expert estimates with
regard to a classic vehicle dynamics evaluation and describe model validity as a simple binary result. Regarding
AV safety, these tolerances are not applicable, since model validation by definition always refers to a certain use
case.
3.
The current closed-loop references are either embedded in a formal framework running online in the AV or
analyze how modeling errors flow through the AV pipeline.
4.
The trac model references are interesting for safety assessment approaches that use trac simulations to
analyze the impact of AVs. Since we are focusing on the scenario-based approach, we require a re-simulation of
trac trajectories instead of trac models.
All approaches lack an aggregation of errors and uncertainties for the safety assessment of the entire AV. It is crucial
to analyze the impact of these errors on safety, because in the end we are interested in making a decision on the safety
of the entire AV based on the simulation models.
The validation of component models is a powerful tool for the developer of the respective component. Furthermore,
it also increases the trustworthiness of the entire model on a qualitative level. On a quantitative level, however, it is
very dicult to relate the errors in component modeling to the final decisions about the system. Suppose a robust vs.
a sensitive lane keeping function controls the same vehicle. The robust controller might compensate a poor vehicle
dynamics model violating the permissible tolerances. The sensor, controller and vehicle dynamics interact with each
other and should not only be validated independently.
This type of extrapolation in the system hierarchy from component to system level is generally one of the main
current challenges in VV&UQ [
13
, Sec. 6.6] with only a few recent approaches such as [
20
,
17
]. There are engineering
fields such as spacecrafts [
53
] or nuclear reactor safety [
54
] that make system level tests impossible for cost or safety
reasons and require such VV&UQ approaches. However, this is not the case with AVs. Therefore, we concentrate
in this paper exclusively on the entire AV. Nevertheless, we make sure that the methodology is also applicable to
component models.
3.2. Probability Bound Analysis
In our survey paper [
13
], we have also presented a variety of references that focus on advanced non-deterministic
approaches. However, there were only a few general approaches which cover the entire model-based process and which
proved dicult to apply to complex systems. For example, Eek et al.
[15]
found that the number of components and
parameters of an aircraft simulation model is too complex to be handled by PBA with reasonable eort. Nevertheless,
we presented a modular framework in [
13
] and are firmly convinced that the advanced approaches can be used in
selected blocks of the framework even for complex systems such as AVs.
In this paper, we have chosen to use PBA as part of our framework for safety assessment of AVs. PBA is the main
approach of Frequentist VV&UQ. It objectively estimates relative frequencies of model inputs and parameters based
on repeated measurements. Unlike Bayesian approaches, it does not incorporate personal beliefs in the form of prior
probabilities, nor does it modify the original model with new data based on Bayes’ theorem. Therefore, it meets the
requirements of an independent type approval of an automated vehicle, which we are particularly interested in.
4. Use Case
In the first part of this section, we present the Lane Keeping Functional Test (LKFT) of UN regulation 79 for the
type approval of lateral driving functions. In the second part, we describe the selection of our simulation models based
on the Method of Manufactured Universes.
6
𝑅
𝑣
𝑎, 𝑗
𝑦
Figure 1: Lane Keeping Functional Test (LKFT)
4.1. Lane Keeping Functional Test
We choose UNECE regulation 79 in Revision 4 [
55
] as the use case of this paper and target the Automatically
Commanded Steering Function (ACSF) of category B1. The latter "means a function which assists the driver in keeping
the vehicle within the chosen lane, by influencing the lateral movement of the vehicle." In production vehicles, it is
either a standalone assistance system (SAE Level 1) or combined with adaptive cruise control for Partial Automation
(SAE Level 2). The R-79 lends itself as a use case, because a regulation generally has a public and independent
character with clearly defined test cases and pass/fail criteria, and the R-79 is also binding for current production
vehicles of Level 1 and 2.
The R-79 describes the Lane Keeping Functional Test (LKFT) [
55
, Sec. 3.2.1] to check whether the ACSF system
can detect and keep its own lane without running over the lane markings. The principle is visualized in Fig. 1. In detail,
the test scenario is specified as follows:
"The vehicle shall be driven [...] with a constant speed on a curved track with lane markings at each side."
"The necessary lateral acceleration to follow the curve shall be between 80 and 90 per cent of the maximum
lateral acceleration specified by the vehicle manufacturer ay,smax."
"The vehicle speed shall remain in the range from vsmin up to vsmax."
"The vehicle manufacturer shall demonstrate [...] that the requirements for the whole lateral acceleration and
speed range are fulfilled."
The pass criteria are fulfilled if:
"The vehicle does not cross any lane marking;"
"The moving average over half a second of the lateral jerk does not exceed 5 m/s3."
Thus, the LKFT focuses on the stationary conditions of a roughly constant lateral acceleration
a
and velocity
v
. The
ranges depend on the manufacturer and are assumed here as typical values
vsmin
=
80 km/h
,
vsmax
=
180 km/h
and
ay,smax
=
2.5 m/s2
. The curve radius
R
=
v2/a
is a third dependent variable. We extend the LKFT in this paper by the
scenario parameters wind speed
vw
, road gradient
sr
and tank load
lt
. We thus bring the example closer to reality and
consider parameters from dierent categories without losing focus on
v
and
a
. The categories include driving states of
the vehicle as well as the road and the environment layer of the 5-layer environmental model presented in [
56
]. Our
LKFT example can be summarized in vector notation as
x=h1vavwsrltiTRNx+1(3)
7
AV Velocity (km/h)
80
100
120
140
160
180
Normalized Lat. Accel. (-)
0.4
0.5
0.6
0.7
0.8
Min. Dist. to Line (m)
0.0
0.1
0.2
0.3
0.4
(a) Simulation model
AV Velocity (km/h)
80
100
120
140
160
180
Normalized Lat. Accel. (-)
0.4
0.5
0.6
0.7
0.8
Min. Dist. to Line (m)
0.00
0.05
0.10
0.15
0.20
0.25
(b) Manufactured Universe
Figure 2: Minimum distance to line across the scenario space
with
Nx
=5 inputs. We concentrate on the
Ny
=1 output of the distance to line
yR
, as it is much more descriptive
than the jerk. Finally, we are aware that the regulation with two parameters is currently pursued with physical testing.
Nevertheless, the extended use case serves as a blueprint for a model-based safeguarding process, as it is firmly planned
for higher automation levels [3, 6]. Therefore, we will use the term AV as a placeholder for all automation levels.
4.2. Method of Manufactured Universes
Stripling et al.
[16]
introduce the Method of Manufactured Universes (MMU) as an approach to validate uncertainty
quantification methods. Thus, it is not a method to validate the models, but to validate the validation methodology itself.
The intention of MMU is to create a manufactured universe where the true values of nature are known. In contrast
to reality, where the true values can be estimated, if at all, by extensive measurements, the manufactured universe
impresses by any number of simulations in a known environment. For sure, the user must manufacture the universe in
a way that it is close to reality. Otherwise, the transferability of the findings is not guaranteed. In MMU, the actual
simulation model is compared with the manufactured one by means of the desired validation methodology. Stripling
et al.
[16]
analyze UQ methods in a particle transport universe by comparing the actual low-fidelity model with a
high-fidelity reference. Whiting et al.
[57]
create a CFD universe to compare four validation methodologies including
PBA under validation and prediction conditions.
In accordance with the current state of science, we also create a universe for the validation of our methodology. We
adapt the MMU approach by injecting modeling errors into the manufactured universe so that we can verify that a
validation methodology is able to identify these errors. We select a pre-configured vehicle dynamics model of a sports
car, a simple PI controller for lane keeping and an ideal sensor model from a vehicle simulation tool. We use a vehicle
mass of
1377 kg
for the simulation model and a mass of
1577 kg
for the manufactured universe (MU) model. This
systematic error of
200 kg
can be compared with the weight of two to three passengers. The mass has an influence
on the overall driving behavior of the vehicle. The resulting minimum distance to line across the scenario space is
already anticipated in the two surface plots in Figure 2. Higher lateral accelerations and higher velocities for constant
radii lead to a smaller distance to line. This corresponds to the driving physics during cornering in which the vehicle is
pushed outwards by the centripetal force. The heavier vehicle has smaller distances and for high lateral accelerations it
has even a plateau of zero distance to line representing a line crossing. We deliberately generated this behavior in our
universe because it is especially safety-critical if the simulation model passes all tests and gives the developer a positive
feedback although the real system would fail several times. It is now the challenge for the validation methodology to
identify those fails.
8
Validation Domain (v)
Application Domain (a)
Scenarios Model & System Assessment Error / Uncertainty
Pipeline Decision Making
Single scenario
subsequently
All scenarios
at once
Unavailable
in reality
LKFT
Approval
Scenarios
Linear Regr.
Fit
Linear Regr.
Inference
LKFT
Validation
Scenarios
L1/2 AV
Model
L1/2 AV
MU-Model
Validation
Metric
L1/2 AV
Model
Type
Approval
LKFT
Assessment
LKFT
Assessment
Uncertainty
Expansion
Tolerance
Approach
L1/2 AV
MU-Model
LKFT
Assessment
LKFT
Assessment
1
2
Figure 3: Model-based process for AV homologation, based on previous work in [13, Fig. 1]
5. Methodology and Application Results
In this section we implement our generic framework from [
13
, Fig. 1] for the use case of safeguarding AVs. The
specific configuration of the framework can be seen in Fig. 3. It describes a model-based process of model validation
and model prediction in the application domain with several individual steps. We describe the methodology step-by-step
from left to right and from top to bottom, using exemplary results of the LKFT from R-79. The mathematical symbols
that describe the interface of each framework block are summarized in Fig. 3. We deal with both a deterministic and a
non-deterministic manifestation of the framework during the description of the blocks and will compare them in the
next section. The two terms refer to the type of simulation. In contrast to state of the art deterministic validation, our
deterministic manifestation also takes interpolation uncertainties into account, but no input uncertainties compared
to the non-deterministic simulation. The interested reader is referred to [
13
] regarding general information and more
detailed theory.
5.1. Model Verification
Before proceeding with model validation and prediction, the numerical eects during model verification should
be analyzed and the model parameters estimated. We skip the step of model calibration, since we assume already
parameterized models for an independent type approval. Therefore, we dedicate this subsection to a numerical
pre-analysis.
Since we do not have the exact mathematical model, we use the Richardson Extrapolation technique to estimate
the numerical discretization error. We perform three simulations with a fine step size
h1
=
2.5×104s
, a medium
step size
h2
=
5×104s
and the actual coarse step size
h3
=
1×103s
. This corresponds to a small refinement factor
9
Table 2: Parameter characteristics. There are several scenarios in the global validation and application space and again several random samples in the
local space around each scenario. A normal distribution
N
(
µ, σ2
) is specified by its mean
µ
and variance
σ2
. The repetitive samples refer to the
physical validation experiments, the nested samples to the remaining non-deterministic simulations.
Validation Application Uncertainty (around each scenario)
space scen-
arios
space scen-
arios
space samples
(nested)
samples
(rep.)
Parameter Unit min max min max type size
Velocity km/h 90 170 3 80 180 6 Aleatory N(0,0.5)
P10 P10
Lat. Accel. - 0.4 0.8 3 0.35 0.85 5 Aleatory N(0,0.01)
Wind Speed km/h -5 5 2 -5 5 2 Aleatory N(0,2)
Tank Load kg -20 20 2 -20 20 2 Aleatory N(0,0.5)
Road Slope -1 1 2 -1 1 2 Epistemic [0.1,0.1] 3
QNv=72 Na=240 30 Nv
r=10
r
=
h3/h2
=
h2/h1
=2. It gives identical results of the distance to line in the relevant decimal places for all three step
sizes. This shows high convergence and very small numerical errors en0.
The negligible numerical eects coincide with the findings of Viehof and Winner
[58]
from vehicle dynamics
simulation. However, we would like to point out that complex co-simulations with dierent solvers and step sizes are
sometimes used to assess the safety of AVs, which no longer necessarily lead to similar results. Therefore, this should
be investigated at least once per tool chain.
5.2. Scenario Design
Since the LKFT requires an assessment over "the whole lateral acceleration and speed range", we concentrate directly
on a good coverage of the scenario space, without restricting us only to the critical band from 80 to
90 %
of
ay,smax
.
The scenario space is specified in Table 2 by the min-max ranges of the five parameters from
(3)
. The application space
is slightly larger than the validation space so that it not only requires interpolation but also extrapolation capabilities.
The values are chosen so that the manufactured universe reflects the real world properly.
We use a simple full factorial Design of Experiments (DoE) to generate the concrete scenarios. According to
Table 2, 72 validation scenarios and 240 application scenarios are selected to legitimize the model-based process. In the
future, it is possible to increase the amount of application scenarios and to select more sophisticated DoE techniques
to gain eciency. The crosses in Figure 4 show a two-dimensional cross-section along the velocity and the lateral
acceleration dimension with a grid of nine validation and 30 application scenarios. We use this type of visualization to
reflect the focus of the R-79 on the two parameters. All NLKFT scenarios will be aggregated in a data matrix
X=
1v1a1vw,1sl,1lt,1
1v2a2vw,2sl,2lt,2
.
.
..
.
..
.
..
.
..
.
.
1vNaNvw,Nsl,Nlt,N
RN×(Nx+1) (4)
for each domain: XvRNx×Nvfor validation and XaRNx×Nafor the application.
5.3. Uncertainty Quantification and Propagation
A deterministic simulation performs a point prediction, since it assumes that all parameters are precisely known.
A non-deterministic simulation, however, considers uncertain parameters, quantifies them, propagates them through
the simulation model and aggregates the results. The uncertainties are summarized in Table 2 for the five scenario
parameters. The values were systematically derived depending on how accurately it is possible to measure them during
real validation experiments. We assume tight normal distributions for the velocity and lateral acceleration, since both
can be measured accurately with Inertial Measurement Units (IMUs), and for the tank load, since the vehicle can be
weighed in advance and the fuel level is available. We assume a coarser normal distribution for the wind – based on
10
80 100 120 140 160 180
AV Velocity (km/h)
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Normalized Lateral Acceleration (-)
Nominal Verification Scenario
Nominal Validation Scenario
Nominal Application Scenario
Validation Uncertainty Samples, Model
Validation Uncertainty Samples, System
Application Uncertainty Samples
Figure 4: Verification, validation and application scenarios
online measuring stations – and an epistemic uncertainty for the road gradient, since it has to be extracted from map
data of highways. This lack of knowledge from publicly accessible maps could only be compensated by extensive
measurements. We assume further scenario parameters and the internal vehicle parameters as deterministic for this
simplified proof of concept, since they are often precisely known and constant. This assumption is reflected in the
model-form uncertainty and can be substantiated by a sensitivity analysis in the future. In addition, we assume the
uncertainties from Table 2 to be equal around each validation and application scenario.
We use
Nv
r
=10 repetitions of each validation scenario for the manufactured universe to imitate physical experiments
and to capture their natural variability. The value is based on the findings of Viehof
[41
, p. 74
]
, who recommends
10 to 15 repetitions based on the t-distribution for a detailed analysis and later at least three repetitions for practical
experiments. These tests are the baseline for both the deterministic and the non-deterministic re-simulation to enable a
fair comparison later. In the deterministic case, we re-simulate the mean value of the scenario repetitions (see averaged
input case in Section 2.2.2). In the non-deterministic case, we apply the nested two-loop sampling approach from
Section 2.3 to propagate the uncertainties. We use a full factorial design with three steps of the epistemic parameter in
the outer loop and Monte Carlo sampling with ten random samples for all aleatory parameters in the inner loop. This is
equivalent to a resolution in
10 %
steps and is sucient for the proof of concept in this paper. In total, we perform 2160
runs with the simulation model in the validation domain and 7200 runs in the application domain. The uncertainties
are also illustrated in Figure 4 by projecting all samples of all parameters onto the velocity and lateral acceleration
dimension. They form local point clouds within the large scenario space.
5.4. LKFT Assessment
Each simulation run results in one distance to line signal. It shows a typical behavior as shown in Figure 5, where
the vehicle is pushed hardest to the edge at the entrance of the curve (cross at minimum distance to line), and then
compensates a bit – unless it crosses the line at high lateral accelerations. We extract the minimum distance to the line
11
0510 15 20 25 30 35 40 45
Time (s)
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Distance to Left Line (m)
Distance Time Signal
Minimum Distance
Figure 5: Characteristic lane keeping behavior by means of the distance to line time signal and its minimum value as KPI at the exemplary application
scenario v=100 km/h and a=0.35 ·ay,smax =0.875 m/s2.
as a KPI because it is the most safety critical. If not even the minimum distance falls below zero (crosses the line), the
whole time signal will not fall below it. The behavior over the whole scenario space was already shown in Figure 2.
In the deterministic case, we average the repetition results so that we can use a scalar validation metric. In the
non-deterministic case, we aggregate the repetitions to an ECDF and the nested uncertainty propagation to a p-box for
the use of a non-deterministic metric.
5.5. Validation Metric
In the deterministic case, we use the absolute deviation between the minimum distance to line from the input-
averaged re-simulation and the output-averaged minimum distance to line from the manufactured universe
ev=gm(hxvi, θm)gs(xv, θs)(5)
as validation metric with
hi
as mean operator. In the non-deterministic case, we quantify the dierence between the
ECDF of the manufactured universe
F
(
yv
s
) and the p-box from the simulation model
B
(
yv
m
). There are dierent metrics
that give dierent weight to dierent characteristics of a distribution. We use an Area Validation Metric (AVM) based
on Voyles and Roy
[59]
because it includes the entire shape of the distributions in the calculation. The principle is
illustrated in Figure 6a for one validation scenario. Since the manufactured universe shows worse performance, it only
exists a left-hand area
ev
l=ZB(yv
m)F(yv
s)
|B(yv
m)F(yv
s)|dy (6)
where the p-box is larger than the ECDF. The area can be determined mathematically by integration of the step functions.
Since the right-hand area is zero, we get a one-sided adaption of the area validation metric (without safety factor from
[59]) in form of the interval of ev(no function)
I(ev)B{ev|evevev}B[ev,ev]=[ev
l,0] .(7)
5.6. Error Learning and Inference
It is essential to take into account the modeling errors and uncertainties for the final decision making in the applica-
tion domain. The validation metric quantities the model-form uncertainty for each validation scenario. Ultimately,
12
0.2 0.3 0.40.5
Distance to Left Line (m)
0.0
0.2
0.4
0.6
0.8
1.0
Cumulative Probability
Area Metric = 0.28
Model Input Uncertainty
System
Model-form Uncertainty
(a) Area Validation Metric
AV Velocity (km/h)
80 100 120 140 160 180
Normalized Lateral Accel. (-)
0.4
0.5
0.6
0.7
0.8
Metric (m)
0.0
0.1
0.2
0.3
0.4
0.5
Validation Metric
Lower Bound Inferred Error
Upper Bound
(b) Error Inference
Figure 6: Validation errors. (a) Non-deterministic area metric for the exemplary validation scenario
v
=
90 km/h
and
a
=0
.
4
·ay,smax
=
1 m/s2
.
(b) Linear regression inference of the deterministic validation error ˆeva with prediction intervals across the entire application scenario space.
however, we need to predict it for each application scenario. Therefore, we learn an error model with the metric results
from the validation domain, so that we can use it afterwards to infer the errors in the application domain.
For this first proof of concept, we apply a multiple linear regression technique based on [
50
, p. 657] to model the
validation error
ˆev=w·x,w=hw0· · · wNxiRNx+1(8)
of the distance to line with the scenario parameters
x
as regressors. The regression weights
w
are determined so that the
dierence between the estimated and the actual validation errors is minimized
arg min
w
Nv
X
i=1
evv
ievv
i)2(9)
across the whole validation data set from Table 2 with
Nv
distinct scenarios. We emphasize estimates of the validation
error in the validation domain with
ˆevv
and estimates of the validation error in the application domain with
ˆeva
. To
get to the latter, we fit one linear model to infer the deterministic validation error
ˆeva
, whereas we fit two linear
models to infer both the left and right (zero here) interval boundaries of the non-deterministic model-form uncertainty
Ieva )=[eva,eva].
In addition, we calculate prediction intervals (PIs) based on [
50
, p. 657] to incorporate the uncertainty of the fitted
model and the one associated with future observations. Since the prediction variable represents a random variable and
we need an interval estimate for its mean value, we apply a non-simultaneous Bonferroni-type prediction interval via
the function [60, p. 115]
gp(xa)=tα/2
Nv(Nx+1) ·s·q1+xaT(XvTXv)1xa(10)
with a confidence of α=95 % for the t-distribution and with the sample variance
s2=1
Nv(Nx+1)
Nv
X
i=1evv
iˆevv
i2.(11)
In the deterministic case, the prediction interval ±gp(xa) shifts the signed estimate of the error in both directions:
Ieva )=[eva,eva]=[ˆeva gp(xa),ˆeva +gp(xa)] .(12)
13
0.0 0.1 0.2 0.3 0.40.5 0.6
Distance to Left Line (m)
0.0
0.2
0.4
0.6
0.8
1.0
Cumulative Probability
Model Input Uncertainty
Model-form Uncertainty
Regulation
System
(a) Non-deterministic
0.0 0.1 0.2 0.3 0.40.5 0.6
Distance to Left Line (m)
0.0
0.2
0.4
0.6
0.8
1.0
Cumulative Probability
Model
Model Uncertainty
Regulation
System
(b) Deterministic
Figure 7: Uncertainty aggregation and decision making at the exemplary application scenario
v
=
100 km/h
and
a
=0
.
35
·ay,smax
=
0.875 m/s2
based on the (a) deterministic point prediction and (b) non-deterministic p-box prediction of the nominal simulation model.
In the non-deterministic case, we add the respective function value to the left and right (positive) areas
Ieva )=[eva,eva]=[ˆeva
l+gp,l(xa),ˆeva
r+gp,r(xa)] ,(13)
since the resulting outer bounds enclose the regression estimates and the inner bounds. Figure 6b illustrates the
regression surface of the deterministic metric including prediction intervals. The light blue plane of the linear model
follows the trend of the orange metric results used for training.
5.7. Uncertainty Expansion
After quantifying the individual errors and uncertainties, they must be aggregated in the application domain.
In the non-deterministic case, we get for each application scenario one input uncertainty in the form of a p-box
B
(
ya
m
)=[
F
(
ya
m
)
,F
(
ya
m
)], one inferred model-form uncertainty in the form of an interval
I
(
ˆeva
)=[
eva,eva
] and one
numerical uncertainty (zero here) in the form of an interval
I
(
ˆena
)=[
ena,ena
]. The combination of the uncertainties can
be seen in Figure 7a for one application scenario. The two lower interval boundaries shift the input uncertainty p-box
to the left, the two upper interval boundaries to the right. The resulting p-box reflects an estimate of the actual system
response
Bya
s)={Fya
s)|F(ya
m+(eva +ena))
Fya
s)F(ya
m(eva +ena))}.(14)
In the example, the model-form uncertainty has the largest influence on the total uncertainty, since we injected a
systematic error into the manufactured universe.
In the deterministic case, we use the inferred error bounds
I
(
ˆeva
)=[
eva,eva
] to adapt the actual model prediction
ya
m
to a non-deterministic interval estimate of the system response
Iya
s)=
[ya
meva,ya
m]eva,eva 0
[ya
meva,ya
meva]eva 0,eva 0
[ya
m,ya
meva]eva ,eva 0
(15)
depending on dierent cases. If both error bounds were either greater or less than zero, the middle case would lead
to a bias correction of the nominal model in one direction. This would be equivalent to putting more trust into the
14
data-driven error model than into the physical simulation model itself and would again be associated with uncertainties.
To avoid this, we add two more cases, which can be interpreted as one-sided uncertainty expansion, starting from the
simulation model as a stable baseline. The principle is also visualized in Fig. 7b. This mixed approach neither neglects
the errors as usual, nor does it perform a risky bias correction, but it adds tight bounds for a fair comparison.
5.8. Decision Making
Fig 7a and Fig. 7b also include the regulation threshold as a vertical line, which states that the distance to line must
be greater than zero (the vehicle edge does not cross the line). It transforms the real values into boolean decisions,
because, in the end, the type approval requirements can only be passed or failed. Both in the deterministic and in the
non-deterministic case, the entire green area exceeds the threshold and passes the requirements. This even corresponds
to the most conservative non-deterministic estimate. A reduction of the confidence level of the CDF in the future would
lower the requirements so that not all steps have to pass. Currently, checking all steps refers to a confidence of
90 %
due to the 10 aleatory samples from Table 2. In total, the AV passes 123 and fails 117 of the 240 application scenarios
in the deterministic case, whereas it passes 97 and fails 143 in the non-deterministic case. The values can be taken
later as part of Table 3. The latter ones are, as expected, somewhat more conservative due to the consideration of
the input uncertainties. Ultimately, the amount of failed decisions would lead in summary to a failure of the entire
homologation.
6. Validation of the Application Results
The last section presented the VV&UQ methodology using an example where the type approval of a lane keeping
assistant was carried out based on simulation. It concluded that the AV fails the type approval. However, what does this
actually mean regarding the quality of the methodology itself? To answer this question, the Method of Manufactured
Universes comes into play, where the actual ground truth values are available in all domains. The first subsection
introduces the principle, how those values can be used to analyze the meaningfulness of the VV&UQ results. The
second sub-section evaluates the tolerance approach as a baseline from the current state of the art. The two following
subsections are dedicated to the validation and comparison of the deterministic and the non-deterministic manifestation
of the framework from the last section.
6.1. Evaluation Methodology
The analyses in this section build up on the Method of Manufactured Universes. Its big advantage is that the actual
Ground Truth (GT) values are available for all validation scenarios and in particular for all application scenarios. This
makes it possible to compare the VV&UQ results with the true values and thus to evaluate the selected VV&UQ
methodology itself. We simulate the manufactured universe in the application domain under the same conditions
as the nominal simulation model. Even if additional real experiments were conducted to validate the methodology,
this would never be possible. There will always be measurement errors or poor input uncertainties that falsify the
ground truth values. These are factors that can be considered in the future to analyze the robustness of the approach
against unexpected uncertainties. In the application domain, however, we require two p-boxes for the model and the
manufactured universe under the same desired application conditions for a fair comparison.
The comparison can take place at multiple stages along the processing pipeline of the framework:
1. Error inference stage
2. Assessment stage
3. Error integration stage
4. Type approval stage.
We focus on the final two stages because they depend on the previous ones and are ultimately decisive. Starting
backwards, the final binary decisions from the type approval can be compared. For this purpose a binary classifier is
used in pattern recognition and machine learning. On the one hand, it distinguishes positive from negative decisions
based on the absolute simulation results. On the other hand, it distinguishes true from false decisions based on the
relation of simulation and experimental results. In our example, a True Positive (TP) stands for a correctly failed type
approval, whereas a True Negative (TN) stands for a correctly passed type approval. On the contrary, a False Positive
15
(FP) stands for an incorrectly failed type approval and a False Negative (FN) for an incorrectly passed type approval.
FPs are Type I errors that convict the innocent AV. FNs are Type II errors that acquit an unsafe AV. In addition, the
precision Pis based on the former and the recall Ron the latter:
P=PT P
PT P +PF P (16)
R=PT P
PT P +PF N .(17)
Recall is a very important measure from the safety perspective to quantify how many of the actual approval fails are
detected by the simulation model. Precision is also important to identify the approval fails with as few Type I errors as
possible.
In the error integration stage, it is additionally possible to check whether the actual GT values fall within the bounds
from the VV&UQ method. This looks as follows in the deterministic and the non-deterministic case:
ya
sIya
s) (18)
F(ya
s)Fya
s) (19)
It gives additional insights and confidence in the error model, since it is possible to correctly classify the binary
decisions depending on the distance to the threshold, despite the GT not falling within the bounds.
6.2. Tolerance Approach
It is important to emphasize that the pure tolerance approach from Section 2.2.2, as it is currently used in the
automotive sector, is not directly transferable to controlled AVs. First of all, there are absolute and relative tolerances.
The absolute ones struggle in the sense that they do not relate the validation errors to the scenario results. An exemplary
assumed model error of
30 cm
has much stronger eects when the virtual vehicle drives
15 cm
beside the line than
when it drives in the middle of the road. The final binary results for the pure tolerance approach (without error pipeline)
for the absolute tolerance of
30 cm
can be seen in Figure 8. It proves cases where the model is assumed to be valid
according to the tolerances in the validation domain, but false classifications (FP and FN) between the nominal model
and the GT occur at neighboring locations in the application domain. They are concerning from a safety perspective.
There is also an inverse case with an invalid model and true classifications in the neighborhood that leaves potential.
Classical relative tolerances suer from similar problems. Suppose one allows a
10 %
deviation of the jerk. Then larger
jerk values benefit, although these are safety critical.
It would be possible to extend the tolerances to the safety perspective by dividing the validation errors by the
distance from the threshold. Regardless of this, we recommend using the tolerances in the validation domain as a tool
for the model developer, but not standalone. It should be combined with the vertical error pipeline in the framework in
Figure 3 to consider model-form and extrapolation uncertainties. The use of tolerances alone separates the validation
and application domain, neglects uncertainties and is risky.
6.3. Deterministic and Non-Deterministic Manifestation
This subsection continues with the analysis of the overall framework including the error and uncertainty pipeline
as presented in Section 5. Table 3 summarizes the results from the binary classifier. The first column refers to the
deterministic manifestation of the framework and the second column to the non-deterministic manifestation. In the first
row, the classifier is used to compare the nominal simulation model with the GT from the manufactured universe, and
in the second row, the results of the VV&UQ methodology are compared with the GT. The injected systematic error
results in many safety critical FNs and a low recall rate with a perfect precision of 100% in the Table 3a and 3b.
Checking whether the GT falls within the bounds of the VV&UQ methodology according to Equation
(18)
and
(19)
yields 238 bounded cases for the deterministic manifestation and even 240 of 240 bounded cases for the non-
deterministic manifestation. This shows a working conservative error modeling using linear regression and prediction
intervals. In both Table 3c and 3d, the VV&UQ methodology successfully shifts all the FNs into the TP cell. This
heavily increases the recall from almost 0 to
100 %
. From a safety perspective this is optimal since no unsafe AV is
acquitted anymore. In return, a few TNs are moved into the FP cell, which convicts the innocent AV. Over the entire
16
80 100 120 140 160 180
AV Velocity (km/h)
0.4
0.5
0.6
0.7
0.8
Normalized Lateral Acceleration (-)
Approval Correct
Approval Passed
Model Valid
Approval Wrong
Approval Failed
Model Invalid
Figure 8: Decisions across the scenario space. The rectangles refer to general model validity based on the tolerance approach, the circles and crosses
to the validity of the application decisions using the binary classifier. The decision of the nominal simulation model is coded in the color channel, the
comparison with the ground truth values by the two symbols cross and circle. Thus, an orange circle symbolizes a TP, an orange cross an FP, a green
cross an FN and a green circle a TN.
scenario space, this results in a precision of
77 %
resp.
86 %
. However, since the vehicle will not pass the homologation
anyway due to the many TPs, the additional FPs do not matter much in this case. Generally, it is impossible to guarantee
model validity with only a restricted amount of validation experiments [
41
]. Nevertheless, further cases and alternative
metamodeling techniques should be considered in the future to address the trade-obetween Type I and II errors.
6.4. Comparison and Discussion
Comparing the deterministic and the non-deterministic manifestation of the framework regarding the binary
classifications shows similar results. The non-deterministic approach has a slight advantage due to a higher precision at
the same recall rate. In principle, the classifier results allow both approaches and do not show a clear selection. Thus,
further factors may be included in the process to select the configuration of the framework for the specific use case of
this paper. The non-deterministic approach requires a higher eort to quantify the uncertainties and to perform the
simulations. In return, it considers dierent sources of uncertainty including the scenario parameters. This results in
a higher confidence in the statement of each application scenario and a larger coverage of the scenario space due to
the scatter around each nominal scenario point. The accuracy of the statement can even be adjusted by the number of
aleatory samples. In addition, the illustrative example with the comparatively large systematic error makes the strengths
of the non-deterministic approach in the quantification of input uncertainties less obvious. Therefore, if the resources
are available for the non-deterministic approach, its selection is recommended. Otherwise, the deterministic approach
represents a good compromise between eort and risk.
Generally, we recommend to use all (5 here) uncertain parameters also as scenario parameters and to include
them into the scenario design. It fits the nature of model validation to quantify the input uncertainties as accurately
as possible around a nominal scenario condition. If an uncertain parameter is not included in the scenario design, its
global uncertainty must be taken into account (space columns instead of the uncertainty column in Table 2). This leads
17
Table 3: Binary classifier
(a) Deterministic results – Nominal model
Universe
Fails Passes
Model
Fails 2 TPs 0 FPs P=100 %
Passes 88 FNs 150 TNs
R2 %
(b) Non-deterministic results – Nominal model
Universe
Fails Passes
Model
Fails 5 TPs 0 FPs P=100 %
Passes 92 FNs 143 TNs
R5 %
(c) Deterministic results – VV&UQ method
Universe
Fails Passes
VVUQ
Fails 90 TPs 27 FPs P77 %
Passes 0 FNs 123 TNs
R=100 %
(d) Non-deterministic results – VV&UQ method
Universe
Fails Passes
VVUQ
Fails 123 TPs 20 FPs P86 %
Passes 0 FNs 97 TNs
R=100 %
to wider p-boxes in case of larger input uncertainties and thus to an under-approximation of the model error. This in
turn is too little to perform a sucient shift of the model responses during the uncertainty expansion and would finally
lead to some false classifications.
7. Conclusion
For credible safeguarding of automated vehicles based on computer simulation, model validation activities are
essential to assess the errors and uncertainties inherent in every model. Therefore, we presented a modular framework
based on the type approval of a lane keeping function. Due to the modular design, the framework can be used in dierent
manifestations depending on the requirements. On the one hand, we presented a non-deterministic manifestation
considering several types of uncertainties. On the other hand, we presented a simplified deterministic manifestation
that still considers an overall model error. We analyzed both manifestations in regard to ground truth values using a
binary classifier.
Both approaches show excellent results in the identification of systematic errors. They corrected all safety-critical
cases where the nominal simulation model suggests a suitable lane keeping behavior although the actual ground truth
illegally crosses the line. The non-deterministic approach shows slightly better results, but is also more complex. In the
future, we plan to analyze dierent scenario designs, validation metrics and error learning techniques to further improve
the results. Nevertheless, it is important to mention that there is no absolute model validity since it is impossible to
guarantee validity across the entire scenario space with only a restricted amount of validation experiments. In addition,
we can apply the framework in the next step to a real vehicle, after we have validated the methodology in this paper
based on the method of manufactured universes.
Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have
appeared to influence the work reported in this paper.
18
Acknowledgement
The authors want to thank TÜV SÜD Auto Service GmbH for the support and funding of this work. Additionally,
the authors want to thank Daniel Schneider for proofreading the article and for enhancing the content due to his critical
remarks.
Contribution
Stefan Riedmaier initiated and wrote this paper. He was involved in all stages of development and primarily
developed the concept and content of this work. Jakob Schneider wrote his master thesis about uncertainties in
safeguarding an ACC. Afterwards, he continued his research by supporting Stefan Riedmaier in implementing the
framework, in applying it to the LKAS type approval and in enhancing the content in frequent workshops. Benedikt
Danquah contributed to the structure of the paper and improved the content thanks to a close cooperation and many
valuable discussions on VV&UQ methods. Bernhard Schick and Frank Diermeyer contributed to the conception of the
research project and revised the paper critically for important intellectual content. Frank Diermeyer gave final approval
of the version to be published and agrees to all aspects of the work. As a guarantor, he accepts responsibility for the
overall integrity of the paper.
References
[1] SAE J3016, Taxonomy and Definitions for Terms Related to Driving Automation Systems for On-Road Motor Vehicles, 2018.
[2]
H. Beglerovic, M. Stolz, M. Horn, Testing of autonomous vehicles using surrogate models and stochastic optimization, in: 2017 IEEE 20th
International Conference on Intelligent Transportation Systems (ITSC), IEEE, 2017, pp. 1–6.
[3]
United Nations Economic Commission for Europe (UNECE), Proposal for a new un regulation on uniform provisions concerning the approval
of vehicles with regards to automated lane keeping system (ece/trans/wp.29/2020/81), 2020.
[4]
N. Kalra, S. M. Paddock, Driving to safety: How many miles of driving would it take to demonstrate autonomous vehicle reliability?,
Transportation Research Part A: Policy and Practice 94 (2016) 182–193. URL:
http://www.sciencedirect.com/science/article/
pii/S0965856416302129.
[5]
W. Wachenfeld, H. Winner, The Release of Autonomous Vehicles, in: M. Maurer, J. C. Gerdes, B. Lenz, H. Winner (Eds.), Autonomous
Driving: Technical, Legal and Social Aspects, Springer Berlin Heidelberg, Berlin, Heidelberg, 2016, pp. 425–449. URL:
https://doi.org/
10.1007/978-3- 662-48847-8_21.
[6] German Aerospace Center, PEGASUS-project, 2019. URL: https://www.pegasusprojekt.de/en/home.
[7]
A. Leitner, A. Akkermann, B. Å. Hjøllo, B. Wirtz, D. Nickovic, E. Möhlmann, H. Holzer, J. van der Voet, J. Niehaus, M. Sarrazin, M. Zofka,
M. Rooker, M. Kubisch, M. Paulweber, M. Siegel, M. Rautila, N. Marko, P. Tummeltshammer, P. Rosenberger, R. Rott, S. Muckenhuber,
S. Kalisvaart, T. d. Graa, T. D’Hondt, T. Fleck, Z. Slavik, Enable-s3: testing & validation of highly automated systems: Summary of results,
2019.
[8]
E. F. Z. Santana, G. Covas, F. Duarte, P. Santi, C. Ratti, F. Kon, Transitioning to a driverless city: Evaluating a hybrid system for autonomous
and non-autonomous vehicles, Simulation Modelling Practice and Theory 107 (2021) 102210.
[9] UL, UL 4600: Standard for Safety for the Evaluation of Autonomous Products, 2019.
[10]
B. Schürmann, D. Heß, J. Eilbrecht, O. Stursberg, F. Koster, M. Altho, Ensuring drivability of planned motions using formal methods, in:
2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC), IEEE, 2017, pp. 1–8.
[11]
S. Kitajima, K. Shimono, J. Tajima, J. Antona-Makoshi, N. Uchida, Multi-agent trac simulations to estimate the impact of automated
technologies on safety, Trac injury prevention 20 (2019) 58–64.
[12]
S. Riedmaier, T. Ponn, D. Ludwig, B. Schick, F. Diermeyer, Survey on scenario-based safety assessment of automated vehicles, IEEE Access
8 (2020) 87456–87477.
[13]
S. Riedmaier, B. Danquah, B. Schick, F. Diermeyer, Unified framework and survey for model verification, validation and uncertainty
quantification, Archives of Computational Methods in Engineering (2020).
[14]
M. G. Faes, M. A. Valdebenito, Fully decoupled reliability-based design optimization of structural systems subject to uncertain loads, Computer
Methods in Applied Mechanics and Engineering 371 (2020) 1–17.
[15]
M. Eek, H. Gavel, J. Ölvander, Definition and implementation of a method for uncertainty aggregation in component-based system simulation
models, Journal of Verification, Validation and Uncertainty Quantification 2 (2017) pp. 011001–1–011001–12.
[16]
H. F. Stripling, M. L. Adams, R. G. McClarren, B. K. Mallick, The method of manufactured universes for validating uncertainty quantification
methods, Reliability Engineering & System Safety 96 (2011) 1242–1256.
[17] R. G. Hills, Roll-up of validation results to a target application, 2013.
[18]
American Society of Mechanical Engineers, Standard for verification and validation in computational fluid dynamics and heat transfer: An
American national standard, volume 20-2009 of ASME V&V, rearmed 2016 ed., The American Society of Mechanical Engineers, New York,
NY, 2009.
[19]
J. Mullins, Y. Ling, S. Mahadevan, L. Sun, A. Strachan, Separation of aleatory and epistemic uncertainty in probabilistic model validation,
Reliability Engineering & System Safety 147 (2016) 49–59.
19
[20]
S. Sankararaman, S. Mahadevan, Integration of model verification, validation, and calibration for uncertainty quantification in engineering
systems, Reliability Engineering & System Safety 138 (2015) 194–209.
[21]
S. Atamturktur, F. M. Hemez, J. A. Laman, Uncertainty quantification in model verification and validation as applied to large scale historic
masonry monuments, Engineering Structures 43 (2012) 221–234.
[22]
H. Abbas, M. O’Kelly, A. Rodionova, R. Mangharam, Safe at any speed: A simulation-based test harness for autonomous vehicles, in: Seventh
Workshop on Design, Modeling and Evaluation of Cyber Physical Systems (CyPhy’17), 2017, pp. 94–106.
[23]
T. Hanke, A. Schaermann, M. Geiger, K. Weiler, N. Hirsenkorn, A. Rauch, S.-A. Schneider, E. Biebl, Generation and validation of virtual
point cloud data for automated driving systems, in: 2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC),
IEEE, 2017, pp. 1–6.
[24]
A. Schaermann, A. Rauch, N. Hirsenkorn, T. Hanke, R. Rasshofer, E. Biebl, Validation of vehicle environment sensor models, in: 2017 IEEE
Intelligent Vehicles Symposium (IV), IEEE, 2017, pp. 405–411.
[25]
A. Gaidon, Q. Wang, Y. Cabon, E. Vig, Virtualworlds as proxy for multi-object tracking analysis, in: 2016 IEEE Conference on Computer
Vision and Pattern Recognition (CVPR), IEEE, 2016, pp. 4340–4349.
[26]
M. Nentwig, M. Miegler, M. Stamminger, Concerning the applicability of computer graphics for the evaluation of image processing algorithms,
in: 2012 IEEE International Conference on Vehicular Electronics and Safety (ICVES 2012), IEEE, 2012, pp. 205–210.
[27]
E. L. Zec, N. Mohammadiha, A. Schliep, Statistical sensor modelling for autonomous driving using autoregressive input-output hmms, in:
2018 IEEE 21th International Conference on Intelligent Transportation Systems (ITSC), IEEE, 2018, pp. 1331–1336.
[28]
International Organization for Standardization, Passenger cars — validation of vehicle dynamic simulation — sine with dwell stability control
testing, 2016.
[29]
International Organization for Standardization, Passenger cars — vehicle dynamic simulation and validation — steady-state circular driving
behaviour, 2016.
[30] E. Kutluay, H. Winner, Validation of vehicle dynamics simulation models – a review, Vehicle System Dynamics 52 (2014) 186–200.
[31]
K. Groh, S. Wagner, T. Kuehbeck, A. Knoll, Simulation and its contribution to evaluate highly automated driving functions, in: WCX SAE
World Congress Experience, SAE Technical Paper Series, SAE International400 Commonwealth Drive, Warrendale, PA, United States, 2019,
pp. 1–11.
[32]
D. Notz, M. Sigl, T. Kühbeck, S. Wagner, K. Groh, C. Schütz, D. Watzenig, Methods for improving the accuracy of the virtual assessment of
autonomous driving, in: 2019 IEEE International Conference on Connected Vehicles and Expo (ICCVE) Proceedings, 2019, pp. 1–6.
[33]
S. Wagner, K. Groh, T. Kuhbeck, A. Knoll, Towards cross-verification and use of simulation in the assessment of automated driving, in: 2019
IEEE Intelligent Vehicles Symposium (IV), IEEE, 2019, pp. 1589–1596.
[34]
B. Johnson, F. Havlak, H. Kress-Gazit, M. Campbell, Experimental evaluation and formal analysis of high-level tasks with dynamic obstacle
anticipation on a full-sized autonomous vehicle, Journal of Field Robotics 34 (2017) 897–911.
[35]
S. Riedmaier, J. Nesensohn, C. Gutenkunst, T. Düser, B. Schick, H. Abdellatif, Validation of x-in-the-loop approaches for virtual homologation
of automated driving functions, in: 11th Graz Symposium Virtual Vehicle (GSVF), 2018, pp. 1—-12.
[36]
W. Daamen (Ed.), Trac simulation and data: Validation methods and applications, [elektronische ressource] ed., Taylor and Francis and CRC
Press, Hoboken and Boca Raton, Fla., 2015.
[37] Y. Hollander, R. Liu, The principles of calibrating trac microsimulation models, Transportation 35 (2008) 347–362.
[38]
L. Rao, L. Owen, Validation of high-fidelity trac simulation models, Transportation Research Record: Journal of the Transportation Research
Board 1710 (2000) 69–78.
[39]
T. Toledo, H. N. Koutsopoulos, Statistical validation of trac simulation models, Transportation Research Record: Journal of the Transportation
Research Board 1876 (2004) 142–150.
[40]
E. Kutluay, Development and Demonstration of a Validation Methodology for Vehicle Lateral Dynamics Simulation Models, Ph.D. thesis,
Technische Universität Darmstadt, Darmstadt, 2012.
[41]
M. Viehof, Objektive Qualitätsbewertung von Fahrdynamiksimulationen durch statistische Validierung, Ph.D. thesis, Technische Universität
Darmstadt, Darmstadt, 2018.
[42]
S. Rhode, Non-stationary gaussian process regression applied in validation of vehicle dynamics models, Engineering Applications of Artificial
Intelligence 93 (2020) 103716.
[43] M. Hartung, D. Hess, R. Lattarulo, J. Oehlerking, J. Perez, A. Rausch, Report on conformance testing of application models, 2017.
[44]
E. Böde, M. Büker, Ulrich Eberle, M. Fränzle, S. Gerwinn, B. Kramer, Ecient splitting of test and simulation cases for the verification
of highly automated driving functions, in: B. Gallina, A. Skavhaug, F. Bitsch (Eds.), Computer Safety, Reliability, and Security, Springer
International Publishing, Cham, 2018, pp. 139–153.
[45]
S. Detering, L. Schnieder, E. Schnieder, Two-level validation and data acquisition for microscopic trac simulation models, International
Journal on Advances in Systems and Measurements 3 (2010).
[46]
T. Ponn, T. Kröger, F. Diermeyer, Performance analysis of camera-based object detection for automated vehicles, Sensors (Basel, Switzerland)
20 (2020).
[47]
M. Holder, P. Rosenberger, H. Winner, V. P. Makkapati, M. Maier, H. Schreiber, Z. Magosi, T. D’hondt, Z. Slavik, O. Bringmann, W. Rosenstiel,
Measurements revealing challenges in radar sensor modeling for virtual validation of autonomous driving, in: 2018 IEEE 21th International
Conference on Intelligent Transportation Systems (ITSC), IEEE, 2018, pp. 2616–2622.
[48]
P. Rosenberger, M. Holder, M. Zirulnik, H. Winner, Analysis of real world sensor behavior for rising fidelity of physically based lidar sensor
models, in: 2018 IEEE Intelligent Vehicles Symposium (IV), IEEE, 2018, pp. 611–616.
[49]
United Nations Economic Commission for Europe (UNECE), Addendum 139 - regulation no. 140 — uniform provisions concerning the
approval of passenger cars with regard to electronic stability control (esc) systems, 2017.
[50] W. L. Oberkampf, C. J. Roy, Verification and Validation in Scientific Computing, Cambridge University Press, Cambridge, 2010.
[51]
M. Aramrattana, R. H. Patel, C. Englund, J. Härri, J. Jansson, C. Bonnet, Evaluating model mismatch impacting cacc controllers in mixed
trac using a driving simulator, in: 2018 IEEE Intelligent Vehicles Symposium (IV), IEEE, 2018, pp. 1867–1872.
[52]
L. Zheng, T. Sayed, M. Essa, Y. Guo, Do simulated trac conflicts predict crashes? an investigation using the extreme value approach, in:
20
2019 IEEE 22th International Conference on Intelligent Transportation Systems (ITSC), IEEE, 2019, p. 631–636.
[53]
D. C. Kammer, P. A. Blelloch, J. Sills, Test-based uncertainty quantification and propagation using hurty/craig-bampton substructure
representations, in: Proceedings of the IMAC-XXXVII, 2019, pp. 107–129.
[54]
M. N. Avramova, K. N. Ivanov, Verification, validation and uncertainty quantification in multi-physics modeling for nuclear reactor design and
safety analysis, Progress in Nuclear Energy 52 (2010) 601–614.
[55]
United Nations Economic Commission for Europe (UNECE), Addendum 78: Un regulation no. 79 — uniform provisions concerning the
approval of vehicles with regard to steering equipment, 2018.
[56]
G. Bagschik, T. Menzel, M. Maurer, Ontology based scene creation for the development of automated vehicles, in: 2018 IEEE Intelligent
Vehicles Symposium (IV), 2018, pp. 1813–1820. doi:10.1109/IVS.2018.8500632.
[57]
N. W. Whiting, C. J. Roy, E. P. Duque, S. Lawrence, Assessment of model validation and calibration approaches in the presence of uncertainty,
in: AIAA Scitech 2019 Forum, American Institute of Aeronautics and Astronautics, 2019, pp. 1–16.
[58]
M. Viehof, H. Winner, Research methodology for a new validation concept in vehicle dynamics, Automotive and Engine Technology (2018).
[59]
I. T. Voyles, C. J. Roy, Evaluation of model validation techniques in the presence of aleatory and epistemic input uncertainties, in: 17th AIAA
Non-Deterministic Approaches Conference, American Institute of Aeronautics and Astronautics, 2015, pp. 1–16.
[60] R. G. Miller, Simultaneous Statistical Inference, Springer New York, New York, NY, 1981.
21
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Simulation is becoming increasingly important in the development, testing and approval process in many areas of engineering, ranging from finite element models to highly complex cyber-physical systems such as autonomous cars. Simulation must be accompanied by model verification, validation and uncertainty quantification (VV&UQ) activities to assess the inherent errors and uncertainties of each simulation model. However, the VV&UQ methods differ greatly between the application areas. In general, a major challenge is the aggregation of uncertainties from calibration and validation experiments to the actual model predictions under new, untested conditions. This is especially relevant due to high extrapolation uncertainties, if the experimental conditions differ strongly from the prediction conditions, or if the output quantities required for prediction cannot be measured during the experiments. In this paper, both the heterogeneous VV&UQ landscape and the challenge of aggregation will be addressed with a novel modular and unified framework to enable credible decision making based on simulation models. This paper contains a comprehensive survey of over 200 literature sources from many application areas and embeds them into the unified framework. In addition, this paper analyzes and compares the VV&UQ methods and the application areas in order to identify strengths and weaknesses and to derive further research directions. The framework thus combines a variety of VV&UQ methods, so that different engineering areas can benefit from new methods and combinations. Finally, this paper presents a procedure to select a suitable method from the framework for the desired application.
Article
Full-text available
Reliability-based optimization (RBO) offers the possibility of finding the best design for a system according to a prescribed criterion while explicitly taking into account the effects of uncertainty. Although the importance and usefulness of RBO is undisputed, it is rarely applied to practical problems, as the associated numerical efforts are usually extremely large due to the necessity of solving simultaneously a reliability problem nested in an optimization procedure. To alleviate this issue, this contribution proposes an approach for solving a particular class of problems in RBO: the minimization of the failure probability of a linear system subjected to an uncertain load. The main characteristic of this approach is that it is fully decoupled. That is, the solution of the RBO problem is reduced to the solution of a single deterministic optimization problem followed by a single reliability analysis. Such an approach implies a complete change of paradigm with respect to the more classical double-loop and sequential methods for RBO. The theoretical basis for the proposed approach lies in the application of the operator norm theorem. The application and capabilities of the proposed approach* are illustrated by means of three examples.
Article
Full-text available
For a safe market launch of automated vehicles, the risks of the overall system as well as the sub-components must be efficiently identified and evaluated. This also includes camera-based object detection using artificial intelligence algorithms. It is trivial and explainable that due to the principle of the camera, performance depends highly on the environmental conditions and can be poor, for example in heavy fog. However, there are other factors influencing the performance of camera-based object detection, which will be comprehensively investigated for the first time in this paper. Furthermore, a precise modeling of the detection performance and the explanation of individual detection results is not possible due to the artificial intelligence based algorithms used. Therefore, a modeling approach based on the investigated influence factors is proposed and the newly developed SHapley Additive exPlanations (SHAP) approach is adopted to analyze and explain the detection performance of different object detection algorithms. The results show that many influence factors such as the relative rotation of an object towards the camera or the position of an object on the image have basically the same influence on the detection performance regardless of the detection algorithm used. In particular, the revealed weaknesses of the tested object detectors can be used to derive challenging and critical scenarios for the testing and type approval of automated vehicles.
Article
Full-text available
This work compares methods to compute confidence bands in a validation task of a vehicle single-track model. The confidence bands are computed from time series by naïve method, Gaussian process regression and heteroscedastic and non-stationary Gaussian process regression. The simulation model considers the epistemic uncertainty of the vehicle mass parameter by Latin hypercube sampling. The validation procedure compares all stochastically simulated time series of the vehicle yaw rate with the confidence band of the reference data. The model is marked as valid if the yaw rate for each time step is within the confidence band of the reference data. The data was challenging due to noise and time-varying variance and smoothness. Due to required data pre-processing and the high sensitivity to noise in the reference data, the naïve method has generated unusable confidence bands and cannot be recommended for similar validation tasks. Gaussian process regression solved the problem of noise sensitivity, but was not able to model the time-varying length scale of the reference data. Therefore, heteroscedastic and non-stationary Gaussian process regression is proposed to calculate accurate confidence bands of time-varying and noisy reference data for the validation of dynamic models by a confidence band approach.
Article
Full-text available
When will automated vehicles come onto the market? This question has puzzled the automotive industry and society for years. The technology and its implementation have made rapid progress over the last decade, but the challenge of how to prove the safety of these systems has not yet been solved. Since a market launch without proof of safety would neither be accepted by society nor by legislators, much time and many resources have been invested into safety assessment in recent years in order to develop new approaches for an efficient assessment. This paper therefore provides an overview of various approaches, and gives a comprehensive survey of the so-called scenario-based approach. The scenario-based approach is a promising method, in which individual traffic situations are typically tested by means of virtual simulation. Since an infinite number of different scenarios can theoretically occur in real-world traffic, even the scenario-based approach leaves the question unanswered as to how to break these down into a finite set of scenarios, and find those which are representative in order to render testing more manageable. This paper provides a comprehensive literature review of related safety-assessment publications that deal precisely with this question. Therefore, this paper develops a novel taxonomy for the scenario-based approach, and classifies all literature sources. Based on this, the existing methods will be compared with each other and, as one conclusion, the alternative concept of formal verification will be combined with the scenario-based approach. Finally, future research priorities are derived.
Conference Paper
Full-text available
Assessing the safety of an autonomous vehicle is an open problem within the research domain for autonomous vehicles. Next to real-world driving tests, simulation and re- processing of recordings play a crucial role in validating the correct and safe behavior. Current state-of-the-art methods for function reprocessing suffer from several sources of error and hence, might lead to incorrect results. In this work, an overview of the most recent reprocessing methods is given and their shortcomings are described. We suggest the derivation of explicit sensor models and the learning of behavior models for traffic objects. An overview of different levels of sensor and different kinds of agent models is given along with a discussion for the need for statistical and machine learning based models. Furthermore, a novel method, based on infrastructure sensors, to collect the data needed for the derivation of the models is presented.
Article
Autonomous vehicles will transform urban mobility. However, before being fully implemented, autonomous vehicles will navigate cities in mixed-traffic roads, negotiating traffic with human-driven vehicles. In this work, we simulate a system of autonomous vehicles co-existing with human-driven vehicles, analyzing the consequences of system design choices. The system consists of a network of arterial roads with exclusive lanes for autonomous vehicles where they can travel in platoons. This paper presents the evaluation of this system in realistic scenarios evaluating the impacts of the system on travel time using mesoscopic traffic simulation. We used real data from the metropolis of São Paulo to create the simulation scenarios. The results show that the proposed system would bring reductions to the average travel time of the city commuters and other benefits such as the reduction of the space required to handle all the traffic.
Conference Paper
Securing and homologating automated driving functions presents a huge challenge for their market introduction due to an enormous number of scenarios and environment parameter combinations. Confronting conventional real world tests with the new challenges of automated driving is not feasible anymore and yields to a virtualization of the testing methods by means of X-in-the-Loop approaches. Since their validity is a key enabler for virtual homologation, this paper focuses on the validation of X-in-the-Loop approaches. A generic validation methodology is introduced and demonstrated for the specific use case of an automated longitudinal driving function. As a proof of concept equal scenarios are performed in real driving tests as reference and in two X-in-the-Loop approaches based on a test bed resp. a purely virtual co-simulation environment. The paper describes how a consistent implementation can be ensured to evaluate the collected data. First results show a promising correlation regarding multiple repetitions on the test bed and regarding the validation of both X-in-the-Loop approaches for a future virtual homologation of automated driving functions.
Conference Paper
There has been significant recent interest in using microscopic traffic simulation in road safety analysis. However, whether the simulated traffic conflicts can predict crashes reliably is still an open question. This study aims at investigating the prediction validity of simulated traffic conflicts using the extreme value approach. The traffic simulation models were developed based on real-world data from four approaches of a signalized intersection and calibrated using a two-step calibration procedure that aimed at enhancing the correlation between simulated conflicts and actual field-measured conflicts. Extreme value models were developed for field conflicts and simulated conflicts of three scenarios (i.e., default, first-step calibration and second-step calibration) separately, and then the model estimated crashes were compared to observed crashes for the purpose of performance evaluation. The results show that after the second-step calibration, a high correlation between simulated conflicts and field conflicts is achieved, but the estimated crashes from these simulated conflicts are not as good as field conflicts and are with systematic underestimation. Nevertheless, each effort of simulation model calibration has shown progressive improvement in terms of crash estimation accuracy and similarity of extreme value distributions between simulated conflicts and field conflicts. This finding implies the possibility of predicting crashes reliably from simulated conflicts with proper calibration.