ArticlePDF Available

Evaluation of Operational Safety Assessment (OSA) Metrics for Automated Vehicles in Simulation

Authors:

Abstract and Figures

The operational safety of automated driving system (ADS)-equipped vehicles (AVs) must be quantified using well-defined metrics in order to gain an unambiguous understanding of the level of risk associated with AV deployment on public roads. In this research, efforts to evaluate the operational safety assessment (OSA) metrics introduced in prior work by the Institute of Automated Mobility (IAM) are described. An initial validation of the proposed set of OSA metrics involved using the open-source simulation software Car Learning to Act (CARLA) and Scenario Runner, which are used to place a subject vehicle in selected scenarios and obtain measurements for the various relevant OSA metrics. Car following scenarios were selected from the list of 37 pre-crash scenarios identified by the National Highway Traffic Safety Administration (NHTSA) as the most common driving situations that lead to crash events involving two light-duty vehicles. The resulting data were used to evaluate different parameters and thresholds of the metrics developed in the prior IAM work. The simulation and analysis results were used to evaluate the relevant metrics in the context of a proposed criteria as measurable and applicable to the operational safety of AVs and human-driven vehicles alike in a data-driven approach.
Content may be subject to copyright.
2021-01-0868 Published 06 Apr 2021
Evaluation of Operational Safety Assessment
(OSA) Metrics for Automated Vehicles in
Simulation
Maria Soledad Elli Intel Corp.
Jerey Wishart Exponent Inc.
Steven Como and Siddhaarthan Dhakshinamoorthy Arizona State University
Jack Weast Intel Corp.
Citation: Elli, M.S., Wishart, J., Como, S., Dhakshinamoorthy, S. et al., “Evaluation of Operational Safety Assessment (OSA) Metrics for
Automated Vehicles in Simulation,” SAE Technical Paper 2021-01-0868, 2021, doi:10.4271/2021-01-0868.
Abstract
The operational safety of automated driving system
(ADS)-equipped vehicles (AVs) must bequantied
using well-dened metrics in order to gain an unam-
biguous understanding of the level of risk associated with AV
deployment on public roads. In this research, efforts to
evaluate the operational safety assessment (OSA) metrics
introduced in prior work by the Institute of Automated
Mobility (IAM) are described. An initial validation of the
proposed set of OSA metrics involved using the open-source
simulation software Car Learning to Act (CARLA) and
Scenario Runner, which are used to place a subject vehicle in
selected scenarios and obtain measurements for the various
relevant OSA metrics. Car following scenarios were selected
from the list of 37 pre-crash scenarios identified by the
National Highway Trac Safety Administration (NHTSA) as
the most common driving situations that lead to crash events
involving two light vehicles. e resulting data were used to
evaluate dierent parameters and thresholds of the metrics
developed in the prior IAM work. e simulation and analysis
results were used to evaluate the relevant metrics in the
context of a proposed criteria as measurable and applicable
to the operational safety of AVs and human-driven vehicles
alike in a data-driven approach.
Introduction
As the development of automated driving system
(ADS)-equipped vehicles (AVs) continues, the need
of a process to evaluate the operational safety perfor-
mance of the technology has become ever more apparent. e
process must provide a consistent, unbiased and technology-
neutral evaluation that will provide public condence as AVs
are deployed.
A possible process for this operational safety performance
evaluation is to use the concept of a formalized Safety Case
Framework (SCF). A safety case is “a structured argument,
supported by a body of evidence, that provides a compelling,
comprehensible, and valid case that a product is safe for a given
application in a given environment.” [1] An SCF, an example
of which is the UL 4600 standard [2], will contain a variety of
possible verication and validation (V&V) methods that are
used in developing the required evidence to support the AV
safety case. A subset of V&V methods is testing methods,
which include conducting simulation testing, closed course
testing, public road testing, or some combination of the three
types that involves placing the AV under test in a set of trac
scenarios and evaluating the operational safety performance.
A comprehensive evaluation methodology, with validated
metrics, must bedeveloped to derive safety case evidence from
the test conduct of a given scenario; to the authors’ k nowledge,
such a methodology does not exist in the literature.
e Institute of Automated Mobility (IAM) was formed
by the Governor’s Executive Order in 2018 to help provide
guidance on AVs in the state of Arizona. e IAM has been
conducting research to develop an operational safety assess-
ment (OSA) methodology, along with OSA metrics, to beused
in an SCF as the evaluation methodology for trac scenario
testing. e intent of the OSA methodology is to evaluate the
navigation of a given trac scenario by a vehicle (primarily
AVs but can also beused for human-driven vehicles) using
the OSA metrics measurements and assigning a score for said
test. e aggregate score over the set of trac scenarios can
then beused to build the safety case for the AV, which in turn
can beused by AV developers and authorities having jurisdic-
tion (AHJs) to evaluate the status of an AV throughout its
development and allow for a determination of readiness for
various stages of deployment. e safety case provides assur-
ance of a level of safety achieved by an AV, which is imperative
for gaining public trust.
Downloaded from SAE International by Jeffrey Wishart, Monday, April 05, 2021
EVALUATION OF OPERATIONAL SAFETY ASSESSMENT (OSA) METRICS FOR AUTOMATED VEHICLES IN SIMULATION 2
In 2020, IAM researchers proposed a set of OSA metrics
in Wishart etal.[3] that were compiled and adapted from a
comprehensive literature review. e OSA metrics set is, to
the authors knowledge, the only comprehensive set that can
beused to measure operational safety of AVs and human-
driven vehicles in the literature.
In the current work, a subset of the OSA metrics proposed
in [1] has been measured and evaluated for a selection of trac
scenarios to rene where appropriate. is evaluation has been
conducted using simulation soware, including Car Learning
to Act (CARLA) [4] and ScenarioRunner [5], to determine the
OSA metric measurements for a subset of scenarios chosen
from NHTSA’s list of 37 pre-crash scenarios in [6].
e proposed set of OSA metrics included sets of param-
eters and thresholds which were adapted from existing litera-
ture and other research studies. Many of these parameters
and thresholds are considered subjective and require further
research for renement. e analysis conducted in this work
includes an attempt to determine values for parameters and
thresholds for some of the metrics that were le as future work
in [3]. e simulation results presented in this work provide
further insight into the consequences of diering thresholds
and parameters for a variety of car-following scenarios, as
well as the the performance of the proposed OSA metrics in
the context of assessing the operational safety of AVs.
e outline of the paper is as follows. First, the OSA
metrics are described and summarized. e parameters and
thresholds used in the experiments are then listed. e trac
scenarios, including the selection process, are then described.
The simulation methodology is discussed, including the
CARLA soware and test vehicle model. Next, the simulation
results are presented and discussed. Finally, overall conclu-
sions and future work are described.
OSA Metrics
e proposed set of OSA metrics was introduced in Wishart
etal. [3] e objective was to develop a comprehensive set that
would allow for an assessment of the operational safety of a
vehicle (human-driven or automated driving) in a variety of
scenarios as part of an SCF. A novel taxonomy is proposed
here to organize the OSA metrics into three categories: (1)
Black Box metrics, (2) Grey Box metrics, and (3) White
Box metrics:
A Black Box metric allows for measurement of data that
can beobtained without requiring any access to ADS
data. is could befrom an on-board or o-board source
(e.g., public road infrastructure, or CAN bus data).
However, using ADS data may enhance the accuracy and
precision of the measurement(s).
A Grey Box metric allows for measurement of data that
can only beobtained with limited access to ADS data.
A White Box metric allows for measurement of data that
can only beobtained with signicant access to ADS data.
ere are trade-os for each metric type that makes them
advantageous (or disadvantageous) in particular use cases.
For example, the Black Box metrics may bepreferable where
access to proprietary ADS data is not desired. Conversely,
White Box metrics allow for specic sub-systems in the ADS
to beassessed rather than just the AV system as a whole. e
Grey Box metrics represent a balance between sensitivity to
proprietary data and assessment granularity. It should
benoted that while Black Box metrics are useful for both
human-driven vehicles and AVs, Grey Box and White Box
metrics are only applicable to AVs.
e proposed OSA metrics are shown in Table 1, along
with the proposed taxonomy. is proposed set is comprised
only of Black Box and Grey Box metrics since White Box
metrics rely on shared data from AV developers and may
beunavailable for evaluation purposes. e Black Box metrics
can be further categorized as Minimum Safe Distance-
Related, Universal, and Trac Engineering-Related.
e Minimum Safe Distance-Related metrics are based
on the Responsibility-Sensitive Safety (RSS) model [8].
Universal metrics are dened as those which apply to both
human-driven vehicles and AVs including events such as
Collision Incidents and Trac Law Violations. It should
benoted that the latter metric in [3] was originally “Rules-
of-the-Road Violation” but has since been changed to “Trac
Law Violation” since the denition for Rules of the Road is
ambiguous and can include customary practices for a partic-
ular region. Trac engineering metrics consist of tradition-
ally used surrogate safety metrics that have been heavily
researched in the past [9]. Lastly, the Grey Box metrics
include two metrics (ADS Active (ADSA) and Achieved
Behavioral Competency (ABC)) that indicate whether the
ADS is completing the dynamic driving task (DDT) and if
the AV accomplishes the trajectory as planned, respectively.
e Human Trac Control Detection Error Rate (HTCDER)
TAB L E 1 Taxonomy for OSA metrics
Black Box Metrics
Grey Box MetricsMinimum Safe Distance-Related Universal Trac Engineering-Related
Minimum Safe Distance Violation
(MSDV)
Collision Incident (CI) Time-to-Collision Violation
(TTCV)
Human Trac Control Detection
Error Rate (HTCDER)
Proper Response Action (PRA) Trac Law Violation (TLV) Modified Time-to-Collision
Violation (MTTCV)
ADS Active (ADSA)
Minimum Safe Distance Factor
(MSDF)
Human Trac Control Violation
Rate (HTCVR)
Post-Encroachment Time
Violation (PETV)
Achieved Behavioral Competency
(ABC)
Aggressive Driving (AD) Minimum Safe Distance Calculation
Error (MSDCE)
© SAE International.
Downloaded from SAE International by Jeffrey Wishart, Monday, April 05, 2021
EVALUATION OF OPERATIONAL SAFETY ASSESSMENT (OSA) METRICS FOR AUTOMATED VEHICLES IN SIMULATION 3
and Minimum Safe Distance Calculation Error (MSDCE)
provide insight into the perception system performance
without requiring raw sensor data (which is particularly
sensitive, proprietary data) but rather much more limited
ADS data. While the MSDCE metric is also based on the
RSS model, it is classied as a “Grey Box” metric due to the
necessity of some ADS data required to calculate the metric
value.
It should benoted that the authors continue to monitor
the literature for additional metrics to beconsidered for the
OSA metrics, such as Time Headway (THW) from [10],
Collision Avoidance Capability (CAC) from [11], and Model
Predictive Instantenous Safety Metric (MPrISM) from [12].
Future work will consider these metrics to beincluded in the
OSA metrics set. e OSA metrics from Table 1 are also being
considered for the SAE J3237 Information Report currently
being developed by the V&V Task Force of the SAE On-Road
Automated Driving (ORAD) Committee.
Selection and Formulation
of OSA Metrics
For this work, not all of the proposed OSA metrics are measur-
able when using the employed simulation. Within the CARLA
simulator, ground truth data of the vehicles’ positions, speeds,
and accelerations were used to calculate the aforementioned
metrics; therefore, metrics related to quantication of localiza-
tion and tracking errors, such as MSDCE and HTCDER, are
not possible to obtain. Additionally, metrics related to the
behavior of the subject vehicle under test (i.e., the AV), such
as ABC, HTCVR, ADSA, and PRA, are inapplicable to this
work as the algorithm controlling the vehicles’ behavior does
not reect a real AV driving policy.
erefore, the OSA metrics evaluated in the presented
work are:
MSDV
TTCV
MTTCV
PETV
In addition to the previously discussed metrics, the THW
metric discussed in [10] and [13] was also included in the
analysis. THW is a time-based metric similar to that of TTC;
however, the THW is based on only the distance between the
following and lead vehicle in relation to the speed of the
following vehicle rather than the dierence in speed. Similar
to other time-based metrics, the lower the THW value, the
higher the risk of a collision; therefore, THW is oen used in
the literature with a pre-determined threshold. is implies
that when the threshold is met, the situation has become
unsafe and a proper response action is required. An example
of the usage of this metric is the latest United Nations
Economic Forum for Europe (UNECE) regulation on
Automated Lane-Keeping Systems (ALKS) [10]. e THW
formulation was modied to align with the other metrics in
the form of a violation if the threshold is exceeded, such that
the Time Headway Violation (THWV) is introduced.
e details of the selected metrics are shown in Tab le 2
(note that the Distance to Stop Violation (DSV) is discussed
below). e selected metrics are relevant to the analysis of
operational safety and represent the rst step in the evaluation
of the proposed OSA metrics.
In order to evaluate the performance of the chosen OSA
metrics, three evaluation criteria were developed to analyze
the ecacy of a given metric in assessing the safety of a
driving situation:
Robustness to changing scenario congurations
Relevance
Comparison to a ground truth metric
Robustness to changing scenario congurations refers to
the metrics providing a timely warning that changes appro-
priately with variations for dierent scenario conditions such
as vehicle initial position and speed, relative headway of the
vehicles, changes in the environment, etc. For example, if the
scenario conguration changes but the metrics violation
timing does not, then the metric robustness is lower.
Relevance refers to the metric providing safety informa-
tion throughout the scenario for all scenario permutations.
For example, if there is one (or more) instance(s) of the metric
exhibiting a nonsensical value (e.g., a denominator being 0),
the metric relevance is lower.
For the purpose of this work, a ground truth metric,
Distance to Stop Violation (DSV) has been established to
enable comparisons of the eectiveness of metrics in identi-
fying a potentially unsafe situation preemptively. is crite-
rion thus evaluates the timeliness of the metric violation
temporal occurrence. e ground truth DSV metric is based
on the distance that it would take the subject vehicle to come
to a full stop at a determined deceleration (DSTOP in
Equation 6in Table 2). When the follower vehicle is driving
at a distance that is less or equal than the distance required
for this deceleration to occur, a DSV has occurred. is
metric indicates a potentially unsafe situation under ideal
conditions, meaning that this metric does not consider the
velocity or deceleration capabilities of the lead vehicle (or
rather it assumes a stopped lead vehicle) nor the road condi-
tions. It also assumes that the subject vehicle can reach the
dened abrakeinstantly. Because of this, two dierent values
for abrake were selected and evaluated with simulation in order
to provide dierent “thresholds” for comparison with the
ground truth. ese values were taken from [10] and reect
the minimum deceleration needed to perform an emergency
maneuver (abrake=5 m/s2) and a reasonable (maximum) decel-
eration applied by automatic emergency braking (AEB)
systems (abrake=8.3 m/s2). e implementation of this evalu-
ation criterion is described in the Metrics Observations and
Discussion section.
ese three criteria will beused to evaluate the metrics
based on the experimental results in the following sections.
Downloaded from SAE International by Jeffrey Wishart, Monday, April 05, 2021
EVALUATION OF OPERATIONAL SAFETY ASSESSMENT (OSA) METRICS FOR AUTOMATED VEHICLES IN SIMULATION 4
OSA Metrics Parameterization
e OSA metrics evaluated in this work included subjective
assumptions for thresholds determining when a violation of
a metric occurred. erefore, this work focused on evaluating
the impact of dierent threshold and parameter values used
to dene the OSA metrics.
Time-Based Metrics e implementation of time-based
metrics such as TTCV, MTTCV, PETV, and THWV is highly
dependent on the threshold values assigned to them and there-
fore it is important to evaluate the metrics’ performance as a
function of diering thresholds. e set of values chosen for
TTCV, MTTCV, and THWV thresholds are based on values
suggested by previous literature reviewed in [3]. In the case
of PETV, the threshold values were chosen according to the
TAB L E 2 OSA metrics formulation
Minimum Safe Distance Violation
<∧ <
=
min min
1
0
lat lat long long
if d d d d
MSDV else
= ∧
=
1 '1
0
if MSDV Originated by AV
MSDV
else
(1)
Time to Collision Violation
LF
FL
XX
TTC vv
=
=
1
0
if TTC threshold
TTCV
else
(2)
Modified Time to Collision Violation −∆ ± + ∆
=
 
 
22V V AD
MTTC A
=
1
0
if MTTC threshold
MTTCV else
(3)
Post Encroachment Time Violation PET=t2t1
=
1
0
if PET threshold
PETV else (4)
Time Headway Violation
=
,
=
1
0
if THW threshold
THWV else
(5)
Distance to Stop Violation
2
2
F
brake
v
DSTOP a
=
=
1
0
long
if d DSTOP
DSV else
(6)
Where:
dlong: longitudinal distance between two vehicles
dlat: lateral distance between two vehicles
min :
long
d
minimum longitudinal distance between two vehicles ([8])*
min
lat
d
: minimum lateral distance between two vehicles ( [8])*
XL: Leading vehicle position
XF: Following vehicle position
vL: Leading vehicle speed
vF: Following vehicle speed
V
: Relative velocity
A
: Relative acceleration
D
: Relative space gap (equivalent to
dlong in car following situations)
t2: Arrival time of (any part of) Vehicle 2 at Conflict Point
t1: Arrival time of (any part of) Vehicle 1 at Conflict Point
abrake: Following vehicle deceleration
*Note: The formulae for
min
long
d
and
min
lat
d
are in Appendix A (equations 7 and 8).
© SAE International.
Downloaded from SAE International by Jeffrey Wishart, Monday, April 05, 2021
EVALUATION OF OPERATIONAL SAFETY ASSESSMENT (OSA) METRICS FOR AUTOMATED VEHICLES IN SIMULATION 5
distribution of PET values found in the simulated data, as the
spread of PET values was smaller than that of the rest of the
time-based metrics (see Figure 7). e threshold values for
the time-based metrics evaluated in this work are shown in
Tab le 3.
Minimum Safe Distance-Related Metric For the
Minimum Safe Distance-Related metric, t here are four param-
eters that were examined according to the denitions in [8] :
1. Reaction time of the ADS of the subject vehicle, ρ
2. Maximum acceleration of the subject vehicle during
the response duration,
aaccel
long
max,
3. Minimum deceleration of the subject vehicle aer the
response duration in order to avoid a collision,
adecel
long
min,
4. Maximum assumed deceleration capability of the
other vehicle,
adecel
long
max,
.
e values chosen for the above-mentioned parameters
were extracted from previous research that determined such
values based on naturalistic driving data and further simula-
tion experiments. For the purpose of this work, the set of
values were separated into three dierent categories (shown
in Table 4): Aggressive, Conservative, and NDS (with NDS
being Naturalistic Driving Study). The values for the
Aggressive and Conservative categories were adopted from
[14] in which a Falsication Search Engine using simulation
was used to nd RSS parameter values with an associated
robustness value, with robustness being a measure of how
close the vehicles under test were during the simulated
scenarios. From the search, clusters of parameter sets were
divided into Aggressive and Conservative categories as they
resulted in more aggressive (i.e., shorter following distances)
and more conservative (i.e., larger following distances)
behaviors, respectively. e values under the NDS category
were adopted from the China Intelligent Transportation
Systems (C-ITS) Alliance standard #0116-2019 [15]. e
values in this standard were dened aer analyzing 3 years
of naturalistic driving data collected from Shanghai highways
in China.
Experimental Design
e experimental design in this work involved developing
various scenarios and then implementing said scenarios in
simulation, as described in the following sections.
Scenario Selection
One of the difficult challenges currently facing the AV
industry is the selection of scenarios to beevaluated for the
assessment of vehicle safety. Unique scenario generation is
out of the scope of this project and the authors relied upon
the 37 pre-crash scenarios documented by the National
Highway Trac Safety Administration (NHTSA) [6]. From
the list of pre-crash scenarios, a subset of car-following situ-
ations was selected for evaluation. e 37 pre-crash scenarios
were filtered and car-following scenarios that involved
“Two-Vehicle”, “Light-Vehicle Crashes” were then selected
based on frequency of occurrence. Scenarios involving signal-
ized and unsignalized junctions are out of the scope of this
work. e reason for using car-following scenarios in the
subject work was for simplication within the context of the
simulation setup. Future work will expand the discussed
methodology to consider more complex scenarios such as
intersection-related environments. Details of the chosen
scenarios extracted from [6] are summarized in Table 7 in
Appendix B and include:
1. Lead vehicle stopped (LVS)
2. Lead vehicle decelerating (LVD)
3. Lead vehicle moving at lower constant
speed (LVMLCS)
4. Lead vehicle accelerating (LVA)
Together, the four scenarios accounted for some 18.4% of
all light-duty vehicle crashes from 2004-2008, according to
Tab le 7 in [6] .
Scenarios Realization
In this work, each selected scenario from NHTSA’s pre-crash
topology is considered a scenario category in which all
scenarios dened within the same category share the same
behavior, but variations on the vehicles’ speeds and/or initial
position were dened to evaluate the sensitivity to changes in
such conditions. Speed variations and positions for the
vehicles were dened with the goal to create relevant situations
that may aect the resulting OSA metrics calculation. is
allows for the inference of data trends when calculating the
associated OSA metrics. In total, 13 dierent scenarios were
dened, shown in Table 5 that were simulated and then
analyzed. e scenarios dened in this study are not meant
to beexhaustive of dierent driving situations, but rather an
initial selection for studying the impact of dierent driving
conditions of vehicles (e.g., vL > vF, vL = vF, etc.) in the
OSA metrics.
In the simulation experiments, the selected scenarios
involve only two vehicles, the subject vehicle and the lead
TAB L E 3 Evaluated thresholds for time-based metrics
Metric Thresholds [s]
TTCV {1, 2, 3, 4, 5}
MTTCV {1, 2, 3, 4, 5}
PETV {0.5, 1, 1.2, 1.5, 2}
THWV {1, 2, 3, 4, 5}
© SAE International.
TAB L E 4 Categories of RSS parameters for MSDV metric
Category ρ [s]
long
max,accel
a
[m/s2]
long
min,decel
a
[m/s2]
long
max,decel
a
[m/s2]
Aggressive 0.5 4.1 4.6 8
Conservative 1.9 5.9 4.1 9.5
NDS 0.2 1.8 3.6 6.1
© SAE International.
Downloaded from SAE International by Jeffrey Wishart, Monday, April 05, 2021
EVALUATION OF OPERATIONAL SAFETY ASSESSMENT (OSA) METRICS FOR AUTOMATED VEHICLES IN SIMULATION 6
vehicle. Both vehicles start from a rest position and accelerate
to their target speeds unless explicitly stated otherwise. e
two vehicles are positioned with the subject vehicle behind
the lead vehicle in the same lane at a specied distance, in a
straight road with no breaks (e.g., junctions, signals, turns,
stop signs, etc.) which is long enough for both vehicles to
achieve their designated target speeds. Each scenario
progresses such that the subject vehicle approaches the lead
vehicle and a collision occurs. e scenario is terminated at
the collision time.
e behavior of the vehicles in a scenario is described
as follows:
Lead Vehicle Stopped (LVS) e lead vehicle is posi-
tioned at an initial distance ahead of the subject vehicle to
provide enough distance for the subject vehicle to reach its
designated target speed. e subject vehicle maintains this
speed until it eventually collides with the lead vehicle.
roughout the entire scenario, the lead vehicle stays at rest.
Example vehicle dynamics for the LVS scenarios are depicted
in Figure 1.
Lead Vehicle Decelerating (LVD)e lead vehicle is
positioned at an initial distance ahead of the subject vehicle
and both vehicles start moving from rest position. Both
vehicles maintain a constant acceleration aer reaching the
target speed. Aer vehicles achieve their target speed, the lead
vehicle starts decelerating until reaching a full stop. e
subject vehicle eventually collides with the decelerating lead
vehicle. Example vehicle dynamics for the LVD scenarios are
depicted in Figure 2.
Lead Vehicle Moving at Lower Constant Speed
(LV M LC S )The lead vehicle is positioned at an initial
distance ahead of the subject vehicle and both vehicles start
moving from rest position. e target speed of the lead vehicle
is lower than that of the subject vehicle. e subject vehicle
eventually collides with the slower-moving lead vehicle.
Example vehicle dynamics for the LVMLCS scenarios are
depicted in Figure 3.
Lead Vehicle Accelerating (LVA) e lead vehicle is
positioned at an initial distance ahead of the subject vehicle
and both vehicles start moving from rest position. e lead
vehicle has an initial speed that is lower than t hat of the subject
vehicle. As the subject vehicle approaches the lead vehicle, the
lead vehicle starts accelerating to its nal target speed. e
subject vehicle eventually collides into the accelerating lead
vehicle. Example vehicle dynamics for the LVA scenarios are
depicted in Figure 4.
TAB L E 5 Scenarios categories and details
Scenario Category Scenario ID
Initial Headway
Distance [m]
Subject Vehicle
Target Speed [m/s]
Lead Vehicle Target
Speed [m/s]
Lead Vehicle Stopped LVS_10 200 10 0
LVS_15 200 15 0
LVS_18 200 18 0
Lead Vehicle Decelerating LVD_14 5* 14.1 14
LVD_15 5* 15 15.1
LVD_16 30 16 18
LVD_18 30 18 18
LVD_20 30 20 18
Lead Vehicle Moving at Lower Constant Speed LVMLCS_12 30 12 10
LVMLCS_15 30 15 10
LVMLCS_20 30 20 15
Lead Vehicle Accelerating LVA_15 30 15 20
LVA_20 30 20 25
*Note: Scenarios LVD_14 and LVD_15 are special situations in which the subject vehicle is initially 5m away from leading vehicle, both driving at
high speeds, purposefully creating unsafe situations for the sake of metrics evaluation.
© SAE International.
FIGURE 1  Example: LVS_18 scenario dynamics
© SAE International.
FIGURE 2  LVD_16 scenario dynamics
© SAE International.
Downloaded from SAE International by Jeffrey Wishart, Monday, April 05, 2021
EVALUATION OF OPERATIONAL SAFETY ASSESSMENT (OSA) METRICS FOR AUTOMATED VEHICLES IN SIMULATION 7
Simulation Setup
CARLA [4] is an open-source AV simulation soware which
can beused to simulate an AV and diverse sensor suites for
use in dierent weather conditions and road topologies.
ScenarioRunner for CARLA [5] is an open source trac
scenario denition and execution engine that can beused for
dening the interaction of an AV with other road users in
dierent driving situations. Within ScenarioRunner, it is
possible to dene scenarios using the standard OpenScenario
format [16] which allows for controlling involved trac partic-
ipants’ initial conditions, driving inputs, and environmental
conditions for evaluation purposes.
For the purpose of this work, a Tesla Model 3 vehicle from
the CARLA vehicle library was selected as the test vehicle.
Both the subject a nd leading vehicle within each scenario were
defined using the same vehicle model to avoid possible
confounding results associated with nuances of dierent
vehicle models. e particular vehicle used for the simulation
is not important so long as the vehicle characteristics are
representative of realistic conditions as the metrics are not
sensitive to a specic vehicle type.
The subject vehicle used in the scenarios behaved
according to a basic behavior dened by the “Roaming Agent”
in CARLA. is agent controls the longitudinal and lateral
motion of the vehicle using a PID controller (with parameters
for the longitudinal control as: Kp = 0.1, Kd = 0.1 and Ki = 1.0;
the parameters for lateral control are not relevant to this
work). Only longitudinal control is relevant in the car-
following scenarios evaluated in this work. Additionally, the
agent responds to trac lights and other vehicles within a
certain proximity threshold. In our experiments, this prox-
imity threshold was set to 0 meters in order to prevent inter-
ference with the subject vehicle’s behavior creating, in this
way, purposefully unsafe situations and collisions for the sake
of the metrics evaluation. As a result, the subject vehicle did
not apply any response when approaching the leading vehicle.
Within each scenario run, the data needed to compute
the metrics were collected and a post-processing Python
pipeline was generated to calculate the instantaneous measure
of the OSA metrics throughout each scenario execution.
In order to calculate the Minimum Safe Distance-Related
metrics from [3], the subject vehicle used the open source
implementation of the RSS model from [8] and [17] that is
integrated within CARLA [18]. An “RSS Sensor” was attached
to the subject vehicle which analyzed the situation at each
time step in order to calculate the longitudinal and lateral
minimum safe distances with respect to the other road users
during the simulation. Within the simulation setup, the RSS
Sensor was only used to detect dangerous situations (according
to RSS denitions), but the subject vehicle did not actuate
based on this information (i.e., did not apply a Proper Response
as dened by RSS), thus, resulting in a collision for each
scenario. By allowing collision events to occur, the capability
of the OSA metrics to pre-emptively determine potential colli-
sions could beassessed.
A quantitative analysis as well as a visual depiction of the
metrics was generated to better understand the relationship
between dierent metrics values and the impact that thresh-
olds or parameter values have on the context of assessing the
safety of the situation.
Experimental Results
e evaluation of the metrics’ relationship, redundancies,
areas of inapplicability, and any other potential observations
associated with the proposed metric thresholds and param-
eters is explained in the following sections.
Metric values as well as metric violation duration distri-
butions across all scenarios were evaluated in order to under-
stand the relevance of each metric for the assessment of the
operational safety performance of AVs. Particularly, metric
violation durations are important for assessing violation
temporal occurrence when identif ying unsafe situations. Short
metric violation durations could be indicative of failure to
identify t he safety of the situation. In ot her words, short metric
violation durations could indicate that the metric identied
an unsafe situation too late, or that it is not giving a continuous
safety assessment. On the other hand, large metric violation
durations could beindicative of increased conservativeness.
Additionally, metric violations for a given threshold/param-
eter set were also analyzed across the scenarios.
Metric Parametrization Based
on Duration Distributions
For each of the evaluated metrics, specic thresholds and, in
some cases, assumed parameters must beassigned in order
to establish a violation occurrence. In previous work, Wishart
et al. [3] compiled thresholds and assumed parameters
collected from previous research as proposed values for each
metric. is analysis considered a range of values for each
FIGURE 3  LVMCLS_15 scenario dynamics
© SAE International.
FIGURE 4  LVA_20 scenario dynamics
© SAE International.
Downloaded from SAE International by Jeffrey Wishart, Monday, April 05, 2021
EVALUATION OF OPERATIONAL SAFETY ASSESSMENT (OSA) METRICS FOR AUTOMATED VEHICLES IN SIMULATION 8
scenario to further assess the impact of these values on gener-
ating meaningful results. Comparing the results for each
threshold value demonstrated that for high thresholds values
on the time-based metrics, violations were generated well
before the vehicle would need to react to avoid an unsafe situ-
ation whereas for lower thresholds, an unsafe situation was
not detected until the vehicle would not have enough time to
safely avoid the collision. is evaluation provided additional
context to justify threshold values within the metric
violation evaluations.
Minimum Safe Distance
Violation (MSDV)
e Minimum Safe Distance Violation metric depends on the
current longitudinal and lateral distances between the subject
vehicle and the lead vehicle as well as the calculated
RSS-dened longitudinal and lateral minimum safe distances
using the dened parameters. In the time steps during which
the calculated minimum safe distances were less than the
instantaneous longitudinal and lateral distances, a violation
of the Minimum Safe Distance occurs (eq. (1)). In order to
better understand the impact of dierent parameter sets into
the MSDV metric, the distributions across the selected
scenarios of the resulting calculation of the Minimum Safe
Distance using different parameters sets are depicted in
Figure 5.
e results indicated that the Minimum Safe Distances
calculated with Ag gressive and NDS parameter sets lead to
similar distributions, with median longitudinal distances of
23.0m and 16.2m respectively. In contrast, the Conservative
category had a median longitudina l distance of 103.8 m, more
than four times the median of the Aggressive category.
When analyzing the distribution of the duration of a MSD
Violation with the dierent parameter sets, a more conserva-
tive set of parameters yielded longer violation durations, with
a median violation duration of 16.6 s (see Figure 6). In the case
of Aggressive and NDS parameters, the median durations of
an MSD Violation were 8.2 s and 5.4 s, respectively. e
minimum violation durations were 2.0 s, 1.6 s, and 7.4 s for
Aggressive, NDS and Conservative categories, respectively.
While Aggressive and NDS parameter sets generated
similar distributions of MSD values and violation durations,
it is worth noting the eect of choosing one set over the other.
In the case of the Aggressive category, the subject vehicle
assumes that it can accelerate up to amax, accel = 4.1 m/s2 during
the response time ρ = 0.5 s and then it is expected to brake
with at least amin, brake = 4.6 m/s2. Additionally, the subject
vehicle assumes that the front vehicle can brake with up to
amax, brake= 8.0 m/s2. ese values certainly reect a more
aggressive and jerky behavior than the values from the NDS
category. e primary reason for this is because values on the
NDS category were extracted from human drivers’ behavior,
while the ones on the Aggressive category from simulation
experiments. Moreover, the values in the NDS category are in
line with previous research work done by NHTSA around the
average maximum deceleration values achieved by humans
[19]. In the latter work by NHTSA, it was found that the mean
maximum deceleration applied by humans on dry and wet
surfaces was approximately 0.67 g (SD 0.25, max 1.15, min
0.00 4).
Time-Based Metrics In this section, the time-based
metrics TTCV, MTTCV, PETV and THWV were analyzed,
in addition to the impact of diering thresholds for the initia-
tion of a metric violation and the duration of the violations.
Figure 7 depicts the distribution of the time-based metric
values across all scenarios. For TTC, MTTC, and THW calcu-
lations, the values were limited to a maximum of 10 s when
the TTC, MTTC and THW calculations resulted in larger
values, skewing the distributions of TTC and MTTC values
towards 10 s, with median values of 10 s in both cases.
Conversely, THW had a median value of 1.9 s. is speaks
towards the lack of relevance of the information provided by
TTCV and MTTCV metrics overall. In the case of PET, a
narrower distribution of values was observed with a median
of 1.5 s and a maximum of 3.7 s.
Time-To-Collision Violation (TTCV) e TTC metric is
arguably the most popular surrogate safety metric found in
the literature. erefore, it is important to understand the
meaning of a certain threshold value for TTC in relation with
the operational safety of vehicles. e TTCV threshold was
varied from 1 s to 5 s and the resulting TTC Violation duration
is shown in Figure 8. For a threshold value of 1 s, the TTCV
FIGURE 5  Minimum Safe Distance calculations boxplot with
varying parameter sets
© SAE International.
FIGURE 6  MSDV duration boxplot with varying
parameter sets
© SAE International.
FIGURE 7  Time-based metrics calculation boxplot with
varying threshold values
© SAE International.
Downloaded from SAE International by Jeffrey Wishart, Monday, April 05, 2021
EVALUATION OF OPERATIONAL SAFETY ASSESSMENT (OSA) METRICS FOR AUTOMATED VEHICLES IN SIMULATION 9
duration has a maximum value of 1.3 s and a median of 1 s.
is implies that the metric likely fails to identify unsafe situ-
ations on time (i.e., 1 s before collision). For threshold values
of 2 s and 3 s, the duration of a TTCV yields a median of 1.9
s and 2.9 s, respectively. e minimum violation duration for
the 2 s threshold is 0.8 s whereas for a threshold of 3 s, the
minimum violation duration is 0.85 s. is implies that a
threshold of 3 s for TTCV might betoo conservative for real-
world driving for the scenarios studied in this work, since the
lower duration violations that happened with a threshold of
2 s were also associated with a threshold of 3 s. Beyond a
threshold of 4 s, the increasing trend of mean duration of
TTCV attens out. e maximum maintains its trend due to
the fact that a higher threshold means the activation of TTC
is much earlier during a scenario. Aer the 4 s threshold, the
metric becomes too conservative to pick up on the smaller
duration of violations.
Modified Time-To-Collision Violation (MTTCV) e
MTTC metric was introduced as a more sophisticated TTC,
considering relative distances, speeds and accelerations of the
vehicles. However, unless linked to a threshold, MTTC by
itself cannot give a sense on the severity of a situation. is is
mainly due to the fact that two vehicles under test might have
the same MTTC value for dierent combinations of relative
speeds and distances. In this work, MTTC Violations were
analyzed with thresholds varying from 1 s to 5 s and the
resulting MTTCV durations at dierent thresholds are shown
in Figure 9.
From Figure 9 weobserve that the minimum MTTCV
duration is always close to 0 (0.05 s in all cases, as this is the
time cycle of the simulation), even in cases of higher threshold
values. is highlights the sensitivity of the MTTC ca lculation
to changes in the relative speed, acceleration and distances
between the vehicles. In all cases, very short violations were
found, suggesting that MTTCV could result in multiple
conservative violations (e.g., unwarranted warnings of an
unsafe situation) regardless of the threshold value.
Furthermore, the median value of a duration for MTTCV for
a thresholds of 3, 4 and 5 seconds is biased towards the lower
values, implying shorter violation durations even for the
highest threshold. The variability demonstrated by the
MTTCV metric across dierent threshold values illustrated
some of the drawbacks of this measurement in the context of
reliably evaluating safety of a vehicle consistently
across scenarios.
Post-Encroachment Time Violation (PETV) PET is a
traditional surrogate safety measure used frequently in trac
engineering studies. By including PET in the proposed set of
metrics, the intent was to provide comparable measures to
existing datasets in addition to a metric that is relatable to
both human driven and automated vehicles. In the context of
the tested scenarios, PET was measured as the dierence in
time between a conict point for the subject and lead vehicles.
In the car following scenarios analyzed, several conict points
were dened throughout the scenarios, resulting in a PET
curve over time, rather than a single PET value for each
scenario. e latter case was the origina l intent of the PET [9].
In the cases where the subject vehicle’s path does not coincide
with the path of the lead vehicle, i.e., the rear bumper of the
lead vehicle and the front bumper of the subject vehicle passing
through the conict point, the PET value was non-existent.
e threshold for a PET violation was varied from 0.5 s to 2 s.
As depicted in Figure 10 the PET violation duration was
minimal for all t hresholds due to scenarios in the LVS category
where there is no post encroachment. In all cases, the
maximum violation duration was larger than 15 s due to
scenarios LVD_14 and LVD_15 where vehicles were 5m away
from each other initially. For thresholds 0.5 s, 1 s, 1.2 s, and
1.5 s, there is an increasing trend for the median values of the
violation duration, with values of 1 s, 2 s, 2.4 s, and 3.6 s
respectively. e variability demonstrated by the PET viola-
tion metric illustrated some of the pitfalls associated with this
measurement in the context of reliably evaluating safety of a
vehicle consistently across scenarios.
Time Headway Violation (THWV) e THW metric is
similar to the TTC metric; however, it is dened by the relative
distance between the subject vehicle and lead vehicle and only
the velocity of the subject vehicle. THW violations were evalu-
ated for varying thresholds from 1 s to 5 s across all scenarios
and the THWV duration distributions are depicted in
Figure 11.
FIGURE 8  TTCV duration boxplot with varying
threshold values
© SAE International.
FIGURE 9  MTTCV duration boxplot with varying
threshold values
© SAE International.
FIGURE 10  PETV metric boxplot with varying
threshold values
© SAE International.
Downloaded from SAE International by Jeffrey Wishart, Monday, April 05, 2021
EVALUATION OF OPERATIONAL SAFETY ASSESSMENT (OSA) METRICS FOR AUTOMATED VEHICLES IN SIMULATION 10
e duration of the THWV for a threshold value of 1 s
has a median of 3 s and a maximum of 18.3 s, due to the condi-
tions on scenarios LVD_14 and LVD_15 (same as with PETV).
When the threshold is increased, the violation duration distri-
butions are skewed towards the maximum value due to the
fact that the THW calculation only considers the relative
distance between both vehicles regardless of the speed of the
lead vehicle. erefore, a THWV will happen when the ratio
between the delta in distance between vehicles and the subject
vehicle’s speed is equal to the threshold value (Δd/
speed=threshold). Since most scenarios in this work start
with a Δd = 30m and vehicles star t accelerating until reaching
their target speed (oen greater than 15m/s), a THWV will
betriggered before the subject vehicle reaches its target speed,
independent of the behavior of the lead vehicle. In these situ-
ations, a THW V triggers when the subject vehicle is travelling
at 15 m/s, 10 m/s, 7.5 m/s and 6 m/s for thresholds of 2 s, 3 s,
4 s, and 5 s respectively. is results in larger violation dura-
tions, as seen in Figu re 11.
Metrics Parameterization Based on Temporal
Occurrence As mentioned in previous sections, one of the
criteria for evaluating the metrics is to compare the results
against the DSV, considered to bethe ground truth metric.
is was done by examining the temporal occurrence of each
metric for each scenario with respect to the dened ground
truth. From this, three temporal regions were identied:
1. Metric violation prior to DSV with abrake=5 m/s2-
is indicates that the subject vehicle may not require
immediate action since a forced braking event would
not require excessive deceleration. However, if a
metric violation occurred too early compared to DSV
at 5 m/s2, this may bean overly conservative violation.
2. Metric violation between DSV with abrake=5 m /s2
and DSV with ab rake=8.3 m/s2- is indicates the
subject vehicle is in a situation in which an avoidance
maneuver may need to betaken, since a forced
braking event might require a deceleration greater
than 5 m/s2, considered to bean emergency maneuver
[10] that may not bea comfortable deceleration rate
for passengers [19]. Metric violations happening in
this temporal region may not indicate failure to
identify the safety of the situation as the DSV does
not consider the reaction time of the subject vehicle
but also does not consider the speed of the
lead vehicle.
3. Metric violation aer DSV with abrak e=8.3 m/s2-
is indicates that the subject vehicle is in a situation
in which an avoidance maneuver is recommended,
since a forced braking event might require a
deceleration rate greater than 8.3 m/s2, which was
established in [10] as an appropriate expected value
for AEB systems; thus, the deceleration rate would
exceed (or nearly exceed) the braking capability of the
AV, and a collision may not beavoided. Metric
violations happening in this temporal region may
indicate failure to identify the safety of the situation
at an appropriate time.
e results of metric violation temporal occurrences
across all scenarios at varying thresholds/parameters based
on the three regions are listed in Ta ble 8. Additionally, the
average time dierence between a metric violation and the
DSV at 5 m/s2, DSV at 8.3 m/s2, or a collision was calculated
for each threshold, depending on whether the metric violation
occurred in a temporal region of 1, 2, or 3, respectively. is
allows for analyzing the trade-os between dierent thresh-
olds and parameters sets. e optimal threshold/parameter
set should besuch that it tries to maximize violation occur-
rences in region 1 at the minimum average time prior DSV at
5 m/s2 as well as maximize the time dierence between viola-
tion occurrence and DSV at 8.3 m/s2 in region 2 and minimize
violation occurrences in region 3. Otherwise, the metric at
the given threshold could become overly conservative or not
relevant enough.
e TTCV metric with a threshold of 2 s has as many
violations as TTCV with thresholds of 3 s and 4 s in region 1
but with the lowest average time prior to DSV at 5 m/s2. e
MTTCV metric with a threshold of 2 s, while it achieved a
lower number of violations in region 1 than did higher
threshold values, has the lowest average time prior to DSV at
5 m/s2. e PETV metric with a threshold of 1.5 s achieved a
slightly higher average time prior to DSV at 5 m/s2 (only 0.13
s higher than a threshold of 1 s) but with a lower number of
metric violations within region 3. e THWV w ith a threshold
of 2 s achieves all occurrences in region 1, but the average time
of a THWV prior to DSV at 5 m/s2 of 2.4 s indicates that the
metric with the given threshold is possibly overly conservative.
Finally, the MSDV metric occurs most frequently in the rst
region with varying parameters, but the NDS parameter
category achieves the lowest average metric violation prior to
DSV at 5m/s2.
Metrics Parametrization
Selection
Based on the metrics parametrization analysis from the
previous section, the results shown in Tab le 8 in Appendix C,
and the threshold/parameter set selection process discussed
above, a threshold value or a parameter set was chosen for
each metric and the OSA metrics in the scenarios dened in
this study were evaluated. A value of 2 s was chosen for the
TTCV, MTTCV, and THWV metrics and 1.5 s for PETV, as
these values demonstrated a balance between conservativeness
and usefulness when compared to other threshold values for
each given metric. For the MSDV metric, the parameter set
from the NDS category was chosen as it demonstrated
FIGURE 11  THW metric boxplot with varying
threshold values
© SAE International.
Downloaded from SAE International by Jeffrey Wishart, Monday, April 05, 2021
EVALUATION OF OPERATIONAL SAFETY ASSESSMENT (OSA) METRICS FOR AUTOMATED VEHICLES IN SIMULATION 11
reasonable following distances (Figure 5) at the minimum
violation duration (median of 5.1 s) with a reasonable compro-
mise between conservativeness versus usefulness compared
to the other parameter categories. It is worth noting that
parameter/threshold values may dier when evaluating the
safety performance of human-driven vehicles, compared to
that of AVs, as human drivers could have longer reaction times
than the ones proposed in this work.
Scenario Results
e time at which metric violations occurred throughout the
execution of the scenarios with regards to the collision point
are shown in Figure 12 through Figure 15. e violation
symbols on the gures depict the start of a metric violation
and the trace aer it demarcates the violation duration. us,
in scenarios where more than one violation occurred for a
given metric1, the cause was either due to the change in the
dynamics of the scenario, or an initial, overly conservative
violation. Additionally, the temporal occurrence of the ground
truth DSV is shown in each gure for both the minimum
deceleration needed to perform an emergency maneuver
(abrake=5m/s2, in orange ) and a reasonable (maximum) decel-
eration applied by automatic emergency brak ing (AEB) systems
(abrake=8. 3m/s2, in red ) for a visual comparison of the metrics’
performance in preemptive warning versus emergency
warning time needed to respond to a conict. For example, in
the LVS_10 scenario of Figure 13, the DSV at 5 m/s2 occurred
at 23.0 s and the DSV at 8.3 m/s2 occurred at 23.4 s.
As depicted in each of the gures, a collision resulted
from each scenario and the timing was dictated by scenario
dynamics, including the speeds and initial separation of the
vehicles. For example, in LVS_10 shown in Figure 13, a colli-
sion happened aer 24.0 s of scenario execution (demonstrated
by the Headway Distance curve reaching 0m at the secondary
y-axis at right) due to the lower speed of the subject vehicle
compared to scenarios LVS_15 and LVS_18, in which vehicles
collided earlier, at 18.1 s and 16.8 s, respectively. In the case
of the metrics violations, in all scenarios within the LVS
category (Figure 13), the THWV, TTCV and MTTCV
occurred at the same time (2 s before the collision), and lasted
until the collision occurred. PETV occurred at the time of
collision, which is obviously not useful as a warning. e only
metric that had a variation on its results with the dierent
scenario congurations was the MSDV metric.
For TTCV, it is worth noting that for scenarios in which
the lead vehicle is driving faster than the subject vehicle (e.g.,
LVD_16), the TTCV metric fails to provide any information
with respect to the safety of the situation during the condition
vl>vf since it results in a negative denominator for the TTC
calculation. A similar behavior happens in cases in which
vehicles are driving at the same speed, (e.g., LVD_18), a TTCV
will not occur unless one of the vehicles changes speed, and
even in that case, the TTCV might occur too late. Moreover,
when vehicles have slightly dierent speeds, like in LDV_14
and LVD_15, the vehicles drive 5m away from each other
without triggering a TTCV until the lead vehicle starts
1 is occurred for MTTCV in scenarios LVD_14 and LVD_15 and for
THWV in scenario LVD_16.
decelerating. e results on TTCV show that this metric is
not always useful, leading to numerous delayed violations in
some cases. Additionally, TTCV is not robust to variations in
the relative vehicle speeds.
MTTCV showed similar behavior to that of TTCV, oen
triggering a violation less than 1 s before TTCV. In some cases,
due to variations in the velocity and accelerations of both
vehicles throughout the scenarios, the metric experienced
changing status of violations with short durations, indicating
that MTTCV may not bea consistently reliable metric for
dierent scenarios (e.g., LVD_14). is observation would
suggest that in scenarios where vehicles may experience
minimal changes in velocity and accelerations (i.e., typical
highway car-following scenarios) other metrics may
bemore suitable.
e PETV metric did not provide releva nt safety informa-
tion in cases where the was no post encroachment (e.g.,
LVS_10, LVS_15, and LVS_18), meaning that the subject
FIGURE 12  Lead Vehicle Decelerating metrics violations
© SAE International.
Downloaded from SAE International by Jeffrey Wishart, Monday, April 05, 2021
EVALUATION OF OPERATIONAL SAFETY ASSESSMENT (OSA) METRICS FOR AUTOMATED VEHICLES IN SIMULATION 12
vehicle did not travel through the conict point. ese can
beconsidered as analogous to “false negatives” in perception
systems. is is a huge pitfall for the robustness and usefulness
of the PETV metric in the car-following scenarios evaluated
in this work.
THWV is a metric that was not considered in the initial
proposed set of safety metrics; however, it has been evaluated
here due to its relevance in a recently published standard for
ALKS features [10]. is metric is dependent on the distance
between the subject and lead vehicles and the velocity of the
subject vehicle which may beconfounding depending on the
particular driving situation. Consider, for example, scenarios
LVD_16, LVD_18, and LVD_20. A THWV violation happens
always at the same time even when the lead vehicle is behaving
dierently (higher, same, and lower target speed than subject
vehicle, respectively). is metric produces a violation once
the subject vehicle is moving at a speed of 15 m/s or more.
Specically in the case of LVD_16, the rst THWV disappears
once the dista nce between vehicles increases as the lead vehicle
moves faster, and a second THWV appears approximately 2
s prior to collision once the lead vehicle starts decelerating.
is indicates a lack of robustness to changing scenario
congurations. One benet of the THWV metric over other
metrics, such as TTCV and PETV, is the relevance and avoid-
ance of discontinuities and negative values in the data.
e MSDV metric seemed robust to changes in scenario
dynamics providing a continuous safety assessment even
when the lead vehicle was moving faster than the subject
vehicle (e.g., LVD_16). Variations in its subjective parameters
yielded diering results, but overall, it is a continuous metric
that considers many relevant aspects for driving such as
reaction time and braking capability of the vehicles. As a
result, overly conservative violations were not seen in the
scenarios for the parameters values chosen.
e criteria introduced earlier dening the ecacy of a
metric according to robustness, relevance, and comparison
to the ground truth DSV metric was evaluated for each metric
based on the experimental results. Table 6 summarizes the
ecacy of each metric in the evaluation criteria categories
with “-” for low ecacy, “+” for medium ecacy, and “++” for
hi gh ecac y.
FIGURE 14  Lead Vehicle Moving at Lower Constant Speed
metrics violations
© SAE International.
FIGURE 15  Lead Vehicle Accelerating metrics violations
© SAE International.
FIGURE 13  Lead Vehicle Stopped scenarios and
metrics violations
© SAE International.
Downloaded from SAE International by Jeffrey Wishart, Monday, April 05, 2021
EVALUATION OF OPERATIONAL SAFETY ASSESSMENT (OSA) METRICS FOR AUTOMATED VEHICLES IN SIMULATION 13
Conclusions and Future
Work
is work demonstrates t he capability of calculating the previ-
ously proposed OSA metrics through the use of simulation
and evaluates the sensitivity and interplay of these metrics for
variations in initial conditions and dierent scenarios. e
metric calculations were directly computed through the
CARLA outputs to produce graphical data that could
beanalyzed for further evaluation.
e following highlights the conclusions drawn from
evaluation of the scenarios presented in the context of this
paper and the OSA metrics renement proposed by this work:
e TTCV metric is unreliable in cases where there is no
relative velocity dierence between the lead and
following vehicles. If the lead and subject vehicles are in
close proximity with the same velocity, the TTCV does
not provide relevant safety information. e TTC metric
has been utilized in many human driver studies
providing a common reference metric, but the
experiments in this work demonstrated several pitfalls of
the metric. As such, the TTCV metric may only
berelevant within the context of an OSA methodology
for comparison purposes to previous trac engineering
research and naturalistic studies.
e MTTCV metric is sensitive to changes in the
accelerations of the vehicles involved in the simulated
scenarios. MTTC violations occurred frequently with
short durations at high threshold values, indicating high
sensitivity to any variation in the vehicle position, velocity,
and accelerations. As such, the MTTCV metric does not
provide reliable and robust notication of a collision and
may not bean adequate metric for the scenarios evaluated
in this paper. As a result, the MTTCV metric may not
berelevant for a nalized OSA metrics set.
e PETV metric may bemore applicable to intersection
scenarios; yet, overall is an ex post facto metric that in
many cases will not provide a continuous, real-time
measurement of a given situation and is more useful in
reactive assessments. erefore, PETV may not bean
adequate metric for the scenarios evaluated in this paper
and may not berelevant in the nalized OSA metrics set.
e THWV metric utilizes the relative distance of the
involved vehicles and the speed of the following vehicle;
however, it does not consider any information regarding
the dynamics of the lead vehicle. As a result, the THWV
metric can lead to confounding results depending on the
situation where the parameters of the lead vehicle can
play a major role in the context of the scenario (i.e., lead
vehicle stopped). erefore, THWV may not bean
adequate metric for the scenarios evaluated in this paper
and may not berelevant in the nalized OSA metrics set.
e MSDV metric is subject to the parameters used in its
formulation; however, research has been conducted in
order to propose sets of parameters that follow
naturalistic driving behavior from humans [15]. e use
of naturalistic studies and generalized vehicle dynamic
capabilities provides a more comprehensive metric
which incorporates the complexity of the physical
attributes of the vehicles. e MSDV formulation
accounts for “what if ” worst-case situations, making it
more robust to changes in the dynamics of the situation
without being an overly conservative measure; while the
time-based metrics are based solely on relative
measurements of vehicle motion including relative
position, velocity, and acceleration.
With respect to future work, there is a need for optimiza-
tion of the selection process of the thresholds/parameter sets
of the OSA metrics. A deeper understanding of violation
duration and temporal occurrence will allow for the selection
process to determine the optimal threshold or parameter set.
is is important for comparing the results for optimized
metrics, which will lead to renement and nalization of the
OSA metrics set.
Additional scenarios will also beevaluated in future
work. is paper focused on car-following scenarios; but the
use of the script and database to calculate the metrics for any
given scenario will a llow for others to beconsidered. Although
the current work examines scenarios composing approxi-
mately 18.4% of all light-duty vehicle collisions in the U.S.,
further simulation work should beconducted to consider
additional scenarios, such as intersections or lane changing
scenarios. e analysis of the additional scenarios will provide
further insight into the relative merits of the OSA metrics,
with the intent to further validate the set and remove any
metrics that are not necessary to the OSA methodology
under development.
To facilitate the OSA metrics set nalization, there is a
need to automate the process of calculating all applicable
metrics proposed within [3]with inputs of the required param-
eters. Although the CARLA script was modied to calculate
and output the safety metric results independently, a script
and database combination is planned to establish a repository
and accompanying methodology capable of calculating the
metrics from any simulation software that is capable of
outputting the necessary variables rather than limiting the
scope specically to CARLA.
One such soware that is planned to beused for future
work is Human, Vehicle, Environment (HVE) which is a
physics-based accident reconstruction soware traditionally
used in the evaluation of vehicle dynamics in collision
scenarios. Although CARLA provided a useful platform to
evaluate the discussed metrics, collision incidents were not
evaluated. CARLA is capable of providing the timing for colli-
sion incidents within the scenarios; however, one limitation
of CARLA is the inability of the soware to handle the colli-
sion and post-collision vehicle dynamics associated with an
TAB L E 6 Evaluation of OSA metrics based on
presented criteria
Criteria TTCV MTTCV PETV THWV MSDV
Robustness + + + + ++
Relevance - + - ++ ++
DSV Comparison - - - ++ +
© SAE International.
Downloaded from SAE International by Jeffrey Wishart, Monday, April 05, 2021
EVALUATION OF OPERATIONAL SAFETY ASSESSMENT (OSA) METRICS FOR AUTOMATED VEHICLES IN SIMULATION 14
impact. e simulation events were terminated at the initia-
tion of a collision event since the vehicles are not modeled to
a level of delity that is capable of accurately calculating the
physics of the vehicles including rotation, crush, and exit
velocities associated with the momentum and energy charac-
teristics of the collision. e proposed Collision Incident (CI)
metric in [3] utilizes the KABCO index to determine the
severity of the collision. e use of HVE could enhance the
severity quantication for any given scenario by providing
information such as delta-v, dissipated energy, and principle
direction of force (PDOF). is renement would provide
additional granularity to not only consider whether a collision
occurs but also understand the severity and dynamics of a
collision if one does occur.
When the OSA metrics have been nalized and validated,
the OSA methodology will bedeveloped as part of an overall
SCF. is will allow for a score to beassigned to the navigation
of any given scenario by an AV (or human-driven vehicle) that
will bea part of the AV safety case.
References
1. Safetyengineering.wordpress.com, “e Safety Engineering
Resource,” April 18, 2008, accessed Jan. 8, 2021.
2. Underwriters Laboratories (UL), ANSI/UL 4600- Standard
for Safety For the Evaluation of Autonomous Products, 2020.
3. Wishart, J., Como, S., Elli, M., Russo, B. et al., “Driving
Safety Performance Assessment Metrics for ADS-Equipped
Vehicles,” SAE Technical Paper 2020-01-1206, 2020. https://
doi.org/10.4271/2020-01-1206.
4. Dosovitskiy, A., Ros, G., Codevilla, F., Lopez, A., and Koltun,
V., “CARLA: An Open Urban Driving Simulator,” in 1st
Annual Conference on Robot Learning, 2 017.
5. “ScenarioRunner,” https://github.com/carla-simulator/
scenario_runner, accessed Nov. 1, 2020.
6. Wassim, N., Ranganathan, R., Srinivasan, G., Smith, J.,
Toma, S., Swanson, E., and Burgett, A., “Description of
Light-Vehicle to-Vehicle Communications for Safety
Applications Based on Vehicle-to-Vehicle Communications,”
National Highway Trac Safety Administration, Report No.
DOT HS 811 731, 2013.
7. Najm, W.G., Smith, J.D., and Yanagisawa, M., “Pre-Crash
Scenario Typology for Crash Avoidance Research,” National
Highway Trac Safety Administration, Report No. DOT-
VNTSC-NHTSA-06-02, 2007.
8. Shalev-Shwartz, S., Shammah, S., and Shashua, A., “On a
Formal Model of Safe and Scalable Self-Driving Cars,”
arXiv:1708 .0 6374, 2017.
9. Gettman, D. and Head, L., “Surrogate Safety Measures from
Trac Simulation Models,” 1840(1):104-115, 2003.
10. United Nations Economic Commission for Europe
(UNECE), “Proposal for a New UN Regulation on Uniform
Provisions Concerning the Approval of Vehicles with
Regards to Automated Lane Keeping System,” 2020, https://
undocs.org/ECE/TRANS/WP.29/2020/81.
11. Silberling, J., Wells, P., Acharya, A., Kelly, J., and Lenkeit, J.,
“Development and Application of a Collision Avoidance
Capability Metric,” SAE Technical Paper 2020-01-1207, 2020.
https://doi.org/10.4271/2020-01-1207.
12. Weng, B., Rao, S., Deosthale, E., Schnelle, S., and Barickman,
F., “Model Predictive Instantaneous Safety Metric for
Evaluation of Automated Driving Systems,” in IEEE
Intelligent Vehicles Symposium (IV), 2020.
13. Javed, M.A. and Khan, J.Y., “Performance Analysis of a Time
Headway Based Rate Control Algorithm for VANET Safety
Applications,” in in 7th International Conference on Signal
Processing and Communication Systems (ICSPCS), Carrara,
VIC , 2013.
14. Rodionova, A., Alvarez, I., Elli, M.S., Oboril, F., Quast, J.,
and Mangharam, R., “How Safe Is Safe Enough? Automatic
Safety Constraints Boundary Estimation for Decision-
Making in Automated Vehicles,” in in IEEE Intelligent
Vehicles Symposium (IV), 2020.
15. China Intelligent Transportation Systems (ITS) Alliance,
“Safety Assurance Technical Requirements for Decision-
Making on Autonomous Vehicles,” C-ITS Alliance
Report, 2020.
16. ASAM , “OpenScenario,” https://ww w.asam.net/standards/
detail/openscenario/, accessed Jan. 11, 2020.
17. Gassmann, B., Oboril, F., Buerkle, C., Liu, S. et al., “Towards
Standardization of AV Safety: C++ Library for Responsibility
Sensitive Safety,” in IEEE Intelligent Vehicles Symposium
(IV), 2019.
18. Gassmann, B., Pasch, F., Oboril, F., and Scholl, K.-U.,
“Integration of Formal Safety Models on System Level Using
the Example of Responsibility Sensitive Safety and CARLA
Driving Simulator,” in International Conference on Computer
Safety, Reliability, and Security, 2020.
19. Mazzae, E.N., Barickman, F.S., Forkenbrock, G., and
Baldwin, G.H., “NHTSA Light Vehicle Antilock Brake
System Research Program Task 5.2/5.3: Test Track
Examination of Drivers’ Collision Avoidance Behavior Using
Conventional and Antilock Brakes,” DOT HS809, 2003.
20. Balas, V.E. and Balas, M.M., Driver Assisting by Inverse Time
to Collision (World Automation Congress (WAC):
Budapest, 2006).
Acknowledgement
is work was made possible by the generous contributions
and funding provided by the Institute for Automated Mobility
(IAM). e authors would like to thank the IAM for its
continued support in advancing research surrounding
vehicle automation.
Definitions/Abbreviations
ABC - Achieved Behavioral Competency
AD - Aggressive Driving
ADS - Automated Driving System
ADSA - ADS Active
Downloaded from SAE International by Jeffrey Wishart, Monday, April 05, 2021
EVALUATION OF OPERATIONAL SAFETY ASSESSMENT (OSA) METRICS FOR AUTOMATED VEHICLES IN SIMULATION 15
AEB - Automatic Emergency Braking
AHJ - Authority Having Jurisdiction
ALKS - Automated Lane Keeping System
AV - ADS-Equipped Vehicle
CAC - Collision Avoidance Capability
CARLA - Car Learning to Act
CI - Collision Incident
HTCDER - Human Trac Control Detection Error Rate
HTCVR - Human Trac Control Violation Rate
IAM - Institute of Automated Mobility
MPrISM - Model Predictive Instantenous Safety Metric
MSD - Minimum Safe Distance
MSDCE - Minimum Safe Distance Calculation Error
MSDF - Minimum Safe Distance Factor
MSDV - Minimum Safe Distance Violation
MTTC - Modied Time-to-Collision
MTTCV - Modied Time-to-Collision Violation
NHTSA - National Highway Trac Safety Administration
ORAD - On-Road Automated Driving
OSA - Operational Safety Assessment
PET - Post Encroachment Time
PETV - Post Encroachment Time Violation
PRA - Proper Response Action
RSS - Responsibility-Sensitive Safety
SCF - Safety Case Framework
THW - Time Headway
THWV - Time Headway Violation
TLW - Trac Law Violation
TTC - Time-to-Collision
TTCV - Time-to-Collision Violation
V&V - Verication and Validation
Downloaded from SAE International by Jeffrey Wishart, Monday, April 05, 2021
EVALUATION OF OPERATIONAL SAFETY ASSESSMENT (OSA) METRICS FOR AUTOMATED VEHICLES IN SIMULATION 16
Appendix A Minimum Longitudinal and Lateral
Distances of MSDV (from [])
d
va
va
long
long
accel
long
long
ac
min
,max,
,max,
=
+
++
1111
2
111
1
2
ρρ
ρ
ccel
long
decel
long
long
decel
long
a
v
a
()
()
2
1
2
2
2
2
2
,min,
,max,
+
(7)
Where the subject vehicle (subscript 1) is following behind another entity (subscript 2) and both are moving in the same
direction, and [x]+max{x, 0}.
d
va
va
lt
lataccel
lat
lataccel
min
,max,
,max,
a=+
+
++
µ
2
2
111
1
111
ρρ
ρ
ll at
decel
lat
lataccel
lat
lat
a
va
v
()
2
1
222
2
2
2
2
2
,min,
,max,
ρρ
ρρ
22
2
2
2
a
a
accel
lat
decel
lat
,max,
,min,
()
+
(8)
Where the subject vehicle (subscript 1) is to the le of the other entity (subscript 2),
dlat
min
is the distance between the right
side of the ego vehicle and the le side of the other entity, and μ is a lateral uctuation margin [m] , and [x]+ma x{x, 0}.Appendix
B: Pre-Crash Scenario Category Summary
Appendix C Temporal Occurrence Results
TAB L E 7 Pre-crash scenario categories and descriptions for two-vehicle crashes from [6]
Scenario Category Description
Scenario
Category ID
Proportion of
Collisions1
Lead Vehicle Stopped Subject vehicle is traveling straight in an urban area, in daylight, under
clear weather conditions, at an intersection-related location with a
posted speed limit of 35 mph and approaches a stopped lead vehicle.
LVS 10.2%
Lead Vehicle Decelerating Subject vehicle is traveling straight and following a lead vehicle in a rural
area, in daylight, under clear weather conditions, at a non-junction with a
posted speed limit of 55 mph or more, and the lead vehicle suddenly
decelerates.
LVD 4.2%
Lead Vehicle Moving at Lower
Constant Speed
Subject vehicle is traveling straight in an urban area, in daylight, under
clear weather conditions, at a non-junction with a posted speed limit of
55 mph or more; and approaches a lead vehicle moving at lower
constant speed.
LVMLCS 3.7%
Lead Vehicle Accelerating Subject vehicle is traveling straight in an urban area, in daylight, under
clear weather conditions, at an intersection-related location with a
posted speed limit of 45 mph and approaches an accelerating lead
vehicle.
LVA 0.3%
Total: 18.4%
* The relative frequency is based on the collision statistics from Table 7 in [6].
© SAE International.
Downloaded from SAE International by Jeffrey Wishart, Monday, April 05, 2021
© 2021 SAE I nternational. Al l rights reserve d. No part of this p ublication may be re produced, store d in a retrieval system , or transmitted , in any form or by any mean s,
electronic, me chanic al, photo copying, recording, or other wise, without the prior writ ten permission of SAE International.
Positions and opinions adva nced in this work are those of the author(s) and not necessarily those of SAE International. Responsibility for the content of the work lies
solely with the author(s).
ISSN 0148-7191
17
EVALUATION OF OPERATIONAL SAFETY ASSESSMENT (OSA) METRICS FOR AUTOMATED VEHICLES IN SIMULATION
TAB L E 8 Temporal occurrence results of metrics violations at varying thresholds across all scenarios*
Metric Threshold
# Violations
Prior to DSV
at 5 m/s^2
Avg. Time
Dierence of
Violation and DSV
at 5 m/s2 [s]
# Violations
Between DSV at
5 and 8.3 m/s2
Avg. Time Dierence
of Violation and DSV
at 8.3 m/s2 [s]
# Violations
After DSV at
8.3 m/s2
Avg. Time
Dierence of
Violation and
Collision [s]
MDSV Aggressive 13 1.27 0 - 0 -
NDS 9 0.58 4 2.39 0 -
Conservative 13 7.6 5 0 - 0 -
TTCV 1 s 0 - 3 0.20 10 0.91
2 s 4 0.48 1 0.20 8 1.73
3 s 4 1.36 3 0.43 6 2.20
4 s 4 2.21 3 1.28 6 2.67
5 s 8 2.03 20.93 52.76
MTTCV 1 s 2 1.75 40.26 10 0.94
2 s 6 0.97 1 0.85 8 1.84
3 s 14 6.16 7 0.58 7 2.54
4 s 23 4.72 31.17 74.61
5 s 22 5.57 9 1.02 5 6.14
THWV 1 s 2 0.75 60.51 5 2.68
2 s 14 2.43 0 - 0 -
3 s 13 4.63 0 - 0 -
4 s 13 5.51 0 - 0 -
5 s 13 6.09 0 - 0 -
PETV 0.5 s 0 - 0 - 13 3.25
1 s 2 0.45 10.95 10 1.33
1.2 s 2 0.55 11.95 10 1.70
1.5 s 3 0.58 5 0 .74 50.13
2 s 8 2.66 2 1.35 40.03
*Note that the threshold/parameter sets selected for the scenario analysis are in bold
© SAE International.
Downloaded from SAE International by Jeffrey Wishart, Monday, April 05, 2021
... The position, velocity, and acceleration information obtained from tracking algorithms can be used to evaluate safety metrics between pairs of entities within a traffic intersection [25]. Elli et al. [26] utilized the CARLA simulator [27] to extract this information and validated the DA metrics proposed in [2], thereby demonstrating their robustness and relevance over conventional counterparts. Jammula et al. [4] collected camera videos mounted on the infrastructure to track vehicles from a BEV perspective and calculated real-world DA metrics. ...
... The parameter values for our metric calculations are derived from the analysis presented by Elli et al. in [26]. For both vehicle-to-vehicle and VRU-to-vehicle interactions, we assume Naturalistic Driving Study long l min deccel a m s It must be noted that these parameters were originally presented for the analysis of vehicleto-vehicle interactions. ...
... While both metrics proved effective in our analysis, we empirically found MDSE to be more versatile for practical applications due to its customizable parameters, which allow for fine-tuning the metric to meet traffic-specific requirements. For example, using conservative parameters [26] for MDSE calculations identified both situations shown in Figure 3 as unsafe, whereas setting an appropriate PET threshold to classify both situations as unsafe was challenging. ...
Conference Paper
Full-text available
div class="section abstract"> Ensuring the safety of vulnerable road users (VRUs) such as pedestrians, users of micro-mobility vehicles, and cyclists is imperative for the commercialization of automated vehicles (AVs) in urban traffic scenarios. City traffic intersections are of particular concern due to the precarious situations VRUs often encounter when navigating these locations, primarily because of the unpredictable nature of urban traffic. Earlier work from the Institute of Automated Vehicles (IAM) has developed and evaluated Driving Assessment (DA) metrics for analyzing car following scenarios. In this work, we extend those evaluations to an urban traffic intersection testbed located in downtown Tempe, Arizona. A multimodal infrastructure sensor setup, comprising a high-density, 128-channel LiDAR and a 720p RGB camera, was employed to collect data during the dusk period, with the objective of capturing data during the transition from daylight to night. In this study, we present and empirically assess the benefits of high-density LiDAR in low-light and dark conditions—a persistent challenge in VRU detection when compared to traditional RGB traffic cameras. Robust detection and tracking algorithms were utilized for analyzing VRU-to-vehicle and vehicle-to-vehicle interactions using the LiDAR data. The analysis explores the effectiveness of two DA metrics based on the i.e. Post Encroachment Time (PET) and Minimum Distance Safety Envelope (MDSE) formulations in identifying potentially unsafe scenarios for VRUs at the Tempe intersection. The codebase for the data pipeline, along with the high-density LiDAR dataset, has been open-sourced with the goal of benefiting the AV research community in the development of new methods for ensuring safety at urban traffic intersections. </div
... These metrics form the cornerstone for the evaluation process, along with the development of the OSA Methodology [3] by the IAM, slated for inclusion in future standards documentation. Building upon this foundation, in 2021 and 2022, IAM researchers measured and evaluated a subset of the OSA metrics in simulation [4] as well as using real-world data collected at an intersection [5,6]. The focus is on safety envelope-type metrics within selected car-following scenarios from the list of pre-crash scenarios published by the National Highway Traffic Safety Administration (NHTSA) [7]. ...
... Based on these prior works, in this paper, we developed a method to automatically detect leader-follower vehicle pairs using a computer program, which enables us to process more than 1.2 million data samples from 5,433 pairs of vehicles efficiently. This validation and analysis work is a follow-up to [4,5], but our analysis delves into the distribution of a few driving safety metrics across various real-world car-following scenarios under a wide range of different scenes rather than one or a few scenes with limited data points. Moreover, our metrics measurement data shed light on how different parameters and thresholds impact these metrics. ...
... To extract leader-follower pairs from vehicle trajectories, we developed a method to detect such pairs from the data and the map automatically. Different from our prior works [4,5] with one or a few limited scenes, we desire to develop a general method that can work for any scene. Our method consists of four steps, which are detailed as follows. ...
Conference Paper
Full-text available
Data-driven driving safety assessment is crucial in understanding the insights of traffic accidents caused by dangerous driving behaviors. Meanwhile, quantifying driving safety through well-defined metrics in real-world naturalistic driving data is also an important step for the operational safety assessment of automated vehicles (AV). However, the lack of flexible data acquisition methods and fine-grained datasets has hindered progress in this critical area. In response to this challenge, we propose a novel dataset for driving safety metrics analysis specifically tailored to car-following situations. Leveraging state-of-the-art Artificial Intelligence (AI) technology, we employ drones to capture high-resolution video data at 12 traffic scenes in the Phoenix metropolitan area. After that, we developed advanced computer vision algorithms and semantically annotated maps to extract precise vehicle trajectories and leader-follower relations among vehicles. These components, in conjunction with a set of defined metrics based on our prior work on Operational Safety Assessment (OSA) by the Institute of Automated Mobility (IAM), allow us to conduct a detailed analysis of driving safety. Our results reveal the distribution of these metrics under various real-world car-following scenarios and characterize the impact of different parameters and thresholds in the metrics. By enabling a data-driven approach to address driving safety in car-following scenarios, our work can empower traffic operators and policymakers to make informed decisions and contribute to a safer, more efficient future for road transportation systems.
... The constructive diversity analysis is expected to capture the formulation discrepancies and to classify metrics into a certain finite number of categories by the constructive nature. Some recent surveys [32], [33] distinguish metrics by the output units (e.g., distance-based metrics and timebased metrics). Such an intuitive classification criterion fails to capture the fundamental constructive property of a metric and is not compatible with the broader spectrum of metrics studied in this paper (e.g., CI (defined later in Appendix B) maps to an output with unit m 2 /s 3 ). ...
... Overall, the metrics considered in this paper are all leading measures except for AD, AM, FMRI, CR,εα-ASS, Jerk, GT, and THW, which are lagging measures, as defined in [35] and [47]. If classified by the output unit, as in [32] and [33], among the 33 base metrics, eight metrics are timebased, and two metrics are distance-based in their original proposal. While the aforementioned features capture some of the differences among metrics, it is difficult to justify the acceptability of a certain metric by the way they are classified and compared in the literature. ...
... The selected metrics in this paper cover a broad spectrum of various related works in the literature. One can refer to other summarized reports and surveys on this topic [26], [32], [33], [35], [36], [86], [87], [88], [89] for other metrics in the literature. Note that this paper focuses on applying metrics to ADAS or ADS specific applications. ...
Article
Full-text available
Vehicle performance metrics analyze data sets consisting of subject vehicle’s interactions with other road users in a nominal driving environment and provide certain performance measures as outputs. To the best of the authors’ knowledge, the vehicle safety performance metrics research dates back to at least 1967. To date, there still does not exist a community-wide accepted metric or a set of metrics for vehicle safety performance assessment and justification. This issue gets further amplified with the evolving interest in Advanced Driver Assistance Systems and Automated Driving Systems. In this paper, the authors seek to perform a unified study that facilitates an improved community-wide understanding of vehicle performance metrics using the lead-vehicle interaction operational design domain as a common means of performance comparison. In particular, the authors study the diversity (including constructive formulation discrepancies and empirical performance differences) among 33 base metrics with up to 51 metric variants (with different choices of hyper-parameters) in the existing literature, published between 1967 and 2022. Two data sets are adopted for the empirical performance diversity analysis, including vehicle trajectories from normal highway driving environment and relatively high-risk incidents with collisions and near-miss cases. The analysis further implies that (i) the conceptual acceptance of a safety metric proposal can be problematic if the assumptions, conditions, and types of outcome assurance are not justified properly, and (ii) the empirical performance justification of an acceptable metric can also be problematic as a dominant consensus is not observed among metrics empirically.
... The constructive diversity analysis is expected to capture the formulation discrepancies and to classify metrics into a certain finite number of categories by the constructive nature. Some recent surveys [32], [33] distinguish metrics by the output units (e.g., distance-based metrics and timebased metrics). Such an intuitive classification criterion fails to capture the fundamental constructive property of a metric and is not compatible with the broader spectrum of metrics studied in this paper (e.g., CI (defined later in Appendix B) maps to an output with unit m 2 /s 3 ). ...
... Overall, the metrics considered in this paper are all leading measures except for AD, AM, FMRI, CR,εα-ASS, Jerk, GT, and THW, which are lagging measures, as defined in [47], [35]. If classified by the output unit, as in [32], [33], among the 33 base metrics, eight metrics are time-based, and two metrics are distance-based in their original proposal. While the aforementioned features capture some of the differences among metrics, it is difficult to justify the acceptability of a certain metric by the way they are classified and compared in the literature. ...
... The selected metrics in this paper cover a broad spectrum of various related works in the literature. One can refer to other summarized reports and surveys on this topic [35], [26], [86], [87], [32], [33], [88], [36], [89] for other metrics in the literature. Note that this paper focuses on applying metrics to ADAS or ADS specific applications. ...
Preprint
Vehicle performance metrics analyze data sets consisting of subject vehicle's interactions with other road users in a nominal driving environment and provide certain performance measures as outputs. To the best of the authors' knowledge, the vehicle safety performance metrics research dates back to at least 1967. To date, there still does not exist a community-wide accepted metric or a set of metrics for vehicle safety performance assessment and justification. This issue gets further amplified with the evolving interest in Advanced Driver Assistance Systems and Automated Driving Systems. In this paper, the authors seek to perform a unified study that facilitates an improved community-wide understanding of vehicle performance metrics using the lead-vehicle interaction operational design domain as a common means of performance comparison. In particular, the authors study the diversity (including constructive formulation discrepancies and empirical performance differences) among 33 base metrics with up to 51 metric variants (with different choices of hyper-parameters) in the existing literature, published between 1967 and 2022. Two data sets are adopted for the empirical performance diversity analysis, including vehicle trajectories from normal highway driving environment and relatively high-risk incidents with collisions and near-miss cases. The analysis further implies that (i) the conceptual acceptance of a safety metric proposal can be problematic if the assumptions, conditions, and types of outcome assurance are not justified properly, and (ii) the empirical performance justification of an acceptable metric can also be problematic as a dominant consensus is not observed among metrics empirically.
... The Test Methodology is the main focus of this paper. Specifically, this paper proposes a test selection and scoring methodology (TSSM), derived, in part from work first presented in one author's (O'Malley) Master's thesis [19] and building upon previous work by the SFAz and IAM [20], which is an algorithmic structure that defines a universally applicable, iterative testing methodology for evaluating AV driving performance by providing the following: Technology refined and adopted: ADS-equipped vehicle commercially deployed (and may continue to be modified). ...
... One other paper details strategies for measuring the severity of metrics violations, and another tests the metrics for robustness to parameter and measurement uncertainty [22] [23]. A validation of the DA metrics using both simulated and real-world data are also provided by the group [20] [24]. ...
Conference Paper
Full-text available
div class="section abstract"> Effectively determining automated driving system (ADS)-equipped vehicle (AV) safety without relying on testing an infeasibly large number of driving scenarios is a challenge with wide recognition in industry and academia. The following paper builds on previous work by the Institute of Automated Mobility (IAM) and Science Foundation Arizona (SFAz), and proposes a test selection and scoring methodology (TSSM) as part of a safety case-based framework being developed by the SFAz to ensure the safety of AVs while addressing the scenario testing challenge. The TSSM is an AV verification and validation (V&V) process that relies, in part, on iterative, partially random generation of AV driving scenarios. These scenarios are generated using an operational design domain (ODD) and behavioral competency portfolio, which expresses the vehicle ODD and behavioral competencies in terms of quantifiable amounts or intensities of discrete components. Once generated, these scenarios are subjected to filters based on their relevance to the AV ODD and behavioral competency portfolio that preserves the robustness of the generated test set; after filtration, scenarios are assigned to a test method and executed. Further, these scenarios may be generated entirely by the TSSM or may be drawn from a preexisting scenario database and subjected to the same filtration process. After the scenarios assembled by the TSSM are executed, the methodology aggregates their driving assessment (DA) scores into a single numerical value. We outline the overall safety case-based framework, the TSSM, including its role in the framework as well as planned future work, and outline two proofs of concept: (1) a demonstration of the ability of the TSSM to pare down the space of scenarios in a scenario database; and (2) a specification form which may be used to solicit a description of the AV ODD and behavioral competency portfolio from the AV developer. </div
... As future vehicle systems will no longer only assist the human driver as an ADAS does but will also make sovereign decisions and perform actions as we expect from an ADS, the parallel development of the testing and validation methodology is a fundamental task. For this reason, the Operational Design Domain (ODD) determined during the design process strongly influences the circumstances of the investigated automated function and significantly affects the conditions of the tests [56], [57], [58]. ...
Article
Full-text available
Innovative testing and validation methods are prerequisites concerning Connected, Cooperative, and Automated Mobility (CCAM) as the high number of cooperating participants and concurrent processes critically increase the probability of adverse safety and security incidents. The proposed new approaches deal with this increasing complexity of not currently having generally accepted validation mechanisms. The paper introduces a novel, mathematical model based, scenario identification methodology, facilitating the selection of critical road vehicle traffic scenarios, taking into account different testing objectives, such as maximizing the safety risk of the analyzed system. The presented results verify that applying specific decision models and quantifiable indicators related to the system elements of highly automated mobility systems can significantly contribute to the systematic identification of unsafe corner cases in connected and cooperative autonomous systems.
... A low TTC or PET generally indicates unsafe driving behavior. Moreover, we can also "re-simulate" the motion of vehicles using our data, and then probe the safety envelope by changing the physical properties of the vehicle [61]. ...
Preprint
Full-text available
Road traffic scene reconstruction from videos has been desirable by road safety regulators, city planners, researchers, and autonomous driving technology developers. However, it is expensive and unnecessary to cover every mile of the road with cameras mounted on the road infrastructure. This paper presents a method that can process aerial videos to vehicle trajectory data so that a traffic scene can be automatically reconstructed and accurately re-simulated using computers. On average, the vehicle localization error is about 0.1 m to 0.3 m using a consumer-grade drone flying at 120 meters. This project also compiles a dataset of 50 reconstructed road traffic scenes from about 100 hours of aerial videos to enable various downstream traffic analysis applications and facilitate further road traffic related research. The dataset is available at https://github.com/duolu/CAROM.
Preprint
Full-text available
In order to ensure autonomous vehicles are safe for on-road deployment, simulation-based testing has become an integral complement to on-road testing. The rise in simulation testing and validation reflects a growing need to verify that AV behavior is consistent with desired outcomes even in edge case scenarios $-$ which may seldom or never appear in on-road testing data. This raises a critical question: to what extent are AV failures in simulation consistent with data collected from real-world testing? As a result of the gap between simulated and real sensor data (sim-to-real gap), failures in simulation can either be spurious (simulation- or simulator-specific issues) or relevant (safety-critical AV system issues). One possible method for validating if simulated time series failures are consistent with real world time series sensor data could involve retrieving instances of the failure scenario from a real-world time series dataset, in order to understand AV performance in these scenarios. Adopting this strategy, we propose a formal definition of what constitutes a match between a real-world labeled time series data item and a simulated scenario written from a fragment of the Scenic probabilistic programming language for simulation generation. With this definition of a match, we develop a querying algorithm that identifies the subset of a labeled time series dataset matching a given scenario. To allow this approach to be used to verify the safety of other cyber-physical systems (CPS), we present a definition and algorithm for matching scalable beyond the autonomous vehicles domain. Experiments demonstrate the precision and scalability of the algorithm for a set of challenging and uncommon time series scenarios identified from the nuScenes autonomous driving dataset. We include a full system implementation of the querying algorithm freely available for use across a wide range of CPS.
Preprint
Unfortunately, many people die in car accidents. To reduce these accidents, cars are equipped with driving safety systems. With autonomous vehicles, the driver's behavior becomes irrelevant as the car drives autonomously. All autonomous driving algorithms must undergo extensive testing and validation, especially for safety-critical scenarios. Therefore, the detection of safety-critical driving scenarios is essential for autonomous vehicles. This publication describes safety indicator metrics based on time series covering longitudinal driving data to detect safety-critical driving scenarios.
Conference Paper
Full-text available
div class="section abstract"> The driving safety performance of automated driving system (ADS)-equipped vehicles (AVs) must be quantified using metrics in order to be able to assess the driving safety performance and compare it to that of human-driven vehicles. In this research, driving safety performance metrics and methods for the measurement and analysis of said metrics are defined and/or developed. A comprehensive literature review of metrics that have been proposed for measuring the driving safety performance of both human-driven vehicles and AVs was conducted. A list of proposed metrics, including novel contributions to the literature, that collectively, quantitatively describe the driving safety performance of an AV was then compiled, including proximal surrogate indicators, driving behaviors, and rules-of-the-road violations. These metrics, which include metrics from on- and off-board data sources, allow the driving safety performance of an AV to be measured in a variety of situations, including crashes, potential conflicts, and near misses. These measurements enable the evaluation of temporal flows and the quantification of key aspects of driving safety performance. The identification and exploration of metrics focusing explicitly on AVs as well as proposing a comprehensive set of metrics is a unique contribution to the literature. The objective is to develop a concise set of metrics that allow driving safety performance assessments to be effectively made and that align with the needs of both the ADS development and transportation engineering communities and accommodate differences in cultural/regional norms. Concurrent project work includes equipping an intersection with a sensor suite of cameras, LIDAR, and RADAR to collect data requiring off-board sources and employing test AVs to collect data requiring on-board sources. Additional concurrent work includes development of artificial intelligence and computer vision-based algorithms to automatically calculate the metrics using the collected data. Future work includes using the collected data and algorithms to finalize the list of metrics and then develop a methodology that uses the metrics to provide an overall driving safety performance assessment score for an AV. </div
Conference Paper
Full-text available
Vehicular ad hoc network is considered as an integral part of the future intelligent transportation system. As the number of applications which are supported by the vehicular communication grow, the efficient utilization of the control channel and congestion control become important issues. The periodic broadcast of basic safety messages (BSM) by the vehicles consumes most of the control channel interval, leaving less transmission capacity for other types of traffic. To accommodate the data packets from other safety and non-safety applications, the packet transmission rate of BSM must be controlled without compromising the safety of the vehicles. In this paper, we present a BSM generation rate control algorithm based on the measured time headway of the vehicles. The performance analysis shows that the proposed rate control algorithm reduces the channel utilization and improves the BSM reception ratio at different vehicle densities and vehicle speeds. Moreover, the proposed rate control algorithm effectively reduces the notification time of a multi-hop warning message.
Conference Paper
Full-text available
The paper is proposing a specific indicator, the inverse time to collision TTC<sup>-1</sup>, useful when analyzing the highway traffic. The advantage of TTC<sup>-1</sup> vs. TTC is a direct and continuous dependence with the collision risk. TTC<sup>-1</sup> could be used as an input in car following algorithms. Because the automate driving is yet in a research stage, a feasible application for TTC<sup>-1</sup> would be rather assisting the driver of the following car at the choice of the distance gap towards the first car.
Conference Paper
The need for safety in Automated Driving (AD) is becoming increasingly critical with the accelerating deployment of this technology. Beyond functional safety, industry must guarantee the operational safety of automated vehicles. Towards that end, Mobileye introduced the Responsibility Sensitive Safety (RSS), a model-based approach to Safety [1]. In this paper we expand upon this work introducing the C ++ Library for Responsibility Sensitive Safety, an open source executable that implements a subset of RSS. We provide architectural details to integrate the C ++ Library for Responsibility Sensitive Safety with AD Software pipelines as safety module overseeing decision making of driving policies. We illustrate this application with an example integration with the Baidu Apollo AD stack and simulator, [2] and [3], that provides safety validation of the planning module. Furthermore, we show how the C ++ Library for Responsibility Sensitive Safety can be used to explore the usefulness of the RSS model through parameter exploration and analysis on minimum safe longitudinal distance, (dmin), considering different weather conditions. We also compare these results with half-of-speed rule followed in some parts of the world. We expect that the C ++ Library for Responsibility Sensitive Safety becomes a critical component of future tools for formal verification, testing and validation of AD safety and that it helps bootstrap the AD research efforts towards standardization of safety.
Article
Safety is emerging as an area of increased attention and awareness within transportation engineering. Historically, the safety of new and innovative traffic treatments has been difficult to assess, primarily because of a lack of good predictive models of crash potential and a lack of consensus on what constitutes a safe or unsafe facility. An FHWA-sponsored research project investigated the potential to derive surrogate measures of safety from existing traffic simulation models. These surrogate measures could then be used to support evaluations of various traffic engineering alternatives, including facilities that have not yet been built and strategies that have not yet been used. Each surrogate measure is collected on the basis of the occurrence of a conflict event, which is an interaction between two vehicles in which one vehicle must take evasive action to avoid a collision. The surrogate measures that are proposed as the best are time to collision, postencroachment time, deceleration rate, maximum speed, and speed differential. Time to collision, postencroachment time, and deceleration rate can be used to measure the severity of the conflict. Maximum speed and the speed differential can be used to measure the severity of the potential collision (by use of additional information about the mass of the vehicles involved to assess momentum). After the simulation model is executed for a number of iterations, a postprocessing tool would be used to compute the statistics for the various measures and perform comparisons between design alternatives.
The Safety Engineering Resource
  • Wordpress Safetyengineering
  • Com
Safetyengineering.wordpress.com, "The Safety Engineering Resource," April 18, 2008, accessed Jan. 8, 2021.
CARLA: An Open Urban Driving Simulator
  • A Dosovitskiy
  • G Ros
  • F Codevilla
  • A Lopez
  • V Koltun
Dosovitskiy, A., Ros, G., Codevilla, F., Lopez, A., and Koltun, V., "CARLA: An Open Urban Driving Simulator," in 1st Annual Conference on Robot Learning, 2017.