Conference PaperPDF Available

A Proposed Safety Case Framework for Automated Vehicle Safety Evaluation

Authors:

Abstract and Figures

Automated driving system (ADS)-equipped vehicles (AVs) are currently being developed and deployed on public roads. While the benefits to public safety (among other potential benefits) are promising, the risk that AVs pose to public safety is not yet understood. The Automated Vehicle Test and Evaluation Process (AV-TEP) mission initiated by Science Foundation Arizona in collaboration with Arizona State University is intended to provide a framework for an AV developer or third-party evaluator (such as a regulator) to use in order to provide evidence that the AV is safe for its intended implementation. The AV-TEP framework uses the safety case concept that is widely accepted in the AV industry and others, and consists of three pillars: (1) Safety Management System (SMS), (2) Design Methods, and (3) Scenario-Based Testing. The three pillars are described in detail, and the validation methodology used is outlined. Finally, the various paths to industry acceptance of the AV-TEP framework are described, which builds upon existing work and research wherever appropriate.
Content may be subject to copyright.
A Proposed Safety Case Framework for Automated
Vehicle Safety Evaluation
Jeffrey Wishart, PhD
Science Foundation Arizona/
Arizona Commerce Authority
Phoenix, USA
jeffw@azcommerce.com
Junfeng Zhao, PhD
The Polytechnic School
Arizona State University
Mesa, USA
junfeng.zhao@asu.edu
Braeden Woodard
The Polytechnic School
Arizona State University
Mesa, USA
bmwoodar@asu.edu
Gavin O’Malley
The Polytechnic School
Arizona State University
Mesa, USA
gpomalle@asu.edu
Hencong Guo
The Polytechnic School
Arizona State University
Mesa, USA
hguo63@asu.edu
Shujauddin Rahimi
The Polytechnic School
Arizona State University
Mesa, USA
srahimi8@asu.edu
Sunder Swaminathan
The Polytechnic School
Arizona State University
Mesa, USA
sswami21@asu.edu
Abstract—Automated driving system (ADS)-equipped vehicles
(AVs) are currently being developed and deployed on public
roads. While the benefits to public safety (among other potential
benefits) are promising, the risk that AVs pose to public safety is
not yet understood. The Automated Vehicle Test and
Evaluation Process (AV-TEP) mission initiated by Science
Foundation Arizona in collaboration with Arizona State
University is intended to provide a framework for an AV
developer or third-party evaluator (such as a regulator) to
use in order to provide evidence that the AV is safe for its
intended implementation. The AV-TEP framework uses the
safety case concept that is widely accepted in the AV
industry and others, and consists of three pillars: (1) Safety
Management System (SMS), (2) Design Methods, and (3)
Scenario-Based Testing. The three pillars are described in
detail, and the validation methodology used is outlined.
Finally, the various paths to industry acceptance of the AV-
TEP framework are described, which builds upon existing
work and research wherever appropriate.
Keywords—Automated Driving System (ADS), Safety
Case Framework (SCF), Driving Safety Assessment (DSA),
scenario-based testing, metrics, safety management system,
digital twin, Verification and Validation (V&V)
I. INTRODUCTION
Automated driving system (ADS)-equipped vehicles (AVs)
are currently being developed by new and existing automotive
industry players. Some of these AVs are being deployed on
public roads already, and this is cause for concern. AVs have
1 The other benefits are enhanced traffic throughput, increased driving
efficiency, and enhanced mobility for mobility-challenged
populations.
great potential for reduction in collisions (both frequency and
severity), among other benefits.1 However, this purported safety
benefit should be proven and not assumed. There exists an
urgent need for a clear and consistent process for AV safety
evaluation. The Automated Vehicle Test and Evaluation Process
(AV-TEP) mission was initiated by Science Foundation Arizona
(SFAz) and Arizona State University (ASU) to provide a process
that can be used by AV developers and third-party evaluators
(including regulators). The AV-TEP mission builds on existing
work and leverages existing best practices, standards, and
regulations wherever possible. The objective of the AV-TEP
mission is to develop a process that is widely accepted in the
industry and that can be developed into standards and
regulations. As such, feedback has been solicited from an
Advisory Board as well as outside organizations to ensure that
there is sufficient stakeholder engagement and agreement on the
approach as the mission progresses.
A. Safety Case Definition
The AV-TEP framework uses a safety case-based approach.
This approach has been used in other safety-critical areas such
as the nuclear and aviation industries. There is a growing
consensus to adopt it within the AV industry. In the context of
AVs, a safety case is a reasoned argument, supported by
evidence, intended to justify that an AV is acceptably safe for
deployment either in a specific operating environment or for a
particular application. A safety case is expected to offer a
structured and systematic approach to identify and mitigate
potential hazards and risks associated with and presented
throughout the lifecycle of an AV. It should also include
This full-text paper was peer-reviewed at the direction of IEEE Instrumentation and Measurement Society prior to the acceptance and publication.
essential elements, such as design and operational principles, as
well as demonstrate compliance with relevant regulations,
standards, and industry best practices.
The National Highway Traffic Safety Administration
(NHTSA) adopted a safety case framework (SCF) through its
Voluntary Safety Self-Assessment (VSSA) [1] (that included
12 “pillars” that were topics that NHTSA recommended be
addressed in the VSSA of an AV developer) as well as the
Advanced Notice of Proposed Rulemaking (ANPRM) [2] that
proposed an SCF with UL 4600 [3], ISO 26262 [4], and ISO
21448 [5] standards as the basis. The AV-TEP mission employs
and expands on this approach.
II. SAFETY CASE CONSTRUCTION
The subsequent sections detail the proposed AV-TEP
framework for constructing an AV safety case, which is
organized into three fundamental pillars, namely Safety
Management System (SMS), Design Methods, and Scenario-
Based Testing, as depicted in Error! Reference source not
found.. The AV-TEP SCF leverages guidance from government
organizations, best practices from safety-critical industries,
voluntary industry standards, and incorporates insights from
academic research, and integrates lessons learned from the
organization’s own work. This safety case has adopted UL4600
as a fundamental standard, considering its wide acceptance and
high regard. However, the standard allows for interpretation by
developers and implementers regarding crucial elements. The
AV-TEP mission objective is to fill the gaps left by these
standards, most significantly a full methodology for scenario-
based testing, and the needs of developers by creating an
actionable process that AV developers can implement to
facilitate the construction of a safety case for their AVs. The
proposed AV-TEP safety case focuses on AV driving safety
performance with attention to safety engineering. This safety
case is not comprehensive and does not cover all aspects of AV
operational safety such as cybersecurity, vehicle maintenance,
passenger-initiated emergency stops, etc.
A. Safety Management System (SMS) Pillar
An SMS is an approach designed to systematically and
comprehensively support organizational safety by utilizing a
combination of safety principles, processes, and practices to
enhance organizational decisions based on safety risk. This
process involves identifying potential safety hazards,
evaluating and managing safety risks, and implementing
controls and mitigations to address those safety risks. Various
safety-sensitive industries, including but not limited to nuclear
energy, oil and gas, healthcare, chemical, defense, space, and
aviation, have implemented numerous variations of an SMS to
effectively manage safety risks. While many government
agencies have mandated the implementation of an SMS within
the aforementioned industries, there are currently no
established regulations or standards that define a minimum
acceptable level of risk for AVs. Consequently, regulators have
delegated the responsibility of identifying and managing safety
risks to the AV developers. The proposed SMS pillar of the AV-
TEP SCF aims to aid developers in fulfilling this responsibility
Fig. 1. The AV-TEP safety case structure and “pillars”
by constructing an efficient SMS framework that can be readily
implemented.
The proposed AV-TEP SMS pillar is founded on four
fundamental components that have been widely adopted
throughout these various industries, these include:
1) Safety Policy and Objectives (SPO): The objective of SPO
is to establish or enhance safety practices by implementing
a clear safety policy, identifying or creating safety
roles/teams with specific responsibilities (e.g., a safety
ambassador with direct access to the senior management
team and an investigation team responsible for safety
concerns concerning stakeholders), and establishing
organizational safety objectives [6]. This component also
includes developing an employee reporting and resolution
system, integrating existing processes and procedures into
a unified approach and promoting cross-organizational
communication and cooperation towards the shared goal of
safety.
2) Safety Risk Management (SRM): The objective of SRM is
to manage risks through a process consisting of describing
the system, identifying hazards, assessing, analyzing and
controlling risks based on safety risk assessments [6].
3) Safety Assurance (SA): The objective of SA is to monitor,
analyze, and measure overall safety performance, including
the efficacy of safety risk controls, management, and
related processes [6]. This component also assesses the
sustained effectiveness of risk control strategies, facilitates
the identification of new hazards, and ensures adherence to
standards, policies, and best practices through audits and
evaluations.
4) Safety Promotion (SP): The objective of SP is to regularly
conduct initiatives that inform, educate, and promote safety
awareness among employees [6]. These include training,
communication, and other activities aimed at fostering a
positive safety culture throughout all levels and roles of the
organization.
The AV-TEP SMS pillar also includes a fifth component: a
post-deployment Change Risk Management (CRM) process
that is similar to SRM. This objective of CRM is to identify,
evaluate, and manage risks that arise from changes in the design
or operations of the AV, whether planned or unplanned. These
changes may result from the addition of ADS features,
functions, or new scenarios in the operating environment. CRM
differs from SRM as it focuses on managing changes. This is
achieved through an iterative process of analyzing CRM
triggers, identifying hazards, assessing new risks, and
evaluating safety measures related to a certain change [7].
B. Design Methods Pillar
The primary objective of the AV-TEP Design Methods
pillar is to assist developers in ensuring the safety, reliability,
and performance of their AV system, as well as compliance
with regulatory requirements and industry standards. Design
methods refer to the systematic approach to developing,
designing, and verifying AV systems, which includes
considerations for the human-machine interface (HMI),
cybersecurity, and functional safety aspects of the system. A
comprehensive design methods process ensures that the system
is designed not only to operate safely under normal
circumstances but also to take into account foreseeable
malfunctions. The AV-TEP Design Methods pillar focuses on
system design with a concentration on driving safety aspects,
including simulation and testing, risk assessment and
management, and verification and validation. The process
involves evaluating the entire system, including simulation and
testing in various environments and scenarios while
considering design assumptions, verification and validation to
ensure compliance with industry standards and regulations,
analyzing system design to identify potential hazards and taking
necessary precautions to prevent them, and continuous
monitoring and improvement to enhance the system's safety
and performance over time while incorporating functional
safety standards into the design process.
Regarding risk assessment and management, a recent study
[8] analyzed automation-related incidents and accidents in
various transport modes to identify learning opportunities for
designing AVs and their operating systems. The study
identified two key leverage points: improved integration of
human factors in automation design across all modes, and re-
evaluation of regulatory approaches to address emerging
technology and their associated risks [3]. For developers
constructing a safety case, a significant takeaway is that design
assumptions must consider the possibility of imperfect
performance or noncompliance by operators and road users.
Developers should take these findings into consideration during
the SRM component of the safety case. By incorporating human
factors, developers can optimize interactions between the AV
system and humans while identifying and addressing potential
hazards and risks associated with potential human factor design
flaws. This design assumption can also be integrated into the
simulation and testing segment of the safety case and included
as a safety argument.
In terms of verification and validation, emphasis should be
placed on the importance of data recording and analysis.
Additionally, transparency and collaboration among industry
stakeholders are encouraged. During post-deployment, ongoing
evaluation and monitoring of the AV's performance and safety
is paramount. This should include adopting new evaluation and
assessment methods proposed in the form of standards,
regulations, and best practices to ensure that AVs continue to
operate safely and efficiently in various environments. The AV-
TEP framework uses ISO 21448-SOTIF [5] for guidance on
design, verification, and validation measures, in addition to
other accepted standards such as ISO 26262 [4]. SOTIF
examines whether safety functionalities can be ensured in
unknown conditions without failure, this requires assessing the
performance limitations of vehicle components and unexpected
changes in the road environment through simulation. The AV-
TEP SCF also requires developers to conduct simulation testing
to evaluate and assess vehicle responses to real-world
scenarios. A detailed approach to simulation and testing is
presented in the following Scenario-Based Testing Pillar.
C. Scenario-Based Testing Pillar
Scenario-based testing is a crucial aspect in ensuring the
safety and reliability of automated vehicles. It is imperative to
understand how automated vehicles would respond in various
real-world scenarios. A wide range of scenarios that encompass
different driving situations and environmental conditions that
an AV might encounter are created. This may include complex
urban setups, highway driving, pedestrian interactions, and
adverse weather conditions. The components that are required
for the scenario-based testing pillar are:
1. Driving Safety Assessment (DSA) metrics
2. Evaluation Methodology
3. Test Methodology
4. Evaluation Criteria
All these components are already being developed either
concurrently by other organizations or as part of the AV-TEP
mission except Evaluation Criteria, which is to be determined
based on thresholds and minimum level of safety recommended
by a regulator such as NHTSA. The first three components are
discussed below.
1) DSA Metrics
The DSA metrics have been developed starting with work
done by the Institute of Automated Mobility (IAM) [9] and
have now been carried forward by the Verification and
Validation (V&V) Task Force under SAE International’s On-
Road Automated Driving (ORAD) Committee. The V&V Task
Force has published a Recommended Practice, SAE J3237 [10].
The set of DSA metrics are shown in Fig. 2. These metrics are
categorized into black-box, grey-box and white-box metrics
depending on whether they require ADS data or not, and if ADS
are required, the level of access that is required. The Safety
Envelope Metrics are intended to be neutral to the Safety
Envelope Formulation selected to calculate the spatio-temporal
boundaries. Examples of Safety Envelope Formulations include
the Minimum Distance Safety Envelope (MDSE) based on
previous work by Intel/Mobileye and the IAM [9] and Model
Predictive Instantaneous Safety Metric (MPrISM) based on
work by the Transportation Research Center (TRC) and
National Highway Traffic Safety Administration (NHTSA)
[11].
Fig. 2. The DSA metrics in the SAE J3237 Recommended Practice [10]
2) Evaluation Methodology
With the metrics established, what must be measured for a
single scenario navigation is known. However, as shown in Fig.
2, there can be a large number of metrics measurements for that
scenario navigation. The objective of the DSA Methodology
[11] is to establish a methodology for interpreting and
combining all of these measurements into a single “score” for
the single scenario navigation. The DSA Methodology is
governed by the equation [11]:
 󰇟  
󰇛 󰇛 󰇜
󰇜󰇠*100%
The DSA metrics are assigned scores of 0 or 1 (0 is “good”
while 1 is “bad” signifying a violation of a threshold). The
severity of each violation (if it occurs) is quantified and the sum
of all products is determined. The raw score is then weighted
by the complexity of the scenario, relevance of the scenario to
the ODD, and fidelity of the test method. These three weighting
factors are significant research questions in and of themselves.
The DSA score out of 100 is then assigned for the scenario
navigation.
3) Test Methodology
The AV-TEP team has integrated the set of DSA metrics and
the DSA Methodology into a test selection and scoring
methodology (TSSM) that generates a testing regime of AV
driving scenarios tailored to each AV ODD and specifies a test
method for each scenario. When the scenarios are executed, an
analysis of the results is conducted that is structured for use in
evaluating AV driving safety and supporting the AV safety
case. The TSSM is the testing methodology to represents a
systems process to ensure that the ODD coverage is assured,
including edge cases.
The TSSM generates scenarios by first discretizing the
ODD of the vehicle under test (VUT) using an ODD “chart” -
a graph with a number of axes representing the entities,
phenomena, or conditions that may be encountered in any ODD
(e.g. trucks, cars, motorcycles, rain, snow, road surface
condition). The tick marks on the axes provide a quantization
of the entity or phenomenon represented by that axis (e.g.
number of safety-relevant trucks, cars, or motorcycles, inches
of rain or snow, numerical ranking of road surface condition),
with the origin of the axis representing zero for most axes and
the maximum of the axis representing the maximum of that
entity or phenomenon expected in any ODD. A point placed on
an axis represents the boundary of the ODD for the VUT, as
specified by the developer (e.g. the VUT may only be able to
handle, at maximum, three trucks or cars at once, or five inches
of rain and/or snow, or a road surface condition of 3/10). The
collection of points on the ODD chart axes creates a defining
volume for the given ODD that can be used to quantify the
number of scenarios needed for testing.
To generate scenarios for testing, the TSSM iterates through
all the axes of the ODD chart and randomly selects a point on
each axis within the boundaries of the ODD “shape”, thereby
generating a scenario. The generated scenario is then passed
through three “filters”. In all cases, if the scenario does not pass
through the filter, the scenario is regenerated until it does. The
first filter checks that the generated scenario is within the VUT
ODD. The second filter checks that the relevance of each
component of the scenario to the VUT ODD (e.g. number of
cars, number of trucks, etc.) is above a threshold. The third and
final filter checks that the relevance of the entire scenario, taken
as a whole, is above a threshold. These filters, combined with
the randomness of the scenario generation procedure, ensure
that AV developers cannot “develop to the test”.
The scenarios created by this random generation process are
categorized into the random test suite (RTS), which contains all
scenarios that are generated randomly. An additional test suite,
the standard test suite (STS), is generated by applying the ODD
filter to NHTSA’s list of 37 pre-crash scenarios and iteratively
generating a set of scenarios using the filtered list with all
components save for actor speeds held constant [12]. This
generates a set of scenarios which is standardized for identical
ODDs. Lastly, the known test suite (KTS) contains any
scenarios that are known to the AV developer or test engineer
to cause issues for the VUT.
Once the RTS, STS, and KTS have been filled, the scenarios
they contain are assigned to a test method. For safety reasons,
all test methods save for simulation are initially “locked” until
satisfactory results for a given test method “unlock” the next
safest test method (e.g. good results in simulation “unlock”
closed-course testing). Therefore, if all test methods have been
unlocked, the overall information required to test the VUT
ODD is calculated as a function of ODD “size”, and scenarios
are assigned to simulation or closed course testing until the total
required scenario information (a function of scenario relevance,
complexity, and test method fidelity) is achieved. This
assignment of scenarios satisfies a mix specified by the test
engineer that conforms to test method minimums specified by
regulators. Lastly, if public road testing is unlocked, a required
public road test mileage is calculated, again as a function of
ODD “size”. After that mileage has been driven by the AV, any
specified behavioral competencies are checked for in the
results. This allows the test engineer to gain useful information
from public road testing even without the ability to specify
scenarios beforehand, as can be done with simulation and
closed-course testing.
The TSSM is iterative; results from one execution are used
to refine the scenarios generated in the next execution.
Therefore, as a final step after the execution of the TSSM, an
overall score is generated that takes the aggregate of the
individual DSA scores for each scenario. Problem scenarios are
then added to the KTS, and a new TSSM iteration is started.
The iteration process stops when either a passing score is
achieved, a safety-critical error is discovered, or the test
engineer opts to stop the process to change the VUT’s software
or hardware and retest.
4) Resources and Current Work
The AV-TEP mission aims to provide the framework and
the research on threshold determination and methods. For the
scenario-based testing pillar, the available resources are an
open-source simulation software called Car Learning to Act
(CARLA) [13], real-world data collection on campus as the
ASU Polytechnic Campus as well as intersections in the
SmartDrive testbed operated by Maricopa County Department
of Transportation in Anthem, AZ, and the validation platform
currently in development (more on this platform in a later
section). CARLA is an open-source simulator for autonomous
driving research that offers a realistic environment with the
flexibility to import custom vehicles and maps. Within
CARLA, an open-source plugin called Scenario Runner is used
to execute specific driving scenarios that relate to the ODD.
Upon execution, ScenarioRunner generates logs that contain
direct simulation data, which is then processed using Python to
calculate DSA metrics and generate output graphs for further
analysis. Utilizing a custom Python script, many variations of a
given scenario can be generated and executed to produce a
robust dataset.
The initial stage of this project involved creating and testing
scenarios based on the 37 pre-crash scenarios prescribed by
NHTSA. A small database of these scenarios was established
and simulated in CARLA to collect and analyze the DSA
metrics. The introduction of the AV-TEP mission along with
the involvement of the validation platform further expanded the
scope of this project by providing a method to verify and
validate the safety of AVs. For this mission, a digital twin of
the research CAV was developed and imported into CARLA
along with a map of the Innovation Way route at ASU’s
Polytechnic Campus. Access to a larger database of about
250,000 scenarios hosted by Safety Pool [13] was obtained for
the AV-TEP mission. These scenarios were categorized based
on the Operational Design Domain (ODD) and added to test
suites. Simulation of these scenarios with a digital twin model
of the research CAV and the Innovation Way map was
performed in CARLA. Through the simulation of these
scenarios, the DSA metrics are verified and validated.
III. SCF VALIDATION
The SCF validation will be conducted on the AV-TEP
validation platform. The primary function of the AV-TEP
platform is to augment the natural driving environment with
Fig. 3. AV-TEP validation platform
virtual traffic to generate dense safety-critical scenarios where
the validation could be conducted efficiently.
The validation of the SCF requires enormous amounts of
safety-critical data from the natural driving environment.
However, their proportion in natural driving data is
unimaginably low. Recently, researchers from the academy and
industry have leveraged Digital Twin (DT) technology to tackle
this problem; some significant progress has proven the
feasibility and effectiveness of this method [14].
The AV-TEP platform is developed based on the DT
technology and Augmented Reality (AR) to accomplish the
validation. As mentioned previously, to generate testing
scenarios, this platform employs an open-source simulation
platform, CARLA. The scenarios can be used for co-simulation
with the automated driving software and for on-road vehicle
validation. The vehicle platform used for this mission is
instrumented as a level-4 AV with an open-source automated
driving system, Autoware [16]. The onboard sensor suite
includes 64-channel LIDAR units, RGB cameras, long-range
RADAR units, GNSS, IMU, and a Drive-by-Wire module.
The AV-TEP validation platform is depicted in Fig. 3. The
platform augments the real world with the designed safety-
critical scenarios. In the physical world, the AV-TEP vehicle
transfers its pose data and position data to the cloud server.
Combined with the digital map in CARLA, a virtual
representation of the vehicle is projected into the digital world,
where the designed safety-critical scenarios are created. Then,
the virtual AV-TEP vehicle detects the surroundings and return
the data to the real world. The virtual vehicle’s sensor
configuration aligns with the physical one. Further, the
decision-making process is completed onboard based on the
data fusion result of real sensor and simulation data. The motion
decisions are transmitted to the motion planning subsystem and
executed through Autoware.
The construction of the AV-TEP validation platform
includes four milestones, shown in Fig. 4: (1) bench testing, (2)
vehicle implementation, (3) cloud communication, and (4)
system-level integration and validation. The first milestone is
nearing completion, and the second one is in progress. The
integration of the Autoware-CARLA co-simulation
environment has been set up, shown in Fig. 4a. The vehicle
dynamics model and 3D CAD model are developed, shown in
Fig. 4b. A test ground map has been created for ASU
Polytechnic Campus, shown as Fig. 4c. The hardware and
sensor integration has been completed, shown in Fig. 4d.
a. Autoware-CARLA Co-Simulation b. 3D CAD model
c. Test ground map d. AV-TEP vehicle
Fig. 4. AV-TEP validation platform construction progress
IV. CONCLUSIONS AND FUTURE WORK
The AV-TEP mission’s SCF has been described for
evaluating the safety of AVs. The framework is based on three
pillars: (1) SMS, (2) Design Principles, and (3) Scenario-Based
Testing. For the SMS and Design Principles pillars, the best
practices and standards that form the basis of these pillars are
enumerated and described. For the Scenario-Based Testing
pillar, the DSA metrics and DSA Methodology that allow for
the determination of a “score” for a single scenario navigation,
along with the TSSM process that ensures ODD coverage,
including edge cases, are described. The current work includes
developing and accessing simulation-based scenarios that will
allow for validation of the various components of the Scenario-
Based Testing pillar, while also refining the flowchart of the
TSSM. A test vehicle platform that is currently in development
for validation of the AV-TEP was described in detail. The
platform is being modeled in CARLA, and selected scenarios
will be run in real-world conditions to validate the simulation
results.
The objective of the AV-TEP mission is to develop a
process that can be used by either an AV developer or a third-
party evaluator (i.e., a regulator) to build or evaluate a safety
case. The safety case will include evidence that demonstrates
sufficient competency for the specific AV deployment. The
intent is for the developed process to be readily implementable
and provide a path to more detailed VSSAs to be submitted to
NHTSA. An AV developer partner is collaborating on the
mission, and the objective is to provide a safety case that can
serve as a best case example for industry use. Subsequently, the
AV-TEP SCF and its components will be taken up by the V&V
Task Force under SAE’s ORAD Committee to become
standards documents once industry consensus has been
reached. The AV-TEP mission could help ensure public road
safety as AVs are deployed in greater numbers with a
systematic and consistent process for AV developers and
regulators.
ACKNOWLEDGMENTS
This work was made possible by the generous contributions and
funding provided by the Science Foundation Arizona (SFAz):
sfaz.org/mobility/AV-TEP. The authors would like to thank the
SFAz for its ongoing support of the AV-TEP Mission, as well
as the Institute of Automated Mobility (IAM) for its support of
the Metrics Project research: https://www.azcommerce.com/
iam/projects/driving-safety-performance-metrics-for-ads-
equipped-vehicles/.
REFERENCES
[1]
National Highway Traffic Safety Administration,
"Voluntary Safety Self- Assessment," [Online].
Available: https://www.nhtsa.gov/automated-driving-
systems/voluntary-safety-self-assessment.
[2]
National Highway Traffic Safety Administration,
"Advance Notice of Proposed Rulemaking (ANPRM):
Framework for Automated Driving System Safety,"
2020.
[3]
American National Standards Institute/Underwriters
Laboratory, "ANSI/UL 4600: Standard for Safety for the
Evaluation of Autonomous Products," 2023.
[4]
International Organization for Standardization, "26262 -
Road Vehicles - Functional Safety," 2018.
[5]
International Organization for Standardization, "21448 -
Safety of the Intended Functionality," 2022.
[6]
Automated Vehicle Safety Consortium, "AVSC
Information Report for Adapting a Safety Management
System (SMS) for Automated Driving System (ADS)
SAE Level 4 and Level 5 Testing and Evaluation," SAE
Industry Technologies Consortia, 2021.
[7]
Automated Vehicle Safety Consortium, "AVSC
Information Report for Change Risk Management,"
SAE Industry Technologies Consortia, 2023.
[8]
G. Read, A. O'Brien, N. Stanton and P. Salmon,
"Learning lessons for automated vehicle design: Using
systems thinking to analyse and compare automation-
related accidents across transport domains," Safety
Science, vol. 153, 2022.
[9]
J. Wishart, S. Como, M. Elli, B. Russo, J. Weast, N.
Altekar and E. James, "Driving Safety Performance
Assessment Metrics for ADS-Equipped Vehicles," SAE
Technical Paper 2020-01-1206, 2020.
[10]
SAE International, "J3237 - Operational Safety
Assessment (OSA) Metrics for Verification and
Validation (V&V) of Automated Driving Systems
(ADS)," Recommended Practice, 2023.
[11]
B. Weng, S. Rao, E. Deosthale, S. Schnelle and F.
Barickman, "Model Predictive Instantaneous Safety
Metric for Evaluation of Automated Driving Systems,"
arXiv:2005.09999v1, 2020.
[12]
S. Como and J. Wishart, "Evaluating Automated
Vehicle Scenario Navigation Using the Operational
Safety Assessment (OSA) Methodology," SAE
Technical Paper 2023-01-0797, 2023.
[13]
W. Najm, J. Smith and M. Yanagisawa, "Pre-crash
scenario typology for crash avoidance research,"
National Highway Traffic Safety Administration, 2007.
[14]
A. Dosovitskiy, G. Ros, F. Codevilla, A. Lopez and V.
Koltun, "CARLA: An Open Urban Driving Simulator,"
in 1st Annual Conference on Robot Learning, 2017.
[15]
Safety Pool, "Safety Pool - Powered by Deepen AI and
University of Warwick," [Online]. Available:
https://www.safetypool.ai/.
[16]
S. Feng, H. Sun, X. Yan, H. Zhu, Z. Zou, S. Shen and
H. Liu, "Dense reinforcement learning for safety,"
Nature, vol. 615, no. 7953, pp. 620-627, 2023.
[17]
S. Kato, S. Tokunaga, Y. Maruyama, S. Maeda, M.
Hirabayashi, Y. Kitsukawa, A. Monrroy, T. Ando, Y.
Fujil and T. Azumi, "Autoware on Board: Enabling
Autonomous Vehicles with Embedded Systems," in
ACM/IEEE 9th International Conference on Cyber-
Physical Systems (ICCPS), 2018.
[18]
SAE International, "J3016 - Taxonomy and Definitions
for Terms Related to Driving Automation Systems for
On-Road Motor Vehicles," Recommended Practice,
2021.
[19]
Automated Vehicle Safety Consortium, "AVSC Best
Practice for Metrics and Methods for Assessing Safety
Performance of Automated Driving Systems (ADS),"
SAE Industry Technologies Consortium, 2021.
... Notably, one of its outlined testing objectives is to evaluate the capability of the AV to handle certain scenarios safely [5]. In alignment with this goal, a clear and comprehensive scenario-based testing approach is essential as part of an overall approach to developing an AV [6]. ...
... The observable variables for the formulation are a long (longitudinal acceleration of subject vehicle), a lat (lateral acceleration of subject vehicle) a magnitude of 1 is given to the metric based on the conditions mentioned in the formulation and a value of 0 when the conditions do not meet. Multiple incidents of this metric from on-road ADS operations can be aggregated to calculate a single score given by the formula, = = ∑ 1 n i i EMI Frequency of Evasive Maneuvers exp (6) where, exp is the exposure to potential for evasive maneuvers. ...
Conference Paper
Full-text available
With the development of vehicles equipped with automated driving systems, the need for systematic evaluation of AV performance has grown increasingly imperative. According to ISO 34502, one of the safety test objectives is to learn the minimum performance levels required for diverse scenarios. To address this need, this paper combines two essential methodologies-scenario-based testing procedures and scoring systems-to systematically evaluate the behav-ioral competence of AVs. In this study, we conduct comprehensive testing across diverse scenarios within a simulator environment following Mcity AV Driver Licensing Test procedure. These scenarios span several common real-world driving situations, including BV Cut-in, BV Lane Departure into VUT Path from Opposite Direction, BV Left Turn Across VUT Path, and BV Right Turn into VUT Path scenarios. Furthermore, the test cases are divided into different risk levels, allowing the AV to be tested in a variety of risk-level situations, with a focus on high-value test cases to increase testing efficiency. Our evaluation leverages the driving assessment (DA) methodology, providing a quantitative framework for objectively assessing AV's system performance. In this system, scores are systematically assigned based on the comprehensive testing results, yielding a precise understanding of Autoware.ai's capabilities. This assessment is especially valuable given Autoware.ai's status as an open-source AV software. The results of our testing provide a promising evaluation of Autoware.ai's performance. The combination of the Mcity AV Driver Licensing Test and DA methodology contributes to a holistic understanding of Autoware.ai's behavior competence and demonstrates its capacity to handle safety-critical events in the test scenarios. These findings could not only help understanding of the performance of the vehicle under test (VUT) but also help developers identify the issues in their AV.
... Another perspective, in paper [12], focuses on the development of a methodology for assessing the safety of autonomous vehicles. The main technical aspects include SCF (Safety Case Framework), SMS (Safety Management System), scenario-based testing methods. ...
Article
Full-text available
The object of this study is the safety control system of ship management, by identifying and restoring the qualification parameters of shipmasters in critical situations. The task solved in the study is the timely determination of an insufficient level of qualification for the performance of certain operations in controlling the movement of the ship, by applying a formal-logical model of detecting the intuitive actions of the operator-shipmaster and gradually restoring his/her qualification parameters using the devised method. The stages of development and the formal-logical structure of the model and method in terms of cognitive automation were described in detail as the study results. It was possible to ensure early detection of risks when controlling the movement of the ship in 56 % of cases, during a laboratory experiment on simulators, which in 24 % of cases turned out to be particularly dangerous. The interpretation of the results involved algorithmizing complex and formalized data on the actions of operators and the application of the method of restoring their qualification parameters, which allowed a comprehensive approach to safety management. The distinguishing features of the findings were to predict the level of danger by simulating maritime operations with input navigational and individual conditions. This made it possible to improve the effectiveness of operations to 89 %, reduce the phenomenon of loss of control over the course to 32 %, reduce critical situations to 7 % and the cost of resources. The scope and conditions of practical use involve a comprehensive assessment of external and internal influences on the level of danger, delay in decision-making by operators, as well as sailing conditions. The simulation results could be used to devise strategies for planning maneuvers, predicting risks, and developing maritime security systems
... Another perspective, in paper [12], focuses on the development of a methodology for assessing the safety of autonomous vehicles. The main technical aspects include SCF (Safety Case Framework), SMS (Safety Management System), scenario-based testing methods. ...
Article
The object of this study is the safety control system of ship management, by identifying and restoring the qualification parameters of shipmasters in critical situations. The task solved in the study is the timely determination of an insufficient level of qualification for the performance of certain operations in controlling the movement of the ship, by applying a formal-logical model of detecting the intuitive actions of the operator -shipmaster and gradually restoring his/her qualification parameters using the devised method. The stages of development and the formal-logical structure of the model and method in terms of cognitive automation were described in detail as the study results. It was possible to ensure early detection of risks when controlling the movement of the ship in 56 % of cases, during a laboratory experiment on simulators, which in 24 % of cases turned out to be particularly dangerous. The interpretation of the results involved algorithmizing complex and formalized data on the actions of operators and the application of the method of restoring their qualification parameters , which allowed a comprehensive approach to safety management. The distinguishing features of the findings were to predict the level of danger by simulating maritime operations with input navigational and individual conditions. This made it possible to improve the effectiveness of operations to 89 %, reduce the phenomenon of loss of control over the course to 32 %, reduce critical situations to 7 % and the cost of resources. The scope and conditions of practical use involve a comprehensive assessment of external and internal influences on the level of danger, delay in decision-making by operators, as well as sailing conditions. The simulation results could be used to devise strategies for planning maneuvers, predicting risks, and developing maritime security systems. Devising an approach for the automated restoration of shipmaster's navigational qua lification parameters under risk conditions.
Conference Paper
Full-text available
Over the past decade, significant progress has been made in developing algorithms and improving hardware for automated driving. However, conducting research and deploying advanced algorithms on automated vehicles for testing and validation remains costly, especially for academia. This paper presents the efforts of our research team to integrate the newest version of the open-source Autoware software with the commercially available DataSpeed Drive-by-Wire (DBW) system, resulting in the creation of a versatile and robust automated vehicle research platform. Autoware, an opensource software stack based on the 2nd generation Robot Operating System (ROS2), has gained prominence in the automated vehicle research community for its comprehensive suite of perception, planning, and control modules. The DataSpeed DBW system directly communicates with the vehicle's CAN bus and provides precise vehicle control capabilities. However, there was no existing software package to make the ROS2-based Autoware interact with the DataSpeed DBW kit. We have successfully developed the software module to translate Ackermann control commands from Autoware’s control module to the DBW system, enabling it to execute both longitudinal and lateral controls of the vehicle. The interface also provides comprehensive feedback signals on vehicle status to Autoware. Rigorous testing has been conducted to verify the interface's functionality. Our successful experience integrating Autoware and the DataSpeed DBW system can serve as a valuable resource for researchers aiming to develop similar research vehicle platforms and accelerate the development of safe and efficient automated vehicles. The source code is available at https://github. com/BELIV-ASU/BELIV_vehicle_interface.git.
Article
Full-text available
The driving safety performance of automated driving system (ADS)-equipped vehicles (AVs) must be quantified using metrics in order to be able to assess the driving safety performance and compare it to that of human-driven vehicles. In this research, driving safety performance metrics and methods for the measurement and analysis of said metrics are defined and/or developed. A comprehensive literature review of metrics that have been proposed for measuring the driving safety performance of both human-driven vehicles and AVs was conducted. A list of proposed metrics, including novel contributions to the literature, that collectively, quantitatively describe the driving safety performance of an AV was then compiled, including proximal surrogate indicators, driving behaviors, and rules-of-the-road violations. These metrics, which include metrics from on-and off-board data sources, allow the driving safety performance of an AV to be measured in a variety of situations, including crashes, potential conflicts, and near misses. These measurements enable the evaluation of temporal flows and the quantification of key aspects of driving safety performance. The identification and exploration of metrics focusing explicitly on AVs as well as proposing a comprehensive set of metrics is a unique contribution to the literature. The objective is to develop a concise set of metrics that allow driving safety performance assessments to be effectively made and that align with the needs of both the ADS development and transportation engineering communities and accommodate differences in cultural/regional norms. Concurrent project work includes equipping an intersection with a sensor suite of cameras, LIDAR, and RADAR to collect data requiring off-board sources and employing test AVs to collect data requiring on-board sources. Additional concurrent work includes development of artificial intelligence and computer vision-based algorithms to automatically calculate the metrics using the collected data. Future work includes using the collected data and algorithms to finalize the list of metrics and then develop a methodology that uses the metrics to provide an overall driving safety performance assessment score for an AV.
Article
Full-text available
One critical bottleneck that impedes the development and deployment of autonomous vehicles is the prohibitively high economic and time costs required to validate their safety in a naturalistic driving environment, owing to the rarity of safety-critical events¹. Here we report the development of an intelligent testing environment, where artificial-intelligence-based background agents are trained to validate the safety performances of autonomous vehicles in an accelerated mode, without loss of unbiasedness. From naturalistic driving data, the background agents learn what adversarial manoeuvre to execute through a dense deep-reinforcement-learning (D2RL) approach, in which Markov decision processes are edited by removing non-safety-critical states and reconnecting critical ones so that the information in the training data is densified. D2RL enables neural networks to learn from densified information with safety-critical events and achieves tasks that are intractable for traditional deep-reinforcement-learning approaches. We demonstrate the effectiveness of our approach by testing a highly automated vehicle in both highway and urban test tracks with an augmented-reality environment, combining simulated background vehicles with physical road infrastructure and a real autonomous test vehicle. Our results show that the D2RL-trained agents can accelerate the evaluation process by multiple orders of magnitude (10³ to 10⁵ times faster). In addition, D2RL will enable accelerated testing and training with other safety-critical autonomous systems.
Conference Paper
Full-text available
The operational safety of Automated Driving System-equipped vehicles (AVs) is a critical issue with AVs being deployed on public roads. Methodologies for evaluating the operational safety are therefore necessary to maintain public safety. One possible approach is a safety case established by the AV developer that uses evidence to support a structured argument that the AV exhibits a given level of operational safety. One of the key components of a safety case for AVs is a set of testing results showing behavioral competency in a variety of scenarios within the AV's operational design domain (ODD). The Institute of Automated Mobility (IAM) has previously published operational safety assessment (OSA) metrics along with a means to evaluate the severity of violations of the safety envelope-type OSA metrics for navigation of individual scenarios in the proposed OSA Methodology. The objective of the OSA Methodology is to objectively quantify the safety performance of each scenario navigated by the AV, with the aggregate quantifications to be included in the AV's safety case. This work expands on the introduction of the OSA Methodology to (1) include more OSA metrics (beyond the safety envelope type) with clearly defined formulations for the severity of metric violation formulations, and (2) propose a methodology for interpreting the various OSA metric violation measurements to result in the desired quantification of the AV's safety performance of a scenario navigation that is composed of evaluations of collision events, near-miss events, and non-collision events. The OSA Methodology is implemented in various simulation software and in real-world testing results at an IAM testbed to validate the concept. The results show a promising test result interpretation methodology that can be incorporated into a safety case framework to help ensure AV operational safety.
Conference Paper
Full-text available
div class="section abstract"> The driving safety performance of automated driving system (ADS)-equipped vehicles (AVs) must be quantified using metrics in order to be able to assess the driving safety performance and compare it to that of human-driven vehicles. In this research, driving safety performance metrics and methods for the measurement and analysis of said metrics are defined and/or developed. A comprehensive literature review of metrics that have been proposed for measuring the driving safety performance of both human-driven vehicles and AVs was conducted. A list of proposed metrics, including novel contributions to the literature, that collectively, quantitatively describe the driving safety performance of an AV was then compiled, including proximal surrogate indicators, driving behaviors, and rules-of-the-road violations. These metrics, which include metrics from on- and off-board data sources, allow the driving safety performance of an AV to be measured in a variety of situations, including crashes, potential conflicts, and near misses. These measurements enable the evaluation of temporal flows and the quantification of key aspects of driving safety performance. The identification and exploration of metrics focusing explicitly on AVs as well as proposing a comprehensive set of metrics is a unique contribution to the literature. The objective is to develop a concise set of metrics that allow driving safety performance assessments to be effectively made and that align with the needs of both the ADS development and transportation engineering communities and accommodate differences in cultural/regional norms. Concurrent project work includes equipping an intersection with a sensor suite of cameras, LIDAR, and RADAR to collect data requiring off-board sources and employing test AVs to collect data requiring on-board sources. Additional concurrent work includes development of artificial intelligence and computer vision-based algorithms to automatically calculate the metrics using the collected data. Future work includes using the collected data and algorithms to finalize the list of metrics and then develop a methodology that uses the metrics to provide an overall driving safety performance assessment score for an AV. </div
Article
Automation is already on our roads, and advanced automated vehicles are anticipated to become a common feature in road transport systems in the near future. While this represents a new era for road systems, automation has existed in other transport systems for decades. The aim of this paper was to analyse accidents involving automated technologies across the transport modes to identify learning opportunities that could be applied in the design of automated vehicles and the wider road systems within which they will operate. Twenty-two investigation reports into automation-related accidents in transport modes traditionally using automation (aviation, maritime and rail) were identified and analysed using the AcciMap technique, with factors codified using a contributing factors taxonomy and analysed using network metrics. To determine their relevance to road automation, the results were compared with an analysis of three recent investigation reports into automated vehicle crashes. The comparison revealed similarities and differences between the contributory factors in the traditional modes and the emerging set of road automation crashes. Two key leverage points were identified: improved human factors integration into the design of automation in all modes; and re-consideration of regulatory approaches across transport domains to ensure they are appropriate for emerging technologies and their associated risks. The analysis also provides support for the utility of previously developed contributory factors taxonomy applied to compare transport domains.
Article
We introduce CARLA, an open-source simulator for autonomous driving research. CARLA has been developed from the ground up to support development, training, and validation of autonomous urban driving systems. In addition to open-source code and protocols, CARLA provides open digital assets (urban layouts, buildings, vehicles) that were created for this purpose and can be used freely. The simulation platform supports flexible specification of sensor suites and environmental conditions. We use CARLA to study the performance of three approaches to autonomous driving: a classic modular pipeline, an end-to-end model trained via imitation learning, and an end-to-end model trained via reinforcement learning. The approaches are evaluated in controlled scenarios of increasing difficulty, and their performance is examined via metrics provided by CARLA, illustrating the platform's utility for autonomous driving research. The supplementary video can be viewed at https://youtu.be/Hp8Dz-Zek2E
J3237 - Operational Safety Assessment (OSA) Metrics for Verification and Validation (V & V) of Automated Driving Systems (ADS)
  • International
SAE International, "J3237 -Operational Safety Assessment (OSA) Metrics for Verification and Validation (V&V) of Automated Driving Systems (ADS)," Recommended Practice, 2023.
Pre-crash scenario typology for crash avoidance research
  • W Najm
  • J Smith
  • M Yanagisawa
W. Najm, J. Smith and M. Yanagisawa, "Pre-crash scenario typology for crash avoidance research," National Highway Traffic Safety Administration, 2007.