Conference PaperPDF Available

A Proposed Safety Case Framework for Automated Vehicle Safety Evaluation

October 2023

October 2023

DOI:10.1109/IAVVC57316.2023.10328077

Conference: IEEE International Automated Vehicle Validation Conference (IAVVC)
At: Austin, TX

Authors:

Jeffrey Wishart

Arizona State University

Braeden Woodard

Arizona State University

Show all 7 authorsHide

Automated driving system (ADS)-equipped vehicles (AVs) are currently being developed and deployed on public roads. While the benefits to public safety (among other potential benefits) are promising, the risk that AVs pose to public safety is not yet understood. The Automated Vehicle Test and Evaluation Process (AV-TEP) mission initiated by Science Foundation Arizona in collaboration with Arizona State University is intended to provide a framework for an AV developer or third-party evaluator (such as a regulator) to use in order to provide evidence that the AV is safe for its intended implementation. The AV-TEP framework uses the safety case concept that is widely accepted in the AV industry and others, and consists of three pillars: (1) Safety Management System (SMS), (2) Design Methods, and (3) Scenario-Based Testing. The three pillars are described in detail, and the validation methodology used is outlined. Finally, the various paths to industry acceptance of the AV-TEP framework are described, which builds upon existing work and research wherever appropriate.

The AV-TEP safety case structure and "pillars"

…

Figures - uploaded by Jeffrey Wishart

Content may be subject to copyright.

Content uploaded by Jeffrey Wishart

Content may be subject to copyright.

A Proposed Safety Case Framework for Automated

Vehicle Safety Evaluation

Jeffrey Wishart, PhD

Science Foundation Arizona/

Arizona Commerce Authority

Phoenix, USA

jeffw@azcommerce.com

Junfeng Zhao, PhD

The Polytechnic School

Arizona State University

Mesa, USA

junfeng.zhao@asu.edu

Braeden Woodard

The Polytechnic School

Arizona State University

Mesa, USA

bmwoodar@asu.edu

Gavin O’Malley

The Polytechnic School

Arizona State University

Mesa, USA

gpomalle@asu.edu

Hencong Guo

The Polytechnic School

Arizona State University

Mesa, USA

hguo63@asu.edu

Shujauddin Rahimi

The Polytechnic School

Arizona State University

Mesa, USA

srahimi8@asu.edu

Sunder Swaminathan

The Polytechnic School

Arizona State University

Mesa, USA

sswami21@asu.edu

Abstract—Automated driving system (ADS)-equipped vehicles

(AVs) are currently being developed and deployed on public

roads. While the benefits to public safety (among other potential

benefits) are promising, the risk that AVs pose to public safety is

not yet understood. The Automated Vehicle Test and

Evaluation Process (AV-TEP) mission initiated by Science

Foundation Arizona in collaboration with Arizona State

University is intended to provide a framework for an AV

developer or third-party evaluator (such as a regulator) to

use in order to provide evidence that the AV is safe for its

intended implementation. The AV-TEP framework uses the

safety case concept that is widely accepted in the AV

industry and others, and consists of three pillars: (1) Safety

Management System (SMS), (2) Design Methods, and (3)

Scenario-Based Testing. The three pillars are described in

detail, and the validation methodology used is outlined.

Finally, the various paths to industry acceptance of the AV-

TEP framework are described, which builds upon existing

work and research wherever appropriate.

Keywords—Automated Driving System (ADS), Safety

Case Framework (SCF), Driving Safety Assessment (DSA),

scenario-based testing, metrics, safety management system,

digital twin, Verification and Validation (V&V)

I. INTRODUCTION

Automated driving system (ADS)-equipped vehicles (AVs)

are currently being developed by new and existing automotive

industry players. Some of these AVs are being deployed on

public roads already, and this is cause for concern. AVs have

1 The other benefits are enhanced traffic throughput, increased driving

efficiency, and enhanced mobility for mobility-challenged

populations.

great potential for reduction in collisions (both frequency and

severity), among other benefits.1 However, this purported safety

benefit should be proven and not assumed. There exists an

urgent need for a clear and consistent process for AV safety

evaluation. The Automated Vehicle Test and Evaluation Process

(AV-TEP) mission was initiated by Science Foundation Arizona

(SFAz) and Arizona State University (ASU) to provide a process

that can be used by AV developers and third-party evaluators

(including regulators). The AV-TEP mission builds on existing

work and leverages existing best practices, standards, and

regulations wherever possible. The objective of the AV-TEP

mission is to develop a process that is widely accepted in the

industry and that can be developed into standards and

regulations. As such, feedback has been solicited from an

Advisory Board as well as outside organizations to ensure that

there is sufficient stakeholder engagement and agreement on the

approach as the mission progresses.

A. Safety Case Definition

The AV-TEP framework uses a safety case-based approach.

This approach has been used in other safety-critical areas such

as the nuclear and aviation industries. There is a growing

consensus to adopt it within the AV industry. In the context of

AVs, a safety case is a reasoned argument, supported by

evidence, intended to justify that an AV is acceptably safe for

deployment either in a specific operating environment or for a

particular application. A safety case is expected to offer a

structured and systematic approach to identify and mitigate

potential hazards and risks associated with and presented

throughout the lifecycle of an AV. It should also include

This full-text paper was peer-reviewed at the direction of IEEE Instrumentation and Measurement Society prior to the acceptance and publication.

essential elements, such as design and operational principles, as

well as demonstrate compliance with relevant regulations,

standards, and industry best practices.

The National Highway Traffic Safety Administration

(NHTSA) adopted a safety case framework (SCF) through its

Voluntary Safety Self-Assessment (VSSA) [1] (that included

12 “pillars” that were topics that NHTSA recommended be

addressed in the VSSA of an AV developer) as well as the

Advanced Notice of Proposed Rulemaking (ANPRM) [2] that

proposed an SCF with UL 4600 [3], ISO 26262 [4], and ISO

21448 [5] standards as the basis. The AV-TEP mission employs

and expands on this approach.

II. SAFETY CASE CONSTRUCTION

The subsequent sections detail the proposed AV-TEP

framework for constructing an AV safety case, which is

organized into three fundamental pillars, namely Safety

Management System (SMS), Design Methods, and Scenario-

Based Testing, as depicted in Error! Reference source not

found.. The AV-TEP SCF leverages guidance from government

organizations, best practices from safety-critical industries,

voluntary industry standards, and incorporates insights from

academic research, and integrates lessons learned from the

organization’s own work. This safety case has adopted UL4600

as a fundamental standard, considering its wide acceptance and

high regard. However, the standard allows for interpretation by

developers and implementers regarding crucial elements. The

AV-TEP mission objective is to fill the gaps left by these

standards, most significantly a full methodology for scenario-

based testing, and the needs of developers by creating an

actionable process that AV developers can implement to

facilitate the construction of a safety case for their AVs. The

proposed AV-TEP safety case focuses on AV driving safety

performance with attention to safety engineering. This safety

case is not comprehensive and does not cover all aspects of AV

operational safety such as cybersecurity, vehicle maintenance,

passenger-initiated emergency stops, etc.

A. Safety Management System (SMS) Pillar

An SMS is an approach designed to systematically and

comprehensively support organizational safety by utilizing a

combination of safety principles, processes, and practices to

enhance organizational decisions based on safety risk. This

process involves identifying potential safety hazards,

evaluating and managing safety risks, and implementing

controls and mitigations to address those safety risks. Various

safety-sensitive industries, including but not limited to nuclear

energy, oil and gas, healthcare, chemical, defense, space, and

aviation, have implemented numerous variations of an SMS to

effectively manage safety risks. While many government

agencies have mandated the implementation of an SMS within

the aforementioned industries, there are currently no

established regulations or standards that define a minimum

acceptable level of risk for AVs. Consequently, regulators have

delegated the responsibility of identifying and managing safety

risks to the AV developers. The proposed SMS pillar of the AV-

TEP SCF aims to aid developers in fulfilling this responsibility

Fig. 1. The AV-TEP safety case structure and “pillars”

by constructing an efficient SMS framework that can be readily

implemented.

The proposed AV-TEP SMS pillar is founded on four

fundamental components that have been widely adopted

throughout these various industries, these include:

1) Safety Policy and Objectives (SPO): The objective of SPO

is to establish or enhance safety practices by implementing

a clear safety policy, identifying or creating safety

roles/teams with specific responsibilities (e.g., a safety

ambassador with direct access to the senior management

team and an investigation team responsible for safety

concerns concerning stakeholders), and establishing

organizational safety objectives [6]. This component also

includes developing an employee reporting and resolution

system, integrating existing processes and procedures into

a unified approach and promoting cross-organizational

communication and cooperation towards the shared goal of

safety.

2) Safety Risk Management (SRM): The objective of SRM is

to manage risks through a process consisting of describing

the system, identifying hazards, assessing, analyzing and

controlling risks based on safety risk assessments [6].

3) Safety Assurance (SA): The objective of SA is to monitor,

analyze, and measure overall safety performance, including

the efficacy of safety risk controls, management, and

related processes [6]. This component also assesses the

sustained effectiveness of risk control strategies, facilitates

the identification of new hazards, and ensures adherence to

standards, policies, and best practices through audits and

evaluations.

4) Safety Promotion (SP): The objective of SP is to regularly

conduct initiatives that inform, educate, and promote safety

awareness among employees [6]. These include training,

communication, and other activities aimed at fostering a

positive safety culture throughout all levels and roles of the

organization.

The AV-TEP SMS pillar also includes a fifth component: a

post-deployment Change Risk Management (CRM) process

that is similar to SRM. This objective of CRM is to identify,

evaluate, and manage risks that arise from changes in the design

or operations of the AV, whether planned or unplanned. These

changes may result from the addition of ADS features,

functions, or new scenarios in the operating environment. CRM

differs from SRM as it focuses on managing changes. This is

achieved through an iterative process of analyzing CRM

triggers, identifying hazards, assessing new risks, and

evaluating safety measures related to a certain change [7].

B. Design Methods Pillar

The primary objective of the AV-TEP Design Methods

pillar is to assist developers in ensuring the safety, reliability,

and performance of their AV system, as well as compliance

with regulatory requirements and industry standards. Design

methods refer to the systematic approach to developing,

designing, and verifying AV systems, which includes

considerations for the human-machine interface (HMI),

cybersecurity, and functional safety aspects of the system. A

comprehensive design methods process ensures that the system

is designed not only to operate safely under normal

circumstances but also to take into account foreseeable

malfunctions. The AV-TEP Design Methods pillar focuses on

system design with a concentration on driving safety aspects,

including simulation and testing, risk assessment and

management, and verification and validation. The process

involves evaluating the entire system, including simulation and

testing in various environments and scenarios while

considering design assumptions, verification and validation to

ensure compliance with industry standards and regulations,

analyzing system design to identify potential hazards and taking

necessary precautions to prevent them, and continuous

monitoring and improvement to enhance the system's safety

and performance over time while incorporating functional

safety standards into the design process.

Regarding risk assessment and management, a recent study

[8] analyzed automation-related incidents and accidents in

various transport modes to identify learning opportunities for

designing AVs and their operating systems. The study

identified two key leverage points: improved integration of

human factors in automation design across all modes, and re-

evaluation of regulatory approaches to address emerging

technology and their associated risks [3]. For developers

constructing a safety case, a significant takeaway is that design

assumptions must consider the possibility of imperfect

performance or noncompliance by operators and road users.

Developers should take these findings into consideration during

the SRM component of the safety case. By incorporating human

factors, developers can optimize interactions between the AV

system and humans while identifying and addressing potential

hazards and risks associated with potential human factor design

flaws. This design assumption can also be integrated into the

simulation and testing segment of the safety case and included

as a safety argument.

In terms of verification and validation, emphasis should be

placed on the importance of data recording and analysis.

Additionally, transparency and collaboration among industry

stakeholders are encouraged. During post-deployment, ongoing

evaluation and monitoring of the AV's performance and safety

is paramount. This should include adopting new evaluation and

assessment methods proposed in the form of standards,

regulations, and best practices to ensure that AVs continue to

operate safely and efficiently in various environments. The AV-

TEP framework uses ISO 21448-SOTIF [5] for guidance on

design, verification, and validation measures, in addition to

other accepted standards such as ISO 26262 [4]. SOTIF

examines whether safety functionalities can be ensured in

unknown conditions without failure, this requires assessing the

performance limitations of vehicle components and unexpected

changes in the road environment through simulation. The AV-

TEP SCF also requires developers to conduct simulation testing

to evaluate and assess vehicle responses to real-world

scenarios. A detailed approach to simulation and testing is

presented in the following Scenario-Based Testing Pillar.

C. Scenario-Based Testing Pillar

Scenario-based testing is a crucial aspect in ensuring the

safety and reliability of automated vehicles. It is imperative to

understand how automated vehicles would respond in various

real-world scenarios. A wide range of scenarios that encompass

different driving situations and environmental conditions that

an AV might encounter are created. This may include complex

urban setups, highway driving, pedestrian interactions, and

adverse weather conditions. The components that are required

for the scenario-based testing pillar are:

1. Driving Safety Assessment (DSA) metrics

2. Evaluation Methodology

3. Test Methodology

4. Evaluation Criteria

All these components are already being developed either

concurrently by other organizations or as part of the AV-TEP

mission except Evaluation Criteria, which is to be determined

based on thresholds and minimum level of safety recommended

by a regulator such as NHTSA. The first three components are

discussed below.

1) DSA Metrics

The DSA metrics have been developed starting with work

done by the Institute of Automated Mobility (IAM) [9] and

have now been carried forward by the Verification and

Validation (V&V) Task Force under SAE International’s On-

Road Automated Driving (ORAD) Committee. The V&V Task

Force has published a Recommended Practice, SAE J3237 [10].

The set of DSA metrics are shown in Fig. 2. These metrics are

categorized into black-box, grey-box and white-box metrics

depending on whether they require ADS data or not, and if ADS

are required, the level of access that is required. The Safety

Envelope Metrics are intended to be neutral to the Safety

Envelope Formulation selected to calculate the spatio-temporal

boundaries. Examples of Safety Envelope Formulations include

the Minimum Distance Safety Envelope (MDSE) based on

previous work by Intel/Mobileye and the IAM [9] and Model

Predictive Instantaneous Safety Metric (MPrISM) based on

work by the Transportation Research Center (TRC) and

National Highway Traffic Safety Administration (NHTSA)

[11].

Fig. 2. The DSA metrics in the SAE J3237 Recommended Practice [10]

2) Evaluation Methodology

With the metrics established, what must be measured for a

single scenario navigation is known. However, as shown in Fig.

2, there can be a large number of metrics measurements for that

scenario navigation. The objective of the DSA Methodology

[11] is to establish a methodology for interpreting and

combining all of these measurements into a single “score” for

the single scenario navigation. The DSA Methodology is

governed by the equation [11]:

 󰇟  

󰇛   󰇛 󰇜

󰇜󰇠*100%

The DSA metrics are assigned scores of 0 or 1 (0 is “good”

while 1 is “bad” signifying a violation of a threshold). The

severity of each violation (if it occurs) is quantified and the sum

of all products is determined. The raw score is then weighted

by the complexity of the scenario, relevance of the scenario to

the ODD, and fidelity of the test method. These three weighting

factors are significant research questions in and of themselves.

The DSA score out of 100 is then assigned for the scenario

navigation.

3) Test Methodology

The AV-TEP team has integrated the set of DSA metrics and

the DSA Methodology into a test selection and scoring

methodology (TSSM) that generates a testing regime of AV

driving scenarios tailored to each AV ODD and specifies a test

method for each scenario. When the scenarios are executed, an

analysis of the results is conducted that is structured for use in

evaluating AV driving safety and supporting the AV safety

case. The TSSM is the testing methodology to represents a

systems process to ensure that the ODD coverage is assured,

including edge cases.

The TSSM generates scenarios by first discretizing the

ODD of the vehicle under test (VUT) using an ODD “chart” -

a graph with a number of axes representing the entities,

phenomena, or conditions that may be encountered in any ODD

(e.g. trucks, cars, motorcycles, rain, snow, road surface

condition). The tick marks on the axes provide a quantization

of the entity or phenomenon represented by that axis (e.g.

number of safety-relevant trucks, cars, or motorcycles, inches

of rain or snow, numerical ranking of road surface condition),

with the origin of the axis representing zero for most axes and

the maximum of the axis representing the maximum of that

entity or phenomenon expected in any ODD. A point placed on

an axis represents the boundary of the ODD for the VUT, as

specified by the developer (e.g. the VUT may only be able to

handle, at maximum, three trucks or cars at once, or five inches

of rain and/or snow, or a road surface condition of 3/10). The

collection of points on the ODD chart axes creates a defining

volume for the given ODD that can be used to quantify the

number of scenarios needed for testing.

To generate scenarios for testing, the TSSM iterates through

all the axes of the ODD chart and randomly selects a point on

each axis within the boundaries of the ODD “shape”, thereby

generating a scenario. The generated scenario is then passed

through three “filters”. In all cases, if the scenario does not pass

through the filter, the scenario is regenerated until it does. The

first filter checks that the generated scenario is within the VUT

ODD. The second filter checks that the relevance of each

component of the scenario to the VUT ODD (e.g. number of

cars, number of trucks, etc.) is above a threshold. The third and

final filter checks that the relevance of the entire scenario, taken

as a whole, is above a threshold. These filters, combined with

the randomness of the scenario generation procedure, ensure

that AV developers cannot “develop to the test”.

The scenarios created by this random generation process are

categorized into the random test suite (RTS), which contains all

scenarios that are generated randomly. An additional test suite,

the standard test suite (STS), is generated by applying the ODD

filter to NHTSA’s list of 37 pre-crash scenarios and iteratively

generating a set of scenarios using the filtered list with all

components save for actor speeds held constant [12]. This

generates a set of scenarios which is standardized for identical

ODDs. Lastly, the known test suite (KTS) contains any

scenarios that are known to the AV developer or test engineer

to cause issues for the VUT.

Once the RTS, STS, and KTS have been filled, the scenarios

they contain are assigned to a test method. For safety reasons,

all test methods save for simulation are initially “locked” until

satisfactory results for a given test method “unlock” the next

safest test method (e.g. good results in simulation “unlock”

closed-course testing). Therefore, if all test methods have been

unlocked, the overall information required to test the VUT

ODD is calculated as a function of ODD “size”, and scenarios

are assigned to simulation or closed course testing until the total

required scenario information (a function of scenario relevance,

complexity, and test method fidelity) is achieved. This

assignment of scenarios satisfies a mix specified by the test

engineer that conforms to test method minimums specified by

regulators. Lastly, if public road testing is unlocked, a required

public road test mileage is calculated, again as a function of

ODD “size”. After that mileage has been driven by the AV, any

specified behavioral competencies are checked for in the

results. This allows the test engineer to gain useful information

from public road testing even without the ability to specify

scenarios beforehand, as can be done with simulation and

closed-course testing.

The TSSM is iterative; results from one execution are used

to refine the scenarios generated in the next execution.

Therefore, as a final step after the execution of the TSSM, an

overall score is generated that takes the aggregate of the

individual DSA scores for each scenario. Problem scenarios are

then added to the KTS, and a new TSSM iteration is started.

The iteration process stops when either a passing score is

achieved, a safety-critical error is discovered, or the test

engineer opts to stop the process to change the VUT’s software

or hardware and retest.

4) Resources and Current Work

The AV-TEP mission aims to provide the framework and

the research on threshold determination and methods. For the

scenario-based testing pillar, the available resources are an

open-source simulation software called Car Learning to Act

(CARLA) [13], real-world data collection on campus as the

ASU Polytechnic Campus as well as intersections in the

SmartDrive testbed operated by Maricopa County Department

of Transportation in Anthem, AZ, and the validation platform

currently in development (more on this platform in a later

section). CARLA is an open-source simulator for autonomous

driving research that offers a realistic environment with the

flexibility to import custom vehicles and maps. Within

CARLA, an open-source plugin called Scenario Runner is used

to execute specific driving scenarios that relate to the ODD.

Upon execution, ScenarioRunner generates logs that contain

direct simulation data, which is then processed using Python to

calculate DSA metrics and generate output graphs for further

analysis. Utilizing a custom Python script, many variations of a

given scenario can be generated and executed to produce a

robust dataset.

The initial stage of this project involved creating and testing

scenarios based on the 37 pre-crash scenarios prescribed by

NHTSA. A small database of these scenarios was established

and simulated in CARLA to collect and analyze the DSA

metrics. The introduction of the AV-TEP mission along with

the involvement of the validation platform further expanded the

scope of this project by providing a method to verify and

validate the safety of AVs. For this mission, a digital twin of

the research CAV was developed and imported into CARLA

along with a map of the Innovation Way route at ASU’s

Polytechnic Campus. Access to a larger database of about

250,000 scenarios hosted by Safety Pool [13] was obtained for

the AV-TEP mission. These scenarios were categorized based

on the Operational Design Domain (ODD) and added to test

suites. Simulation of these scenarios with a digital twin model

of the research CAV and the Innovation Way map was

performed in CARLA. Through the simulation of these

scenarios, the DSA metrics are verified and validated.

III. SCF VALIDATION

The SCF validation will be conducted on the AV-TEP

validation platform. The primary function of the AV-TEP

platform is to augment the natural driving environment with

Fig. 3. AV-TEP validation platform

virtual traffic to generate dense safety-critical scenarios where

the validation could be conducted efficiently.

The validation of the SCF requires enormous amounts of

safety-critical data from the natural driving environment.

However, their proportion in natural driving data is

unimaginably low. Recently, researchers from the academy and

industry have leveraged Digital Twin (DT) technology to tackle

this problem; some significant progress has proven the

feasibility and effectiveness of this method [14].

The AV-TEP platform is developed based on the DT

technology and Augmented Reality (AR) to accomplish the

validation. As mentioned previously, to generate testing

scenarios, this platform employs an open-source simulation

platform, CARLA. The scenarios can be used for co-simulation

with the automated driving software and for on-road vehicle

validation. The vehicle platform used for this mission is

instrumented as a level-4 AV with an open-source automated

driving system, Autoware [16]. The onboard sensor suite

includes 64-channel LIDAR units, RGB cameras, long-range

RADAR units, GNSS, IMU, and a Drive-by-Wire module.

The AV-TEP validation platform is depicted in Fig. 3. The

platform augments the real world with the designed safety-

critical scenarios. In the physical world, the AV-TEP vehicle

transfers its pose data and position data to the cloud server.

Combined with the digital map in CARLA, a virtual

representation of the vehicle is projected into the digital world,

where the designed safety-critical scenarios are created. Then,

the virtual AV-TEP vehicle detects the surroundings and return

the data to the real world. The virtual vehicle’s sensor

configuration aligns with the physical one. Further, the

decision-making process is completed onboard based on the

data fusion result of real sensor and simulation data. The motion

decisions are transmitted to the motion planning subsystem and

executed through Autoware.

The construction of the AV-TEP validation platform

includes four milestones, shown in Fig. 4: (1) bench testing, (2)

vehicle implementation, (3) cloud communication, and (4)

system-level integration and validation. The first milestone is

nearing completion, and the second one is in progress. The

integration of the Autoware-CARLA co-simulation

environment has been set up, shown in Fig. 4a. The vehicle

dynamics model and 3D CAD model are developed, shown in

Fig. 4b. A test ground map has been created for ASU

Polytechnic Campus, shown as Fig. 4c. The hardware and

sensor integration has been completed, shown in Fig. 4d.

a. Autoware-CARLA Co-Simulation b. 3D CAD model

c. Test ground map d. AV-TEP vehicle

Fig. 4. AV-TEP validation platform construction progress

IV. CONCLUSIONS AND FUTURE WORK

The AV-TEP mission’s SCF has been described for

evaluating the safety of AVs. The framework is based on three

pillars: (1) SMS, (2) Design Principles, and (3) Scenario-Based

Testing. For the SMS and Design Principles pillars, the best

practices and standards that form the basis of these pillars are

enumerated and described. For the Scenario-Based Testing

pillar, the DSA metrics and DSA Methodology that allow for

the determination of a “score” for a single scenario navigation,

along with the TSSM process that ensures ODD coverage,

including edge cases, are described. The current work includes

developing and accessing simulation-based scenarios that will

allow for validation of the various components of the Scenario-

Based Testing pillar, while also refining the flowchart of the

TSSM. A test vehicle platform that is currently in development

for validation of the AV-TEP was described in detail. The

platform is being modeled in CARLA, and selected scenarios

will be run in real-world conditions to validate the simulation

results.

The objective of the AV-TEP mission is to develop a

process that can be used by either an AV developer or a third-

party evaluator (i.e., a regulator) to build or evaluate a safety

case. The safety case will include evidence that demonstrates

sufficient competency for the specific AV deployment. The

intent is for the developed process to be readily implementable

and provide a path to more detailed VSSAs to be submitted to

NHTSA. An AV developer partner is collaborating on the

mission, and the objective is to provide a safety case that can

serve as a best case example for industry use. Subsequently, the

AV-TEP SCF and its components will be taken up by the V&V

Task Force under SAE’s ORAD Committee to become

standards documents once industry consensus has been

reached. The AV-TEP mission could help ensure public road

safety as AVs are deployed in greater numbers with a

systematic and consistent process for AV developers and

regulators.

ACKNOWLEDGMENTS

This work was made possible by the generous contributions and

funding provided by the Science Foundation Arizona (SFAz):

sfaz.org/mobility/AV-TEP. The authors would like to thank the

SFAz for its ongoing support of the AV-TEP Mission, as well

as the Institute of Automated Mobility (IAM) for its support of

the Metrics Project research: https://www.azcommerce.com/

iam/projects/driving-safety-performance-metrics-for-ads-

equipped-vehicles/.

REFERENCES

[1]

National Highway Traffic Safety Administration,

"Voluntary Safety Self- Assessment," [Online].

Available: https://www.nhtsa.gov/automated-driving-

systems/voluntary-safety-self-assessment.

[2]

National Highway Traffic Safety Administration,

"Advance Notice of Proposed Rulemaking (ANPRM):

Framework for Automated Driving System Safety,"

2020.

[3]

American National Standards Institute/Underwriters

Laboratory, "ANSI/UL 4600: Standard for Safety for the

Evaluation of Autonomous Products," 2023.

[4]

International Organization for Standardization, "26262 -

Road Vehicles - Functional Safety," 2018.

[5]

International Organization for Standardization, "21448 -

Safety of the Intended Functionality," 2022.

[6]

Automated Vehicle Safety Consortium, "AVSC

Information Report for Adapting a Safety Management

System (SMS) for Automated Driving System (ADS)

SAE Level 4 and Level 5 Testing and Evaluation," SAE

Industry Technologies Consortia, 2021.

[7]

Automated Vehicle Safety Consortium, "AVSC

Information Report for Change Risk Management,"

SAE Industry Technologies Consortia, 2023.

[8]

G. Read, A. O'Brien, N. Stanton and P. Salmon,

"Learning lessons for automated vehicle design: Using

systems thinking to analyse and compare automation-

related accidents across transport domains," Safety

Science, vol. 153, 2022.

[9]

J. Wishart, S. Como, M. Elli, B. Russo, J. Weast, N.

Altekar and E. James, "Driving Safety Performance

Assessment Metrics for ADS-Equipped Vehicles," SAE

Technical Paper 2020-01-1206, 2020.

[10]

SAE International, "J3237 - Operational Safety

Assessment (OSA) Metrics for Verification and

Validation (V&V) of Automated Driving Systems

(ADS)," Recommended Practice, 2023.

[11]

B. Weng, S. Rao, E. Deosthale, S. Schnelle and F.

Barickman, "Model Predictive Instantaneous Safety

Metric for Evaluation of Automated Driving Systems,"

arXiv:2005.09999v1, 2020.

[12]

S. Como and J. Wishart, "Evaluating Automated

Vehicle Scenario Navigation Using the Operational

Safety Assessment (OSA) Methodology," SAE

Technical Paper 2023-01-0797, 2023.

[13]

W. Najm, J. Smith and M. Yanagisawa, "Pre-crash

scenario typology for crash avoidance research,"

National Highway Traffic Safety Administration, 2007.

[14]

A. Dosovitskiy, G. Ros, F. Codevilla, A. Lopez and V.

Koltun, "CARLA: An Open Urban Driving Simulator,"

in 1st Annual Conference on Robot Learning, 2017.

[15]

Safety Pool, "Safety Pool - Powered by Deepen AI and

University of Warwick," [Online]. Available:

https://www.safetypool.ai/.

[16]

S. Feng, H. Sun, X. Yan, H. Zhu, Z. Zou, S. Shen and

H. Liu, "Dense reinforcement learning for safety,"

Nature, vol. 615, no. 7953, pp. 620-627, 2023.

[17]

S. Kato, S. Tokunaga, Y. Maruyama, S. Maeda, M.

Hirabayashi, Y. Kitsukawa, A. Monrroy, T. Ando, Y.

Fujil and T. Azumi, "Autoware on Board: Enabling

Autonomous Vehicles with Embedded Systems," in

ACM/IEEE 9th International Conference on Cyber-

Physical Systems (ICCPS), 2018.

[18]

SAE International, "J3016 - Taxonomy and Definitions

for Terms Related to Driving Automation Systems for

On-Road Motor Vehicles," Recommended Practice,

2021.

[19]

Automated Vehicle Safety Consortium, "AVSC Best

Practice for Metrics and Methods for Assessing Safety

Performance of Automated Driving Systems (ADS),"

SAE Industry Technologies Consortium, 2021.

Wang Wishart et al 2024 - Comprehensive Evaluation of Behavioral Competence of an AV using the DA Methodology

Conference Paper

Full-text available

Apr 2024

Jeffrey Wishart

With the development of vehicles equipped with automated driving systems, the need for systematic evaluation of AV performance has grown increasingly imperative. According to ISO 34502, one of the safety test objectives is to learn the minimum performance levels required for diverse scenarios. To address this need, this paper combines two essential methodologies-scenario-based testing procedures and scoring systems-to systematically evaluate the behav-ioral competence of AVs. In this study, we conduct comprehensive testing across diverse scenarios within a simulator environment following Mcity AV Driver Licensing Test procedure. These scenarios span several common real-world driving situations, including BV Cut-in, BV Lane Departure into VUT Path from Opposite Direction, BV Left Turn Across VUT Path, and BV Right Turn into VUT Path scenarios. Furthermore, the test cases are divided into different risk levels, allowing the AV to be tested in a variety of risk-level situations, with a focus on high-value test cases to increase testing efficiency. Our evaluation leverages the driving assessment (DA) methodology, providing a quantitative framework for objectively assessing AV's system performance. In this system, scores are systematically assigned based on the comprehensive testing results, yielding a precise understanding of Autoware.ai's capabilities. This assessment is especially valuable given Autoware.ai's status as an open-source AV software. The results of our testing provide a promising evaluation of Autoware.ai's performance. The combination of the Mcity AV Driver Licensing Test and DA methodology contributes to a holistic understanding of Autoware.ai's behavior competence and demonstrates its capacity to handle safety-critical events in the test scenarios. These findings could not only help understanding of the performance of the vehicle under test (VUT) but also help developers identify the issues in their AV.

Devising an approach for the automated restoration of shipmaster’s navigational qualification parameters under risk conditions

Article

Full-text available

Feb 2024

The object of this study is the safety control system of ship management, by identifying and restoring the qualification parameters of shipmasters in critical situations. The task solved in the study is the timely determination of an insufficient level of qualification for the performance of certain operations in controlling the movement of the ship, by applying a formal-logical model of detecting the intuitive actions of the operator-shipmaster and gradually restoring his/her qualification parameters using the devised method. The stages of development and the formal-logical structure of the model and method in terms of cognitive automation were described in detail as the study results. It was possible to ensure early detection of risks when controlling the movement of the ship in 56 % of cases, during a laboratory experiment on simulators, which in 24 % of cases turned out to be particularly dangerous. The interpretation of the results involved algorithmizing complex and formalized data on the actions of operators and the application of the method of restoring their qualification parameters, which allowed a comprehensive approach to safety management. The distinguishing features of the findings were to predict the level of danger by simulating maritime operations with input navigational and individual conditions. This made it possible to improve the effectiveness of operations to 89 %, reduce the phenomenon of loss of control over the course to 32 %, reduce critical situations to 7 % and the cost of resources. The scope and conditions of practical use involve a comprehensive assessment of external and internal influences on the level of danger, delay in decision-making by operators, as well as sailing conditions. The simulation results could be used to devise strategies for planning maneuvers, predicting risks, and developing maritime security systems

Devising an approach for the automated restoration of shipmaster’s navigational qualification parameters under risk conditions

Article

Feb 2024

The object of this study is the safety control system of ship management, by identifying and restoring the qualification parameters of shipmasters in critical situations. The task solved in the study is the timely determination of an insufficient level of qualification for the performance of certain operations in controlling the movement of the ship, by applying a formal-logical model of detecting the intuitive actions of the operator -shipmaster and gradually restoring his/her qualification parameters using the devised method. The stages of development and the formal-logical structure of the model and method in terms of cognitive automation were described in detail as the study results. It was possible to ensure early detection of risks when controlling the movement of the ship in 56 % of cases, during a laboratory experiment on simulators, which in 24 % of cases turned out to be particularly dangerous. The interpretation of the results involved algorithmizing complex and formalized data on the actions of operators and the application of the method of restoring their qualification parameters , which allowed a comprehensive approach to safety management. The distinguishing features of the findings were to predict the level of danger by simulating maritime operations with input navigational and individual conditions. This made it possible to improve the effectiveness of operations to 89 %, reduce the phenomenon of loss of control over the course to 32 %, reduce critical situations to 7 % and the cost of resources. The scope and conditions of practical use involve a comprehensive assessment of external and internal influences on the level of danger, delay in decision-making by operators, as well as sailing conditions. The simulation results could be used to devise strategies for planning maneuvers, predicting risks, and developing maritime security systems. Devising an approach for the automated restoration of shipmaster's navigational qua lification parameters under risk conditions.

Guo Wishart et al 2024 - Developing an AV Research Platform by Integrating Autoware with the DataSpeed DBW System

Conference Paper

Full-text available

Apr 2024

Over the past decade, significant progress has been made in developing algorithms and improving hardware for automated driving. However, conducting research and deploying advanced algorithms on automated vehicles for testing and validation remains costly, especially for academia. This paper presents the efforts of our research team to integrate the newest version of the open-source Autoware software with the commercially available DataSpeed Drive-by-Wire (DBW) system, resulting in the creation of a versatile and robust automated vehicle research platform. Autoware, an opensource software stack based on the 2nd generation Robot Operating System (ROS2), has gained prominence in the automated vehicle research community for its comprehensive suite of perception, planning, and control modules. The DataSpeed DBW system directly communicates with the vehicle's CAN bus and provides precise vehicle control capabilities. However, there was no existing software package to make the ROS2-based Autoware interact with the DataSpeed DBW kit. We have successfully developed the software module to translate Ackermann control commands from Autoware’s control module to the DBW system, enabling it to execute both longitudinal and lateral controls of the vehicle. The interface also provides comprehensive feedback signals on vehicle status to Autoware. Rigorous testing has been conducted to verify the interface's functionality. Our successful experience integrating Autoware and the DataSpeed DBW system can serve as a valuable resource for researchers aiming to develop similar research vehicle platforms and accelerate the development of safe and efficient automated vehicles. The source code is available at https://github. com/BELIV-ASU/BELIV_vehicle_interface.git.

Driving Safety Performance Assessment Metrics for ADS-Equipped Vehicles

Article

Full-text available

Apr 2020

The driving safety performance of automated driving system (ADS)-equipped vehicles (AVs) must be quantified using metrics in order to be able to assess the driving safety performance and compare it to that of human-driven vehicles. In this research, driving safety performance metrics and methods for the measurement and analysis of said metrics are defined and/or developed. A comprehensive literature review of metrics that have been proposed for measuring the driving safety performance of both human-driven vehicles and AVs was conducted. A list of proposed metrics, including novel contributions to the literature, that collectively, quantitatively describe the driving safety performance of an AV was then compiled, including proximal surrogate indicators, driving behaviors, and rules-of-the-road violations. These metrics, which include metrics from on-and off-board data sources, allow the driving safety performance of an AV to be measured in a variety of situations, including crashes, potential conflicts, and near misses. These measurements enable the evaluation of temporal flows and the quantification of key aspects of driving safety performance. The identification and exploration of metrics focusing explicitly on AVs as well as proposing a comprehensive set of metrics is a unique contribution to the literature. The objective is to develop a concise set of metrics that allow driving safety performance assessments to be effectively made and that align with the needs of both the ADS development and transportation engineering communities and accommodate differences in cultural/regional norms. Concurrent project work includes equipping an intersection with a sensor suite of cameras, LIDAR, and RADAR to collect data requiring off-board sources and employing test AVs to collect data requiring on-board sources. Additional concurrent work includes development of artificial intelligence and computer vision-based algorithms to automatically calculate the metrics using the collected data. Future work includes using the collected data and algorithms to finalize the list of metrics and then develop a methodology that uses the metrics to provide an overall driving safety performance assessment score for an AV.

Dense reinforcement learning for safety validation of autonomous vehicles

Article

Full-text available

Mar 2023
NATURE

One critical bottleneck that impedes the development and deployment of autonomous vehicles is the prohibitively high economic and time costs required to validate their safety in a naturalistic driving environment, owing to the rarity of safety-critical events¹. Here we report the development of an intelligent testing environment, where artificial-intelligence-based background agents are trained to validate the safety performances of autonomous vehicles in an accelerated mode, without loss of unbiasedness. From naturalistic driving data, the background agents learn what adversarial manoeuvre to execute through a dense deep-reinforcement-learning (D2RL) approach, in which Markov decision processes are edited by removing non-safety-critical states and reconnecting critical ones so that the information in the training data is densified. D2RL enables neural networks to learn from densified information with safety-critical events and achieves tasks that are intractable for traditional deep-reinforcement-learning approaches. We demonstrate the effectiveness of our approach by testing a highly automated vehicle in both highway and urban test tracks with an augmented-reality environment, combining simulated background vehicles with physical road infrastructure and a real autonomous test vehicle. Our results show that the D2RL-trained agents can accelerate the evaluation process by multiple orders of magnitude (10³ to 10⁵ times faster). In addition, D2RL will enable accelerated testing and training with other safety-critical autonomous systems.

Evaluating Automated Vehicle Scenario Navigation Using the Operational Safety Assessment (OSA) Methodology

Conference Paper

Full-text available

Feb 2023

Jeffrey Wishart

The operational safety of Automated Driving System-equipped vehicles (AVs) is a critical issue with AVs being deployed on public roads. Methodologies for evaluating the operational safety are therefore necessary to maintain public safety. One possible approach is a safety case established by the AV developer that uses evidence to support a structured argument that the AV exhibits a given level of operational safety. One of the key components of a safety case for AVs is a set of testing results showing behavioral competency in a variety of scenarios within the AV's operational design domain (ODD). The Institute of Automated Mobility (IAM) has previously published operational safety assessment (OSA) metrics along with a means to evaluate the severity of violations of the safety envelope-type OSA metrics for navigation of individual scenarios in the proposed OSA Methodology. The objective of the OSA Methodology is to objectively quantify the safety performance of each scenario navigated by the AV, with the aggregate quantifications to be included in the AV's safety case. This work expands on the introduction of the OSA Methodology to (1) include more OSA metrics (beyond the safety envelope type) with clearly defined formulations for the severity of metric violation formulations, and (2) propose a methodology for interpreting the various OSA metric violation measurements to result in the desired quantification of the AV's safety performance of a scenario navigation that is composed of evaluations of collision events, near-miss events, and non-collision events. The OSA Methodology is implemented in various simulation software and in real-world testing results at an IAM testbed to validate the concept. The results show a promising test result interpretation methodology that can be incorporated into a safety case framework to help ensure AV operational safety.

Driving Safety Performance Assessment Metrics for ADS-Equipped Vehicles

Conference Paper

Full-text available

Apr 2020

div class="section abstract"> The driving safety performance of automated driving system (ADS)-equipped vehicles (AVs) must be quantified using metrics in order to be able to assess the driving safety performance and compare it to that of human-driven vehicles. In this research, driving safety performance metrics and methods for the measurement and analysis of said metrics are defined and/or developed. A comprehensive literature review of metrics that have been proposed for measuring the driving safety performance of both human-driven vehicles and AVs was conducted. A list of proposed metrics, including novel contributions to the literature, that collectively, quantitatively describe the driving safety performance of an AV was then compiled, including proximal surrogate indicators, driving behaviors, and rules-of-the-road violations. These metrics, which include metrics from on- and off-board data sources, allow the driving safety performance of an AV to be measured in a variety of situations, including crashes, potential conflicts, and near misses. These measurements enable the evaluation of temporal flows and the quantification of key aspects of driving safety performance. The identification and exploration of metrics focusing explicitly on AVs as well as proposing a comprehensive set of metrics is a unique contribution to the literature. The objective is to develop a concise set of metrics that allow driving safety performance assessments to be effectively made and that align with the needs of both the ADS development and transportation engineering communities and accommodate differences in cultural/regional norms. Concurrent project work includes equipping an intersection with a sensor suite of cameras, LIDAR, and RADAR to collect data requiring off-board sources and employing test AVs to collect data requiring on-board sources. Additional concurrent work includes development of artificial intelligence and computer vision-based algorithms to automatically calculate the metrics using the collected data. Future work includes using the collected data and algorithms to finalize the list of metrics and then develop a methodology that uses the metrics to provide an overall driving safety performance assessment score for an AV. </div

Autoware on Board: Enabling Autonomous Vehicles with Embedded Systems

Conference Paper

Full-text available

Apr 2018

Learning lessons for automated vehicle design: Using systems thinking to analyse and compare automation-related accidents across transport domains

Article

Sep 2022
SAFETY SCI

Automation is already on our roads, and advanced automated vehicles are anticipated to become a common feature in road transport systems in the near future. While this represents a new era for road systems, automation has existed in other transport systems for decades. The aim of this paper was to analyse accidents involving automated technologies across the transport modes to identify learning opportunities that could be applied in the design of automated vehicles and the wider road systems within which they will operate. Twenty-two investigation reports into automation-related accidents in transport modes traditionally using automation (aviation, maritime and rail) were identified and analysed using the AcciMap technique, with factors codified using a contributing factors taxonomy and analysed using network metrics. To determine their relevance to road automation, the results were compared with an analysis of three recent investigation reports into automated vehicle crashes. The comparison revealed similarities and differences between the contributory factors in the traditional modes and the emerging set of road automation crashes. Two key leverage points were identified: improved human factors integration into the design of automation in all modes; and re-consideration of regulatory approaches across transport domains to ensure they are appropriate for emerging technologies and their associated risks. The analysis also provides support for the utility of previously developed contributory factors taxonomy applied to compare transport domains.

Model Predictive Instantaneous Safety Metric for Evaluation of Automated Driving Systems

Conference Paper

Oct 2020

CARLA: An Open Urban Driving Simulator

Article

Nov 2017

We introduce CARLA, an open-source simulator for autonomous driving research. CARLA has been developed from the ground up to support development, training, and validation of autonomous urban driving systems. In addition to open-source code and protocols, CARLA provides open digital assets (urban layouts, buildings, vehicles) that were created for this purpose and can be used freely. The simulation platform supports flexible specification of sensor suites and environmental conditions. We use CARLA to study the performance of three approaches to autonomous driving: a classic modular pipeline, an end-to-end model trained via imitation learning, and an end-to-end model trained via reinforcement learning. The approaches are evaluated in controlled scenarios of increasing difficulty, and their performance is examined via metrics provided by CARLA, illustrating the platform's utility for autonomous driving research. The supplementary video can be viewed at https://youtu.be/Hp8Dz-Zek2E

J3237 - Operational Safety Assessment (OSA) Metrics for Verification and Validation (V & V) of Automated Driving Systems (ADS)

Jan 2023

International

SAE International, "J3237 -Operational Safety Assessment (OSA) Metrics for Verification and Validation (V&V) of Automated Driving Systems (ADS)," Recommended Practice, 2023.

Pre-crash scenario typology for crash avoidance research

W Najm
J Smith
M Yanagisawa

W. Najm, J. Smith and M. Yanagisawa, "Pre-crash scenario typology for crash avoidance research," National Highway Traffic Safety Administration, 2007.

A Proposed Safety Case Framework for Automated Vehicle Safety Evaluation

Abstract and Figures

Recommended publications

OMalley Wishart et al 2024 - A Scenario-Based Test Selection and Scoring Methodology for Inclusion i...

Evaluating Automated Vehicle Scenario Navigation Using the Operational Safety Assessment (OSA) Metho...

Evaluation of Operational Safety Assessment (OSA) Metrics for Automated Vehicles in Simulation

Wang Wishart et al 2024 - Comprehensive Evaluation of Behavioral Competence of an AV using the DA Me...