PreprintPDF Available

Game-theoretic Modeling of Traffic in Unsignalized Intersection Network for Autonomous Vehicle Control Verification and Validation

October 2019

October 2019

Authors:

Nan Li

Tongji University

Yildiray Yildiz

Bilkent University

Show all 5 authorsHide

Preprints and early-stage research may not have been peer reviewed yet.

For a foreseeable future, autonomous vehicles (AVs) will operate in traffic together with human-driven vehicles. The AV planning and control systems need extensive testing, including early-stage testing in simulations where the interactions among autonomous/human-driven vehicles are represented. Motivated by the need for such simulation tools, we propose a game-theoretic approach to modeling vehicle interactions, in particular, for urban traffic environments with unsignalized intersections. We develop traffic models with heterogeneous (in terms of their driving styles) and interactive vehicles based on our proposed approach, and use them for virtual testing, evaluation, and calibration of AV control systems. For illustration, we consider two AV control approaches, analyze their characteristics and performance based on the simulation results with our developed traffic models, and optimize the parameters of one of them.

Reference paths for the autonomous ego vehicle to drive through (a) four-way, (b) T-shape, and (c) roundabout intersections.

…

Architecture of the neural network.

…

Figures - uploaded by Nan Li

Content may be subject to copyright.

Content uploaded by Nan Li

Content may be subject to copyright.

Game-theoretic Modeling of Trafﬁc in Unsignalized

Intersection Network for Autonomous Vehicle

Control Veriﬁcation and Validation

Ran Tian, Nan Li, Ilya Kolmanovsky, Yildiray Yildiz, and Anouck Girard

Abstract—For a foreseeable future, autonomous vehicles (AVs)

will operate in trafﬁc together with human-driven vehicles. The

AV planning and control systems need extensive testing, including

early-stage testing in simulations where the interactions among

autonomous/human-driven vehicles are represented. Motivated

by the need for such simulation tools, we propose a game-

theoretic approach to modeling vehicle interactions, in particular,

for urban trafﬁc environments with unsignalized intersections.

We develop trafﬁc models with heterogeneous (in terms of their

driving styles) and interactive vehicles based on our proposed

approach, and use them for virtual testing, evaluation, and

calibration of AV control systems. For illustration, we consider

two AV control approaches, analyze their characteristics and

performance based on the simulation results with our developed

trafﬁc models, and optimize the parameters of one of them.

I. INTRODUCTION

Autonomous driving technologies have greatly advanced

in recent years with the promise of providing safer, more

efﬁcient, environment-friendly, and easily accessible trans-

portation [1]–[3]. To fulﬁll such a commitment requires devel-

oping advanced planning and control algorithms to navigate

autonomous vehicles, as well as comprehensive testing pro-

cedures to verify their safety and performance characteristics

[4]–[6]. It is estimated based on the collision fatalities rate that

to conﬁdently verify an autonomous vehicle control system,

hundreds of millions of miles need to be driven [4], which

can be highly time and resource consuming if these driving

tests are all conducted in the physical world. Therefore, an

alternative solution is to use simulation tools to conduct early-

stage testing and evaluation in a virtual world. The work

of this paper is motivated by the need for virtual testing of

autonomous vehicle control systems.

In the near to medium term, autonomous vehicles are

expected to operate in trafﬁc together with human-driven

vehicles. Therefore, accounting for the interactions among

autonomous/human-driven vehicles is important to achieve

safe and efﬁcient driving behavior of an autonomous vehicle.

Control strategies for autonomous vehicles that account for

vehicle interactions include the ones based on Markov decision

This research has been supported by the National Science Foundation award

CNS 1544844.

Ran Tian, Nan Li, Ilya Kolmanovsky, and Anouck Girard are

with the Department of Aerospace Engineering, University of Michi-

gan, Ann Arbor, MI 48109, USA {tianran, nanli, ilya,

anouck}@umich.edu. Yildiray Yildiz is with the Department

of Mechanical Engineering, Bilkent University, Ankara 06800, Turkey

{yyildiz}@bilkent.edu.tr.

processes [7]–[10], model predictive control [11], [12], game-

theoretic models [13]–[16], [16], [17], as well as data-driven

approaches [18], [19]. To evaluate the effectiveness of these

algorithms requires simulation environments that can represent

the interactions among autonomous/human-driven vehicles.

In our previous work [20], we exploited a game-theoretic

approach to modeling vehicle interactions in highway trafﬁc.

Compared to highway trafﬁc, urban trafﬁc environments with

intersections are considered to be more challenging for both

human drivers and autonomous vehicles, as they involve

more extensive and complex interactions among vehicles. For

instance, almost 40% of trafﬁc accidents in the U.S. are

intersection-related [21].

In this paper, we extend the game-theoretic approach of [20]

to modeling vehicle interactions in urban trafﬁc. In particular,

we consider urban trafﬁc environments with unsignalized inter-

sections. Firstly, unsignalized intersections may be even more

challenging than signalized intersections because, due to the

lack of guidance from trafﬁc signals, a driver/automation needs

to decide on its own, whether, when and how to enter and

drive through the intersection. According to the U.S. Federal

Highway Administration’s report, almost 70% of fatalities due

to intersection-related trafﬁc accidents happened at unsignal-

ized intersections [22]. Thus, well-veriﬁed autonomous driving

systems for unsignalized intersections may deliver signiﬁcant

safety beneﬁts. Indeed, many research works on autonomous

vehicle control for intersections in the literature, including

[17], [23]–[26], deal with unsignalized intersections, although

they do not always explicitly point this out.

Our approach formulates the decision-making processes of

drivers/vehicles as a dynamic game, where each vehicle inter-

acts with other vehicles by observing their states, predicting

their future actions, and then planning its own actions. In

addition to the difference in trafﬁc scenarios being considered

(i.e., urban trafﬁc in this paper versus highway trafﬁc in [20]),

this paper contains the following methodological contribution

compared to [20]: Due to the much larger state space for

urban trafﬁc environments with intersections compared to

that for highway trafﬁc, the reinforcement learning approach

used in [20] to solve for control policies is computationally

prohibitive. Therefore, we develop in this paper an alternative

approach that uniquely integrates a game-theoretic formalism,

receding-horizon optimization, and an imitation learning algo-

rithm to obtain control policies. This new approach is shown

to be computationally effective for the large state space of

urban trafﬁc.

arXiv:1910.07141v1 [cs.RO] 16 Oct 2019

In [27], we modeled the interactions among vehicles at

unsignalized intersections, but using a different game-theoretic

approach from the one used in this paper: In [27], we

model vehicle interactions based on a formulation of a leader-

follower game; while in this paper, we consider the application

of level-k game theory [28], [29]. The control strategies of all

interacting vehicles modeled using the framework of [27] are

homogeneous; while the control strategies of different vehicles

modeled using the scheme of this paper are heterogeneous,

differentiated by their level-kcontrol policies with different

k= 0,1,2, . . . This heterogeneity can be used to represent the

different driving styles among different drivers, e.g., aggressive

driving versus cautious/conservative driving. In addition, [27]

models a single intersection with up to 10 interacting vehicles;

while in this paper, thanks to the effective application of

the aforementioned solution approach integrating game theory,

receding-horizon optimization, and imitation learning to obtain

control policies, the scheme of this paper can be used to model

much larger road systems involving many intersections and

many vehicles with manageable online computational effort.

This enables the investigation of driving characteristics that

are exhibited when a vehicle drives through multiple road

segments, such as overall travel time, fuel consumption, etc.

A road system with 15 intersections and 30 vehicles is shown

as an example in Section IV. Furthermore, application of

the developed trafﬁc models to veriﬁcation and validation

of autonomous vehicle control systems is comprehensively

discussed in this paper, but not in [27].

Preliminary results of this paper have been reported in the

conference papers [30] and [31]. The results modeling the

interactions between two vehicles at a four-way intersection

are reported in [30] and those for two vehicles at a roundabout

intersection are in [31]. This paper generalizes the methodol-

ogy to modeling the interactions among multiple (more than

two) vehicles and to an additional intersection type – T-shape

intersection. Constructing larger road systems based on the

models of these three intersections is reported for the ﬁrst time

in this paper. This paper also demonstrates how the developed

trafﬁc models can be used for virtual testing, evaluation, and

calibration of autonomous vehicle control systems, which is

not provided in [30] and [31].

In summary, the contributions of this paper are: 1) We de-

scribe an approach based on level-k game theory to modeling

the interactions among vehicles in urban trafﬁc environments

with unsignalized intersections. 2) We propose an algorithm

based on imitation learning to obtain level-kcontrol policies so

that our approach to modeling vehicle interactions is scalable

– able to model trafﬁc scenes with many intersections and

many vehicles. 3) We demonstrate the use of the developed

trafﬁc models for virtual testing, evaluation, and calibration of

autonomous vehicle control systems. For illustration purposes,

we consider two autonomous vehicle control approaches,

analyze their characteristics and performance based on the

simulation results with our trafﬁc models, and optimize the

parameters of one of them.

This paper is organized as follows: The models representing

vehicle dynamics and driver decision-making processes are in-

troduced in Section II. The game-theoretic model representing

vehicle interactions and obtaining its explicit approximation

via imitation learning are discussed in Section III. The proce-

dure to construct trafﬁc models of larger road systems based on

the models of three basic intersection scenarios is described in

Section IV. We then propose two autonomous vehicle control

approaches in Section V, used as case studies to illustrate

the application of our developed trafﬁc models to autonomous

vehicle control veriﬁcation and validation. Simulation results

are reported in Section VI, and ﬁnally, the paper is concluded

in Section VII.

II. TR AFFI C DYNA MI CS A ND DRIVER DECISION-MAKING

MODELING

In this section, we describe our models to represent the traf-

ﬁc dynamics and the decision-making processes of interacting

drivers.

A. Trafﬁc dynamics

Firstly, we describe the evolution of a trafﬁc scenario using

a discrete-time model as follows:

st+1 =F(st,ut),(1)

where s= (s1, s2, . . . , sm)denotes the trafﬁc state, composed

of the states si,i∈ M ={1,2, . . . , m}, of all interacting

vehicles in the scenario, u= (u1, u2, . . . , um)denotes the

collection of all vehicles’ actions ui, and the subscript t

represents the discrete-time instant. In particular, the state of

a vehicle is composed of two parts, si= (si,1, si,2). The

ﬁrst part si,1= (xi, yi, vi, θ i)represents the state of vehicle

dynamics, modeled using the “uni-cycle” model as follows:







t+1

θi

t+1







=f(si

t, ui

t) = 





t+vi

tcos θi

t∆t

t+vi

tsin θi

t∆t

t+ai

t∆t

θi

t+ωi

t∆t







,(2)

where (xi, yi),vi, and θirepresent, respectively, the vehi-

cle’s position in the ground-ﬁxed frame, its speed, and its

heading angle, the inputs aiand ωirepresent, respectively,

the vehicle’s acceleration and heading angle rate, while ∆t

is the sampling interval for decision-making. The second part

si,2= (ri, ξi)contains additional information related to the

vehicle’s decision-making objective, including ri= (ri

x, ri

y),

representing a target/reference position to go, and ξi, a feature

vector containing key information about the road layout and

geometry such as the road width, the angle of intersection, and

etc [27]. When vehicle iis driving toward, in the middle of,

or exiting a speciﬁc intersection, si,2stays constant with ri

being a point located in the center of the vehicle’s target lane;

si,2gets updated after the vehicle has returned to straight road

and is driving toward the next intersection.

B. Driver decision-making

An action uiis a pair of values of the inputs (ai, ωi), i.e.,

ui= (ai, ωi). We assume that the drivers of the vehicles make

sequential decisions based on receding-horizon optimization as

follows: At each discrete-time instant t, the driver of vehicle

isolves for

(ui

t)∗=(ui

0|t)∗,(ui

1|t)∗,...,(ui

N−1|t)∗(3)

∈arg max

t∈UN

N−1

τ=0

λτR(si

τ|t,s−i

τ|t, ui

τ|t,u−i

τ|t),

where ui

t=ui

0|t, ui

1|t, . . . , ui

N−1|trepresents a sequence of

predicted actions of vehicle i, with ui

τ|tdenoting the predicted

action for time step t+τand taking values in a ﬁnite action

set U; the notations si

τ|t,s−i

τ|tand u−i

τ|trepresent, respectively,

the predicted state of vehicle i, and the collections of predicted

states and actions of the other vehicles j∈ M,j6=i, i.e.,

s−i

τ|t= (sj

τ|t)j∈M,j6=iand u−i

τ|t= (uj

τ|t)j∈M,j6=i;Ris a reward

function depending on the states and actions of all interacting

vehicles, which will be introduced in detail in the following

section; and λ∈(0,1] is a factor discounting future reward.

Once an optimal action sequence (ui

t)∗is determined,

vehicle iapplies the ﬁrst element (ui

0|t)∗for one time step,

i.e., ui

t= (ui

0|t)∗. After the states of all vehicles have been

updated, vehicle irepeats this procedure at t+ 1.

The fact that Rdepends not only on the ego vehicle’s state

and action but also on those of the other vehicles determines

the interactive nature of the drivers’ decision-making processes

in a multi-vehicle trafﬁc scenario. Note that, due to the

unknowns u−i

τ|tand s−i

τ|tfor τ= 0,1, . . . , N −1, the problem

(3) has not been well-deﬁned yet and cannot be solved. To

be able to solve for (ui

t)∗, we will exploit a game-theoretic

approach in Section III to predict the values of u−i

τ|tand s−i

τ|t.

C. Reward function

We use the reward function Rin (3) to represent vehicles’

decision-making objectives in trafﬁc. In this paper, we consider

Rdeﬁned as follows:

R(si

τ|t,s−i

τ|t, ui

τ|t,u−i

τ|t) = w|Φsi

τ+1|t,(sj

τ+1|t)j∈M,j6=i,

(4)

where Φ= [φ1, φ2, . . . , φ6]|is the feature vector and w∈R6

is the weight vector. Note that sj

τ+1|t=f(sj

τ|t, uj

τ|t)for all

j∈ M based on the dynamic model (2).

The features φ1, φ2, . . . , φ6are designed to encode common

considerations in driving, such as safety, comfort, travel time,

etc. They are deﬁned as follows.

The feature φ1characterizes the collision status of the

vehicle. In particular, we bound the geometric contour of each

vehicle by a rectangle, referred to as the collision-zone (c-

zone). Then, φ1=−1if vehicle i’s c-zone at the predicted

state si

τ+1|toverlaps with any of the other vehicles’ c-zones

at their predicted states sj

τ+1|t, and φ1= 0 otherwise.

The feature φ2characterizes the on-road status of the

vehicle, taking −1if vehicle i’s c-zone crosses any of the road

boundaries, and 0otherwise. And similarly, φ3characterizes

the in-lane status of the vehicle. If vehicle i’s c-zone crosses

a lane marking that separates the trafﬁc of opposite directions

or enters a lane different from its target lane when exiting an

intersection, then φ3=−1;φ3= 0 otherwise.

To characterize the status of maintaining a safe and com-

fortable separation between vehicles, we further deﬁne a

separation-zone (s-zone) of each vehicle, which over-bounds

the vehicle’s c-zone with a safety margin. The feature φ4

takes −1if vehicle i’s s-zone overlaps with any of the

other vehicles’ s-zones at their predicted states, and takes 0

otherwise.

The features φ5and φ6characterize the vehicle’s behavior

in approaching its target lane and are deﬁned as follows,

φ5=− |ri

x−xi|−|ri

y−yi|,(5)

φ6=vi,(6)

so that the vehicle is encouraged to reach the reference point

riin its target lane as quickly as it can.

The above reward function design represents common driv-

ing objectives in trafﬁc. The weight vector wcan be tuned

to achieve reasonable driving behavior, or can be calibrated

using trafﬁc data and approaches such as inverse reinforcement

learning [32], [33].

III. GAME-THEORETIC DECISION-MAKING AND EXPLICIT

REALIZATION VIA IMI TATION LEARNING

Game theory is a useful tool for modeling intelligent agents’

strategic interactions. In this paper, we exploit the level-k

game theory [28], [29] to model vehicles’ interactive decision-

making.

A. Level-k reasoning and decision-making

In level-k game theory, it is assumed that players make

decisions based on ﬁnite depths of reasoning, called “level,”

and different players may have different reasoning levels.

In particular, a level-0player makes non-strategic decisions

– decisions without regard to the other players’ decisions.

Then, a level-k,k≥1, player makes strategic decisions

by assuming that all of the other players are level-(k−1),

predicting their decisions based on such an assumption, and

optimally responding to their predicted decisions. It is veriﬁed

by experimental results from cognitive science that such a

level-k reasoning process can model human interactions with

higher accuracy than traditional analytic methods in many

cases [29].

To incorporate level-k reasoning in our decision-making

model (3), we start with deﬁning a level-0decision rule.

According to the non-strategic assumption about level-0play-

ers, we let a level-0decision of a vehicle i,i∈ M,

depend only on the trafﬁc state st, including its own state

tand the other vehicles’ states s−i

t, but not on the other

vehicles’ actions u−i

t. In this paper, a level-0decision, (ui

t)0=

(ui

0|t)0,(ui

1|t)0,...,(ui

N−1|t)0, is a sequence of predicted

actions that maximizes the cumulative reward in (3) with

treating all of the other vehicles as stationary obstacles over

the planning horizon, i.e., vj

τ|t= 0,ωj

τ|t= 0 for all j6=i,

τ= 0,1, . . . , N . This way, a level-0vehicle represents an

aggressive vehicle which assumes that all of the other vehicles

will yield the right of way to it.

On the basis of the formulated level-0decision rule, the

level-kdecisions of the vehicles are obtained based on

(ui

t)k=(ui

0|t)k,(ui

1|t)k,...,(ui

N−1|t)k(7)

∈arg max

t∈UN

N−1

τ=0

λτRsi

τ|t,s−i

τ|t, ui

τ|t,(u−i

τ|t)k−1,

for every i∈ M, and for every k= 1,2, . . . , kmax through

sequential, iterated computations, where (u−i

τ|t)k−1denotes the

level-(k−1) decisions of the other vehicles j6=i, which have

been determined either in the previous iteration or based on

the level-0decision rule (for k= 1), and kmax is the highest

reasoning level for computation.

Given a ﬁnite action set U, the problem (7) for every i∈ M

and k= 1,2, . . . , kmax can be solved with exhaustive search,

e.g., based on a tree structure [34].

B. Explicit level-k decision-making via imitation learning

A level-kvehicle drives in trafﬁc by applying ui

t= (ui

0|t)k

at every time step, where (ui

0|t)kis determined according to

(7) with the current state as the initial condition, i.e., si

0|t=si

and s−i

0|t=s−i

Solving the problem (7) involves numerical computations.

In particular, the computational demand becomes increasingly

heavier for larger kand larger numbers of interacting vehicles,

due to the fact that to compute the level-kdecision of vehicle

irequires determining level-(k−1) decisions of all other

vehicles j6=iﬁrst, which in turn requires the determination

of level-(k−2) decisions for k≥2, and etc.

For the purpose of developing simulation environments to

conduct virtual tests for autonomous vehicle control systems,

fast simulations are desired so that a large number of scenarios

can be covered within a short period of time. Motivated by

this, we exploit machine learning techniques to move the

computations ofﬂine and achieve explicit level-kdecision rules

for online use.

In particular, we deﬁne a policy as a map from a triple of

the ego vehicle’s state si

t, the other vehicles’ states s−i

t, and

the ego vehicle’s reasoning level kto the level-kaction of the

ego vehicle, i.e.

πk: (si

t,s−i

t, k)7→ (ui

t)k.(8)

This map is algorithmically determined by the problem (7) and

(ui

t)k= (ui

0|t)k. We then pursue an explicit approximation

of πk, denoted by ˆπk, using the approach called “imitation

learning.”

Imitation learning is an approach for an autonomous agent

to learn a control policy from expert demonstrations to imitate

expert’s behavior. The expert can be a human expert [35] or a

well-behaved artiﬁcial intelligence [36]. In this paper, we treat

the algorithmically determined map πkas the expert.

Imitation learning can be formulated as a standard super-

vised learning problem, in which case it is also commonly

referred to as “behavioral cloning,” where the learning objec-

tive is to obtain a policy from a pre-collected dataset of expert

demonstrations that best approximates the expert’s behavior at

the states contained in the dataset. Such a procedure can be

described as

ˆπk∈arg min

πθ

E¯

s∼P(¯

s|πk)L(πk(¯

s), πθ(¯

s)),(9)

where ¯

sdenotes the triple (si,s−i, k),πkdenotes the expert

policy (8), πθdenotes a policy parameterized by θ(e.g., the

weights of a neural network) that is being evaluated and

optimized, Lis a loss function, and the notation E¯

s∼P(¯

s|πk)(·)

is deﬁned as

E¯

s∼P(¯

s|πk)(·) = Z(·)dP(¯

s|πk).(10)

We remark that a key feature of the procedure (9) is that

the expectation is with respect to the probability distribution

P(¯

s|πk)of the data ¯

sdetermined by the expert policy πk, which

is essentially the empirical distribution of ¯

sin the pre-collected

dataset.

In our previous work [31], we have explored the procedure

(9) to obtain an explicit policy that imitates level-kdecisions

for an autonomous vehicle to drive through a roundabout

intersection.

A drawback of using (9) to train the policy ˆπklies in that

only the states that can be reached by executing πkare included

in the dataset, and such a sampling bias may cause the error

of ˆπkfrom πkto propagate in time – a small error may cause

the vehicle to reach a state that is not exactly included in the

dataset and, consequently, a large error may occur at the next

time step.

Therefore, in this paper we use an alternative approach,

called the “Dataset Aggregation” (DAgger) algorithm, to train

the policy ˆπk. DAgger is an iterative algorithm that optimizes

the policy under its induced state distribution [37]. The learn-

ing objective of DAgger can be described as

ˆπk∈arg min

πθ

E¯

s∼P(¯

s|πθ)L(πk(¯

s), πθ(¯

s)),(11)

E¯

s∼P(¯

s|πθ)(·) = Z(·)dP(¯

s|πθ),(12)

where the distinguishing feature from (9) is that the expec-

tation is with respect to the probability distribution P(¯

s|πθ)

induced from the policy πθthat is being evaluated and

optimized.

DAgger can effectively resolve the aforementioned issue

with regard to the propagation of error in time, since there

will be data points (¯

s, πk(¯

s)) for states ¯

sreached by executing

ˆπk.

The procedure to obtain explicit level-kdecision-making

policies based on an improved version of DAgger algorithm

[36] is presented as Algorithm 1. In Algorithm 1, nmax repre-

sents the maximum number of simulation episodes and tmax

represents the length of a simulation episode. By “initialize the

simulation environment,” we mean constructing a trafﬁc scene,

including specifying the road layout and geometry as well as

the number of vehicles. By “initialize vehicle i,” we mean

putting the vehicle in a lane entering the scene while satisfying

a minimum separation distance from the other vehicles, and

specifying a sequence of target lanes for the vehicle to traverse

and ﬁnally leave the scene. By “vehicle ifails,” we mean

the occurrence of 1) vehicle i’s c-zone overlapping with any

of the other vehicles’ c-zones, 2) crossing any of the road

boundaries, or 3) crossing a lane marking that separates the

trafﬁc of opposite directions. And, by “vehicle isucceeds,”

we mean vehicle igets to the last target lane in its sequence

so that it can leave the scene without further interactions with

the other vehicles.

Algorithm 1: Imitation learning algorithm to obtain ex-

plicit level-kdecision-making policies

1Initialize ˆπ0

kto an arbitrary policy;

2Initialize dataset D ← ∅;

3for n= 1 : nmax do

4Initialize the simulation environment;

5for i∈ M do

6Initialize vehicle i;

7end for

8for t= 0 : tmax −1do

9for i∈ M do

10 if vehicle ifails or succeeds then

11 Re-initialize vehicle i;

12 end if

13 for k= 1 : kmax do

14 if ˆπn−1

k(si

t,s−i

t, k)6=πk(si

t,s−i

t, k)then

15 D ← D ∪ (si

t,s−i

t, k), πk(si

t,s−i

t, k)

16 end if

17 end for

18 Randomly generate kt∈ {1,...,kmax};

19 si

t+1 =fsi

t,ˆπn−1

k(si

t,s−i

t, kt);

20 end for

21 end for

22 Train classiﬁer ˆπn

kon D;

23 end for

24 Output ˆπk= ˆπnmax

IV. TRA FFIC I N UNSIGNALIZED INTERSECTION NETWOR K

We model trafﬁc in urban environments where the road

system is composed of straight roads and three of the most

common types of unsignalized intersections: four-way, T-

shape, and roundabout [38]. Such trafﬁc models can be used

as simulation environments for virtual testing of autonomous

vehicle control systems, which will be introduced in Section V.

The three unsignalized intersections to be modeled are

shown in Fig. 1. A vehicle can come from any of the entrance

lanes (marked by green arrows) to enter an intersection and

go to any of the exit lanes (marked by red arrows) to leave it,

except that U-turns are not allowed for four-way and T-shape

intersections.

(a) (b) (c)

Fig. 1. Unsignalized intersections to be modeled: (a) four-way, (b) T-shape,

and (c) roundabout.

When training the level-k policy ˆπkusing Algorithm 1, we

treat these three unsignalized intersections separately. Specif-

ically, when initializing the simulation environment in step 4,

we select one of these three unsignalized intersections as the

trafﬁc scene for the current simulation episode. In addition,

since in this paper we only consider these three unsignalized

intersections, their layout and geometry features can be char-

acterized and distinguished using a label ξ∈ {1,2,3}, i.e., the

state ξiof vehicle itakes the value 1when vehicle ioperates

in the area of the four-way intersection, 2for the T-shape

intersection, and 3for the roundabout. For more intersection

types with various layout and geometry features, a higher

dimensional vector ξmay be used (e.g., see the intersection

model in [27]).

Once the policy ˆπkfor each of these three unsignalized inter-

sections has been obtained, we can model larger road systems

using these three unsignalized intersections as modules and

assembling them in arbitrary ways. Fig. 2 shows an example

of assembly. When a vehicle operates at/nearest to a speciﬁc

intersection, it uses a local coordinate system, accounts for its

interactions with only the vehicles in an immediate vicinity,

and applies the ˆπkcorresponding to this intersection.

To model the heterogeneity in driving styles of different

drivers, we let different vehicles be of different reasoning

levels. Speciﬁcally, a level-kvehicle is controlled by the

policy:

ˆπk= ˆπk(·,·, k):(si

t,s−i

t)7→ (ui

t)k.(13)

For instance, the 15 yellow cars are level-1and the 15 red

cars are level-2in Fig. 2.

num. of level-1 cars: 15

num. of level-2 cars: 15

Fig. 2. An urban trafﬁc environment with 15 level-1cars (yellow) and 15

level-2cars (red).

V. AUTONOMOUS VEHICLE CONTROL AP PROACH ES

In this section, we describe two autonomous vehicle control

approaches for urban trafﬁc environments with unsignalized

intersections. These approaches will be tested and calibrated

using our trafﬁc model, thereby demonstrating its utility for

veriﬁcation and validation.

A. Adaptive control based on level-kmodels

In this approach, the autonomous ego vehicle treats the other

drivers as level-kdrivers. As different drivers may behave

corresponding to different reasoning levels, the ego vehicle

estimates their levels and adapts its own control strategy based

on the estimation results.

The control strategy of the autonomous ego vehicle, i, can

be described as: At each discrete-time instant t, vehicle i

solves for

(ui

t)a=(ui

0|t)a,(ui

1|t)a,...,(ui

N−1|t)a(14)

∈arg max

t∈UN

N−1

τ=0

λτRsi

τ|t,s−i

τ|t, ui

τ|t,(u−i

τ|t)˜

k,

where (u−i

τ|t)˜

k=(uj

τ|t)˜

tj∈M,j6=idenotes the collection

of predicted actions of the other vehicles. In particular, the

actions of vehicle j,uj

τ|t,τ= 0,1, . . . , N −1, are predicted by

modeling vehicle jas level-˜

tand solved based on (7), where

tis determined based on the following maximum likelihood

principle: ˜

t∈arg max

k∈K

Pi(kj=k|t),(15)

in which Pi(kj=k|t)represents vehicle i’s belief at time tin

that vehicle jcan be modeled as level-k, with ktaking values

in a model set K. The beliefs Pi(kj=k|t)get updated after

each time step based on the following algorithm: If there exist

k, k0∈ K such that πk(sj

t,s−j

t, k)6=πk(sj

t,s−j

t, k0), then

Pi(kj=k|t+ 1) = pi(kj=k|t+ 1)

Pk0∈K pi(kj=k0|t+ 1),(16)

pi(kj=k|t+ 1) = ((1 −β)Pi(kj=k|t) + βif k=ˆ

Pi(kj=k|t),otherwise,

where β∈[0,1] represents an update step size,

t∈arg min

k∈K

distuj

t,(uj

t)k,

=q(aj

t−(aj

t)k)2+ (ωj

t−(ωj

t)k)2;(17)

and if πk(sj

t,s−j

t, k) = πk(sj

t,s−j

t, k0)for all k, k 0∈ K, then

Pi(kj=k|t+ 1) = Pi(kj=k|t)for all k∈ K.

The level estimation algorithm (15)-(17) has the following

three features: 1) If the actions predicted by all of the models

in Kare the same, then the autonomous ego vehicle has

no information to distinguish their relative accuracy and thus

maintains its previous beliefs. 2) Otherwise, the ego vehicle

identiﬁes the model(s) in Kwhose prediction (uj

t)kmatches

vehicle j’s actually applied action uj

tfor time twith the

highest accuracy. 3) The ego vehicle improves its belief in that

model(s) from its previous beliefs, thus, it takes into account

both its previous estimates and the current, latest estimate.

Similar to (8) deﬁned by (7), we can deﬁne a policy to

represent the control determined by (14) as follows:

πa: (si

t,s−i

t,˜

k−i

t)7→ (ui

t)a,(18)

where ˜

k−i

t= (˜

t)j∈M,j6=idenotes the collection of level esti-

mates of the other vehicles and (ui

t)a= (ui

0|t)ais determined

by (14). Furthermore, similar to the procedure to train the

explicit approximation ˆπkto πkusing imitation learning, we

can train an explicit approximation ˆπato πa. This way, together

with replacing πkwith ˆπkin the level estimation algorithm

(15)-(17), we can move the major computations involved in

(14)-(17) ofﬂine, thus, reducing online computational load and

promoting real-time implementation.

The algorithm to train ˆπausing πaas the expert policy and

the DAgger algorithm is similar to Algorithm 1 and is omitted.

B. Rule-based control

The second autonomous vehicle control approach we con-

sider is a rule-based solution. Compared to many other ap-

proaches, rule-based control has the advantage of interpretabil-

ity and can often be calibrated by tuning a small number of

parameters.

The autonomous ego vehicle drives by following a pre-

planned reference path and accounts for its interactions with

other vehicles by adjusting its speed along the path corre-

spondingly. Examples of reference paths for the autonomous

ego vehicle to drive through intersections are illustrated by the

green dotted curves in Fig. 3.

(a) (b) (c)

Fig. 3. Reference paths for the autonomous ego vehicle to drive through (a)

four-way, (b) T-shape, and (c) roundabout intersections.

The basic control rules can be explained as follows: The

autonomous ego vehicle pursues a higher speed along the

reference path if there is no other vehicle in conﬂict with

it. If there are other vehicles in conﬂict with it, then the au-

tonomous ego vehicle yields to them by maximizing distances

from them. Speciﬁcally, at each discrete-time instant t, the

autonomous ego vehicle, i, selects and applies for one time

step an acceleration value from a ﬁnite set of accelerations,

A, according to Algorithm 2.

Algorithm 2: Rule-based autonomous vehicle control al-

gorithm

1Initialize Mc← ∅;

2for j∈ M, j 6=ido

3if the estimated future path of jintersects with i’s future

path and dist(xi

t, yi

t),(xj

t, yj

t)≤Rcthen

4Mc← Mc∪ {j};

5end if

6end for

7if Mc6=∅then

8(ai

t)r=

arg maxa∈A minj∈Mcdist(xi

1|t, yi

1|t),(xj

1|t, yj

1|t);

9else

10 (ai

t)r= max{a∈ A};

11 end if

12 Output (ai

t)r.

In Algorithm 2, Mcrepresents the set of vehicles that are

in conﬂict with the ego vehicle. In particular, the ego vehicle

estimates each of the other vehicles’ future paths based on their

current positions and their target lanes and using the same path

planning algorithm that is used by the ego vehicle to create its

own path. If the estimated future path of a vehicle jintersects

with the ego vehicle’s own future path and the current distance

between these two vehicles is smaller than a threshold value

Rc, then vehicle jis identiﬁed as a vehicle in conﬂict, i.e.,

j∈ Mc, where the distance function dist(·,·)is deﬁned as

dist(x1, y1),(x2, y2)=p(x1−x2)2+ (y1−y2)2.(19)

If there are vehicles in conﬂict, Mc6=∅, then the ego vehicle

maximizes the minimum among the predicted distances from

these vehicles to improve safety. In step 8, (xi

1|t, yi

1|t)repre-

sents the predicted position of the ego vehicle iby driving

with the predicted speed after applying the acceleration aand

along its reference path for one step, and (xj

1|t, yj

1|t)represents

the predicted position of vehicle jby driving with its current

speed and along its current heading direction. If there is no

vehicle in conﬂict, Mc=∅, then the ego vehicle maximizes

its speed.

Note that the key parameter for this rule-based control

approach is the threshold value Rc, which inﬂuences both

whether a vehicle is identiﬁed as in conﬂict with the ego

vehicle and the separation distance the ego vehicle tries to

keep from other vehicles. We will utilize our trafﬁc model to

calibrate this parameter in Section VI-B.

VI. RE SU LTS

In this section, we illustrate simulations of urban trafﬁc with

vehicle interactions modeled by our level-k game-theoretic

approach, and the application to veriﬁcation, validation and

calibration of autonomous vehicle control systems.

A. Trafﬁc modeling with level-k vehicles

We consider a sampling interval, ∆t= 0.25[s], and an ac-

tion set Uconsisting of 6actions representing common driving

maneuvers in urban trafﬁc, listed in Table I. The weight vector,

the planning horizon, and the discount factor for the reward

function (4) are w= [1000,500,50,100,5,1]|,N= 4, and

λ= 0.8. When evaluating the features φ1and φ4, we consider

the c-zone of a vehicle as a 5[m]×2[m]rectangle centered

at the vehicle’s position (x, y)and stretched along its heading

direction θand the s-zone of a vehicle as a rectangle concentric

with its c-zone and 8[m] ×2.4[m] in size. Furthermore, we

consider a speed range [vmin, vmax ] = [0,5][m/s], representing

common speeds for vehicles to drive through intersections, i.e.,

when the speed calculated based on the model (2) gets outside

of [vmin, vmax ], it is saturated to this range.

Experimental studies [29], [39] suggest that humans are

most commonly level-1and 2reasoners in their interactions.

Thus, we model vehicles in trafﬁc using level-1and 2policies

in this paper. In particular, on the basis of our level-0deci-

sion rule (see Section III-A), a level-1vehicle represents a

cautious/conservative vehicle and a level-2vehicle represents

an aggressive vehicle. Indeed, since both level-0and level-2

vehicles represent aggressive vehicles, they behave similarly

in many situations.

TABLE I

ACT ION SET U.

action u a [m2/s]ω[rad/s]

maintain (u1) 0 0

accelerate (u2) 2.5 0

decelerate (u3) -2.5 0

hard brake (u4) -5 0

turn left (u5) 0 π/4

turn right (u6) 0 −π/4

We use a neural network with the architecture shown in in

Fig. 4 to represent a policy πθand train its weights θusing

Algorithm 1 to obtain the explicit approximation ˆπkto the

level-k policy πk, which is algorithmically determined by (7).

The accuracy of the obtained ˆπkin terms of matching πkon the

training dataset is 98.3%. Then, we generate 30% more data

points of (si

t,s−i

t, k), πk(si

t,s−i

t, k)for testing. The accuracy

of ˆπkin matching πkon the test dataset is 97.8%.

Fig. 4. Architecture of the neural network.

To show the advantage of using the DAgger algorithm

(11) over using a standard supervised learning procedure (9)

to obtain the policy ˆπk, we show a case observed in our

simulations where the policy trained using standard supervised

learning fails but the one trained using DAgger succeeds. In

Fig. 5(a-3), the blue vehicle controlled by ˆπktrained using

standard supervised learning fails in making an adequate right

turn to get around the central island. This is due to a signiﬁcant

error of ˆπkfrom πkat certain states encountered by the blue

vehicle when entering the roundabout, and the encounter with

such states results from the issue of error propagation in time

that has been discussed in Section III-B. In contrast, the blue

vehicle in Fig. 5(b-3) controlled by ˆπktrained using DAgger

succeeds in making a proper right turn, illustrating the fact

that DAgger can effectively resolve such an issue.

In what follows we show the interactions between level-k

vehicles at the four-way, T-shape, and roundabout intersec-

tions. In particular, we let three vehicles be controlled by

different level-kpolicies and show how the trafﬁc scenarios

evolve differently depending on the different combinations of

level-kpolicies.

It can be observed from Figs. 6-8 that, in general, when

level-1and level-2vehicles interact with each other, the

conﬂicts between them can be resolved. This is expected since

level-1vehicles, representing cautious/conservative vehicles,

will yield the right of way and level-2vehicles, representing

aggressive vehicles, will proceed ahead. In contrast, when

level-1vehicles interact with level-1vehicles, deadlocks may

-15 -10 -5 0 5 10 15

-15

-10

-5

v1 = 5 m/s

v2 = 5 m/s

-15 -10 -5 0 5 10 15

-15

-10

-5

v1 = 4.5 m/s

v2 = 5 m/s

-15 -10 -5 0 5 10 15

-15

-10

-5

v1 = 5 m/s

v2 = 5 m/s

-15 -10 -5 0 5 10 15

-15

-10

-5

v1 = 5 m/s

v2 = 5 m/s

-15 -10 -5 0 5 10 15

-15

-10

-5

v1 = 1.5 m/s

v2 = 5 m/s

-15 -10 -5 0 5 10 15

-15

-10

-5

v1 = 3 m/s

v2 = 5 m/s

(a-1) (a-2) (a-3)

(b-1) (b-2) (b-3)

Fig. 5. (a-1)-(a-3) show three subsequent steps in a simulation where the blue

vehicle controlled by ˆπktrained using standard supervised learning fails in

making an adequate right turn to get around the central island of a roundabout;

(b-1)-(b-3) show steps in a similar simulation where the blue vehicle controlled

by ˆπktrained using DAgger succeeds in making a proper right turn.

occur, such as the one being observed in the T-shape intersec-

tion in Fig. 7(a), because everyone yields to the others. When

level-2vehicles interact with level-2vehicles, collisions may

occur, such as the ones being observed in panel (b) of Figs. 6-

8, because everyone assumes the others yield.

-15 -10 -5 0 5 10 15

-15

-10

-5

v1 = 5 m/s

v2 = 5 m/s

v3 = 5 m/s

-15 -10 -5 0 5 10 15

-15

-10

-5

v1 = 2 m/s

v2 = 4 m/s

v3 = 0 m/s

-15 -10 -5 0 5 10 15

-15

-10

-5

v1 = 5 m/s

v2 = 3.5 m/s

v3 = 3.5 m/s

-15 -10 -5 0 5 10 15

-15

-10

-5

v1 = 5 m/s

v2 = 5 m/s

v3 = 5 m/s

-15 -10 -5 0 5 10 15

-15

-10

-5

v1 = 5 m/s

v2 = 5 m/s

v3 = 5 m/s

-15 -10 -5 0 5 10 15

-15

-10

-5

v1 = 5 m/s

v2 = 5 m/s

v3 = 5 m/s

-15 -10 -5 0 5 10 15

-15

-10

-5

v1 = 5 m/s

v2 = 0.5 m/s

v3 = 0 m/s

-15 -10 -5 0 5 10 15

-15

-10

-5

v1 = 5 m/s

v2 = 1.5 m/s

v3 = 3.5 m/s

(a-1) (a-2) (a-3)

(b-1) (b-2) (b-3)

(c-1) (c-2) (c-3)

Fig. 6. Interactions of level-kvehicles at the four-way intersection. (a-1)-(a-

3) show three subsequent steps in a simulation where three level-1vehicles

interact with each other; (b-1)-(b-3) show steps of three level-2vehicles

interacting with each other; (c-1)-(c-3) show steps of a level-2vehicle (blue)

interacting with two level-1vehicles (yellow and red); v1,v2, and v3are the

speeds of, respectively, the blue, yellow, and red vehicles.

We remark that deadlocks (collisions) do not always occur

-15 -10 -5 0 5 10 15

-15

-10

-5

v1 = 5 m/s

v2 = 5 m/s

v3 = 5 m/s

-15 -10 -5 0 5 10 15

-15

-10

-5

v1 = 3 m/s

v2 = 3 m/s

v3 = 5 m/s

-15 -10 -5 0 5 10 15

-15

-10

-5

v1 = 0 m/s

v2 = 0 m/s

v3 = 0 m/s

-15 -10 -5 0 5 10 15

-15

-10

-5

v1 = 5 m/s

v2 = 5 m/s

v3 = 5 m/s

-15 -10 -5 0 5 10 15

-15

-10

-5

v1 = 4.5 m/s

v2 = 5 m/s

v3 = 5 m/s

-15 -10 -5 0 5 10 15

-15

-10

-5

v1 = 5 m/s

v2 = 5 m/s

v3 = 5 m/s

-15 -10 -5 0 5 10 15

-15

-10

-5

v1 = 5 m/s

v2 = 2 m/s

v3 = 5 m/s

-15 -10 -5 0 5 10 15

-15

-10

-5

v1 = 2.5 m/s

v2 = 5 m/s

v3 = 3.5 m/s

(a-1) (a-2) (a-3)

(b-1) (b-2) (b-3)

(c-1) (c-2) (c-3)

Fig. 7. Interactions of level-kvehicles at the T-shape intersection. (a-1)-(a-

3) show three subsequent steps in a simulation where three level-1vehicles

interact with each other; (b-1)-(b-3) show steps of three level-2vehicles

interacting with each other; (c-1)-(c-3) show steps of a level-2vehicle (blue)

interacting with two level-1vehicles (yellow and red); v1,v2, and v3are the

speeds of, respectively, the blue, yellow, and red vehicles.

in level-1(level-2) interactions. The initial conditions of

Figs. 6-8 are chosen to show such situations. For randomized

initial conditions, the rates of success, deﬁned as the pro-

portion of 2000 simulation episodes where neither deadlocks

nor collisions occur to the ego vehicle, for different numbers

of interacting vehicles and different combinations of level-k

policies at the three intersections are shown in Fig. 9. In Fig. 9,

“L-kcar in L-k0Env.” means the rate of success of a level-k

ego vehicle when interacting with other vehicles that are all

of level-k0; “L-kcar in Mix Env.” means the rate of success

of a level-kego vehicle when interacting with other vehicles

whose control policies are randomly chosen between level-1

and level-2with equal probability.

The following can be observed: 1) As the number of

interacting vehicles increases, the rate of success decreases

for all the cases. This is reasonable since a larger number of

interacting vehicles represents a more complex trafﬁc scenario.

2) The rates of success of a level-2ego vehicle when interact-

ing with other vehicles that are also of level-2are the lowest

among the results of all combinations of level-kpolicies. This

is also reasonable since when all the vehicles are aggressive

and assume the others yield, trafﬁc accidents are more likely to

occur. 3) Among the results of the three intersection types, the

rates of success for the roundabout intersection are the highest.

This illustrates the effective functionality of roundabouts in

reducing trafﬁc conﬂicts.

We further remark that although the high rates of failure of

“level-2versus level-2” are not desired in real-world trafﬁc,

it is important for a simulation environment for autonomous

vehicle control testing to include such cases that represent

rational interactions between aggressive vehicles. Note that

-15 -10 -5 0 5 10 15

-15

-10

-5

v1 = 5 m/s

v2 = 5 m/s

v3 = 5 m/s

-15 -10 -5 0 5 10 15

-15

-10

-5

v1 = 0.5 m/s

v2 = 5 m/s

v3 = 0 m/s

-15 -10 -5 0 5 10 15

-15

-10

-5

v1 = 4 m/s

v2 = 5 m/s

v3 = 3.5 m/s

-15 -10 -5 0 5 10 15

-15

-10

-5

v1 = 5 m/s

v2 = 5 m/s

v3 = 5 m/s

-15 -10 -5 0 5 10 15

-15

-10

-5

v1 = 5 m/s

v2 = 5 m/s

v3 = 5 m/s

-15 -10 -5 0 5 10 15

-15

-10

-5

v1 = 5 m/s

v2 = 5 m/s

v3 = 5 m/s

-15 -10 -5 0 5 10 15

-15

-10

-5

v1 = 5 m/s

v2 = 4 m/s

v3 = 0.5 m/s

-15 -10 -5 0 5 10 15

-15

-10

-5

v1 = 4.5 m/s

v2 = 3 m/s

v3 = 4.5 m/s

(a-1) (a-2) (a-3)

(b-1) (b-2) (b-3)

(c-1) (c-2) (c-3)

Fig. 8. Interactions of level-kvehicles at the roundabout intersection. (a-1)-

(a-3) show three subsequent steps in a simulation where three level-1vehicles

interact with each other; (b-1)-(b-3) show steps of three level-2vehicles

interacting with each other; (c-1)-(c-3) show steps of a level-2vehicle (blue)

interacting with two level-1vehicles (yellow and red); v1,v2, and v3are the

speeds of, respectively, the blue, yellow, and red vehicles.

a level-2vehicle is a rational decision maker that behaves

aggressively, which is fundamentally different from a vehicle

model that acts aggressively but in an irrational way, e.g.,

taking actions randomly. The cases of level-2vehicle inter-

actions provide challenging test scenarios for an autonomous

vehicle control system, which can be more realistic than those

provided by some worst-case (i.e., not necessarily rational)

models [40].

B. Evaluation and calibration of autonomous vehicle control

approaches

We test the two autonomous vehicle control approaches

described in Section V using our trafﬁc model.

For the ﬁrst approach of adaptive control based on level-

kmodels, we use the same sampling interval ∆t, action set

U, reward function including the weight vector w, planning

horizon N, and discount factor λas those used for the level-k

vehicle models. In the level estimation algorithm (15)-(17), we

consider the model set K={1,2}and the update step size

β= 0.6.

When training the explicit approximation ˆπato the policy πa

that is algorithmically determined by (14), we use the same

neural network architecture shown in Fig. 4. The accuracy

of the obtained ˆπain terms of matching πais 98.8% on

the training dataset and is 98.6% on a test dataset of 30%

additional data points that are not used for training.

Firstly, we simulate similar scenarios as those shown in

Figs. 6-8, but let the autonomous ego vehicle (blue) be

(a-1) (b-1)

(a-2) (b-2)

(a-3) (b-3)

Fig. 9. The rates of success of level-kpolicies. (a-1)-(a-3) show the rates

of success of a level-1ego vehicle operating in various trafﬁc environments

(various in the numbers and policies of interacting vehicles) at the four-way,

T-shape, and roundabout intersections; (b-1)-(b-3) show those of a level-2ego

vehicle; the bars in dark color represent the rates of success.

controlled by the adaptive control approach instead of level-

kpolicies. Figs. 10-12 show snapshots of the simulations. It

can be observed that the autonomous ego vehicle can resolve

the conﬂicts with the other two vehicles and safely drive

through the intersections although the other two vehicles are

controlled by varying policies. The bottom panels show the

level estimation histories of the simulations. It can be observed

that the autonomous ego vehicle can resolve the conﬂicts

because it successfully identiﬁes the level-kmodels of the

other two vehicles. Recall that vehicle jis identiﬁed as level-

1(level-2) when P(kj= 2) <0.5(P(kj= 2) ≥0.5).

The success of the adaptive control approach in situations

where level-kcontrol policies with ﬁxed kfail suggests

the signiﬁcance in autonomous vehicle control of intention

recognition and action prediction for the other vehicles. Note

that these two steps are achieved in our adaptive control

approach through the level estimates and the level-kmodels

of the other vehicles.

We then statistically evaluate and compare the two au-

tonomous vehicle control approaches. For the second approach

of rule-based control, we consider an acceleration set A=

{−5,−2.5,0,2.5}[m/s2] and an initial design of the threshold

value Rc= 14[m].

To cover a rich set of scenarios, we construct a larger trafﬁc

scene shown in Fig. 13, which models the road system of

-15 -10 -5 0 5 10 15

-15

-10

-5

v1 = 3.5 m/s

v2 = 5 m/s

v3 = 4.5 m/s

-15 -10 -5 0 5 10 15

-15

-10

-5

v1 = 5 m/s

v2 = 3 m/s

v3 = 1.5 m/s

-15 -10 -5 0 5 10 15

-15

-10

-5

v1 = 5 m/s

v2 = 1 m/s

v3 = 3 m/s

-15 -10 -5 0 5 10 15

-15

-10

-5

v1 = 3.5 m/s

v2 = 5 m/s

v3 = 5 m/s

-15 -10 -5 0 5 10 15

-15

-10

-5

v1 = 0 m/s

v2 = 5 m/s

v3 = 5 m/s

-15 -10 -5 0 5 10 15

-15

-10

-5

v1 = 3.5 m/s

v2 = 5 m/s

v3 = 5 m/s

-15 -10 -5 0 5 10 15

-15

-10

-5

v1 = 3.5 m/s

v2 = 5 m/s

v3 = 4.5 m/s

-15 -10 -5 0 5 10 15

-15

-10

-5

v1 = 3.5 m/s

v2 = 5 m/s

v3 = 3.5 m/s

-15 -10 -5 0 5 10 15

-15

-10

-5

v1 = 3 m/s

v2 = 5 m/s

v3 = 0 m/s

0 1 2 3 4 5 6 7

Time [s]

0.2

0.4

0.6

0.8

yellow car

red car

0 1 2 3 4 5 6 7

Time [s]

0.2

0.4

0.6

0.8

yellow car

red car

0 1 2 3 4 5 6 7

Time [s]

0.2

0.4

0.6

0.8

yellow car

red car

(a-1) (a-2) (a-3)

(b-1) (b-2) (b-3)

(c-1) (c-2) (c-3)

(a-4) (b-4) (c-4)

Fig. 10. Interactions of the autonomous ego vehicle (blue) controlled by the

adaptive control approach with level-kvehicles at the four-way intersection.

(a-1)-(a-3) show three subsequent steps in a simulation where the autonomous

ego vehicle interacts with two level-1vehicles, and (a-4) shows the time

histories of the two vehicles’ level estimates where P(2) = P(k= 2) denotes

the ego vehicle’s belief in the level-2model; (b-1)-(b-4) show those of the

autonomous ego vehicle interacting with two level-2vehicles; (c-1)-(c-4) show

those of the autonomous ego vehicle interacting with a level-1vehicle (red)

and a level-2vehicle (yellow); v1,v2, and v3are the speeds of, respectively,

the blue, yellow and red vehicles.

an urban area in Los Angeles and consists of one four-way

intersection, one roundabout, and two T-shape intersections.

We let an autonomous ego vehicle controlled by the adaptive

control approach or the rule-based control approach drive

through this trafﬁc scene. Apart from the autonomous ego

vehicle, we also put multiple other vehicles controlled by level-

kpolicies in the scene and let them drive through the scene

repeatedly. Their initial positions, lanes entering the scene, and

sequences of target lanes to traverse the scene are all randomly

chosen.

We evaluate the two control approaches based on two

statistical metrics: the rate of collision (CR) and the rate

of deadlock (DR). The rate of collision is deﬁned as the

proportion of 2000 simulation episodes where the autonomous

ego vehicle collides with another vehicle or with the road

boundaries. The rate of deadlock is deﬁned as the proportion

of 2000 simulation episodes where no collision occurs to

the autonomous ego vehicle but it fails to drive through the

scene in 300[s] of simulation time. We consider three trafﬁc

models: 1) all of the other vehicles are level-1, called a “level-

-15 -10 -5 0 5 10 15

-15

-10

-5

v1 = 3.5 m/s

v2 = 3.5 m/s

v3 = 5 m/s

-15 -10 -5 0 5 10 15

-15

-10

-5

v1 = 4 m/s

v2 = 0 m/s

v3 = 3.5 m/s

-15 -10 -5 0 5 10 15

-15

-10

-5

v1 = 3.5 m/s

v2 = 5 m/s

v3 = 3 m/s

-15 -10 -5 0 5 10 15

-15

-10

-5

v1 = 3.5 m/s

v2 = 5 m/s

v3 = 5 m/s

-15 -10 -5 0 5 10 15

-15

-10

-5

v1 = 0.5 m/s

v2 = 5 m/s

v3 = 5 m/s

-15 -10 -5 0 5 10 15

-15

-10

-5

v1 = 3 m/s

v2 = 5 m/s

v3 = 5 m/s

-15 -10 -5 0 5 10 15

-15

-10

-5

v1 = 3.5 m/s

v2 = 5 m/s

v3 = 5 m/s

-15 -10 -5 0 5 10 15

-15

-10

-5

v1 = 3.5 m/s

v2 = 5 m/s

v3 = 0 m/s

-15 -10 -5 0 5 10 15

-15

-10

-5

v1 = 3 m/s

v2 = 5 m/s

v3 = 3 m/s

0 2 4 6 8

Time [s]

0.2

0.4

0.6

0.8

yellow car

red car

0 2 4 6 8

Time [s]

0.2

0.4

0.6

0.8

yellow car

red car

0 2 4 6 8

Time [s]

0.2

0.4

0.6

0.8

yellow car

red car

(a-1) (a-2) (a-3)

(b-1) (b-2) (b-3)

(c-1) (c-2) (c-3)

(a-4) (b-4) (c-4)

Fig. 11. Interactions of the autonomous ego vehicle (blue) controlled by the

adaptive control approach with level-kvehicles at the T-shape intersection. (a-

1)-(a-3) show three subsequent steps in a simulation where the autonomous

ego vehicle interacts with two level-1vehicles, and (a-4) shows the time

histories of the two vehicles’ level estimates where P(2) = P(k= 2) denotes

the ego vehicle’s belief in the level-2model; (b-1)-(b-4) show those of the

autonomous ego vehicle interacting with two level-2vehicles; (c-1)-(c-4) show

those of the autonomous ego vehicle interacting with a level-1vehicle (red)

and a level-2vehicle (yellow); v1,v2, and v3are the speeds of, respectively,

the blue, yellow and red vehicles.

1 environment,” 2) all of the other vehicles are level-2, called a

“level-2 environment,” and 3) the control policy of each of the

other vehicles is randomly chosen between level-1and level-2

with equal probability, called a “mixed environment.”

The CR and DR results of the adaptive control approach

and the rule-based control approach for different numbers of

other vehicles in the scene are shown in Figs. 14 and 15.

The number of other vehicles, nv, represents trafﬁc density,

roughly, 2.87nv[vehicles/mile] (the total length of the roads

is about 560 [m]).

From Fig. 14 it can be observed that, for the adaptive

control approach, the CR and DR increase as the trafﬁc density

increases, which is reasonable. In particular, the increase in CR

slows down as the number of other vehicles goes beyond 20.

Among the results for different trafﬁc models, the CR and DR

for the level-1 environment are the lowest and those for the

level-2 environment are the highest. This is also reasonable

since the level-1 environment, composed of level-1vehicles,

represents a cautious/conservative trafﬁc model, the level-2

environment represents an aggressive trafﬁc model and is thus

most challenging for the autonomous ego vehicle, while the

mixed environment lies in between. Furthermore, the results

for the adaptive control approach are less sensitive to changes

in trafﬁc models than those for level-kpolicies with ﬁxed

-15 -10 -5 0 5 10 15

-15

-10

-5

v1 = 4 m/s

v2 = 4 m/s

v3 = 5 m/s

-15 -10 -5 0 5 10 15

-15

-10

-5

v1 = 5 m/s

v2 = 5 m/s

v3 = 3 m/s

-15 -10 -5 0 5 10 15

-15

-10

-5

v1 = 5 m/s

v2 = 4.5 m/s

v3 = 5 m/s

-15 -10 -5 0 5 10 15

-15

-10

-5

v1 = 4 m/s

v2 = 5 m/s

v3 = 5 m/s

-15 -10 -5 0 5 10 15

-15

-10

-5

v1 = 2 m/s

v2 = 5 m/s

v3 = 5 m/s

-15 -10 -5 0 5 10 15

-15

-10

-5

v1 = 5 m/s

v2 = 5 m/s

v3 = 5 m/s

-15 -10 -5 0 5 10 15

-15

-10

-5

v1 = 4 m/s

v2 = 5 m/s

v3 = 5 m/s

-15 -10 -5 0 5 10 15

-15

-10

-5

v1 = 0 m/s

v2 = 3.5 m/s

v3 = 1.5 m/s

-15 -10 -5 0 5 10 15

-15

-10

-5

v1 = 2.5 m/s

v2 = 5 m/s

v3 = 1.5 m/s

0 2 4 6 8

Time [s]

0.2

0.4

0.6

0.8

yellow car

red car

0 2 4 6 8

Time [s]

0.2

0.4

0.6

0.8

yellow car

red car

0 2 4 6 8

Time [s]

0.2

0.4

0.6

0.8

yellow car

red car

(a-1) (a-2) (a-3)

(b-1) (b-2) (b-3)

(c-1) (c-2) (c-3)

(a-4) (b-4) (c-4)

Fig. 12. Interactions of the autonomous ego vehicle (blue) controlled by the

adaptive control approach with level-kvehicles at the roundabout intersection.

(a-1)-(a-3) show three subsequent steps in a simulation where the autonomous

ego vehicle interacts with two level-1vehicles, and (a-4) shows the time

histories of the two vehicles’ level estimates where P(2) = P(k= 2) denotes

the ego vehicle’s belief in the level-2model; (b-1)-(b-4) show those of the

autonomous ego vehicle interacting with two level-2vehicles; (c-1)-(c-4) show

those of the autonomous ego vehicle interacting with a level-1vehicle (red)

and a level-2vehicle (yellow); v1,v2, and v3are the speeds of, respectively,

the blue, yellow and red vehicles.

kshown in Fig. 9. This shows again the signiﬁcance of

adaptation of autonomous vehicle control strategy to other

vehicles’ intentions and actions. Note that the rate of success

for a single intersection of the adaptive control approach, if

computed as 1−CR+DR

4, is close to that of “L-1car in L-2

Env.” and that of “L-2car in L-1Env.,” which represent the

best performance of level-kpolicies.

For the rule-based control approach, it can be observed

from Fig. 15 that as the trafﬁc density increases, the CR ﬁrst

increases and then decreases, while the DR keeps increasing.

The decrease in CR when the trafﬁc becomes very dense is

due to the constant yielding of the autonomous ego vehicle to

other vehicles, which causes the dramatic increase in DR.

Comparing the results of the two approaches, the adaptive

control approach performs better than the rule-based control

approach in the above experiments. This is attributed to the

more sophisticated and complicated algorithm behind the

adaptive control approach. However, the rule-based control is

more interpretable (e.g., the reason for the decrease in CR is

easily understood), and is easier to calibrate.

We show in Fig. 16 two informative cases observed in our

(a) (b)

Fig. 13. Trafﬁc scene for evaluating autonomous vehicle control approaches.

(a) shows an urban area in Los Angeles (provided by Google Maps) and (b)

shows the model of the road system in (a).

0 5 10 15 20 25

Num. of cars

0.2

0.4

0.6

L-1 Env.

L-2 Env.

Mix Env.

0 5 10 15 20 25

Num. of cars

0.2

0.4

0.6

L-1 Env.

L-2 Env.

Mix Env.

(a) (b)

Fig. 14. Evaluation results of the adaptive control approach: (a) the rate of

collision (CR) and (b) the rate of deadlock (DR) versus different number of

environmental vehicles and different trafﬁc models.

simulations. In the ﬁrst case in Fig. 16(a), the autonomous

ego vehicle (blue) controlled by the adaptive control approach

and the level-1vehicle (yellow) on its left both yield to

the other and cause a deadlock. Note that a level-1vehicle

represents a vehicle with a cautious/conservative driver, and

accordingly, yields to the autonomous ego vehicle. Although

the autonomous ego vehicle eventually decides to proceed

ahead and successfully drives through the roundabout, it takes

too long for such a conﬂict to be resolved, and thus this

scenario falls into our DR category. To avoid such deadlock

scenarios, the autonomous ego vehicle may need to identify

the driving style of the opponent vehicle faster, which may be

achieved through a larger update step size β. In the second

case in Fig. 16(b), the autonomous ego vehicle controlled by

the rule-based control approach stops in the roundabout to

yield to the yellow vehicle on its right and within the critical

distance Rc(marked by the red dashed circle). However,

because the gap between the autonomous ego vehicle and the

yellow vehicle is still quite large, the red vehicle on the left of

the autonomous ego vehicle expects it to proceed and thus does

not slow down, which causes a collision. This scenario shows

that a larger critical distance Rcmay not always correspond to

a safer driving behavior. Such corner cases identiﬁed by our

simulations can inform test trajectory design for autonomous

vehicles.

We now optimize the threshold value Rcin the rule-based

control approach to achieve better performance deﬁned by a

0 5 10 15 20 25

Num. of cars

0.2

0.4

0.6

L-1 Env.

L-2 Env.

Mix Env.

0 5 10 15 20 25

Num. of cars

0.2

0.4

0.6

L-1 Env.

L-2 Env.

Mix Env.

(a) (b)

Fig. 15. Evaluation results of the rule-based control approach with Rc=

14 [m]: (a) the rate of collision (CR) and (b) the rate of deadlock (DR) versus

different number of environmental vehicles and different trafﬁc models.

(a) (b)

Fig. 16. Failure cases. (a) shows a scenario where the autonomous ego vehicle

(blue) controlled by the adaptive control approach gets stuck at the entrance

of the roundabout due to the level-1vehicle (yellow) on its left. (b) shows a

scenario where the autonomous ego vehicle (blue) controlled by the rule-based

control approach gets hit by the level-2vehicle (red) on its left.

performance index as follows:

J=1

nmax

n=1 wcφc(Sn) + wdφd(Sn) + wv

φs(Sn)

¯v(Sn) + ,

(20)

where Sndenotes the nth simulation episode; the φc(Sn),

φd(Sn), and φs(Sn)are indicator functions, taking 1if,

respectively, a collision occurs to the autonomous ego vehi-

cle, no collision but a deadlock occurs to the autonomous

ego vehicle, and neither collision nor deadlock occur and

the autonomous ego vehicle successfully drives through the

scene in 300[s] of simulation time in the nth simulation

episode, and taking 0otherwise; ¯v(Sn)is the average speed

of the autonomous ego vehicle in the nth simulation episode;

wc, wd, wv≥0are weighting factors, and  > 0is a constant

to adjust the shape of the function with respect to the average

speed ¯v(Sn)and to avoid the denominator being 0.

The performance index function (20) imposes penalties for

collisions and deadlocks through the ﬁrst two terms, and

rewards higher average speeds through the last term. Note

that the last term is designed in such a way that the penalty

increases fast for decrease in speed values that are already

very low, and decreases slowly for increase in speed values

that are already very high. In obtaining the following results,

we run simulations in the same scene shown in Fig. 13 and

with 15 other vehicles, and we use wc= 10,wd= 5,wv= 1,

and = 0.1.

We plot the values of (20) for different values of Rcin

Fig. 17. Speciﬁcally, for each value of Rc, we run nmax =

2000 simulation episodes and calculate the value of (20) based

on the simulation results. Lower values of (20) represent better

performance in terms of having less collisions, less deadlocks,

and higher average travel speeds.

6 8 10 12 14 16

Rc [m]

L-1 Env.

L-2 Env.

Mix Env.

Fig. 17. Performance index Jas function of Rcof the rule-based control

approach with different trafﬁc models.

In Fig. 17, the blue curve represents the result when the

autonomous ego vehicle operates in the level-1environment.

It can be observed that the performance is good when Rc

takes very small values, i.e., in the range of [6,7.5][m]. This

is because small Rccorresponds to aggressive behavior and

the level-1environment represents a conservative trafﬁc model,

thus, the other vehicles almost always yield to the autonomous

ego vehicle when there is a conﬂict. Since the autonomous ego

vehicle proceeds ahead while the other vehicles yield, colli-

sions and deadlocks are avoided. However, when operating

in the level-2or mixed environment, small Rcleads to poor

performance. This is because both the autonomous ego vehicle

and the other vehicles behave aggressively and cause many

collisions. When Rctakes values in the range of [7.5,11][m],

the performance is the worst for all of the three trafﬁc models.

This is because such Rcvalues correspond to behaviors in

between aggressive and conservative, which cause collisions

with both aggressive and conservative interacting vehicles. The

range [11.5,13][m] is suitable for choosing the value of Rc,

where the performance is good and insensitive to changes in

the trafﬁc models. For larger Rcvalues, the autonomous ego

vehicle becomes overly conservative and almost always yields

to the other vehicles, which causes it difﬁculties to enter the

intersections and leads to many deadlocks.

VII. CONCLUSION

In this paper, we described a framework based on level-k

game theory for modeling trafﬁc consisting of heterogeneous

(in terms of their driving styles) and interactive vehicles

in urban environments with unsignalized intersections. An

algorithm integrating the level-k decision-making formalism,

receding-horizon optimization, and imitation learning was

proposed and used to solve for level-kcontrol policies.

The developed trafﬁc models are useful as simulation envi-

ronments for veriﬁcation and validation of autonomous vehicle

control systems. In particular, we considered two autonomous

vehicle control approaches as case studies: an adaptive control

approach based on level-kvehicle models and a rule-based

control approach. We analyzed their characteristics and evalu-

ated their performance based on their testing results with our

trafﬁc models, and then optimized the parameters of the rule-

based approach based on a performance index.

We envision that trafﬁc models developed using the frame-

work proposed in this paper can also be integrated with urban

trafﬁc/driving simulators with higher-ﬁdelity car dynamics and

environmental representations, such as CARLA [41], using an

approach similar to that of [42], to create more realistic urban

trafﬁc simulations and support autonomous driving system

development.

REFERENCES

[1] D. J. Fagnant and K. Kockelman, “Preparing a nation

for autonomous vehicles: opportunities, barriers and policy

recommendations,” Transportation Research Part A: Policy and

Practice, vol. 77, pp. 167 – 181, 2015. [Online]. Available:

http://www.sciencedirect.com/science/article/pii/S0965856415000804

[2] U.S. Department of Transportations National Highway Trafﬁc Safety

Administration, “Automated Vehicles for Safety,” Tech. Rep., Avail-

able: https://www.nhtsa.gov/technology-innovation/automated-vehicles-

safety [June 18, 2019].

[3] J. Meyer, H. Becker, P. M. Bsch, and K. W. Axhausen, “Autonomous

vehicles: The next jump in accessibilities?” Research in Transportation

Economics, vol. 62, pp. 80 – 91, 2017. [Online]. Available:

http://www.sciencedirect.com/science/article/pii/S0739885917300021

[4] N. Kalra and S. M. Paddock, “Driving to safety: How many miles of

driving would it take to demonstrate autonomous vehicle reliability?”

Transportation Research Part A: Policy and Practice, vol. 94, pp. 182

– 193, 2016.

[5] J. Zhou and L. del Re, “Reduced complexity safety testing for adas &

adf,” IFAC-PapersOnLine, vol. 50, no. 1, pp. 5985–5990, 2017.

[6] H. Waschl, I. Kolmanovsky, and F. Willems, Control Strategies for Ad-

vanced Driver Assistance Systems and Autonomous Driving Functions.

Springer, 2019.

[7] C. Hubmann, J. Schulz, M. Becker, D. Althoff, and C. Stiller, “Au-

tomated driving in uncertain environments: Planning with interaction

and uncertain maneuver prediction,” IEEE Transactions on Intelligent

Vehicles, vol. 3, no. 1, pp. 5–17, March 2018.

[8] T. Bandyopadhyay, K. S. Won, E. Frazzoli, D. Hsu, W. S. Lee, and

D. Rus, “Intention-aware motion planning,” in Algorithmic Foundations

of Robotics X, E. Frazzoli, T. Lozano-Perez, N. Roy, and D. Rus, Eds.

Berlin, Heidelberg: Springer Berlin Heidelberg, 2013, pp. 475–491.

[9] C. Hubmann, J. Schulz, G. Xu, D. Althoff, and C. Stiller, “A belief state

planner for interactive merge maneuvers in congested trafﬁc,” in 2018

21st International Conference on Intelligent Transportation Systems

(ITSC), Nov 2018, pp. 1617–1624.

[10] N. Li, A. Girard, and I. Kolmanovsky, “Stochastic predictive control for

partially observable Markov decision processes with time-joint chance

constraints and application to autonomous vehicle control,” Journal of

Dynamic Systems, Measurement, and Control, vol. 141, no. 7, p. 071007,

2019.

[11] W. Schwarting, J. Alonso-Mora, L. Paull, S. Karaman, and D. Rus, “Safe

nonlinear trajectory generation for parallel autonomy with a dynamic ve-

hicle model,” IEEE Transactions on Intelligent Transportation Systems,

vol. 19, no. 9, pp. 2994–3008, Sep. 2018.

[12] G. Cesari, G. Schildbach, A. Carvalho, and F. Borrelli, “Scenario model

predictive control for lane change assistance and autonomous driving on

highways,” IEEE Intelligent Transportation Systems Magazine, vol. 9,

no. 3, pp. 23–35, Fall 2017.

[13] M. Bahram, A. Lawitzky, J. Friedrichs, M. Aeberhard, and D. Woll-

herr, “A game-theoretic approach to replanning-aware interactive scene

prediction and planning,” IEEE Transactions on Vehicular Technology,

vol. 65, no. 6, pp. 3981–3992, June 2016.

[14] D. Sadigh, S. Sastry, S. A. Seshia, and A. D. Dragan, “Planning for

autonomous cars that leverage effects on human actions.” in Robotics:

Science and Systems, vol. 2. Ann Arbor, MI, USA, 2016.

[15] J. F. Fisac, E. Bronstein, E. Stefansson, D. Sadigh, S. S. Sastry, and

A. D. Dragan, “Hierarchical game-theoretic planning for autonomous

vehicles,” in 2019 International Conference on Robotics and Automation

(ICRA). IEEE, 2019, pp. 9590–9596.

[16] H. Yu, H. E. Tseng, and R. Langari, “A human-like game theory-based

controller for automatic lane changing,” Transportation Research Part

C: Emerging Technologies, vol. 88, pp. 140–158, 2018.

[17] A. Dreves and M. Gerdts, “A generalized Nash equilibrium approach

for optimal control problems of autonomous cars,” Optimal Control

Applications and Methods, vol. 39, no. 1, pp. 326–342, 2018. [Online].

Available: https://onlinelibrary.wiley.com/doi/abs/10.1002/oca.2348

[18] C. Vallon, Z. Ercan, A. Carvalho, and F. Borrelli, “A machine learning

approach for personalized autonomous lane change initiation and con-

trol,” in 2017 IEEE Intelligent Vehicles Symposium (IV), June 2017, pp.

1590–1595.

[19] H. Xu, Y. Gao, F. Yu, and T. Darrell, “End-to-end learning of driving

models from large-scale video datasets,” in Proceedings of the IEEE

conference on computer vision and pattern recognition, 2017, pp. 2174–

2182.

[20] N. Li, D. W. Oyler, M. Zhang, Y. Yildiz, I. Kolmanovsky, and A. R.

Girard, “Game theoretic modeling of driver and vehicle interactions for

veriﬁcation and validation of autonomous vehicle control systems,” IEEE

Transactions on Control Systems Technology, vol. 26, no. 5, pp. 1782–

1797, Sep. 2018.

[21] Federal Highway Administration, “Intersection safety needs identiﬁca-

tion report,” Tech. Rep., Available: https://safety.fhwa.dot.gov /intersec-

tion/other topics/needsidrpt/needsidrpt.pdf [Jun. 20, 2019].

[22] ——, “Unsignalized intersections,” Tech. Rep., Available: https://

safety.fhwa.dot.gov/intersection/conventional/unsignalized/ [Jun. 20,

2019].

[23] M. Bouton, A. Cosgun, and M. J. Kochenderfer, “Belief state planning

for autonomously navigating urban intersections,” in 2017 IEEE Intelli-

gent Vehicles Symposium (IV), June 2017, pp. 825–830.

[24] D. Isele, R. Rahimi, A. Cosgun, K. Subramanian, and K. Fujimura,

“Navigating occluded intersections with autonomous vehicles using deep

reinforcement learning,” in 2018 IEEE International Conference on

Robotics and Automation (ICRA), May 2018, pp. 2034–2039.

[25] S. Pruekprasert, J. Dubut, X. Zhang, C. Huang, and M. Kishida, “A

game theoretic approach to decision making for multiple vehicles at

roundabout,” arXiv preprint arXiv:1904.06224, 2019.

[26] S. Pruekprasert, X. Zhang, J. Dubut, C. Huang, and M. Kishida,

“Decision making for autonomous vehicles at unsignalized intersection

in presence of malicious vehicles,” arXiv preprint arXiv:1904.10158,

2019.

[27] N. Li, Y. Yao, I. V. Kolmanovsky, E. M. Atkins, and A. Girard,

“Game-theoretic modeling of multi-vehicle interactions at uncontrolled

intersections,” CoRR, vol. abs/1904.05423, 2019. [Online]. Available:

http://arxiv.org/abs/1904.05423

[28] D. O. Stahl and P. W. Wilson, “On players’ models of other players:

Theory and experimental evidence,” Games and Economic Behavior,

vol. 10, no. 1, pp. 218–254, 1995.

[29] M. A. Costa-Gomes and V. P. Crawford, “Cognition and behavior in two-

person guessing games: An experimental study,” American Economic

Review, vol. 96, no. 5, pp. 1737–1768, Dec. 2006.

[30] N. Li, I. Kolmanovsky, A. Girard, and Y. Yildiz, “Game theoretic

modeling of vehicle interactions at unsignalized intersections and appli-

cation to autonomous vehicle control,” in 2018 Annual American Control

Conference (ACC), June 2018, pp. 3215–3220.

[31] R. Tian, S. Li, N. Li, I. Kolmanovsky, A. Girard, and Y. Yildiz, “Adap-

tive game-theoretic decision making for autonomous vehicle control

at roundabouts,” in 2018 IEEE Conference on Decision and Control

(CDC), Dec 2018, pp. 321–326.

[32] A. Y. Ng, S. J. Russell, et al., “Algorithms for inverse reinforcement

learning,” in International Conference on Machine Learning, 2000.

[33] B. D. Ziebart, A. Maas, J. A. Bagnell, and A. K. Dey, “Maximum en-

tropy inverse reinforcement learning,” in AAAI Conference on Artiﬁcial

Intelligence, 2008.

[34] L. Claussmann, A. Carvalho, and G. Schildbach, “A path planner for

autonomous driving on highways using a human mimicry approach with

binary decision diagrams,” in 2015 European Control Conference (ECC).

IEEE, 2015, pp. 2976–2982.

[35] F. Codevilla, M. Miiller, A. L´

opez, V. Koltun, and A. Dosovitskiy,

“End-to-end driving via conditional imitation learning,” in 2018 IEEE

International Conference on Robotics and Automation (ICRA), May

2018, pp. 1–9.

[36] L. Sun, C. Peng, W. Zhan, and M. Tomizuka, “A fast integrated

planning and control framework for autonomous driving via imitation

learning,” in ASME 2018 Dynamic Systems and Control Conference.

American Society of Mechanical Engineers, 2018, pp. V003T37A012–

V003T37A012.

[37] S. Ross, G. Gordon, and D. Bagnell, “A reduction of imitation learning

and structured prediction to no-regret online learning,” in Proceedings

of the fourteenth international conference on artiﬁcial intelligence and

statistics, 2011, pp. 627–635.

[38] K. Fitzpatrick, M. D. Wooldridge, and J. D. Blaschke, Feb 2005, ch.

Urban Intersection Design Guide: Volume 1 - Guidelines, Tech Report.

[39] M. A. Costa-Gomes, N. Iriberri, and V. P. Crawford, “Comparing models

of strategic thinking in Van Huyck, Battalio, and Beil’s coordination

games,” Journal of the European Economic Association, vol. 7, no. 2/3,

pp. 365–376, 2009.

[40] G. Chou, Y. E. Sahin, L. Yang, K. J. Rutledge, P. Nilsson, and N. Ozay,

“Using control synthesis to generate corner cases: A case study on

autonomous driving,” IEEE Transactions on Computer-Aided Design of

Integrated Circuits and Systems, vol. 37, no. 11, pp. 2906–2917, 2018.

[41] A. Dosovitskiy, G. Ros, F. Codevilla, A. Lopez, and V. Koltun,

“CARLA: An open urban driving simulator,” in Proceedings of the 1st

Annual Conference on Robot Learning, 2017, pp. 1–16.

[42] G. Su, N. Li, Y. Yildiz, A. Girard, and I. Kolmanovsky, “A trafﬁc sim-

ulation model with interactive drivers and high-ﬁdelity car dynamics,”

IFAC-PapersOnLine, vol. 51, no. 34, pp. 384–389, 2019.

Ran Tian received his B.S. degree in aerospace

engineering from the University of Michigan, Ann

Arbor, MI, USA, in 2016, and his B.S. degree in

mechanical engineering from the Shanghai Jiao Tong

University, Shanghai, China, in 2017. He received

his M.S. degree in Robotics from the University

of Michigan, Ann Arbor, MI, USA, in 2019. His

current research interests include decision-making

under uncertainty and human-robot interaction.

Nan Li received the B.S. degree in automotive engi-

neering from Tongji University, Shanghai, China, in

2014, and the M.S. degree in mechanical engineering

from the University of Michigan, Ann Arbor, MI,

USA, in 2016, where he is pursuing the Ph.D. degree

in aerospace engineering. His current research inter-

ests are stochastic control and multi-agent systems.

Ilya Kolmanovsky is a professor in the depart-

ment of aerospace engineering at the University of

Michigan, with research interests in control theory

for systems with state and control constraints, and

in control applications to aerospace and automotive

systems. He received his Ph.D. degree from the

University of Michigan in 1995.

Yildiray Yildiz is an assistant professor at Bilkent

University, Ankara. He received his B.S. degree

(valedictorian) in mechanical engineering from Mid-

dle East Technical University, Ankara in 2002; M.S.

degree in mechatronics from Sabanci University,

Istanbul, in 2004; and Ph.D. degree in mechanical

engineering with a mathematics minor from MIT

in 2009. He held postdoctoral associate and asso-

ciate scientist positions with NASA Ames Research

Center, California, employed by the University of

California, Santa Cruz, through its University Afﬁl-

iated Research Center, from 2009 to 2010 and 2010 to 2014, respectively.

He is the recipient of NASA Honor Award, Young Scientist Award from

Science Academy of Turkey, Young Scientist Award from Turkish Academy

of Sciences, Research Incentive Award from Prof. Mustafa Parlar Education

and Research Foundation, and best student conference paper award from

ASME. He is an IEEE Senior Member and currently serving as an associate

editor for IEEE Control Systems Magazine and European Journal of Control.

His research interests include control, machine learning, game theory, and

applications of these ﬁelds for modeling and control of automotive and

aerospace systems.

Anouck R. Girard received the Ph.D. degree in

ocean engineering from the University of California,

Berkeley, CA, USA, in 2002. She has been with the

University of Michigan, Ann Arbor, MI, USA, since

2006, where she is currently an Associate Professor

of Aerospace Engineering. She has co-authored the

book Fundamentals of Aerospace Navigation and

Guidance (Cambridge University Press, 2014). Her

current research interests include ﬂight dynamics

and control systems. Dr. Girard was a recipient of

the Silver Shaft Teaching Award from the University

of Michigan and a Best Student Paper Award from the American Society of

Mechanical Engineers.

ResearchGate has not been able to resolve any citations for this publication.

A Belief State Planner for Interactive Merge Maneuvers in Congested Traffic

Conference Paper

Full-text available

Nov 2018

Using Control Synthesis to Generate Corner Cases: A Case Study on Autonomous Driving

Article

Full-text available

Oct 2018

This paper employs correct-by-construction control synthesis, in particular controlled invariant set computations, for falsification. Our hypothesis is that if it is possible to compute a “large enough" controlled invariant set either for the actual system model or some simplification of the system model, interesting corner cases for other control designs can be generated by sampling initial conditions from the boundary of this controlled invariant set. Moreover, if falsifying trajectories for a given control design can be found through such sampling, then the controlled invariant set can be used as a supervisor to ensure safe operation of the control design under consideration. In addition to interesting initial conditions, which are mostly related to safety violations in transients, we use solutions from a dual game, a reachability game for the safety specification, to find falsifying inputs. We also propose optimization-based heuristics for input generation for cases when the state is outside the winning set of the dual game. To demonstrate the proposed ideas, we consider case studies from basic autonomous driving functionality, in particular, adaptive cruise control and lane keeping. We show how the proposed technique can be used to find interesting falsifying trajectories for classical control designs like proportional controllers, proportional integral controllers and model predictive controllers, as well as an open source real-world autonomous driving package.

Game-Theoretic Modeling of Multi-Vehicle Interactions at Uncontrolled Intersections

Article

Oct 2020

Motivated by the need for simulation tools for testing, verification and validation of autonomous driving systems that operate in traffic consisting of both autonomous and human-driven vehicles, we propose a game-theoretic framework for modeling the interactive behavior of vehicles at uncontrolled intersections. The proposed vehicle interaction model is based on a novel formulation of dynamic games with multiple concurrent leader-follower pairs, induced from common traffic rules. Based on simulation results for various intersection scenarios, we show that the model exhibits reasonable behavior expected in traffic, including the capability of reproducing scenarios extracted from real-world traffic data and reasonable performance in resolving traffic conflicts. The model is further validated based on the level-of-service traffic quality rating system and demonstrates manageable computational complexity compared to traditional multi-player game-theoretic models.

Decision Making for Autonomous Vehicles at Unsignalized Intersection in Presence of Malicious Vehicles

Conference Paper

Oct 2019

Hierarchical Game-Theoretic Planning for Autonomous Vehicles

Conference Paper

May 2019

Stochastic Predictive Control for Partially Observable Markov Decision Processes With Time-Joint Chance Constraints and Application to Autonomous Vehicle Control

Article

Mar 2019

This paper describes a stochastic predictive control algorithm for partially observable Markov decision processes (POMDPs) with time-joint chance constraints. We first present the algorithm as a general tool to treat finite-space POMDP problems with time-joint chance constraints together with its theoretical properties. We then discuss its application to autonomous vehicle control on highways. In particular, we model decision-making/behavior-planning for an autonomous vehicle accounting for safety in a dynamic and uncertain environment as a constrained POMDP problem and solve it using the proposed algorithm. After behavior is planned, we use nonlinear model predictive control to execute the behavior commands generated from the planner. This two-layer control framework is shown to be effective by simulations.

Adaptive Game-Theoretic Decision Making for Autonomous Vehicle Control at Roundabouts

Conference Paper

Sep 2018

In this paper, we propose a decision making algorithm for autonomous vehicle control at a roundabout intersection. The algorithm is based on a game-theoretic model representing the interactions between the ego vehicle and an opponent vehicle, and adapts to an online estimated driver type of the opponent vehicle. Simulation results are reported.

End-to-End Driving Via Conditional Imitation Learning

Conference Paper

May 2018

Navigating Occluded Intersections with Autonomous Vehicles Using Deep Reinforcement Learning

Conference Paper

May 2018

A human-like game theory-based controller for automatic lane changing

Article

Mar 2018
TRANSPORT RES C-EMER

Lane changing is a critical task for autonomous driving, especially in heavy traffic. Numerous automatic lane-changing algorithms have been proposed. However, surrounding vehicles are usually treated as moving obstacles without considering the interaction between vehicles/drivers. This paper presents a game theory-based lane-changing model, which mimics human behavior by interacting with surrounding drivers using the turn signal and lateral moves. The aggressiveness of the surrounding vehicles/drivers is estimated based on their reactions. With this model, the controller is capable of extracting information and learning from the interaction in real time. As such, the optimal timing and acceleration for changing lanes with respect to a variety of aggressiveness in target lane vehicle behavior are found accordingly. The game theory-based controller was tested in Simulink and dSPACE. Scenarios were designed so that a vehicle controlled by a game theory-based controller could interact with vehicles controlled by both robot and human drivers. Test results show that the game theory-based controller is capable of changing lanes in a human-like manner and outperforms fixed rule-based controllers.

Game-theoretic Modeling of Traffic in Unsignalized Intersection Network for Autonomous Vehicle Control Verification and Validation

Abstract and Figures

Recommended publications

What humanlike errors do autonomous vehicles need to avoid to maximize safety?

Game-Theoretic Modeling of Traffic in Unsignalized Intersection Network for Autonomous Vehicle Contr...

Decision making in dynamic and interactive environments based on cognitive hierarchy theory, Bayesia...

Interaction-Aware Trajectory Prediction and Planning for Autonomous Vehicles in Forced Merge Scenari...

Game-Theoretic Modeling of Multi-Vehicle Interactions at Uncontrolled Intersections