PreprintPDF Available

Game-theoretic Modeling of Traffic in Unsignalized Intersection Network for Autonomous Vehicle Control Verification and Validation

Authors:
Preprints and early-stage research may not have been peer reviewed yet.

Abstract and Figures

For a foreseeable future, autonomous vehicles (AVs) will operate in traffic together with human-driven vehicles. The AV planning and control systems need extensive testing, including early-stage testing in simulations where the interactions among autonomous/human-driven vehicles are represented. Motivated by the need for such simulation tools, we propose a game-theoretic approach to modeling vehicle interactions, in particular, for urban traffic environments with unsignalized intersections. We develop traffic models with heterogeneous (in terms of their driving styles) and interactive vehicles based on our proposed approach, and use them for virtual testing, evaluation, and calibration of AV control systems. For illustration, we consider two AV control approaches, analyze their characteristics and performance based on the simulation results with our developed traffic models, and optimize the parameters of one of them.
Content may be subject to copyright.
1
Game-theoretic Modeling of Traffic in Unsignalized
Intersection Network for Autonomous Vehicle
Control Verification and Validation
Ran Tian, Nan Li, Ilya Kolmanovsky, Yildiray Yildiz, and Anouck Girard
Abstract—For a foreseeable future, autonomous vehicles (AVs)
will operate in traffic together with human-driven vehicles. The
AV planning and control systems need extensive testing, including
early-stage testing in simulations where the interactions among
autonomous/human-driven vehicles are represented. Motivated
by the need for such simulation tools, we propose a game-
theoretic approach to modeling vehicle interactions, in particular,
for urban traffic environments with unsignalized intersections.
We develop traffic models with heterogeneous (in terms of their
driving styles) and interactive vehicles based on our proposed
approach, and use them for virtual testing, evaluation, and
calibration of AV control systems. For illustration, we consider
two AV control approaches, analyze their characteristics and
performance based on the simulation results with our developed
traffic models, and optimize the parameters of one of them.
I. INTRODUCTION
Autonomous driving technologies have greatly advanced
in recent years with the promise of providing safer, more
efficient, environment-friendly, and easily accessible trans-
portation [1]–[3]. To fulfill such a commitment requires devel-
oping advanced planning and control algorithms to navigate
autonomous vehicles, as well as comprehensive testing pro-
cedures to verify their safety and performance characteristics
[4]–[6]. It is estimated based on the collision fatalities rate that
to confidently verify an autonomous vehicle control system,
hundreds of millions of miles need to be driven [4], which
can be highly time and resource consuming if these driving
tests are all conducted in the physical world. Therefore, an
alternative solution is to use simulation tools to conduct early-
stage testing and evaluation in a virtual world. The work
of this paper is motivated by the need for virtual testing of
autonomous vehicle control systems.
In the near to medium term, autonomous vehicles are
expected to operate in traffic together with human-driven
vehicles. Therefore, accounting for the interactions among
autonomous/human-driven vehicles is important to achieve
safe and efficient driving behavior of an autonomous vehicle.
Control strategies for autonomous vehicles that account for
vehicle interactions include the ones based on Markov decision
This research has been supported by the National Science Foundation award
CNS 1544844.
Ran Tian, Nan Li, Ilya Kolmanovsky, and Anouck Girard are
with the Department of Aerospace Engineering, University of Michi-
gan, Ann Arbor, MI 48109, USA {tianran, nanli, ilya,
anouck}@umich.edu. Yildiray Yildiz is with the Department
of Mechanical Engineering, Bilkent University, Ankara 06800, Turkey
{yyildiz}@bilkent.edu.tr.
processes [7]–[10], model predictive control [11], [12], game-
theoretic models [13]–[16], [16], [17], as well as data-driven
approaches [18], [19]. To evaluate the effectiveness of these
algorithms requires simulation environments that can represent
the interactions among autonomous/human-driven vehicles.
In our previous work [20], we exploited a game-theoretic
approach to modeling vehicle interactions in highway traffic.
Compared to highway traffic, urban traffic environments with
intersections are considered to be more challenging for both
human drivers and autonomous vehicles, as they involve
more extensive and complex interactions among vehicles. For
instance, almost 40% of traffic accidents in the U.S. are
intersection-related [21].
In this paper, we extend the game-theoretic approach of [20]
to modeling vehicle interactions in urban traffic. In particular,
we consider urban traffic environments with unsignalized inter-
sections. Firstly, unsignalized intersections may be even more
challenging than signalized intersections because, due to the
lack of guidance from traffic signals, a driver/automation needs
to decide on its own, whether, when and how to enter and
drive through the intersection. According to the U.S. Federal
Highway Administration’s report, almost 70% of fatalities due
to intersection-related traffic accidents happened at unsignal-
ized intersections [22]. Thus, well-verified autonomous driving
systems for unsignalized intersections may deliver significant
safety benefits. Indeed, many research works on autonomous
vehicle control for intersections in the literature, including
[17], [23]–[26], deal with unsignalized intersections, although
they do not always explicitly point this out.
Our approach formulates the decision-making processes of
drivers/vehicles as a dynamic game, where each vehicle inter-
acts with other vehicles by observing their states, predicting
their future actions, and then planning its own actions. In
addition to the difference in traffic scenarios being considered
(i.e., urban traffic in this paper versus highway traffic in [20]),
this paper contains the following methodological contribution
compared to [20]: Due to the much larger state space for
urban traffic environments with intersections compared to
that for highway traffic, the reinforcement learning approach
used in [20] to solve for control policies is computationally
prohibitive. Therefore, we develop in this paper an alternative
approach that uniquely integrates a game-theoretic formalism,
receding-horizon optimization, and an imitation learning algo-
rithm to obtain control policies. This new approach is shown
to be computationally effective for the large state space of
urban traffic.
arXiv:1910.07141v1 [cs.RO] 16 Oct 2019
2
In [27], we modeled the interactions among vehicles at
unsignalized intersections, but using a different game-theoretic
approach from the one used in this paper: In [27], we
model vehicle interactions based on a formulation of a leader-
follower game; while in this paper, we consider the application
of level-k game theory [28], [29]. The control strategies of all
interacting vehicles modeled using the framework of [27] are
homogeneous; while the control strategies of different vehicles
modeled using the scheme of this paper are heterogeneous,
differentiated by their level-kcontrol policies with different
k= 0,1,2, . . . This heterogeneity can be used to represent the
different driving styles among different drivers, e.g., aggressive
driving versus cautious/conservative driving. In addition, [27]
models a single intersection with up to 10 interacting vehicles;
while in this paper, thanks to the effective application of
the aforementioned solution approach integrating game theory,
receding-horizon optimization, and imitation learning to obtain
control policies, the scheme of this paper can be used to model
much larger road systems involving many intersections and
many vehicles with manageable online computational effort.
This enables the investigation of driving characteristics that
are exhibited when a vehicle drives through multiple road
segments, such as overall travel time, fuel consumption, etc.
A road system with 15 intersections and 30 vehicles is shown
as an example in Section IV. Furthermore, application of
the developed traffic models to verification and validation
of autonomous vehicle control systems is comprehensively
discussed in this paper, but not in [27].
Preliminary results of this paper have been reported in the
conference papers [30] and [31]. The results modeling the
interactions between two vehicles at a four-way intersection
are reported in [30] and those for two vehicles at a roundabout
intersection are in [31]. This paper generalizes the methodol-
ogy to modeling the interactions among multiple (more than
two) vehicles and to an additional intersection type – T-shape
intersection. Constructing larger road systems based on the
models of these three intersections is reported for the first time
in this paper. This paper also demonstrates how the developed
traffic models can be used for virtual testing, evaluation, and
calibration of autonomous vehicle control systems, which is
not provided in [30] and [31].
In summary, the contributions of this paper are: 1) We de-
scribe an approach based on level-k game theory to modeling
the interactions among vehicles in urban traffic environments
with unsignalized intersections. 2) We propose an algorithm
based on imitation learning to obtain level-kcontrol policies so
that our approach to modeling vehicle interactions is scalable
– able to model traffic scenes with many intersections and
many vehicles. 3) We demonstrate the use of the developed
traffic models for virtual testing, evaluation, and calibration of
autonomous vehicle control systems. For illustration purposes,
we consider two autonomous vehicle control approaches,
analyze their characteristics and performance based on the
simulation results with our traffic models, and optimize the
parameters of one of them.
This paper is organized as follows: The models representing
vehicle dynamics and driver decision-making processes are in-
troduced in Section II. The game-theoretic model representing
vehicle interactions and obtaining its explicit approximation
via imitation learning are discussed in Section III. The proce-
dure to construct traffic models of larger road systems based on
the models of three basic intersection scenarios is described in
Section IV. We then propose two autonomous vehicle control
approaches in Section V, used as case studies to illustrate
the application of our developed traffic models to autonomous
vehicle control verification and validation. Simulation results
are reported in Section VI, and finally, the paper is concluded
in Section VII.
II. TR AFFI C DYNA MI CS A ND DRIVER DECISION-MAKING
MODELING
In this section, we describe our models to represent the traf-
fic dynamics and the decision-making processes of interacting
drivers.
A. Traffic dynamics
Firstly, we describe the evolution of a traffic scenario using
a discrete-time model as follows:
st+1 =F(st,ut),(1)
where s= (s1, s2, . . . , sm)denotes the traffic state, composed
of the states si,i∈ M ={1,2, . . . , m}, of all interacting
vehicles in the scenario, u= (u1, u2, . . . , um)denotes the
collection of all vehicles’ actions ui, and the subscript t
represents the discrete-time instant. In particular, the state of
a vehicle is composed of two parts, si= (si,1, si,2). The
first part si,1= (xi, yi, vi, θ i)represents the state of vehicle
dynamics, modeled using the “uni-cycle” model as follows:
xi
t+1
yi
t+1
vi
t+1
θi
t+1
=f(si
t, ui
t) =
xi
t+vi
tcos θi
tt
yi
t+vi
tsin θi
tt
vi
t+ai
tt
θi
t+ωi
tt
,(2)
where (xi, yi),vi, and θirepresent, respectively, the vehi-
cle’s position in the ground-fixed frame, its speed, and its
heading angle, the inputs aiand ωirepresent, respectively,
the vehicle’s acceleration and heading angle rate, while t
is the sampling interval for decision-making. The second part
si,2= (ri, ξi)contains additional information related to the
vehicle’s decision-making objective, including ri= (ri
x, ri
y),
representing a target/reference position to go, and ξi, a feature
vector containing key information about the road layout and
geometry such as the road width, the angle of intersection, and
etc [27]. When vehicle iis driving toward, in the middle of,
or exiting a specific intersection, si,2stays constant with ri
being a point located in the center of the vehicle’s target lane;
si,2gets updated after the vehicle has returned to straight road
and is driving toward the next intersection.
B. Driver decision-making
An action uiis a pair of values of the inputs (ai, ωi), i.e.,
ui= (ai, ωi). We assume that the drivers of the vehicles make
sequential decisions based on receding-horizon optimization as
3
follows: At each discrete-time instant t, the driver of vehicle
isolves for
(ui
t)=(ui
0|t),(ui
1|t),...,(ui
N1|t)(3)
arg max
ui
t∈UN
N1
X
τ=0
λτR(si
τ|t,si
τ|t, ui
τ|t,ui
τ|t),
where ui
t=ui
0|t, ui
1|t, . . . , ui
N1|trepresents a sequence of
predicted actions of vehicle i, with ui
τ|tdenoting the predicted
action for time step t+τand taking values in a finite action
set U; the notations si
τ|t,si
τ|tand ui
τ|trepresent, respectively,
the predicted state of vehicle i, and the collections of predicted
states and actions of the other vehicles j∈ M,j6=i, i.e.,
si
τ|t= (sj
τ|t)j∈M,j6=iand ui
τ|t= (uj
τ|t)j∈M,j6=i;Ris a reward
function depending on the states and actions of all interacting
vehicles, which will be introduced in detail in the following
section; and λ(0,1] is a factor discounting future reward.
Once an optimal action sequence (ui
t)is determined,
vehicle iapplies the first element (ui
0|t)for one time step,
i.e., ui
t= (ui
0|t). After the states of all vehicles have been
updated, vehicle irepeats this procedure at t+ 1.
The fact that Rdepends not only on the ego vehicle’s state
and action but also on those of the other vehicles determines
the interactive nature of the drivers’ decision-making processes
in a multi-vehicle traffic scenario. Note that, due to the
unknowns ui
τ|tand si
τ|tfor τ= 0,1, . . . , N 1, the problem
(3) has not been well-defined yet and cannot be solved. To
be able to solve for (ui
t), we will exploit a game-theoretic
approach in Section III to predict the values of ui
τ|tand si
τ|t.
C. Reward function
We use the reward function Rin (3) to represent vehicles’
decision-making objectives in traffic. In this paper, we consider
Rdefined as follows:
R(si
τ|t,si
τ|t, ui
τ|t,ui
τ|t) = w|Φsi
τ+1|t,(sj
τ+1|t)j∈M,j6=i,
(4)
where Φ= [φ1, φ2, . . . , φ6]|is the feature vector and wR6
+
is the weight vector. Note that sj
τ+1|t=f(sj
τ|t, uj
τ|t)for all
j∈ M based on the dynamic model (2).
The features φ1, φ2, . . . , φ6are designed to encode common
considerations in driving, such as safety, comfort, travel time,
etc. They are defined as follows.
The feature φ1characterizes the collision status of the
vehicle. In particular, we bound the geometric contour of each
vehicle by a rectangle, referred to as the collision-zone (c-
zone). Then, φ1=1if vehicle i’s c-zone at the predicted
state si
τ+1|toverlaps with any of the other vehicles’ c-zones
at their predicted states sj
τ+1|t, and φ1= 0 otherwise.
The feature φ2characterizes the on-road status of the
vehicle, taking 1if vehicle is c-zone crosses any of the road
boundaries, and 0otherwise. And similarly, φ3characterizes
the in-lane status of the vehicle. If vehicle is c-zone crosses
a lane marking that separates the traffic of opposite directions
or enters a lane different from its target lane when exiting an
intersection, then φ3=1;φ3= 0 otherwise.
To characterize the status of maintaining a safe and com-
fortable separation between vehicles, we further define a
separation-zone (s-zone) of each vehicle, which over-bounds
the vehicle’s c-zone with a safety margin. The feature φ4
takes 1if vehicle is s-zone overlaps with any of the
other vehicles’ s-zones at their predicted states, and takes 0
otherwise.
The features φ5and φ6characterize the vehicle’s behavior
in approaching its target lane and are defined as follows,
φ5=− |ri
xxi|−|ri
yyi|,(5)
φ6=vi,(6)
so that the vehicle is encouraged to reach the reference point
riin its target lane as quickly as it can.
The above reward function design represents common driv-
ing objectives in traffic. The weight vector wcan be tuned
to achieve reasonable driving behavior, or can be calibrated
using traffic data and approaches such as inverse reinforcement
learning [32], [33].
III. GAME-THEORETIC DECISION-MAKING AND EXPLICIT
REALIZATION VIA IMI TATION LEARNING
Game theory is a useful tool for modeling intelligent agents’
strategic interactions. In this paper, we exploit the level-k
game theory [28], [29] to model vehicles’ interactive decision-
making.
A. Level-k reasoning and decision-making
In level-k game theory, it is assumed that players make
decisions based on finite depths of reasoning, called “level,
and different players may have different reasoning levels.
In particular, a level-0player makes non-strategic decisions
– decisions without regard to the other players’ decisions.
Then, a level-k,k1, player makes strategic decisions
by assuming that all of the other players are level-(k1),
predicting their decisions based on such an assumption, and
optimally responding to their predicted decisions. It is verified
by experimental results from cognitive science that such a
level-k reasoning process can model human interactions with
higher accuracy than traditional analytic methods in many
cases [29].
To incorporate level-k reasoning in our decision-making
model (3), we start with defining a level-0decision rule.
According to the non-strategic assumption about level-0play-
ers, we let a level-0decision of a vehicle i,i∈ M,
depend only on the traffic state st, including its own state
si
tand the other vehicles’ states si
t, but not on the other
vehicles’ actions ui
t. In this paper, a level-0decision, (ui
t)0=
(ui
0|t)0,(ui
1|t)0,...,(ui
N1|t)0, is a sequence of predicted
actions that maximizes the cumulative reward in (3) with
treating all of the other vehicles as stationary obstacles over
the planning horizon, i.e., vj
τ|t= 0,ωj
τ|t= 0 for all j6=i,
τ= 0,1, . . . , N . This way, a level-0vehicle represents an
aggressive vehicle which assumes that all of the other vehicles
will yield the right of way to it.
4
On the basis of the formulated level-0decision rule, the
level-kdecisions of the vehicles are obtained based on
(ui
t)k=(ui
0|t)k,(ui
1|t)k,...,(ui
N1|t)k(7)
arg max
ui
t∈UN
N1
X
τ=0
λτRsi
τ|t,si
τ|t, ui
τ|t,(ui
τ|t)k1,
for every i∈ M, and for every k= 1,2, . . . , kmax through
sequential, iterated computations, where (ui
τ|t)k1denotes the
level-(k1) decisions of the other vehicles j6=i, which have
been determined either in the previous iteration or based on
the level-0decision rule (for k= 1), and kmax is the highest
reasoning level for computation.
Given a finite action set U, the problem (7) for every i∈ M
and k= 1,2, . . . , kmax can be solved with exhaustive search,
e.g., based on a tree structure [34].
B. Explicit level-k decision-making via imitation learning
A level-kvehicle drives in traffic by applying ui
t= (ui
0|t)k
at every time step, where (ui
0|t)kis determined according to
(7) with the current state as the initial condition, i.e., si
0|t=si
t
and si
0|t=si
t.
Solving the problem (7) involves numerical computations.
In particular, the computational demand becomes increasingly
heavier for larger kand larger numbers of interacting vehicles,
due to the fact that to compute the level-kdecision of vehicle
irequires determining level-(k1) decisions of all other
vehicles j6=ifirst, which in turn requires the determination
of level-(k2) decisions for k2, and etc.
For the purpose of developing simulation environments to
conduct virtual tests for autonomous vehicle control systems,
fast simulations are desired so that a large number of scenarios
can be covered within a short period of time. Motivated by
this, we exploit machine learning techniques to move the
computations offline and achieve explicit level-kdecision rules
for online use.
In particular, we define a policy as a map from a triple of
the ego vehicle’s state si
t, the other vehicles’ states si
t, and
the ego vehicle’s reasoning level kto the level-kaction of the
ego vehicle, i.e.
πk: (si
t,si
t, k)7→ (ui
t)k.(8)
This map is algorithmically determined by the problem (7) and
(ui
t)k= (ui
0|t)k. We then pursue an explicit approximation
of πk, denoted by ˆπk, using the approach called “imitation
learning.”
Imitation learning is an approach for an autonomous agent
to learn a control policy from expert demonstrations to imitate
expert’s behavior. The expert can be a human expert [35] or a
well-behaved artificial intelligence [36]. In this paper, we treat
the algorithmically determined map πkas the expert.
Imitation learning can be formulated as a standard super-
vised learning problem, in which case it is also commonly
referred to as “behavioral cloning,” where the learning objec-
tive is to obtain a policy from a pre-collected dataset of expert
demonstrations that best approximates the expert’s behavior at
the states contained in the dataset. Such a procedure can be
described as
ˆπkarg min
πθ
E¯
sP(¯
s|πk)L(πk(¯
s), πθ(¯
s)),(9)
where ¯
sdenotes the triple (si,si, k),πkdenotes the expert
policy (8), πθdenotes a policy parameterized by θ(e.g., the
weights of a neural network) that is being evaluated and
optimized, Lis a loss function, and the notation E¯
sP(¯
s|πk)(·)
is defined as
E¯
sP(¯
s|πk)(·) = Z(·)dP(¯
s|πk).(10)
We remark that a key feature of the procedure (9) is that
the expectation is with respect to the probability distribution
P(¯
s|πk)of the data ¯
sdetermined by the expert policy πk, which
is essentially the empirical distribution of ¯
sin the pre-collected
dataset.
In our previous work [31], we have explored the procedure
(9) to obtain an explicit policy that imitates level-kdecisions
for an autonomous vehicle to drive through a roundabout
intersection.
A drawback of using (9) to train the policy ˆπklies in that
only the states that can be reached by executing πkare included
in the dataset, and such a sampling bias may cause the error
of ˆπkfrom πkto propagate in time – a small error may cause
the vehicle to reach a state that is not exactly included in the
dataset and, consequently, a large error may occur at the next
time step.
Therefore, in this paper we use an alternative approach,
called the “Dataset Aggregation” (DAgger) algorithm, to train
the policy ˆπk. DAgger is an iterative algorithm that optimizes
the policy under its induced state distribution [37]. The learn-
ing objective of DAgger can be described as
ˆπkarg min
πθ
E¯
sP(¯
s|πθ)L(πk(¯
s), πθ(¯
s)),(11)
E¯
sP(¯
s|πθ)(·) = Z(·)dP(¯
s|πθ),(12)
where the distinguishing feature from (9) is that the expec-
tation is with respect to the probability distribution P(¯
s|πθ)
induced from the policy πθthat is being evaluated and
optimized.
DAgger can effectively resolve the aforementioned issue
with regard to the propagation of error in time, since there
will be data points (¯
s, πk(¯
s)) for states ¯
sreached by executing
ˆπk.
The procedure to obtain explicit level-kdecision-making
policies based on an improved version of DAgger algorithm
[36] is presented as Algorithm 1. In Algorithm 1, nmax repre-
sents the maximum number of simulation episodes and tmax
represents the length of a simulation episode. By “initialize the
simulation environment,” we mean constructing a traffic scene,
including specifying the road layout and geometry as well as
the number of vehicles. By “initialize vehicle i,” we mean
putting the vehicle in a lane entering the scene while satisfying
a minimum separation distance from the other vehicles, and
specifying a sequence of target lanes for the vehicle to traverse
and finally leave the scene. By “vehicle ifails,” we mean
5
the occurrence of 1) vehicle i’s c-zone overlapping with any
of the other vehicles’ c-zones, 2) crossing any of the road
boundaries, or 3) crossing a lane marking that separates the
traffic of opposite directions. And, by “vehicle isucceeds,
we mean vehicle igets to the last target lane in its sequence
so that it can leave the scene without further interactions with
the other vehicles.
Algorithm 1: Imitation learning algorithm to obtain ex-
plicit level-kdecision-making policies
1Initialize ˆπ0
kto an arbitrary policy;
2Initialize dataset D ← ∅;
3for n= 1 : nmax do
4Initialize the simulation environment;
5for i∈ M do
6Initialize vehicle i;
7end for
8for t= 0 : tmax 1do
9for i∈ M do
10 if vehicle ifails or succeeds then
11 Re-initialize vehicle i;
12 end if
13 for k= 1 : kmax do
14 if ˆπn1
k(si
t,si
t, k)6=πk(si
t,si
t, k)then
15 D ← D ∪ (si
t,si
t, k), πk(si
t,si
t, k)
16 end if
17 end for
18 Randomly generate kt∈ {1,...,kmax};
19 si
t+1 =fsi
t,ˆπn1
k(si
t,si
t, kt);
20 end for
21 end for
22 Train classifier ˆπn
kon D;
23 end for
24 Output ˆπk= ˆπnmax
k.
IV. TRA FFIC I N UNSIGNALIZED INTERSECTION NETWOR K
We model traffic in urban environments where the road
system is composed of straight roads and three of the most
common types of unsignalized intersections: four-way, T-
shape, and roundabout [38]. Such traffic models can be used
as simulation environments for virtual testing of autonomous
vehicle control systems, which will be introduced in Section V.
The three unsignalized intersections to be modeled are
shown in Fig. 1. A vehicle can come from any of the entrance
lanes (marked by green arrows) to enter an intersection and
go to any of the exit lanes (marked by red arrows) to leave it,
except that U-turns are not allowed for four-way and T-shape
intersections.
(a) (b) (c)
Fig. 1. Unsignalized intersections to be modeled: (a) four-way, (b) T-shape,
and (c) roundabout.
When training the level-k policy ˆπkusing Algorithm 1, we
treat these three unsignalized intersections separately. Specif-
ically, when initializing the simulation environment in step 4,
we select one of these three unsignalized intersections as the
traffic scene for the current simulation episode. In addition,
since in this paper we only consider these three unsignalized
intersections, their layout and geometry features can be char-
acterized and distinguished using a label ξ∈ {1,2,3}, i.e., the
state ξiof vehicle itakes the value 1when vehicle ioperates
in the area of the four-way intersection, 2for the T-shape
intersection, and 3for the roundabout. For more intersection
types with various layout and geometry features, a higher
dimensional vector ξmay be used (e.g., see the intersection
model in [27]).
Once the policy ˆπkfor each of these three unsignalized inter-
sections has been obtained, we can model larger road systems
using these three unsignalized intersections as modules and
assembling them in arbitrary ways. Fig. 2 shows an example
of assembly. When a vehicle operates at/nearest to a specific
intersection, it uses a local coordinate system, accounts for its
interactions with only the vehicles in an immediate vicinity,
and applies the ˆπkcorresponding to this intersection.
To model the heterogeneity in driving styles of different
drivers, we let different vehicles be of different reasoning
levels. Specifically, a level-kvehicle is controlled by the
policy:
ˆπk= ˆπk(·,·, k):(si
t,si
t)7→ (ui
t)k.(13)
For instance, the 15 yellow cars are level-1and the 15 red
cars are level-2in Fig. 2.
num. of level-1 cars: 15
num. of level-2 cars: 15
Fig. 2. An urban traffic environment with 15 level-1cars (yellow) and 15
level-2cars (red).
V. AUTONOMOUS VEHICLE CONTROL AP PROACH ES
In this section, we describe two autonomous vehicle control
approaches for urban traffic environments with unsignalized
intersections. These approaches will be tested and calibrated
using our traffic model, thereby demonstrating its utility for
verification and validation.
A. Adaptive control based on level-kmodels
In this approach, the autonomous ego vehicle treats the other
drivers as level-kdrivers. As different drivers may behave
6
corresponding to different reasoning levels, the ego vehicle
estimates their levels and adapts its own control strategy based
on the estimation results.
The control strategy of the autonomous ego vehicle, i, can
be described as: At each discrete-time instant t, vehicle i
solves for
(ui
t)a=(ui
0|t)a,(ui
1|t)a,...,(ui
N1|t)a(14)
arg max
ui
t∈UN
N1
X
τ=0
λτRsi
τ|t,si
τ|t, ui
τ|t,(ui
τ|t)˜
k,
where (ui
τ|t)˜
k=(uj
τ|t)˜
kj
tj∈M,j6=idenotes the collection
of predicted actions of the other vehicles. In particular, the
actions of vehicle j,uj
τ|t,τ= 0,1, . . . , N 1, are predicted by
modeling vehicle jas level-˜
kj
tand solved based on (7), where
˜
kj
tis determined based on the following maximum likelihood
principle: ˜
kj
targ max
k∈K
Pi(kj=k|t),(15)
in which Pi(kj=k|t)represents vehicle i’s belief at time tin
that vehicle jcan be modeled as level-k, with ktaking values
in a model set K. The beliefs Pi(kj=k|t)get updated after
each time step based on the following algorithm: If there exist
k, k0∈ K such that πk(sj
t,sj
t, k)6=πk(sj
t,sj
t, k0), then
Pi(kj=k|t+ 1) = pi(kj=k|t+ 1)
Pk0∈K pi(kj=k0|t+ 1),(16)
pi(kj=k|t+ 1) = ((1 β)Pi(kj=k|t) + βif k=ˆ
kj
t,
Pi(kj=k|t),otherwise,
where β[0,1] represents an update step size,
ˆ
kj
targ min
k∈K
distuj
t,(uj
t)k,
=q(aj
t(aj
t)k)2+ (ωj
t(ωj
t)k)2;(17)
and if πk(sj
t,sj
t, k) = πk(sj
t,sj
t, k0)for all k, k 0∈ K, then
Pi(kj=k|t+ 1) = Pi(kj=k|t)for all k∈ K.
The level estimation algorithm (15)-(17) has the following
three features: 1) If the actions predicted by all of the models
in Kare the same, then the autonomous ego vehicle has
no information to distinguish their relative accuracy and thus
maintains its previous beliefs. 2) Otherwise, the ego vehicle
identifies the model(s) in Kwhose prediction (uj
t)kmatches
vehicle j’s actually applied action uj
tfor time twith the
highest accuracy. 3) The ego vehicle improves its belief in that
model(s) from its previous beliefs, thus, it takes into account
both its previous estimates and the current, latest estimate.
Similar to (8) defined by (7), we can define a policy to
represent the control determined by (14) as follows:
πa: (si
t,si
t,˜
ki
t)7→ (ui
t)a,(18)
where ˜
ki
t= (˜
kj
t)j∈M,j6=idenotes the collection of level esti-
mates of the other vehicles and (ui
t)a= (ui
0|t)ais determined
by (14). Furthermore, similar to the procedure to train the
explicit approximation ˆπkto πkusing imitation learning, we
can train an explicit approximation ˆπato πa. This way, together
with replacing πkwith ˆπkin the level estimation algorithm
(15)-(17), we can move the major computations involved in
(14)-(17) offline, thus, reducing online computational load and
promoting real-time implementation.
The algorithm to train ˆπausing πaas the expert policy and
the DAgger algorithm is similar to Algorithm 1 and is omitted.
B. Rule-based control
The second autonomous vehicle control approach we con-
sider is a rule-based solution. Compared to many other ap-
proaches, rule-based control has the advantage of interpretabil-
ity and can often be calibrated by tuning a small number of
parameters.
The autonomous ego vehicle drives by following a pre-
planned reference path and accounts for its interactions with
other vehicles by adjusting its speed along the path corre-
spondingly. Examples of reference paths for the autonomous
ego vehicle to drive through intersections are illustrated by the
green dotted curves in Fig. 3.
(a) (b) (c)
Fig. 3. Reference paths for the autonomous ego vehicle to drive through (a)
four-way, (b) T-shape, and (c) roundabout intersections.
The basic control rules can be explained as follows: The
autonomous ego vehicle pursues a higher speed along the
reference path if there is no other vehicle in conflict with
it. If there are other vehicles in conflict with it, then the au-
tonomous ego vehicle yields to them by maximizing distances
from them. Specifically, at each discrete-time instant t, the
autonomous ego vehicle, i, selects and applies for one time
step an acceleration value from a finite set of accelerations,
A, according to Algorithm 2.
Algorithm 2: Rule-based autonomous vehicle control al-
gorithm
1Initialize Mc← ∅;
2for j∈ M, j 6=ido
3if the estimated future path of jintersects with is future
path and dist(xi
t, yi
t),(xj
t, yj
t)Rcthen
4Mc← Mc∪ {j};
5end if
6end for
7if Mc6=then
8(ai
t)r=
arg maxa∈A minj∈Mcdist(xi
1|t, yi
1|t),(xj
1|t, yj
1|t);
9else
10 (ai
t)r= max{a∈ A};
11 end if
12 Output (ai
t)r.
In Algorithm 2, Mcrepresents the set of vehicles that are
in conflict with the ego vehicle. In particular, the ego vehicle
7
estimates each of the other vehicles’ future paths based on their
current positions and their target lanes and using the same path
planning algorithm that is used by the ego vehicle to create its
own path. If the estimated future path of a vehicle jintersects
with the ego vehicle’s own future path and the current distance
between these two vehicles is smaller than a threshold value
Rc, then vehicle jis identified as a vehicle in conflict, i.e.,
j∈ Mc, where the distance function dist(·,·)is defined as
dist(x1, y1),(x2, y2)=p(x1x2)2+ (y1y2)2.(19)
If there are vehicles in conflict, Mc6=, then the ego vehicle
maximizes the minimum among the predicted distances from
these vehicles to improve safety. In step 8, (xi
1|t, yi
1|t)repre-
sents the predicted position of the ego vehicle iby driving
with the predicted speed after applying the acceleration aand
along its reference path for one step, and (xj
1|t, yj
1|t)represents
the predicted position of vehicle jby driving with its current
speed and along its current heading direction. If there is no
vehicle in conflict, Mc=, then the ego vehicle maximizes
its speed.
Note that the key parameter for this rule-based control
approach is the threshold value Rc, which influences both
whether a vehicle is identified as in conflict with the ego
vehicle and the separation distance the ego vehicle tries to
keep from other vehicles. We will utilize our traffic model to
calibrate this parameter in Section VI-B.
VI. RE SU LTS
In this section, we illustrate simulations of urban traffic with
vehicle interactions modeled by our level-k game-theoretic
approach, and the application to verification, validation and
calibration of autonomous vehicle control systems.
A. Traffic modeling with level-k vehicles
We consider a sampling interval, t= 0.25[s], and an ac-
tion set Uconsisting of 6actions representing common driving
maneuvers in urban traffic, listed in Table I. The weight vector,
the planning horizon, and the discount factor for the reward
function (4) are w= [1000,500,50,100,5,1]|,N= 4, and
λ= 0.8. When evaluating the features φ1and φ4, we consider
the c-zone of a vehicle as a 5[m]×2[m]rectangle centered
at the vehicle’s position (x, y)and stretched along its heading
direction θand the s-zone of a vehicle as a rectangle concentric
with its c-zone and 8[m] ×2.4[m] in size. Furthermore, we
consider a speed range [vmin, vmax ] = [0,5][m/s], representing
common speeds for vehicles to drive through intersections, i.e.,
when the speed calculated based on the model (2) gets outside
of [vmin, vmax ], it is saturated to this range.
Experimental studies [29], [39] suggest that humans are
most commonly level-1and 2reasoners in their interactions.
Thus, we model vehicles in traffic using level-1and 2policies
in this paper. In particular, on the basis of our level-0deci-
sion rule (see Section III-A), a level-1vehicle represents a
cautious/conservative vehicle and a level-2vehicle represents
an aggressive vehicle. Indeed, since both level-0and level-2
vehicles represent aggressive vehicles, they behave similarly
in many situations.
TABLE I
ACT ION SET U.
action u a [m2/s]ω[rad/s]
maintain (u1) 0 0
accelerate (u2) 2.5 0
decelerate (u3) -2.5 0
hard brake (u4) -5 0
turn left (u5) 0 π/4
turn right (u6) 0 π/4
We use a neural network with the architecture shown in in
Fig. 4 to represent a policy πθand train its weights θusing
Algorithm 1 to obtain the explicit approximation ˆπkto the
level-k policy πk, which is algorithmically determined by (7).
The accuracy of the obtained ˆπkin terms of matching πkon the
training dataset is 98.3%. Then, we generate 30% more data
points of (si
t,si
t, k), πk(si
t,si
t, k)for testing. The accuracy
of ˆπkin matching πkon the test dataset is 97.8%.
Fig. 4. Architecture of the neural network.
To show the advantage of using the DAgger algorithm
(11) over using a standard supervised learning procedure (9)
to obtain the policy ˆπk, we show a case observed in our
simulations where the policy trained using standard supervised
learning fails but the one trained using DAgger succeeds. In
Fig. 5(a-3), the blue vehicle controlled by ˆπktrained using
standard supervised learning fails in making an adequate right
turn to get around the central island. This is due to a significant
error of ˆπkfrom πkat certain states encountered by the blue
vehicle when entering the roundabout, and the encounter with
such states results from the issue of error propagation in time
that has been discussed in Section III-B. In contrast, the blue
vehicle in Fig. 5(b-3) controlled by ˆπktrained using DAgger
succeeds in making a proper right turn, illustrating the fact
that DAgger can effectively resolve such an issue.
In what follows we show the interactions between level-k
vehicles at the four-way, T-shape, and roundabout intersec-
tions. In particular, we let three vehicles be controlled by
different level-kpolicies and show how the traffic scenarios
evolve differently depending on the different combinations of
level-kpolicies.
It can be observed from Figs. 6-8 that, in general, when
level-1and level-2vehicles interact with each other, the
conflicts between them can be resolved. This is expected since
level-1vehicles, representing cautious/conservative vehicles,
will yield the right of way and level-2vehicles, representing
aggressive vehicles, will proceed ahead. In contrast, when
level-1vehicles interact with level-1vehicles, deadlocks may
8
-15 -10 -5 0 5 10 15
-15
-10
-5
0
5
10
15
v1 = 5 m/s
v2 = 5 m/s
-15 -10 -5 0 5 10 15
-15
-10
-5
0
5
10
15
v1 = 4.5 m/s
v2 = 5 m/s
-15 -10 -5 0 5 10 15
-15
-10
-5
0
5
10
15
v1 = 5 m/s
v2 = 5 m/s
-15 -10 -5 0 5 10 15
-15
-10
-5
0
5
10
15
v1 = 5 m/s
v2 = 5 m/s
-15 -10 -5 0 5 10 15
-15
-10
-5
0
5
10
15
v1 = 1.5 m/s
v2 = 5 m/s
-15 -10 -5 0 5 10 15
-15
-10
-5
0
5
10
15
v1 = 3 m/s
v2 = 5 m/s
(a-1) (a-2) (a-3)
(b-1) (b-2) (b-3)
Fig. 5. (a-1)-(a-3) show three subsequent steps in a simulation where the blue
vehicle controlled by ˆπktrained using standard supervised learning fails in
making an adequate right turn to get around the central island of a roundabout;
(b-1)-(b-3) show steps in a similar simulation where the blue vehicle controlled
by ˆπktrained using DAgger succeeds in making a proper right turn.
occur, such as the one being observed in the T-shape intersec-
tion in Fig. 7(a), because everyone yields to the others. When
level-2vehicles interact with level-2vehicles, collisions may
occur, such as the ones being observed in panel (b) of Figs. 6-
8, because everyone assumes the others yield.
-15 -10 -5 0 5 10 15
-15
-10
-5
0
5
10
15
v1 = 5 m/s
v2 = 5 m/s
v3 = 5 m/s
-15 -10 -5 0 5 10 15
-15
-10
-5
0
5
10
15
v1 = 2 m/s
v2 = 4 m/s
v3 = 0 m/s
-15 -10 -5 0 5 10 15
-15
-10
-5
0
5
10
15
v1 = 5 m/s
v2 = 3.5 m/s
v3 = 3.5 m/s
-15 -10 -5 0 5 10 15
-15
-10
-5
0
5
10
15
v1 = 5 m/s
v2 = 5 m/s
v3 = 5 m/s
-15 -10 -5 0 5 10 15
-15
-10
-5
0
5
10
15
v1 = 5 m/s
v2 = 5 m/s
v3 = 5 m/s
-15 -10 -5 0 5 10 15
-15
-10
-5
0
5
10
15
v1 = 5 m/s
v2 = 5 m/s
v3 = 5 m/s
-15 -10 -5 0 5 10 15
-15
-10
-5
0
5
10
15
v1 = 5 m/s
v2 = 0.5 m/s
v3 = 0 m/s
-15 -10 -5 0 5 10 15
-15
-10
-5
0
5
10
15
v1 = 5 m/s
v2 = 1.5 m/s
v3 = 3.5 m/s
(a-1) (a-2) (a-3)
(b-1) (b-2) (b-3)
(c-1) (c-2) (c-3)
Fig. 6. Interactions of level-kvehicles at the four-way intersection. (a-1)-(a-
3) show three subsequent steps in a simulation where three level-1vehicles
interact with each other; (b-1)-(b-3) show steps of three level-2vehicles
interacting with each other; (c-1)-(c-3) show steps of a level-2vehicle (blue)
interacting with two level-1vehicles (yellow and red); v1,v2, and v3are the
speeds of, respectively, the blue, yellow, and red vehicles.
We remark that deadlocks (collisions) do not always occur
-15 -10 -5 0 5 10 15
-15
-10
-5
0
5
v1 = 5 m/s
v2 = 5 m/s
v3 = 5 m/s
-15 -10 -5 0 5 10 15
-15
-10
-5
0
5
v1 = 3 m/s
v2 = 3 m/s
v3 = 5 m/s
-15 -10 -5 0 5 10 15
-15
-10
-5
0
5
v1 = 0 m/s
v2 = 0 m/s
v3 = 0 m/s
-15 -10 -5 0 5 10 15
-15
-10
-5
0
5
v1 = 5 m/s
v2 = 5 m/s
v3 = 5 m/s
-15 -10 -5 0 5 10 15
-15
-10
-5
0
5
v1 = 4.5 m/s
v2 = 5 m/s
v3 = 5 m/s
-15 -10 -5 0 5 10 15
-15
-10
-5
0
5
v1 = 5 m/s
v2 = 5 m/s
v3 = 5 m/s
-15 -10 -5 0 5 10 15
-15
-10
-5
0
5
v1 = 5 m/s
v2 = 2 m/s
v3 = 5 m/s
-15 -10 -5 0 5 10 15
-15
-10
-5
0
5
v1 = 2.5 m/s
v2 = 5 m/s
v3 = 3.5 m/s
(a-1) (a-2) (a-3)
(b-1) (b-2) (b-3)
(c-1) (c-2) (c-3)
Fig. 7. Interactions of level-kvehicles at the T-shape intersection. (a-1)-(a-
3) show three subsequent steps in a simulation where three level-1vehicles
interact with each other; (b-1)-(b-3) show steps of three level-2vehicles
interacting with each other; (c-1)-(c-3) show steps of a level-2vehicle (blue)
interacting with two level-1vehicles (yellow and red); v1,v2, and v3are the
speeds of, respectively, the blue, yellow, and red vehicles.
in level-1(level-2) interactions. The initial conditions of
Figs. 6-8 are chosen to show such situations. For randomized
initial conditions, the rates of success, defined as the pro-
portion of 2000 simulation episodes where neither deadlocks
nor collisions occur to the ego vehicle, for different numbers
of interacting vehicles and different combinations of level-k
policies at the three intersections are shown in Fig. 9. In Fig. 9,
“L-kcar in L-k0Env.” means the rate of success of a level-k
ego vehicle when interacting with other vehicles that are all
of level-k0; “L-kcar in Mix Env.” means the rate of success
of a level-kego vehicle when interacting with other vehicles
whose control policies are randomly chosen between level-1
and level-2with equal probability.
The following can be observed: 1) As the number of
interacting vehicles increases, the rate of success decreases
for all the cases. This is reasonable since a larger number of
interacting vehicles represents a more complex traffic scenario.
2) The rates of success of a level-2ego vehicle when interact-
ing with other vehicles that are also of level-2are the lowest
among the results of all combinations of level-kpolicies. This
is also reasonable since when all the vehicles are aggressive
and assume the others yield, traffic accidents are more likely to
occur. 3) Among the results of the three intersection types, the
rates of success for the roundabout intersection are the highest.
This illustrates the effective functionality of roundabouts in
reducing traffic conflicts.
We further remark that although the high rates of failure of
“level-2versus level-2” are not desired in real-world traffic,
it is important for a simulation environment for autonomous
vehicle control testing to include such cases that represent
rational interactions between aggressive vehicles. Note that
9
-15 -10 -5 0 5 10 15
-15
-10
-5
0
5
10
15
v1 = 5 m/s
v2 = 5 m/s
v3 = 5 m/s
-15 -10 -5 0 5 10 15
-15
-10
-5
0
5
10
15
v1 = 0.5 m/s
v2 = 5 m/s
v3 = 0 m/s
-15 -10 -5 0 5 10 15
-15
-10
-5
0
5
10
15
v1 = 4 m/s
v2 = 5 m/s
v3 = 3.5 m/s
-15 -10 -5 0 5 10 15
-15
-10
-5
0
5
10
15
v1 = 5 m/s
v2 = 5 m/s
v3 = 5 m/s
-15 -10 -5 0 5 10 15
-15
-10
-5
0
5
10
15
v1 = 5 m/s
v2 = 5 m/s
v3 = 5 m/s
-15 -10 -5 0 5 10 15
-15
-10
-5
0
5
10
15
v1 = 5 m/s
v2 = 5 m/s
v3 = 5 m/s
-15 -10 -5 0 5 10 15
-15
-10
-5
0
5
10
15
v1 = 5 m/s
v2 = 4 m/s
v3 = 0.5 m/s
-15 -10 -5 0 5 10 15
-15
-10
-5
0
5
10
15
v1 = 4.5 m/s
v2 = 3 m/s
v3 = 4.5 m/s
(a-1) (a-2) (a-3)
(b-1) (b-2) (b-3)
(c-1) (c-2) (c-3)
Fig. 8. Interactions of level-kvehicles at the roundabout intersection. (a-1)-
(a-3) show three subsequent steps in a simulation where three level-1vehicles
interact with each other; (b-1)-(b-3) show steps of three level-2vehicles
interacting with each other; (c-1)-(c-3) show steps of a level-2vehicle (blue)
interacting with two level-1vehicles (yellow and red); v1,v2, and v3are the
speeds of, respectively, the blue, yellow, and red vehicles.
a level-2vehicle is a rational decision maker that behaves
aggressively, which is fundamentally different from a vehicle
model that acts aggressively but in an irrational way, e.g.,
taking actions randomly. The cases of level-2vehicle inter-
actions provide challenging test scenarios for an autonomous
vehicle control system, which can be more realistic than those
provided by some worst-case (i.e., not necessarily rational)
models [40].
B. Evaluation and calibration of autonomous vehicle control
approaches
We test the two autonomous vehicle control approaches
described in Section V using our traffic model.
For the first approach of adaptive control based on level-
kmodels, we use the same sampling interval t, action set
U, reward function including the weight vector w, planning
horizon N, and discount factor λas those used for the level-k
vehicle models. In the level estimation algorithm (15)-(17), we
consider the model set K={1,2}and the update step size
β= 0.6.
When training the explicit approximation ˆπato the policy πa
that is algorithmically determined by (14), we use the same
neural network architecture shown in Fig. 4. The accuracy
of the obtained ˆπain terms of matching πais 98.8% on
the training dataset and is 98.6% on a test dataset of 30%
additional data points that are not used for training.
Firstly, we simulate similar scenarios as those shown in
Figs. 6-8, but let the autonomous ego vehicle (blue) be
(a-1) (b-1)
(a-2) (b-2)
(a-3) (b-3)
Fig. 9. The rates of success of level-kpolicies. (a-1)-(a-3) show the rates
of success of a level-1ego vehicle operating in various traffic environments
(various in the numbers and policies of interacting vehicles) at the four-way,
T-shape, and roundabout intersections; (b-1)-(b-3) show those of a level-2ego
vehicle; the bars in dark color represent the rates of success.
controlled by the adaptive control approach instead of level-
kpolicies. Figs. 10-12 show snapshots of the simulations. It
can be observed that the autonomous ego vehicle can resolve
the conflicts with the other two vehicles and safely drive
through the intersections although the other two vehicles are
controlled by varying policies. The bottom panels show the
level estimation histories of the simulations. It can be observed
that the autonomous ego vehicle can resolve the conflicts
because it successfully identifies the level-kmodels of the
other two vehicles. Recall that vehicle jis identified as level-
1(level-2) when P(kj= 2) <0.5(P(kj= 2) 0.5).
The success of the adaptive control approach in situations
where level-kcontrol policies with fixed kfail suggests
the significance in autonomous vehicle control of intention
recognition and action prediction for the other vehicles. Note
that these two steps are achieved in our adaptive control
approach through the level estimates and the level-kmodels
of the other vehicles.
We then statistically evaluate and compare the two au-
tonomous vehicle control approaches. For the second approach
of rule-based control, we consider an acceleration set A=
{−5,2.5,0,2.5}[m/s2] and an initial design of the threshold
value Rc= 14[m].
To cover a rich set of scenarios, we construct a larger traffic
scene shown in Fig. 13, which models the road system of
10
-15 -10 -5 0 5 10 15
-15
-10
-5
0
5
10
15
v1 = 3.5 m/s
v2 = 5 m/s
v3 = 4.5 m/s
-15 -10 -5 0 5 10 15
-15
-10
-5
0
5
10
15
v1 = 5 m/s
v2 = 3 m/s
v3 = 1.5 m/s
-15 -10 -5 0 5 10 15
-15
-10
-5
0
5
10
15
v1 = 5 m/s
v2 = 1 m/s
v3 = 3 m/s
-15 -10 -5 0 5 10 15
-15
-10
-5
0
5
10
15
v1 = 3.5 m/s
v2 = 5 m/s
v3 = 5 m/s
-15 -10 -5 0 5 10 15
-15
-10
-5
0
5
10
15
v1 = 0 m/s
v2 = 5 m/s
v3 = 5 m/s
-15 -10 -5 0 5 10 15
-15
-10
-5
0
5
10
15
v1 = 3.5 m/s
v2 = 5 m/s
v3 = 5 m/s
-15 -10 -5 0 5 10 15
-15
-10
-5
0
5
10
15
v1 = 3.5 m/s
v2 = 5 m/s
v3 = 4.5 m/s
-15 -10 -5 0 5 10 15
-15
-10
-5
0
5
10
15
v1 = 3.5 m/s
v2 = 5 m/s
v3 = 3.5 m/s
-15 -10 -5 0 5 10 15
-15
-10
-5
0
5
10
15
v1 = 3 m/s
v2 = 5 m/s
v3 = 0 m/s
0 1 2 3 4 5 6 7
Time [s]
0
0.2
0.4
0.6
0.8
1
yellow car
red car
0 1 2 3 4 5 6 7
Time [s]
0
0.2
0.4
0.6
0.8
1
yellow car
red car
0 1 2 3 4 5 6 7
Time [s]
0
0.2
0.4
0.6
0.8
1
yellow car
red car
(a-1) (a-2) (a-3)
(b-1) (b-2) (b-3)
(c-1) (c-2) (c-3)
(a-4) (b-4) (c-4)
Fig. 10. Interactions of the autonomous ego vehicle (blue) controlled by the
adaptive control approach with level-kvehicles at the four-way intersection.
(a-1)-(a-3) show three subsequent steps in a simulation where the autonomous
ego vehicle interacts with two level-1vehicles, and (a-4) shows the time
histories of the two vehicles’ level estimates where P(2) = P(k= 2) denotes
the ego vehicle’s belief in the level-2model; (b-1)-(b-4) show those of the
autonomous ego vehicle interacting with two level-2vehicles; (c-1)-(c-4) show
those of the autonomous ego vehicle interacting with a level-1vehicle (red)
and a level-2vehicle (yellow); v1,v2, and v3are the speeds of, respectively,
the blue, yellow and red vehicles.
an urban area in Los Angeles and consists of one four-way
intersection, one roundabout, and two T-shape intersections.
We let an autonomous ego vehicle controlled by the adaptive
control approach or the rule-based control approach drive
through this traffic scene. Apart from the autonomous ego
vehicle, we also put multiple other vehicles controlled by level-
kpolicies in the scene and let them drive through the scene
repeatedly. Their initial positions, lanes entering the scene, and
sequences of target lanes to traverse the scene are all randomly
chosen.
We evaluate the two control approaches based on two
statistical metrics: the rate of collision (CR) and the rate
of deadlock (DR). The rate of collision is defined as the
proportion of 2000 simulation episodes where the autonomous
ego vehicle collides with another vehicle or with the road
boundaries. The rate of deadlock is defined as the proportion
of 2000 simulation episodes where no collision occurs to
the autonomous ego vehicle but it fails to drive through the
scene in 300[s] of simulation time. We consider three traffic
models: 1) all of the other vehicles are level-1, called a “level-
-15 -10 -5 0 5 10 15
-15
-10
-5
0
5
v1 = 3.5 m/s
v2 = 3.5 m/s
v3 = 5 m/s
-15 -10 -5 0 5 10 15
-15
-10
-5
0
5
v1 = 4 m/s
v2 = 0 m/s
v3 = 3.5 m/s
-15 -10 -5 0 5 10 15
-15
-10
-5
0
5
v1 = 3.5 m/s
v2 = 5 m/s
v3 = 3 m/s
-15 -10 -5 0 5 10 15
-15
-10
-5
0
5
v1 = 3.5 m/s
v2 = 5 m/s
v3 = 5 m/s
-15 -10 -5 0 5 10 15
-15
-10
-5
0
5
v1 = 0.5 m/s
v2 = 5 m/s
v3 = 5 m/s
-15 -10 -5 0 5 10 15
-15
-10
-5
0
5
v1 = 3 m/s
v2 = 5 m/s
v3 = 5 m/s
-15 -10 -5 0 5 10 15
-15
-10
-5
0
5
v1 = 3.5 m/s
v2 = 5 m/s
v3 = 5 m/s
-15 -10 -5 0 5 10 15
-15
-10
-5
0
5
v1 = 3.5 m/s
v2 = 5 m/s
v3 = 0 m/s
-15 -10 -5 0 5 10 15
-15
-10
-5
0
5
v1 = 3 m/s
v2 = 5 m/s
v3 = 3 m/s
0 2 4 6 8
Time [s]
0
0.2
0.4
0.6
0.8
1
yellow car
red car
0 2 4 6 8
Time [s]
0
0.2
0.4
0.6
0.8
1
yellow car
red car
0 2 4 6 8
Time [s]
0
0.2
0.4
0.6
0.8
1
yellow car
red car
(a-1) (a-2) (a-3)
(b-1) (b-2) (b-3)
(c-1) (c-2) (c-3)
(a-4) (b-4) (c-4)
Fig. 11. Interactions of the autonomous ego vehicle (blue) controlled by the
adaptive control approach with level-kvehicles at the T-shape intersection. (a-
1)-(a-3) show three subsequent steps in a simulation where the autonomous
ego vehicle interacts with two level-1vehicles, and (a-4) shows the time
histories of the two vehicles’ level estimates where P(2) = P(k= 2) denotes
the ego vehicle’s belief in the level-2model; (b-1)-(b-4) show those of the
autonomous ego vehicle interacting with two level-2vehicles; (c-1)-(c-4) show
those of the autonomous ego vehicle interacting with a level-1vehicle (red)
and a level-2vehicle (yellow); v1,v2, and v3are the speeds of, respectively,
the blue, yellow and red vehicles.
1 environment,” 2) all of the other vehicles are level-2, called a
“level-2 environment,” and 3) the control policy of each of the
other vehicles is randomly chosen between level-1and level-2
with equal probability, called a “mixed environment.
The CR and DR results of the adaptive control approach
and the rule-based control approach for different numbers of
other vehicles in the scene are shown in Figs. 14 and 15.
The number of other vehicles, nv, represents traffic density,
roughly, 2.87nv[vehicles/mile] (the total length of the roads
is about 560 [m]).
From Fig. 14 it can be observed that, for the adaptive
control approach, the CR and DR increase as the traffic density
increases, which is reasonable. In particular, the increase in CR
slows down as the number of other vehicles goes beyond 20.
Among the results for different traffic models, the CR and DR
for the level-1 environment are the lowest and those for the
level-2 environment are the highest. This is also reasonable
since the level-1 environment, composed of level-1vehicles,
represents a cautious/conservative traffic model, the level-2
environment represents an aggressive traffic model and is thus
most challenging for the autonomous ego vehicle, while the
mixed environment lies in between. Furthermore, the results
for the adaptive control approach are less sensitive to changes
in traffic models than those for level-kpolicies with fixed
11
-15 -10 -5 0 5 10 15
-15
-10
-5
0
5
10
15
v1 = 4 m/s
v2 = 4 m/s
v3 = 5 m/s
-15 -10 -5 0 5 10 15
-15
-10
-5
0
5
10
15
v1 = 5 m/s
v2 = 5 m/s
v3 = 3 m/s
-15 -10 -5 0 5 10 15
-15
-10
-5
0
5
10
15
v1 = 5 m/s
v2 = 4.5 m/s
v3 = 5 m/s
-15 -10 -5 0 5 10 15
-15
-10
-5
0
5
10
15
v1 = 4 m/s
v2 = 5 m/s
v3 = 5 m/s
-15 -10 -5 0 5 10 15
-15
-10
-5
0
5
10
15
v1 = 2 m/s
v2 = 5 m/s
v3 = 5 m/s
-15 -10 -5 0 5 10 15
-15
-10
-5
0
5
10
15
v1 = 5 m/s
v2 = 5 m/s
v3 = 5 m/s
-15 -10 -5 0 5 10 15
-15
-10
-5
0
5
10
15
v1 = 4 m/s
v2 = 5 m/s
v3 = 5 m/s
-15 -10 -5 0 5 10 15
-15
-10
-5
0
5
10
15
v1 = 0 m/s
v2 = 3.5 m/s
v3 = 1.5 m/s
-15 -10 -5 0 5 10 15
-15
-10
-5
0
5
10
15
v1 = 2.5 m/s
v2 = 5 m/s
v3 = 1.5 m/s
0 2 4 6 8
Time [s]
0
0.2
0.4
0.6
0.8
1
yellow car
red car
0 2 4 6 8
Time [s]
0
0.2
0.4
0.6
0.8
1
yellow car
red car
0 2 4 6 8
Time [s]
0
0.2
0.4
0.6
0.8
1
yellow car
red car
(a-1) (a-2) (a-3)
(b-1) (b-2) (b-3)
(c-1) (c-2) (c-3)
(a-4) (b-4) (c-4)
Fig. 12. Interactions of the autonomous ego vehicle (blue) controlled by the
adaptive control approach with level-kvehicles at the roundabout intersection.
(a-1)-(a-3) show three subsequent steps in a simulation where the autonomous
ego vehicle interacts with two level-1vehicles, and (a-4) shows the time
histories of the two vehicles’ level estimates where P(2) = P(k= 2) denotes
the ego vehicle’s belief in the level-2model; (b-1)-(b-4) show those of the
autonomous ego vehicle interacting with two level-2vehicles; (c-1)-(c-4) show
those of the autonomous ego vehicle interacting with a level-1vehicle (red)
and a level-2vehicle (yellow); v1,v2, and v3are the speeds of, respectively,
the blue, yellow and red vehicles.
kshown in Fig. 9. This shows again the significance of
adaptation of autonomous vehicle control strategy to other
vehicles’ intentions and actions. Note that the rate of success
for a single intersection of the adaptive control approach, if
computed as 1CR+DR
4, is close to that of “L-1car in L-2
Env.” and that of “L-2car in L-1Env.,” which represent the
best performance of level-kpolicies.
For the rule-based control approach, it can be observed
from Fig. 15 that as the traffic density increases, the CR first
increases and then decreases, while the DR keeps increasing.
The decrease in CR when the traffic becomes very dense is
due to the constant yielding of the autonomous ego vehicle to
other vehicles, which causes the dramatic increase in DR.
Comparing the results of the two approaches, the adaptive
control approach performs better than the rule-based control
approach in the above experiments. This is attributed to the
more sophisticated and complicated algorithm behind the
adaptive control approach. However, the rule-based control is
more interpretable (e.g., the reason for the decrease in CR is
easily understood), and is easier to calibrate.
We show in Fig. 16 two informative cases observed in our
(a) (b)
Fig. 13. Traffic scene for evaluating autonomous vehicle control approaches.
(a) shows an urban area in Los Angeles (provided by Google Maps) and (b)
shows the model of the road system in (a).
0 5 10 15 20 25
Num. of cars
0
0.2
0.4
0.6
CR
L-1 Env.
L-2 Env.
Mix Env.
0 5 10 15 20 25
Num. of cars
0
0.2
0.4
0.6
DR
L-1 Env.
L-2 Env.
Mix Env.
(a) (b)
Fig. 14. Evaluation results of the adaptive control approach: (a) the rate of
collision (CR) and (b) the rate of deadlock (DR) versus different number of
environmental vehicles and different traffic models.
simulations. In the first case in Fig. 16(a), the autonomous
ego vehicle (blue) controlled by the adaptive control approach
and the level-1vehicle (yellow) on its left both yield to
the other and cause a deadlock. Note that a level-1vehicle
represents a vehicle with a cautious/conservative driver, and
accordingly, yields to the autonomous ego vehicle. Although
the autonomous ego vehicle eventually decides to proceed
ahead and successfully drives through the roundabout, it takes
too long for such a conflict to be resolved, and thus this
scenario falls into our DR category. To avoid such deadlock
scenarios, the autonomous ego vehicle may need to identify
the driving style of the opponent vehicle faster, which may be
achieved through a larger update step size β. In the second
case in Fig. 16(b), the autonomous ego vehicle controlled by
the rule-based control approach stops in the roundabout to
yield to the yellow vehicle on its right and within the critical
distance Rc(marked by the red dashed circle). However,
because the gap between the autonomous ego vehicle and the
yellow vehicle is still quite large, the red vehicle on the left of
the autonomous ego vehicle expects it to proceed and thus does
not slow down, which causes a collision. This scenario shows
that a larger critical distance Rcmay not always correspond to
a safer driving behavior. Such corner cases identified by our
simulations can inform test trajectory design for autonomous
vehicles.
We now optimize the threshold value Rcin the rule-based
control approach to achieve better performance defined by a
12
0 5 10 15 20 25
Num. of cars
0
0.2
0.4
0.6
CR
L-1 Env.
L-2 Env.
Mix Env.
0 5 10 15 20 25
Num. of cars
0
0.2
0.4
0.6
DR
L-1 Env.
L-2 Env.
Mix Env.
(a) (b)
Fig. 15. Evaluation results of the rule-based control approach with Rc=
14 [m]: (a) the rate of collision (CR) and (b) the rate of deadlock (DR) versus
different number of environmental vehicles and different traffic models.
(a) (b)
Fig. 16. Failure cases. (a) shows a scenario where the autonomous ego vehicle
(blue) controlled by the adaptive control approach gets stuck at the entrance
of the roundabout due to the level-1vehicle (yellow) on its left. (b) shows a
scenario where the autonomous ego vehicle (blue) controlled by the rule-based
control approach gets hit by the level-2vehicle (red) on its left.
performance index as follows:
J=1
nmax
nmax
X
n=1 wcφc(Sn) + wdφd(Sn) + wv
φs(Sn)
¯v(Sn) + ,
(20)
where Sndenotes the nth simulation episode; the φc(Sn),
φd(Sn), and φs(Sn)are indicator functions, taking 1if,
respectively, a collision occurs to the autonomous ego vehi-
cle, no collision but a deadlock occurs to the autonomous
ego vehicle, and neither collision nor deadlock occur and
the autonomous ego vehicle successfully drives through the
scene in 300[s] of simulation time in the nth simulation
episode, and taking 0otherwise; ¯v(Sn)is the average speed
of the autonomous ego vehicle in the nth simulation episode;
wc, wd, wv0are weighting factors, and  > 0is a constant
to adjust the shape of the function with respect to the average
speed ¯v(Sn)and to avoid the denominator being 0.
The performance index function (20) imposes penalties for
collisions and deadlocks through the first two terms, and
rewards higher average speeds through the last term. Note
that the last term is designed in such a way that the penalty
increases fast for decrease in speed values that are already
very low, and decreases slowly for increase in speed values
that are already very high. In obtaining the following results,
we run simulations in the same scene shown in Fig. 13 and
with 15 other vehicles, and we use wc= 10,wd= 5,wv= 1,
and = 0.1.
We plot the values of (20) for different values of Rcin
Fig. 17. Specifically, for each value of Rc, we run nmax =
2000 simulation episodes and calculate the value of (20) based
on the simulation results. Lower values of (20) represent better
performance in terms of having less collisions, less deadlocks,
and higher average travel speeds.
6 8 10 12 14 16
Rc [m]
0
2
4
6
8
10
J
L-1 Env.
L-2 Env.
Mix Env.
Fig. 17. Performance index Jas function of Rcof the rule-based control
approach with different traffic models.
In Fig. 17, the blue curve represents the result when the
autonomous ego vehicle operates in the level-1environment.
It can be observed that the performance is good when Rc
takes very small values, i.e., in the range of [6,7.5][m]. This
is because small Rccorresponds to aggressive behavior and
the level-1environment represents a conservative traffic model,
thus, the other vehicles almost always yield to the autonomous
ego vehicle when there is a conflict. Since the autonomous ego
vehicle proceeds ahead while the other vehicles yield, colli-
sions and deadlocks are avoided. However, when operating
in the level-2or mixed environment, small Rcleads to poor
performance. This is because both the autonomous ego vehicle
and the other vehicles behave aggressively and cause many
collisions. When Rctakes values in the range of [7.5,11][m],
the performance is the worst for all of the three traffic models.
This is because such Rcvalues correspond to behaviors in
between aggressive and conservative, which cause collisions
with both aggressive and conservative interacting vehicles. The
range [11.5,13][m] is suitable for choosing the value of Rc,
where the performance is good and insensitive to changes in
the traffic models. For larger Rcvalues, the autonomous ego
vehicle becomes overly conservative and almost always yields
to the other vehicles, which causes it difficulties to enter the
intersections and leads to many deadlocks.
VII. CONCLUSION
In this paper, we described a framework based on level-k
game theory for modeling traffic consisting of heterogeneous
(in terms of their driving styles) and interactive vehicles
in urban environments with unsignalized intersections. An
algorithm integrating the level-k decision-making formalism,
receding-horizon optimization, and imitation learning was
proposed and used to solve for level-kcontrol policies.
The developed traffic models are useful as simulation envi-
ronments for verification and validation of autonomous vehicle
control systems. In particular, we considered two autonomous
vehicle control approaches as case studies: an adaptive control
approach based on level-kvehicle models and a rule-based
control approach. We analyzed their characteristics and evalu-
ated their performance based on their testing results with our
13
traffic models, and then optimized the parameters of the rule-
based approach based on a performance index.
We envision that traffic models developed using the frame-
work proposed in this paper can also be integrated with urban
traffic/driving simulators with higher-fidelity car dynamics and
environmental representations, such as CARLA [41], using an
approach similar to that of [42], to create more realistic urban
traffic simulations and support autonomous driving system
development.
REFERENCES
[1] D. J. Fagnant and K. Kockelman, “Preparing a nation
for autonomous vehicles: opportunities, barriers and policy
recommendations,” Transportation Research Part A: Policy and
Practice, vol. 77, pp. 167 – 181, 2015. [Online]. Available:
http://www.sciencedirect.com/science/article/pii/S0965856415000804
[2] U.S. Department of Transportations National Highway Traffic Safety
Administration, “Automated Vehicles for Safety,” Tech. Rep., Avail-
able: https://www.nhtsa.gov/technology-innovation/automated-vehicles-
safety [June 18, 2019].
[3] J. Meyer, H. Becker, P. M. Bsch, and K. W. Axhausen, “Autonomous
vehicles: The next jump in accessibilities?” Research in Transportation
Economics, vol. 62, pp. 80 – 91, 2017. [Online]. Available:
http://www.sciencedirect.com/science/article/pii/S0739885917300021
[4] N. Kalra and S. M. Paddock, “Driving to safety: How many miles of
driving would it take to demonstrate autonomous vehicle reliability?”
Transportation Research Part A: Policy and Practice, vol. 94, pp. 182
– 193, 2016.
[5] J. Zhou and L. del Re, “Reduced complexity safety testing for adas &
adf,” IFAC-PapersOnLine, vol. 50, no. 1, pp. 5985–5990, 2017.
[6] H. Waschl, I. Kolmanovsky, and F. Willems, Control Strategies for Ad-
vanced Driver Assistance Systems and Autonomous Driving Functions.
Springer, 2019.
[7] C. Hubmann, J. Schulz, M. Becker, D. Althoff, and C. Stiller, “Au-
tomated driving in uncertain environments: Planning with interaction
and uncertain maneuver prediction,” IEEE Transactions on Intelligent
Vehicles, vol. 3, no. 1, pp. 5–17, March 2018.
[8] T. Bandyopadhyay, K. S. Won, E. Frazzoli, D. Hsu, W. S. Lee, and
D. Rus, “Intention-aware motion planning,” in Algorithmic Foundations
of Robotics X, E. Frazzoli, T. Lozano-Perez, N. Roy, and D. Rus, Eds.
Berlin, Heidelberg: Springer Berlin Heidelberg, 2013, pp. 475–491.
[9] C. Hubmann, J. Schulz, G. Xu, D. Althoff, and C. Stiller, “A belief state
planner for interactive merge maneuvers in congested traffic,” in 2018
21st International Conference on Intelligent Transportation Systems
(ITSC), Nov 2018, pp. 1617–1624.
[10] N. Li, A. Girard, and I. Kolmanovsky, “Stochastic predictive control for
partially observable Markov decision processes with time-joint chance
constraints and application to autonomous vehicle control,” Journal of
Dynamic Systems, Measurement, and Control, vol. 141, no. 7, p. 071007,
2019.
[11] W. Schwarting, J. Alonso-Mora, L. Paull, S. Karaman, and D. Rus, “Safe
nonlinear trajectory generation for parallel autonomy with a dynamic ve-
hicle model,” IEEE Transactions on Intelligent Transportation Systems,
vol. 19, no. 9, pp. 2994–3008, Sep. 2018.
[12] G. Cesari, G. Schildbach, A. Carvalho, and F. Borrelli, “Scenario model
predictive control for lane change assistance and autonomous driving on
highways,” IEEE Intelligent Transportation Systems Magazine, vol. 9,
no. 3, pp. 23–35, Fall 2017.
[13] M. Bahram, A. Lawitzky, J. Friedrichs, M. Aeberhard, and D. Woll-
herr, “A game-theoretic approach to replanning-aware interactive scene
prediction and planning,” IEEE Transactions on Vehicular Technology,
vol. 65, no. 6, pp. 3981–3992, June 2016.
[14] D. Sadigh, S. Sastry, S. A. Seshia, and A. D. Dragan, “Planning for
autonomous cars that leverage effects on human actions.” in Robotics:
Science and Systems, vol. 2. Ann Arbor, MI, USA, 2016.
[15] J. F. Fisac, E. Bronstein, E. Stefansson, D. Sadigh, S. S. Sastry, and
A. D. Dragan, “Hierarchical game-theoretic planning for autonomous
vehicles,” in 2019 International Conference on Robotics and Automation
(ICRA). IEEE, 2019, pp. 9590–9596.
[16] H. Yu, H. E. Tseng, and R. Langari, “A human-like game theory-based
controller for automatic lane changing,” Transportation Research Part
C: Emerging Technologies, vol. 88, pp. 140–158, 2018.
[17] A. Dreves and M. Gerdts, “A generalized Nash equilibrium approach
for optimal control problems of autonomous cars,” Optimal Control
Applications and Methods, vol. 39, no. 1, pp. 326–342, 2018. [Online].
Available: https://onlinelibrary.wiley.com/doi/abs/10.1002/oca.2348
[18] C. Vallon, Z. Ercan, A. Carvalho, and F. Borrelli, “A machine learning
approach for personalized autonomous lane change initiation and con-
trol,” in 2017 IEEE Intelligent Vehicles Symposium (IV), June 2017, pp.
1590–1595.
[19] H. Xu, Y. Gao, F. Yu, and T. Darrell, “End-to-end learning of driving
models from large-scale video datasets,” in Proceedings of the IEEE
conference on computer vision and pattern recognition, 2017, pp. 2174–
2182.
[20] N. Li, D. W. Oyler, M. Zhang, Y. Yildiz, I. Kolmanovsky, and A. R.
Girard, “Game theoretic modeling of driver and vehicle interactions for
verification and validation of autonomous vehicle control systems,IEEE
Transactions on Control Systems Technology, vol. 26, no. 5, pp. 1782–
1797, Sep. 2018.
[21] Federal Highway Administration, “Intersection safety needs identifica-
tion report,” Tech. Rep., Available: https://safety.fhwa.dot.gov /intersec-
tion/other topics/needsidrpt/needsidrpt.pdf [Jun. 20, 2019].
[22] ——, “Unsignalized intersections,” Tech. Rep., Available: https://
safety.fhwa.dot.gov/intersection/conventional/unsignalized/ [Jun. 20,
2019].
[23] M. Bouton, A. Cosgun, and M. J. Kochenderfer, “Belief state planning
for autonomously navigating urban intersections,” in 2017 IEEE Intelli-
gent Vehicles Symposium (IV), June 2017, pp. 825–830.
[24] D. Isele, R. Rahimi, A. Cosgun, K. Subramanian, and K. Fujimura,
“Navigating occluded intersections with autonomous vehicles using deep
reinforcement learning,” in 2018 IEEE International Conference on
Robotics and Automation (ICRA), May 2018, pp. 2034–2039.
[25] S. Pruekprasert, J. Dubut, X. Zhang, C. Huang, and M. Kishida, “A
game theoretic approach to decision making for multiple vehicles at
roundabout,” arXiv preprint arXiv:1904.06224, 2019.
[26] S. Pruekprasert, X. Zhang, J. Dubut, C. Huang, and M. Kishida,
“Decision making for autonomous vehicles at unsignalized intersection
in presence of malicious vehicles,” arXiv preprint arXiv:1904.10158,
2019.
[27] N. Li, Y. Yao, I. V. Kolmanovsky, E. M. Atkins, and A. Girard,
“Game-theoretic modeling of multi-vehicle interactions at uncontrolled
intersections,” CoRR, vol. abs/1904.05423, 2019. [Online]. Available:
http://arxiv.org/abs/1904.05423
[28] D. O. Stahl and P. W. Wilson, “On players’ models of other players:
Theory and experimental evidence,Games and Economic Behavior,
vol. 10, no. 1, pp. 218–254, 1995.
[29] M. A. Costa-Gomes and V. P. Crawford, “Cognition and behavior in two-
person guessing games: An experimental study,American Economic
Review, vol. 96, no. 5, pp. 1737–1768, Dec. 2006.
[30] N. Li, I. Kolmanovsky, A. Girard, and Y. Yildiz, “Game theoretic
modeling of vehicle interactions at unsignalized intersections and appli-
cation to autonomous vehicle control,” in 2018 Annual American Control
Conference (ACC), June 2018, pp. 3215–3220.
[31] R. Tian, S. Li, N. Li, I. Kolmanovsky, A. Girard, and Y. Yildiz, “Adap-
tive game-theoretic decision making for autonomous vehicle control
at roundabouts,” in 2018 IEEE Conference on Decision and Control
(CDC), Dec 2018, pp. 321–326.
[32] A. Y. Ng, S. J. Russell, et al., “Algorithms for inverse reinforcement
learning,” in International Conference on Machine Learning, 2000.
[33] B. D. Ziebart, A. Maas, J. A. Bagnell, and A. K. Dey, “Maximum en-
tropy inverse reinforcement learning,” in AAAI Conference on Artificial
Intelligence, 2008.
[34] L. Claussmann, A. Carvalho, and G. Schildbach, “A path planner for
autonomous driving on highways using a human mimicry approach with
binary decision diagrams,” in 2015 European Control Conference (ECC).
IEEE, 2015, pp. 2976–2982.
[35] F. Codevilla, M. Miiller, A. L´
opez, V. Koltun, and A. Dosovitskiy,
“End-to-end driving via conditional imitation learning,” in 2018 IEEE
International Conference on Robotics and Automation (ICRA), May
2018, pp. 1–9.
[36] L. Sun, C. Peng, W. Zhan, and M. Tomizuka, “A fast integrated
planning and control framework for autonomous driving via imitation
learning,” in ASME 2018 Dynamic Systems and Control Conference.
American Society of Mechanical Engineers, 2018, pp. V003T37A012–
V003T37A012.
[37] S. Ross, G. Gordon, and D. Bagnell, “A reduction of imitation learning
and structured prediction to no-regret online learning,” in Proceedings
of the fourteenth international conference on artificial intelligence and
statistics, 2011, pp. 627–635.
14
[38] K. Fitzpatrick, M. D. Wooldridge, and J. D. Blaschke, Feb 2005, ch.
Urban Intersection Design Guide: Volume 1 - Guidelines, Tech Report.
[39] M. A. Costa-Gomes, N. Iriberri, and V. P. Crawford, “Comparing models
of strategic thinking in Van Huyck, Battalio, and Beil’s coordination
games,” Journal of the European Economic Association, vol. 7, no. 2/3,
pp. 365–376, 2009.
[40] G. Chou, Y. E. Sahin, L. Yang, K. J. Rutledge, P. Nilsson, and N. Ozay,
“Using control synthesis to generate corner cases: A case study on
autonomous driving,IEEE Transactions on Computer-Aided Design of
Integrated Circuits and Systems, vol. 37, no. 11, pp. 2906–2917, 2018.
[41] A. Dosovitskiy, G. Ros, F. Codevilla, A. Lopez, and V. Koltun,
“CARLA: An open urban driving simulator,” in Proceedings of the 1st
Annual Conference on Robot Learning, 2017, pp. 1–16.
[42] G. Su, N. Li, Y. Yildiz, A. Girard, and I. Kolmanovsky, “A traffic sim-
ulation model with interactive drivers and high-fidelity car dynamics,
IFAC-PapersOnLine, vol. 51, no. 34, pp. 384–389, 2019.
Ran Tian received his B.S. degree in aerospace
engineering from the University of Michigan, Ann
Arbor, MI, USA, in 2016, and his B.S. degree in
mechanical engineering from the Shanghai Jiao Tong
University, Shanghai, China, in 2017. He received
his M.S. degree in Robotics from the University
of Michigan, Ann Arbor, MI, USA, in 2019. His
current research interests include decision-making
under uncertainty and human-robot interaction.
Nan Li received the B.S. degree in automotive engi-
neering from Tongji University, Shanghai, China, in
2014, and the M.S. degree in mechanical engineering
from the University of Michigan, Ann Arbor, MI,
USA, in 2016, where he is pursuing the Ph.D. degree
in aerospace engineering. His current research inter-
ests are stochastic control and multi-agent systems.
Ilya Kolmanovsky is a professor in the depart-
ment of aerospace engineering at the University of
Michigan, with research interests in control theory
for systems with state and control constraints, and
in control applications to aerospace and automotive
systems. He received his Ph.D. degree from the
University of Michigan in 1995.
Yildiray Yildiz is an assistant professor at Bilkent
University, Ankara. He received his B.S. degree
(valedictorian) in mechanical engineering from Mid-
dle East Technical University, Ankara in 2002; M.S.
degree in mechatronics from Sabanci University,
Istanbul, in 2004; and Ph.D. degree in mechanical
engineering with a mathematics minor from MIT
in 2009. He held postdoctoral associate and asso-
ciate scientist positions with NASA Ames Research
Center, California, employed by the University of
California, Santa Cruz, through its University Affil-
iated Research Center, from 2009 to 2010 and 2010 to 2014, respectively.
He is the recipient of NASA Honor Award, Young Scientist Award from
Science Academy of Turkey, Young Scientist Award from Turkish Academy
of Sciences, Research Incentive Award from Prof. Mustafa Parlar Education
and Research Foundation, and best student conference paper award from
ASME. He is an IEEE Senior Member and currently serving as an associate
editor for IEEE Control Systems Magazine and European Journal of Control.
His research interests include control, machine learning, game theory, and
applications of these fields for modeling and control of automotive and
aerospace systems.
Anouck R. Girard received the Ph.D. degree in
ocean engineering from the University of California,
Berkeley, CA, USA, in 2002. She has been with the
University of Michigan, Ann Arbor, MI, USA, since
2006, where she is currently an Associate Professor
of Aerospace Engineering. She has co-authored the
book Fundamentals of Aerospace Navigation and
Guidance (Cambridge University Press, 2014). Her
current research interests include flight dynamics
and control systems. Dr. Girard was a recipient of
the Silver Shaft Teaching Award from the University
of Michigan and a Best Student Paper Award from the American Society of
Mechanical Engineers.
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
This paper employs correct-by-construction control synthesis, in particular controlled invariant set computations, for falsification. Our hypothesis is that if it is possible to compute a “large enough" controlled invariant set either for the actual system model or some simplification of the system model, interesting corner cases for other control designs can be generated by sampling initial conditions from the boundary of this controlled invariant set. Moreover, if falsifying trajectories for a given control design can be found through such sampling, then the controlled invariant set can be used as a supervisor to ensure safe operation of the control design under consideration. In addition to interesting initial conditions, which are mostly related to safety violations in transients, we use solutions from a dual game, a reachability game for the safety specification, to find falsifying inputs. We also propose optimization-based heuristics for input generation for cases when the state is outside the winning set of the dual game. To demonstrate the proposed ideas, we consider case studies from basic autonomous driving functionality, in particular, adaptive cruise control and lane keeping. We show how the proposed technique can be used to find interesting falsifying trajectories for classical control designs like proportional controllers, proportional integral controllers and model predictive controllers, as well as an open source real-world autonomous driving package.
Article
Motivated by the need for simulation tools for testing, verification and validation of autonomous driving systems that operate in traffic consisting of both autonomous and human-driven vehicles, we propose a game-theoretic framework for modeling the interactive behavior of vehicles at uncontrolled intersections. The proposed vehicle interaction model is based on a novel formulation of dynamic games with multiple concurrent leader-follower pairs, induced from common traffic rules. Based on simulation results for various intersection scenarios, we show that the model exhibits reasonable behavior expected in traffic, including the capability of reproducing scenarios extracted from real-world traffic data and reasonable performance in resolving traffic conflicts. The model is further validated based on the level-of-service traffic quality rating system and demonstrates manageable computational complexity compared to traditional multi-player game-theoretic models.
Article
This paper describes a stochastic predictive control algorithm for partially observable Markov decision processes (POMDPs) with time-joint chance constraints. We first present the algorithm as a general tool to treat finite-space POMDP problems with time-joint chance constraints together with its theoretical properties. We then discuss its application to autonomous vehicle control on highways. In particular, we model decision-making/behavior-planning for an autonomous vehicle accounting for safety in a dynamic and uncertain environment as a constrained POMDP problem and solve it using the proposed algorithm. After behavior is planned, we use nonlinear model predictive control to execute the behavior commands generated from the planner. This two-layer control framework is shown to be effective by simulations.
Conference Paper
In this paper, we propose a decision making algorithm for autonomous vehicle control at a roundabout intersection. The algorithm is based on a game-theoretic model representing the interactions between the ego vehicle and an opponent vehicle, and adapts to an online estimated driver type of the opponent vehicle. Simulation results are reported.
Article
Lane changing is a critical task for autonomous driving, especially in heavy traffic. Numerous automatic lane-changing algorithms have been proposed. However, surrounding vehicles are usually treated as moving obstacles without considering the interaction between vehicles/drivers. This paper presents a game theory-based lane-changing model, which mimics human behavior by interacting with surrounding drivers using the turn signal and lateral moves. The aggressiveness of the surrounding vehicles/drivers is estimated based on their reactions. With this model, the controller is capable of extracting information and learning from the interaction in real time. As such, the optimal timing and acceleration for changing lanes with respect to a variety of aggressiveness in target lane vehicle behavior are found accordingly. The game theory-based controller was tested in Simulink and dSPACE. Scenarios were designed so that a vehicle controlled by a game theory-based controller could interact with vehicles controlled by both robot and human drivers. Test results show that the game theory-based controller is capable of changing lanes in a human-like manner and outperforms fixed rule-based controllers.