Conference PaperPDF Available

Courteous Autonomous Cars

Authors:
Courteous Autonomous Cars
Liting Sun1, Wei Zhan1, Masayoshi Tomizuka1, and Anca D. Dragan2
Abstract Typically, autonomous cars optimize for a
combination of safety, efficiency, and driving quality. But
as we get better at this optimization, we start seeing
behavior go from too conservative to too aggressive. The
car’s behavior exposes the incentives we provide in its
cost function. In this work, we argue for cars that are not
optimizing a purely selfish cost, but also try to be courteous
to other interactive drivers. We formalize courtesy as a
term in the objective that measures the increase in another
driver’s cost induced by the autonomous car’s behavior.
Such a courtesy term enables the robot car to be aware
of possible irrationality of the human behavior, and plan
accordingly. We analyze the effect of courtesy in a variety
of scenarios. We find, for example, that courteous robot
cars leave more space when merging in front of a human
driver. Moreover, we find that such a courtesy term can help
explain real human driver behavior on the NGSIM dataset.
I. Introduction
Autonomous cars are getting better at generating
their motion not only in isolation, but also around
people. We now have many strategies for dealing with
interactions with people on the road, each modeling
people in substantially different ways.
Most techniques first anticipate what people plan on
doing, and generate the car’s motion to be efficient, but
also to safely stay out of their way. This prediction can
be as simple as assuming the person will maintain their
current velocity within the planning horizon [1]–[3], or
as complicated as learning a human driver policy or
cost function [4]–[7].
Other techniques account for the interactive nature of
coordinating on the road, and model people as chang-
ing their plans depending on what the car does. Some
do it via coupled planning, assuming that the person
and the robot are on the same team, optimizing the
same joint cost function [8]–[10], while others capture
interaction as a game in which the human and robot
have different utilities, but they influence each other’s
actions [11]–[13].
All of these works focus on how to optimize the
robot’s cost when the robot needs to interact with
people. In this paper, we focus on what the robot should
optimize in such situations, particularly if we consider
the fact that humans are not perfectly rational.
1Liting Sun, Wei Zhan and Masayoshi Tomizuka are with
the Department of Mechanical Engineering, University of Cal-
ifornia, Berkeley, CA, USA, 94720. {litingsun, wzhan,
tomizuka}@berkeley.edu
2Anca D. Dragan is with the Department of Electrical Engineering
and Computer Sciences, University of California, Berkeley, CA, USA,
94720. anca@berkeley.edu
Typically, when designing the robot’s cost function,
we focus on safety and driving quality of the ego
vehicle. Arguably, that is rather selfish.
Selfishness has not been a problem with approaches
that predict human plans and react to them, because
that led to conservative robots that always try to stay
out of the way and let people do what they want.
But, as we are switching to more recent approaches
that draw on the game-theoretic aspects of interaction,
our cars are starting to become more aggressive. They
cut people off, or inch forward at intersections to go
first [11] [14]. While this behavior is good sometimes,
we would not want to see it all the time.
Our observation is that as we get better at solving the
optimization problem for driving by better models of
the world and of the people in it, there is an increased
burden on the cost function we optimize to capture
what we want. We propose that purely selfish robots
that care about their safety and driving quality are not
good enough. They should also be courteous to other
drivers. This is of crucial importance since humans
are not perfectly rational, and their behavior will be
influenced by the aggressiveness of the robot cars.
We advocate that a robot should balance minimizing the
inconvenience it brings to another driver, and that we can
formalize inconvenience as the increase in the other driver’s
cost due to the robot’s behavior to capture one aspect of
human behavior irrationality.
We make the following contributions:
A formalism for courtesy incorporating irrational
human behavior. We formalize courteous planning as
trading off between the robot’s selfish objective and a
courtesy term, and introduce a mathematical defini-
tion for this term for irrational human behavior – we
measure the increase of the vehicle’s best cost under
the robot’s planned behavior, compared to the vehicle’s
best cost under an alternative "best case scenario", and
define the cost increase as the courtesy term.
An analysis of the effects of courteous planning.
We show the difference between courteous and selfish
robots under different traffic scenarios. The courteous
robot leaves the person more space when it merges,
and might even block another agent (not a person) to
ensure that the human can safely proceed.
Showing that courtesy helps explain human driving.
We do an Inverse Reinforcement Learning (IRL)-based
analysis [7], [16]–[18] to study whether our courtesy
term helps in better predicting how humans drive. On
the NGSIM dataset [19] of real human driver trajecto-
arXiv:1808.02633v2 [cs.RO] 16 Aug 2018
ries, we find that courtesy produces trajectories that are
significantly closer to the ground truth.
We think that the autonomous car of the future
should be safe, efficient, and courteous to others, per-
haps even more so than represented in our current
human-only driving society. Our paper enables au-
tonomous car designers to decide to make that happen.
II. Problem Statement
In this paper, we consider an interactive robot-human
system with two agents: an autonomous car Rand a
human driver H1. Our task is to enable a courteous
robot car which cares about the potential inconvenience
it brings to the human driver’s utilities, and generates
trajectories that are socially predictable and acceptable.
Throughout the paper, we denote all robot-related
terms by subscript (·)Rand all human-related terms
by (·)H.
Let xRand uRdenote, respectively, the robot’s state
and control input, and xHand uHfor the human’s.
x=(xT
R,xT
H)Trepresents the states of the interaction
system. For each agent, we have
xt+1
R=fRxt
R,ut
R, (1)
xt+1
H=fHxt
H,ut
H, (2)
and the overall system dynamics are
xt+1=fxt,ut
R,ut
H. (3)
We assume that both the human driver and the
autonomous car are optimal planners, and they use
Model Predictive Control (MPC) with a horizon of
length N. Let CRand CHbe, respectively, the cost
functions of the robot car and the human driver over
the horizon:
Cixt,uR,uH;θi=
N1
k=0
cixt,k,uk
R,uk
H;θi,i∈{R,H}
(4)
where ui=(u0
i,u1
i,··· ,uN1
i)Tare sequences of con-
trol actions of the robot car (i=R) and the human
driver (i=H), and xt,kwith k=0, 1, ··· ,N1 are the
corresponding sequence of system states. θirepresent,
respectively, the preferences of the robot car (i=R) and
the human driver (i=H). At every time step t, the
robot car and the human driver generate their optimal
sequences of actions u
Rand u
Hby minimizing CRand
CH, respectively, execute the first steps u0
Rand u0
H(i.e.,
set ut
i=u0
iin (3)), and replan for step t+1.
Such an optimization-based state feedback strategy
formulates the closed-loop dynamics of the robot-
human interaction system as a game. To simplify the
game, we assume that the robot car has access to CH,
and that the human only computes a best response to
the robot’s actions rather than trying to influence them,
1If there are multiple robot cars that we control, we treat them all
as a single R. If there are multiple human drivers, we reason about
how each of them affects the robot’s utility separately.
as in [11]. This means that the robot car can compute,
for any control sequence it considers, how the human
would respond and what cost the human will incur:
u
H=arg min
uHCHxt,uR,uH;θH,g(xt,uR;θH)(5)
C
H(uR) = CHxt,uR,g(xt,uR;θH);θH. (6)
Here g(xt,uR;θH)represents the response curve of the
human driver towards the autonomous car.
Armed with this model, the robot can now compute
what it should do, such that when the human responds,
the combination is good for the robot’s cost:
u
R=arg min
uRCRxt,uR,g(xt,uR;θH);θR. (7)
Our goal is to generate courteous robot behavior
to the human, i.e. that takes into consideration the
inconvenience it brings to the human driver. We will
do so by changing the cost function of the robot to
reflect this inconvenience.
III. Courteous Planning
We propose a courteous planning strategy based on
one key observation: human is not perfectly rational,
and one of the irrationality is that they weight losses
higher than gains when evaluating their actions [15].
Hence, a courteous robot car should balance the mini-
mization of its own cost function and the inconvenience
(loss) it brings to the human driver.
Therefore, we construct CRin (7) as
CRxt,uR,uH;θR,θH,λc=Csel f
Rxt,uR,uH;θR
+λcCcourt
Rxt,uR,uH;θH,(8)
where Csel f
Ris the cost function for a regular (selfish)
robot car which cares about only its own utilities
(safety, efficiency, etc), and Ccourt
Rmodels the courtesy
term of the robot car to the human driver. It is a
function of the robot car’s behavior, the human’s be-
havior, the human’s cost parameters (θH) and some
alternative costs (see Section III.A). λc[0, )captures
the trade-off. If we want the robot car to be just as
courteous as a human driver, we could learn λcfrom
human driver demonstration, as we do in Section V.
As robot designers, we might set this parameter higher
than regular human driving to enable more courteous
autonomous cars, particularly when they do not have
passengers on board.
A. Alternative Costs
With any robot plan uR, the robot car changes the
human driver’s environment and therefore induces a
best cost for the human, C
H(uR). Our courtesy term
compares this cost with the alternative,Calt,
H– the best
case scenario for the person. It is not immediately clear
how to define this best case scenario since it may vary
depending different on driving scenarios. We explore
three alternatives.
What the human could have done, had the robot car
not been there. We first consider a world in which the
robot car wouldn’t even exist to interfere the person.
In such a world, the person gets to optimize their cost
without the robot car:
Calt,
H(xt,θH) = min
uHCH(xt,uH;θH)(9)
This induces a very generous definition of courtesy: the
alternative is for the robot car to not have been on the
road at all. In reality though, the robot car is there,
which leads to our second alternative.
What the human could have done, had the robot
car only been there to help the human. Our second
alternative is to assume that the robot car already on
the road could be completely altruistic. The robot car
could actually optimize the human driver’s cost, being
a perfect collaborator:
Calt,
H(xt,θH) = min
uH,uRCH(xt,uR,uH;θH)(10)
For this alternative, the robot car and the human would
perform a joint optimization for the human’s cost. For
example, the robot car can brake to make sure that the
human could change lanes in front of it, or even block
another traffic participant to make sure the human has
space.
What the human could have done, had the robot car
just kept doing what it was previously doing. A fully
collaborative robot car is still perhaps not the fairest
one to compute inconvenience against. After all, the
autonomous car does have a passenger sometimes, and
it is fair to take their needs into account too. Our third
alternative computes how well the human driver could
have done, had the robot car kept acting the same way
as it was previously doing:
Calt,
H(xt,θH) = min
uHCH(xt,ut1
R,uH;θH)(11)
This means that the person is now responding to a con-
stant robot trajectory ut1
R=(ut1
R, .., ut1
R), for instance,
maintaining its current velocity.
Our experiments below explore these three different
alternative options for the courtesy term.
B. Courtesy Term
We define the courtesy term based on the difference
between what cost the human has, and what cost they
would have had in the alternative:
Definition 1 (Courtesy of the Robot Car)
Ccourt
R(xt,uR,uH;θH) = max{0, CH(xt,uR,uH;θH)
Calt,
H(xt;θH)}(12)
Note that we could have also sent the courtesy term
to simply be the human cost, and have the robot trade
off between its cost and the human’s. However, that
would have penalized the robot for any cost the human
incurs, even if the robot does not bring any inconve-
nience to the human. That might cause too conservative
behavior. In fact, if we treat the alternative cost as
the reference point in Prospect Theory – a human
irrationality model [15], then the theory suggests that
human weigh losses more than gains. This means that
our courteous robot car should care more about avoid-
ing additional inconvenience, rather than providing
more convenience, i.e., helping to reduce the human
cost lower than the alternative one. Mathematically, this
concept is formulated via Definition 1: the robot does
not get any bonus for bringing the human cost lower
than Calt,
H(possible with some definitions of Calt,
H), it
only gets a penalty for making it higher.
C. Solution
Thus far, we have constructed a compound cost func-
tion CR(xt,uR,uH;θR,θH,λc)to enable a courteous
robot car, considering three alternative costs. At every
step, the robot needs to solve the optimization problem
in (7) to find the best actions to take. We approximate
the solution by alternatively fixing one of uRor uH,
and solving for the other.
IV. Analysis of Courteous Planning
In this section, we analyze the effect of courteous
planning on the robot’s behavior in different simu-
lated driving scenarios. In Section V, we study how
courteous planning can help better explain real human
driving data, enabling robots to be more human-like
and predictable, as well as better able at anticipating
human driver actions on the road.
Simulation Environment: We implement the simulation
environment using Julia [20] on a 2.5 GHz Intel Core i7
processor with 16 GB RAM. We set the horizon length
to N=10, and the sampling time to 0.1s. Our simulated
environment is 1/10 scale of the real world: 1/10 road
width, car sizes, maximum acceleration (0.5m/s2) and
deceleration (-1.0m/s2), and low speed limit (1.0m/s).
Regarding the cost functions CHand CRin (6)-
(8), except for the courtesy term formulated above,
we penalize safety, car speed, comfort level and goal
distances in both CHand Csel f
R. Details about this can
be found later in Section V.
For all results, we denote a selfish (baseline) au-
tonomous car with gray rectangle, a courteous one as
orange, and the human driver as dark blue.
A. The Effect of Courtesy
1) Lane Changing: We first consider a lane changing
driving scenario, as shown in Fig. 1. The autonomous
car wants to merge into the human driver’s lane from
an adjacent lane. We assume that the goal of the human
driver is to maintain speed. Then all three different al-
ternatives lead to the same alternative optimal behavior
and cost of the human: the human would go in their
lane undisturbed by the robot. Hence, with constant
(b) intermediate courteous robot car:
human driver’s inconvenience = 0.0173
(c) most courteous robot car:
human driver’s inconvenience = 0
(a) selfish robot car:
human driver’s inconvenience = 0.2063
Speed (m/s):
human: 0.78
robot: 1.0
human: 0.7
robot: 1.0
human: 0.75
robot: 1.0
human: 0.80
robot: 1.0
human: 0.85
robot: 1.0
human: 0.85
robot: 1.0
human: 0.85
robot: 1.0
human: 0.85
robot: 1.0
human: 0.85
robot: 1.0
Fig. 1: A lane changing scenario: both the human car and robot car speed at 0.85 m/s initially; (a) a selfish robot car merges in front of the
human with a small gap so that the human brakes to yield; (b) an intermediate courteous robot car merges with a larger gap, which releases
the human driver from hard brakes; (c) a most courteous robot car merges with a gap large enough so that the human can maintain speed.
Speed (m/s):
human: 0.9
robot: 0.99
human: 0.8
robot: 1.0
human: 0.7
robot: 1.0
human: 0.9
robot: 0.92
human: 0.9
robot: 0.85
human: 0.9
robot: 0.85
(b) courteous robot car:
human driver’s inconvenience = 0
(a) selfish robot car:
human driver’s inconvenience = 0.385
Fig. 2: Another lane changing scenario: both the human car and robot
car speed at 0.9 m/s initially; (a) a selfish robot car accelerates and
merges in front of the human driver with a small gap, scaring the
human driver to brake; (b) a courteous robot car decelerates and
merges after the human driver so that the human can maintain speed.
Calt,
H, we focus on the influence of the trade-of factor
λcin the results.
We present two sets of simulation results in Fig. 1
and Fig. 2, where the initial human driver’s speeds are
0.85 m/s and 0.9 m/s respectively. The results show
that as λcincreases, i.e., being more courteous, the
autonomous car tends to leave a larger gap when it
merges in front of the human, and the human brakes
less (Fig. 1 from left to right). When the human driver’s
initial speed is high enough, a courteous autonomous
car decides to merge afterwards instead of cutting in,
as shown in Fig. 2.
Figure 3 summarizes the relationship between the
human driver’s inconvenience (the magnitude of the
courtesy term) and λcfor the simulation conditions
in Fig. 1. One can note that as the courtesy of the
autonomous car increases, the human driver’s incon-
venience decreases.
103101101103105
Courtesy weight λc
0.00
0.05
0.10
0.15
0.20
human’s inconvenience
Fig. 3: Inconvenience to the human decreases as λcincreases.
2) Turning Left: In this scenario, an autonomous car
wants to take a left turn at an intersection with a
straight-driving human. In this case as well, the al-
ternative behaviors that we consider when evaluating
inconvenience are the same among three different al-
ternatives: the human driver crosses the intersection
maintaining speed.
Simulation results with a courteous and selfish au-
tonomous car are shown in Fig. 4, where a selfish robot
car takes a left turn immediately and forces the human
driver to brake (Fig. 4(a)); while a courteous robot car
waits in the middle of the intersection and takes the
left turn after the human driver passes the intersection
so that the human can maintain its speed (Fig. 4(b)).
B. Influence of Different Alternative Costs for Evaluating
Inconvenience
In the previous examples, the human would have
arrived at the same trajectory regardless of which alter-
(b) a courteous robot car waits until the human
driver passes the intersection
(a) a selfish robot car takes the left turn first
and forces the human driver to brake
Fig. 4: Interaction between a straight-driving human and a left-turning autonomous car: (a) a selfish (baseline) robot car takes a left turn
immediately and forces the human driver to brake (red frames); (b) a courteous robot car waits in the middle of the intersection and takes
the left turn after the human passes so that the human can maintain speed.
native world we are considering to evaluate how much
inconvenience the autonomous car is causing. Here, we
consider a scenario in which that is no longer the case
to highlight the differences generated by the alternative
formulations of courtesy in the robot car’s behavior.
We consider a scenario where the human is turning
right, with a straight-driving robot car coming from
their left. In this scenario, the three alternative costs
are different, which leads to different courtesy terms:
Alternative I–Robot car not being there: the op-
timal human behavior would be to take a right
turning directly;
Alternative II–Robot car being collaborative: the
robot would take the necessary yielding maneuver
to let the human driver take the right turn first,
leading to the same alternative optimal human
behavior of performing the right turn directly;
Alternative III–Robot car maintaining behavior: the
robot car would maintain its speed, and the opti-
mal human behavior would be to slow down.
Figure 5 summarizes the results of using these differ-
ent courtesy terms. In Alternative III, a courteous robot
car goes first, as shown in Fig. 5(a). Intuitively, this
is because Calt,
His initially high, and by maintaining
its speed (or even accelerating depending on Csel f
R), no
further inconvenience is brought to the human by the
robot car, i.e., Ccourt
Rremains zero. Hence, the robot car
goes first (Had the robot try to brake, it only increases
Csel f
Rwithout changing Ccourt
R=0, and therefore CR
increases). The other two alternatives (I and II) are
much more generous to the human. Results in Fig. 5(b)
show that a courteous robot car finds it too expensive
to force the human to go second, and slows down to let
the human go first. The red frames in Fig. 5(b) indicate
the time instants when the autonomous car brakes.
C. Extension to environments with multiple agents
We study a scenario on a two-way road. The robot car
and the human are driving towards opposite directions,
but the robot car is blocked and it has to temporarily
merge into the human driver’s lane to get through, as in
Fig. 6. We use the collaborative robot as our alternative
formulation of the courtesy term in this scenario.
(b) a courteous robot car yields and let the human go first
(a) a courteous robot car goes first
Fig. 5: Interaction between a right-turning human driver and a
courteous autonomous car with different courtesy terms: (a) the robot
car goes first when it evaluates the courtesy term using going forward
as an alternative world; (b) the robot car yields and let the human
go first when it evaluates the courtesy term based on a collaborative
or not-being-there alternative world.
When there are only two agents in the environment,
i.e., the autonomous car and the human driver, the
results for a selfish and a courteous autonomous car
are shown in Fig. 6(a)-(b): A selfish autonomous car
directly merges into the human’s lane and forces the
human driver to brake; while a courteous autonomous
car decides to wait until the human driver passes by
since the courtesy term becomes too expensive to go
first.
Such courtesy-aware planning becomes much more
interesting when there is a third agent in the envi-
ronment, as shown in Fig. 6(c). We assume that the
third agent is a responsive agent to the autonomous
car and the autonomous car is courteous only to the
human driver (and not to both). In this case, for Calt,
H,
the human would ideally want to pass undisturbed
by either the robot or the other agent: the courtesy
term captures the difference in cost to the human
between the robot’s behavior and the alternative of a
collaborative robot, and this cost to the human depends
on how much progress the human is able to make
and how fast. As a result, a very courteous robot has
an incentive to produce behavior that is as close as
possible to making that happen.
Then an interesting behavior emerges: the au-
tonomous car first backs up to block the third agent
(the following car) from interrupting the human driver
until the human driver safely passes them, and then
the robot car finishes its task. This displays truly
collaborative behavior, and only happens with high
enough weight on the courtesy term. This may not be
practical for real on-road driving, but it enables the
design of highly courteous robots in some particular
scenarios where human have higher priority over all
other autonomous agents.
(c) a courteous robot car helps to block the other car
(b) a courteous robot car yields
(a) a selfish robot car forces the human brake
selfish courteous human other car blocked area
Fig. 6: A blocking-area overtaking scenario: (a) with a selfish cost
function, the robot car overtakes first and forces the human driver to
brake; (b)(c) a courtesy-aware robot car yields to the human driver
and even helps to block other cars depending on its formulation of
the human driver’s alternative world
V. Courtesy Helps Explain Human Driving
Thus far, we have shown that courtesy is useful for
enabling cars to generate actions that do not cause
inconvenience to other drivers. We have also seen that
the larger the weight we put on the courtesy term, the
more the car behavior becomes social. A natural next
question is – are humans courteous?
Our hypothesis is that our courtesy term can help
explain human driving behavior. If that is the case, this
has two important implications: it means that it can
enable robots to better predict human actions by giving
them a more accurate model of how people drive, and
it also means that robot can use courtesy to produce
more human-like driving.
We put our hypothesis to the test by learning a cost
function from human driver data, with and without
a courtesy feature. We find that using the courtesy
feature leads to a more accurate cost function that
is better at reproducing human driver data, lending
support to our hypothesis.
A. Learning Cost Functions from Human Demonstrations
1) Human Data Collection: The human data is col-
lected from the Next Generation SIMulation (NGSIM)
dataset [19], which captures the highway driving be-
haviors/trajectories by digital video cameras mounted
on top of surrounding buildings. We selected 153 left-
lane-changing driving trajectories on Interstate 80 (near
Emeryville, California), and separated them into two
sets: a training set of size 100 (denoted by UD, i.e., the
human demonstrations), and the other 53 trajectories
as the test set.
2) Learning Algorithm: We use Inverse Reinforcement
Learning (IRL) [7], [16]–[18] to learn an appropriate cost
function from human data.
We assume that cost function is parameterized as a
linear combination of features:
c(xt,ut
R,ut
H;θ) = θTφ(xt,ut
R,ut
H). (13)
Then over the trajectory length L, the cumulative cost
function becomes
C(x0,uR,uH;θ) = θTL1
t=0
φ(xt,ut
R,ut
H)
=θTΦ(x0,uR,uH)(14)
where uRand uHare, respectively, the actions of the
robot car and the human over the trajectory. Our goal
is to find the weights θwhich maximizes the likelihood
of the demonstrations:
θ=arg max
θP(UD|θ)(15)
Building on the principle of maximum entropy, we
assume that trajectories are exponentially more likely
when they have lower cost:
P(uH,θ)exp C(x0,uR,uH;θ). (16)
Thus the probability (likelihood) of the demonstration
set becomes
P(UD|θ) = Πn
i=1
P(uD
H,i,θ)
P(θ)=Πn
i=1
P(uD
H,i,θ)
RP(˜
uH,θ)d˜
uH
(17)
where nis the number of trajectories in UD.
To tackle the partition term RP(˜
uH,θ)d˜
uHin (17),
we approximate C(x0,uR,˜
uH;θ)with its Laplace ap-
proximation as proposed in [7]:
C(x0,uR,˜
uH;θ)C(x0,uR,uD
H,i;θ)+ ˜
uHuD
H,iTC
uH
+1
2˜
uHuD
H,iT2C
u2
H˜
uHuD
H,i.
(18)
With the assumption of locally optimal demonstra-
tions, we have C
uH|uD
H,i=0 in (18). This simplifies the
partition term RP(˜
uH,θ)d˜
uHas a Gaussian Integral
where a closed-form solution exists (see [7] for details).
Substituting (17) and (18) into (15) yields the optimal
parameter θas the maximizer.
B. Experiment Design
Hypothesis. Within human interactions, human drivers
show courtesy to others, i.e., they optimize a compound
cost function in the form of C=Csel f +λcCcourt as (8)
instead of a selfish one as Csel f .
Independent Variable. To test our hypothesis, we run
two sets of IRL on the same set of human data, but
with one different feature. For the selfish cost function
Csel f , four features are selected as follows:
speed feature fd: deviation of autonomous car’s
speed compared to the speed limit:
fd= (vvd)2(19)
comfort features facc and fsteer: jerk and steering
rate of the autonomous car;
goal feature fg: distance to the target lane:
fg=edg
wl, (20)
where dgis the Euclidean distance and wlis the
lane width.
safety feature fs: relative positions with respect to
surrounding cars;
fs=
ns
i=1
edi, (21)
where nsis the number of surrounding cars and
di,i=0, 1, ··· ,nsis the distance to each of them.
For the courtesy-aware cost function C=Csel f +
λcCcourt, we use the same four features as above, plus
one additional feature that equals to the courtesy term.
Dependent Measures. We measured the similarity be-
tween trajectories planned with the learned cost func-
tions and human driving trajectories on the test set (an-
other 53 left-lane changing scenarios that are different
from the training set from the NGSIM dataset).
C. Analysis
Training performance. The training results are shown
in Fig. 7 and Table I. One can see that with the
additional courtesy term, better learning performance
(in terms of training loss) has been achieved. This is a
sanity check: having access to one extra DOF can lead
to better training loss regardless, but if it did not that
would invalidate our hypothesis.
θgθdθacc θsteer θsλc
Csel f 1.0 2.08e+04 5.80e+02 3.91e+02 4.37
C1.0 1.96e+02 6.7e+04 2.36e+02 6.53 9.89e+04
TABLE I: The parameters in Clearned via IRL
Trajectory similarity. Figure 8 shows one demonstra-
tive example of the trajectories for a selfish car (grey)
and a courteous car (orange), with four surrounding
vehicles. The dark blue rectangle is the human driver
in our two-agent robot-human interaction system and
all other vehicles (cyan) are treated as moving obstacles.
Fig. 7: Training curves for cost functions with and without the
courtesy term
It shows that a simulated car with Cthat includes
courtesy manages to reduce its influence on the human
driver by choosing a much smoother and less aggres-
sive merging curve, while a car driven by Csel f merges
in much aggressively.
Simulated human trajectory with a courteous cost function
Simulated human trajectory with a selfish cost function
selfish courteous surrounding cars human driver
Fig. 8: An example pair of simulated trajectories with courteous (top)
and selfish (bottom) cost functions
Results for all 53 left-lane changing test trajectories
are given in Fig. 9 (left). To describe the similarities
among trajectories, we adopted the Mean Euclidean
Distance (MED) [21]. As shown in Fig. 9 (right), the
courtesy-aware trajectories are much similar to the
ground truth trajectories, i.e., a courteous robot car
behaves more human-like. We have also calculated the
space headways of the following human driver on
the robot car’s target lane for all 53 test scenarios,
and the statistical results are given in Fig. 9 (middle).
Compared to a selfish robot car, a courteous robot car
can achieve safer left-lane changing behaviours in terms
of following gaps for the human driver behind.
VI. Conclusion
Summary. We introduced courteous planning based
on the fact that human irrationally care more about
additional inconvenience they are brought to by others.
Courteous planning enables an autonomous car to take
into consideration such inconvenience when evaluating
its possible plans. We saw that not only this leads to
more courteous robot behavior, but it also helps explain
real human driving data, because humans too are likely
trying to be courteous.
longitudinal direction
lateral direction
Selfish
Courteous
Human data
0
2
4
6
8
10
12
14
Space headways
Selfish Courteous Human data
Space Headways
0
2
4
6
8
10
12
Mean Euclidean Distance (MED)
Selfish Courteous
Trajectory Similarities
Fig. 9: The courtesy term helps fit test set human driver data significantly better: we can see this from the actual trajectories (left), the
following gaps (middle), and the mean euclidean distances from the ground truth human data (right).
Limitations and Future Work. Despite the fact that
courtesy is not absolute, but relative to how well off
the human driver could be, the trade-off between cour-
tesy and selfishness remains a meta-parameter that is
difficult to set. In general, defining the right trade-off
parameters in the objective function for autonomous
cars and robots more broadly remains a challenge. With
autonomous cars, this is made worse by the fact that
it is not neccessarily a good idea to rely on Inverse
Reinforcement Learning–this might give us models of
human drivers, as it did in our last experiment, but that
might not be what we want the car to optimize for.
Further, we studied courtesy with a single human
driver to be courteous toward (we had other agents,
but the robot did not attempt courtesy toward them). In
real life, there will be many people on the road, and it
becomes difficult to be courteous to all. To some extent,
this is alleviated by our definition of courtesy: it is
not maximizing everyone’s utility, but it is minimizing
the inconvenience we cause. But further work needs to
push courtesy to the limits of interacting with multiple
people in cases where it is difficult to be courteous to
all.
Acknowledgement
This work was partially supported by Mines Paris-
Tech Foundation, “Automated Vehciles–Drive for All”
Chair, and NSF CAREER. We thank Jaime F. Fisac for
helpful discussion and feedback.
References
[1] Y. Kuwata, J. Teo, G. Fiore, S. Karaman, E. Frazzoli, and J. P.
How, “Real-time motion planning with applications to au-
tonomous urban driving,” IEEE Transactions on Control Systems
Technology, vol. 17, no. 5, pp. 1105–1118, 2009.
[2] Z. Liang, G. Zheng, and J. Li, “Automatic parking path op-
timization based on bezier curve fitting,” in Automation and
Logistics (ICAL), 2012 IEEE International Conference on. IEEE,
2012, pp. 583–587.
[3] W. Zhan, J. Chen, C. Y. Chan, C. Liu, and M. Tomizuka,
“Spatially-partitioned environmental representation and plan-
ning architecture for on-road autonomous driving,” in 2017
IEEE Intelligent Vehicles Symposium (IV), June 2017, pp. 632–639.
[4] A. Alahi, K. Goel, V. Ramanathan, A. Robicquet, L. Fei-Fei,
and S. Savarese, “Social LSTM: Human trajectory prediction
in crowded spaces,” in Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition, 2016, pp. 961–971.
[5] W. Zhan, C. Liu, C. Y. Chan, and M. Tomizuka, “A non-
conservatively defensive strategy for urban autonomous driv-
ing,” in 2016 IEEE 19th International Conference on Intelligent
Transportation Systems (ITSC), Nov. 2016, pp. 459–464.
[6] M. Shimosaka, K. Nishi, J. Sato, and H. Kataoka, “Predicting
driving behavior using inverse reinforcement learning with
multiple reward functions towards environmental diversity,” in
Intelligent Vehicles Symposium (IV), 2015 IEEE. IEEE, 2015, pp.
567–572.
[7] S. Levine and V. Koltun, “continuous inverse optimal control
with locally optimal examples„” in the 29th International Confer-
ence on Machine Learning (ICML-12), 2012.
[8] G. R. de Campos, P. Falcone, and J. Sjoberg, “Autonomous co-
operative driving: a velocity-based negotiation approach for in-
tersection crossing,” in Intelligent Transportation Systems-(ITSC),
2013 16th International IEEE Conference on. IEEE, 2013, pp. 1456–
1461.
[9] M. Hafner, D. Cunningham, L. Caminiti, and D. Del Vecchio,
“Automated vehicle-to-vehicle collision avoidance at intersec-
tions,” in Proceedings of world congress on intelligent transport
systems, 2011.
[10] H. Kretzschmar, M. Spies, C. Sprunk, and W. Burgard, “Socially
compliant mobile robot navigation via inverse reinforcement
learning,” The International Journal of Robotics Research, vol. 35,
no. 11, pp. 1289–1307, 2016.
[11] D. Sadigh, S. Sastry, S. A. Seshia, and A. D. Dragan, “Planning
for autonomous cars that leverage effects on human actions.” in
Robotics: Science and Systems, 2016.
[12] M. Bahram, A. Lawitzky, J. Friedrichs, M. Aeberhard, and
D. Wollherr, “A Game-Theoretic Approach to Replanning-
Aware Interactive Scene Prediction and Planning,” IEEE Trans-
actions on Vehicular Technology, vol. 65, no. 6, pp. 3981–3992, June
2016.
[13] N. Li, D. W. Oyler, M. Zhang, Y. Yildiz, I. Kolmanovsky, and
A. R. Girard, “Game Theoretic Modeling of Driver and Vehi-
cle Interactions for Verification and Validation of Autonomous
Vehicle Control Systems,” IEEE Transactions on Control Systems
Technology, vol. PP, no. 99, pp. 1–16, 2017.
[14] S. S. A. D. Dorsa Sadigh, Shankar S. Sastry, “Information gath-
ering actions over human internal state,” in Proceedings of the
IEEE/RSJ International Conference on Intelligent Robots and Systems
(IROS), October 2016, pp. 66–73.
[15] A. Tversky and D. Kahneman, “Advances in prospect theory:
Cumulative representation of uncertainty,” Journal of Risk and
uncertainty, vol. 5, no. 4, pp. 297–323, 1992.
[16] P. Abbeel and A. Y. Ng, “Apprenticeship learning via inverse
reinforcement learning,” in Proceedings of the twenty-first interna-
tional conference on Machine learning. ACM, 2004, p. 1.
[17] B. D. Ziebart, A. L. Maas, J. A. Bagnell, and A. K. Dey, “Maxi-
mum entropy inverse reinforcement learning.” in AAAI, vol. 8.
Chicago, IL, USA, 2008, pp. 1433–1438.
[18] P. Abbeel and A. Y. Ng, “Inverse reinforcement learning,” in
Encyclopedia of machine learning. Springer, 2011, pp. 554–558.
[19] V. Alexiadis, J. Colyar, J. Halkias, R. Hranac, and G. McHale,
“The Next Generation Simulation Program,” Institute of Trans-
portation Engineers. ITE Journal; Washington, vol. 74, no. 8, pp.
22–26, Aug. 2004.
[20] “https://julialang.org.”
[21] J. Quehl, H. Hu, O. S. Tas, E. Rehder, and M. Lauer, “How good
is my prediction? finding a similarity measure for trajectory
prediction evaluation.” in 2017 IEEE 18th International Conference
on Intelligent Transportation Systems (ITSC), 2017, pp. 120–125.
... In a driving simulation, they demonstrate that a visibility-aware AV slows down before entering another vehicle's blind spots. In [29], the authors formalize a notion of being courteous to other interacting drivers. They hypothesize that human drivers are in fact also courteous. ...
... The authors in [32] present a detailed survey of features. The features that compose our baseline reward function are enumerated below, 1) Lateral deviation from target lane [29] is represented as, ...
Preprint
In this paper we investigate the effect of the unpredictability of surrounding cars on an ego-car performing a driving maneuver. We use Maximum Entropy Inverse Reinforcement Learning to model reward functions for an ego-car conducting a lane change in a highway setting. We define a new feature based on the unpredictability of surrounding cars and use it in the reward function. We learn two reward functions from human data: a baseline and one that incorporates our defined unpredictability feature, then compare their performance with a quantitative and qualitative evaluation. Our evaluation demonstrates that incorporating the unpredictability feature leads to a better fit of human-generated test data. These results encourage further investigation of the effect of unpredictability on driving behavior.
... For instance, the controller proposed by Sadigh et al. (2018) generated emergent behaviors in this vein: their simulated AV learned to back down at an intersection to signal yielding and thereby influence the human driver at the other end of the intersection to go first, helping to avoid a traffic conflict. Sun et al. (2018) argued that AV designers should leverage the opportunity to influence the interactions with humans and incorporate the ''courtesy'' term in the AV optimization criteria. Such prosocial AV behaviors are especially relevant given the often-overlooked ethical challenge: The impact (both positive and negative) of even the most mundane AV decisions could be amplified in case the presence of AVs on the roads is scaled up (Himmelreich, 2018). ...
Article
Full-text available
Understanding behavior of human drivers in interactions with automated vehicles (AV) can aid the development of future AVs. Existing investigations of such behavior have predominantly focused on situations in which an AV a priori needs to take action because the human has the right of way. However, future AVs might need to proactively manage interactions even if they have the right of way over humans, e.g., a human driver taking a left turn in front of the approaching AV. Yet it remains unclear how AVs could behave in such interactions and how humans would react to them. To address this issue, here we investigated behavior of human drivers (N = 19) when interacting with an oncoming AV during unprotected left turns in a driving simulator experiment. We measured the outcomes (Go or Stay) and timing of participants’ decisions when interacting with an AV which performed subtle longitudinal nudging maneuvers, e.g. briefly decelerating and then accelerating back to its original speed. We found that participants’ behavior was sensitive to deceleration nudges but not acceleration nudges. We compared the obtained data to predictions of several variants of a drift-diffusion model of human decision making. The most parsimonious model that captured the data hypothesized noisy integration of dynamic information on time-to-arrival and distance to a fixed decision boundary, with an initial accumulation bias towards the Go decision. Our model not only accounts for the observed behavior but can also flexibly generate predictions of human responses to arbitrary longitudinal AV maneuvers, and can be used for both informing future studies of human behavior and incorporating insights from such studies into computational frameworks for AV interaction planning.
... The research conducted to date in the area of autonomous vehicles has been multidirectional in nature, diverse in its subject matter, and based on the analysis of both secondary and primary data sources using various research methods and techniques. The most common issues include technological aspects related to driving control and the design of systems for this purpose (Zhao et al., 2018;Hussain & Zeadally, 2019), interactions occurring between the driver and vehicle (Karatas et al., 2019;Sun et al., 2018), safety issues and risk assessment (Hulse et al., 2018;Guo et al., 2020), ethical dilemmas (Gogoll & Müller, 2017;Taylor & Bouazzaoui, 2019), and machine learning mechanisms (i.a. Stilgoe, 2018). ...
Article
The paper presents the results of a social study relating to the awareness and attitudes of city residents towards the prospects of using autonomous vehicles as a means of urban transportation, including last-mile transportation. The main objective of the study was to find answers to the following questions: 1) what is the current state of city residents' knowledge of autonomous vehicles, 2) what attitude respondents have towards the phenomenon under study, 3) what determinants may foster acceptance of this form of transportation as part of daily mobility, 4) what scenarios for travel using self-drive cars are most likely at present. The source base consisted of contributions from 648 respondents qualified for the study based on the criterion of place of residence. Data analysis and interpretation were carried out using the SPSS package and using statistical tests. The results of the study allow us to draw the conclusion that the use of autonomous vehicles as part of daily mobility requires achieving a state of public awareness where the choice of autonomous transportation will become, for city residents, a conscious and desirable form of mobility in urban agglomerations, competitive to other modes of transport.
... By including this metric in the coordination policy, individual waiting times decrease as well as the overall group congestion. Prosocial behavior is also pursued by Sun et al. (2018), who propose an artificial form of ''courtesy''. A courteous autonomous agent balances its selfish objective with the potential inconvenience it brings to the other drivers. ...
Article
Evaluation of autonomous vehicles is one of the major challenges before they can be released. Due to the advantages in efficiency, cost, and safety, scenario-based simulation methods have recently received great attention. Even so, as the complexity and uncertainty exist in the real driving environment, the scenarios that autonomous vehicles may encounter are infinite. Therefore, it is necessary to classify simulation scenarios according to their criticality. It contributes to accelerating the evaluation processes. This paper presents a novel criticality evaluation method, based on a proposed Digital Expert, for car-following autonomous driving. The Digital Expert acts as the evaluator to evaluate the criticality of scenarios depending on their driving performance. Driving performance refers to the achieved degree of driving intentions. Firstly, a Digital Expert is established as the evaluator for the criticality of the scenario using the inverse reinforcement learning method. Then, based on the fact that the intention of Digital Expert is to maximize its internal reward function, the reward function is used to evaluate driving performance. Finally, calculating the criticality of the car-following scenario according to the mapping relationship between driving performance and criticality. Using the driving data in the NGSIM data set, this paper generates two groups of simulated car-following scenarios and evaluates the criticalities of the two scenarios. The experimental results show that the proposed criticality evaluation method can reasonably evaluate the criticality of car-following scenarios.
Article
Modeling driving behavior plays a pivotal role in advancing the development of human-like autonomous driving. In light of this, this paper proposes a car-following behavior modeling method with sample-based deep inverse reinforcement learning (DIRL). Due to the challenges associated with feature extraction and the limited fitting capacity of linear functions, traditional IRL, which employs feature-based linear functions to represent reward functions, exhibits low modeling accuracy. Accordingly, DIRL leverages deep neural networks to represent reward functions. However, the requirement for reinforcement learning to determine the optimal policy for DIRL's reward function makes the training and inference processes computationally resource-intensive and inefficient. To address this issue, this paper proposes the sample-based DIRL. Through solution space discretization, sample-based DIRL streamlines the integration calculation of the partition function into a summation, resulting in improved computational efficiency. Specifically, it is a three-stage framework: sampling candidate trajectories, evaluating candidate trajectories, and selecting the trajectory with the highest reward. In order to evaluate DIRL at both the level of driving behavior and the reward function, the MPC-based virtual driver with the explicit reward function is utilized to collect driving data for training and assessing the convergence of the learned reward function. The experimental results confirm that the proposed method can accurately model the car-following behavior, and acquire the driver's reward function from the driving data.
Article
Merge scenarios on highway are often challenging for autonomous driving, due to its lack of sufficient tacit understanding on and subtle interaction with human drivers in the traffic flow. This, as a result, may impose serious safety risks, and often cause traffic jam with autonomous driving. Therefore, human-like autonomous driving becomes important, yet imperative. This paper presents an interaction-aware decision-making and planning method for human-like autonomous driving in merge scenarios. Rather than directly mimicking human behavior, deep inverse reinforcement learning is employed to learn the human-used reward function for decision-making and planning from naturalistic driving data to enhance interpretability and generalizability. To consider the interaction factor, the reward function for planning is utilized to evaluate the joint trajectories of the autonomous driving vehicle (ADV) and traffic vehicles. In contrast to predicting trajectories of traffic vehicles with the fixed trajectory of ADV given by the upstream prediction model, the trajectories of traffic vehicles are predicted by responding to the ADV's behavior in this paper. Additionally, the decision-making module is employed to reduce the solution space of planning via the selection of a proper gap for merging. Both the decision-making and planning algorithms follow a “sampling, evaluation, and selection” framework. After being verified through experiments, the results indicate that the planned trajectories with the presented method are highly similar to those of human drivers. Moreover, compared to the interaction-unaware planning method, the interaction-aware planning method behaves closer to human drivers.
Conference Paper
From the driving strategy point of view, a major challenge for autonomous vehicles in urban environment is to behave defensively to potential dangers, yet to not overreact to threats with low probability. As it is overwhelming to program the action rules case-by-case, a unified planning framework under uncertainty is proposed in this paper, which achieves a non-conservatively defensive strategy (NCDS) in various kinds of scenarios for urban autonomous driving. First, uncertainties in urban scenarios are simplified to two probabilistic cases, namely passing and yielding. Two-way-stop intersection is used as an exemplar scenario to illustrate the derivation of probabilities for different intentions of others via a logistic regression model. Then a deterministic planner is designed as the baseline. Also, a safe set is defined, which considers both current and preview safety. The planning framework under uncertainty is then proposed, in which safety is guaranteed and overcautious behavior is prevented. Finally, the proposed planning framework is tested by simulation in the exemplar scenario, which demonstrates that an NCDS can be realistically achieved by employing the proposed framework.
Conference Paper
Predicting defensive driving is a promising technology for novel advanced driver assistance systems. In recent years, modeling driving behavior in residential roads through inverse reinforcement learning (IRL) has been attracting attention in intelligent vehicle community thanks to the superiority of this approach providing long-term prediction of fine-grained driving behavior. However, it suffers from poor performance in diverse environment due to the fact that the single reward function could not handle all the environment with large diversity. Towards this issue, a novel IRL framework with multiple reward functions to deal with environmental diversity is proposed in the paper. Specifically, the model employs Dirichlet process mixtures as a flexible and powerful Bayesian model to divide the environment into clusters and learns the parameters in each cluster simultaneously. Experimental result with expert driver behavior data shows that our model with multiple reward functions provides superior performance over the IRL model with single reward function. It also suggests that the clustering of environments based on the driving behavior of professional drivers could be useful on evaluating driving environments.
Article
Autonomous driving has been the subject of increased interest in recent years both in industry and in academia. Serious efforts are being pursued to address legal, technical and logistical problems and make autonomous cars a viable option for everyday transportation. One significant challenge is the time and effort required for the verification and validation of the decision and control algorithms employed in these vehicles to ensure a safe and comfortable driving experience. Hundreds of thousands of miles of driving tests are required to achieve a well calibrated control system that is capable of operating an autonomous vehicle in an uncertain traffic environment where multiple interactions between vehicles and drivers simultaneously occur. Traffic simulators where these interactions can be modeled and represented with reasonable fidelity can help decrease the time and effort necessary for the development of the autonomous driving control algorithms by providing a venue where acceptable initial control calibrations can be achieved quickly and safely before actual road tests. In this paper, we present a game theoretic traffic model that can be used to 1) test and compare various autonomous vehicle decision and control systems and 2) calibrate the parameters of an existing control system. We demonstrate two example case studies, where, in the first case, we test and quantitatively compare two autonomous vehicle control systems in terms of their safety and performance, and, in the second case, we optimize the parameters of an autonomous vehicle control system, utilizing the proposed traffic model and simulation environment.
Article
Mobile robots are increasingly populating our human environments. To interact with humans in a socially compliant way, these robots need to understand and comply with mutually accepted rules. In this paper, we present a novel approach to model the cooperative navigation behavior of humans. We model their behavior in terms of a mixture distribution that captures both the discrete navigation decisions, such as going left or going right, as well as the natural variance of human trajectories. Our approach learns the model parameters of this distribution that match, in expectation, the observed behavior in terms of user-defined features. To compute the feature expectations over the resulting high-dimensional continuous distributions, we use Hamiltonian Markov chain Monte Carlo sampling. Furthermore, we rely on a Voronoi graph of the environment to efficiently explore the space of trajectories from the robot’s current position to its target position. Using the proposed model, our method is able to imitate the behavior of pedestrians or, alternatively, to replicate a specific behavior that was taught by tele-operation in the target environment of the robot. We implemented our approach on a real mobile robot and demonstrated that it is able to successfully navigate in an office environment in the presence of humans. An extensive set of experiments suggests that our technique outperforms state-of-the-art methods to model the behavior of pedestrians, which also makes it applicable to fields such as behavioral science or computer graphics.