Content uploaded by Liting Sun
Author content
All content in this area was uploaded by Liting Sun on Jun 05, 2019
Content may be subject to copyright.
Courteous Autonomous Cars
Liting Sun1, Wei Zhan1, Masayoshi Tomizuka1, and Anca D. Dragan2
Abstract— Typically, autonomous cars optimize for a
combination of safety, efficiency, and driving quality. But
as we get better at this optimization, we start seeing
behavior go from too conservative to too aggressive. The
car’s behavior exposes the incentives we provide in its
cost function. In this work, we argue for cars that are not
optimizing a purely selfish cost, but also try to be courteous
to other interactive drivers. We formalize courtesy as a
term in the objective that measures the increase in another
driver’s cost induced by the autonomous car’s behavior.
Such a courtesy term enables the robot car to be aware
of possible irrationality of the human behavior, and plan
accordingly. We analyze the effect of courtesy in a variety
of scenarios. We find, for example, that courteous robot
cars leave more space when merging in front of a human
driver. Moreover, we find that such a courtesy term can help
explain real human driver behavior on the NGSIM dataset.
I. Introduction
Autonomous cars are getting better at generating
their motion not only in isolation, but also around
people. We now have many strategies for dealing with
interactions with people on the road, each modeling
people in substantially different ways.
Most techniques first anticipate what people plan on
doing, and generate the car’s motion to be efficient, but
also to safely stay out of their way. This prediction can
be as simple as assuming the person will maintain their
current velocity within the planning horizon [1]–[3], or
as complicated as learning a human driver policy or
cost function [4]–[7].
Other techniques account for the interactive nature of
coordinating on the road, and model people as chang-
ing their plans depending on what the car does. Some
do it via coupled planning, assuming that the person
and the robot are on the same team, optimizing the
same joint cost function [8]–[10], while others capture
interaction as a game in which the human and robot
have different utilities, but they influence each other’s
actions [11]–[13].
All of these works focus on how to optimize the
robot’s cost when the robot needs to interact with
people. In this paper, we focus on what the robot should
optimize in such situations, particularly if we consider
the fact that humans are not perfectly rational.
1Liting Sun, Wei Zhan and Masayoshi Tomizuka are with
the Department of Mechanical Engineering, University of Cal-
ifornia, Berkeley, CA, USA, 94720. {litingsun, wzhan,
tomizuka}@berkeley.edu
2Anca D. Dragan is with the Department of Electrical Engineering
and Computer Sciences, University of California, Berkeley, CA, USA,
94720. anca@berkeley.edu
Typically, when designing the robot’s cost function,
we focus on safety and driving quality of the ego
vehicle. Arguably, that is rather selfish.
Selfishness has not been a problem with approaches
that predict human plans and react to them, because
that led to conservative robots that always try to stay
out of the way and let people do what they want.
But, as we are switching to more recent approaches
that draw on the game-theoretic aspects of interaction,
our cars are starting to become more aggressive. They
cut people off, or inch forward at intersections to go
first [11] [14]. While this behavior is good sometimes,
we would not want to see it all the time.
Our observation is that as we get better at solving the
optimization problem for driving by better models of
the world and of the people in it, there is an increased
burden on the cost function we optimize to capture
what we want. We propose that purely selfish robots
that care about their safety and driving quality are not
good enough. They should also be courteous to other
drivers. This is of crucial importance since humans
are not perfectly rational, and their behavior will be
influenced by the aggressiveness of the robot cars.
We advocate that a robot should balance minimizing the
inconvenience it brings to another driver, and that we can
formalize inconvenience as the increase in the other driver’s
cost due to the robot’s behavior to capture one aspect of
human behavior irrationality.
We make the following contributions:
A formalism for courtesy incorporating irrational
human behavior. We formalize courteous planning as
trading off between the robot’s selfish objective and a
courtesy term, and introduce a mathematical defini-
tion for this term for irrational human behavior – we
measure the increase of the vehicle’s best cost under
the robot’s planned behavior, compared to the vehicle’s
best cost under an alternative "best case scenario", and
define the cost increase as the courtesy term.
An analysis of the effects of courteous planning.
We show the difference between courteous and selfish
robots under different traffic scenarios. The courteous
robot leaves the person more space when it merges,
and might even block another agent (not a person) to
ensure that the human can safely proceed.
Showing that courtesy helps explain human driving.
We do an Inverse Reinforcement Learning (IRL)-based
analysis [7], [16]–[18] to study whether our courtesy
term helps in better predicting how humans drive. On
the NGSIM dataset [19] of real human driver trajecto-
arXiv:1808.02633v2 [cs.RO] 16 Aug 2018
ries, we find that courtesy produces trajectories that are
significantly closer to the ground truth.
We think that the autonomous car of the future
should be safe, efficient, and courteous to others, per-
haps even more so than represented in our current
human-only driving society. Our paper enables au-
tonomous car designers to decide to make that happen.
II. Problem Statement
In this paper, we consider an interactive robot-human
system with two agents: an autonomous car Rand a
human driver H1. Our task is to enable a courteous
robot car which cares about the potential inconvenience
it brings to the human driver’s utilities, and generates
trajectories that are socially predictable and acceptable.
Throughout the paper, we denote all robot-related
terms by subscript (·)Rand all human-related terms
by (·)H.
Let xRand uRdenote, respectively, the robot’s state
and control input, and xHand uHfor the human’s.
x=(xT
R,xT
H)Trepresents the states of the interaction
system. For each agent, we have
xt+1
R=fRxt
R,ut
R, (1)
xt+1
H=fHxt
H,ut
H, (2)
and the overall system dynamics are
xt+1=fxt,ut
R,ut
H. (3)
We assume that both the human driver and the
autonomous car are optimal planners, and they use
Model Predictive Control (MPC) with a horizon of
length N. Let CRand CHbe, respectively, the cost
functions of the robot car and the human driver over
the horizon:
Cixt,uR,uH;θi=
N−1
∑
k=0
cixt,k,uk
R,uk
H;θi,i∈{R,H}
(4)
where ui=(u0
i,u1
i,··· ,uN−1
i)Tare sequences of con-
trol actions of the robot car (i=R) and the human
driver (i=H), and xt,kwith k=0, 1, ··· ,N−1 are the
corresponding sequence of system states. θirepresent,
respectively, the preferences of the robot car (i=R) and
the human driver (i=H). At every time step t, the
robot car and the human driver generate their optimal
sequences of actions u∗
Rand u∗
Hby minimizing CRand
CH, respectively, execute the first steps u∗0
Rand u∗0
H(i.e.,
set ut
i=u∗0
iin (3)), and replan for step t+1.
Such an optimization-based state feedback strategy
formulates the closed-loop dynamics of the robot-
human interaction system as a game. To simplify the
game, we assume that the robot car has access to CH,
and that the human only computes a best response to
the robot’s actions rather than trying to influence them,
1If there are multiple robot cars that we control, we treat them all
as a single R. If there are multiple human drivers, we reason about
how each of them affects the robot’s utility separately.
as in [11]. This means that the robot car can compute,
for any control sequence it considers, how the human
would respond and what cost the human will incur:
u∗
H=arg min
uHCHxt,uR,uH;θH,g(xt,uR;θH)(5)
C∗
H(uR) = CHxt,uR,g(xt,uR;θH);θH. (6)
Here g(xt,uR;θH)represents the response curve of the
human driver towards the autonomous car.
Armed with this model, the robot can now compute
what it should do, such that when the human responds,
the combination is good for the robot’s cost:
u∗
R=arg min
uRCRxt,uR,g(xt,uR;θH);θR. (7)
Our goal is to generate courteous robot behavior
to the human, i.e. that takes into consideration the
inconvenience it brings to the human driver. We will
do so by changing the cost function of the robot to
reflect this inconvenience.
III. Courteous Planning
We propose a courteous planning strategy based on
one key observation: human is not perfectly rational,
and one of the irrationality is that they weight losses
higher than gains when evaluating their actions [15].
Hence, a courteous robot car should balance the mini-
mization of its own cost function and the inconvenience
(loss) it brings to the human driver.
Therefore, we construct CRin (7) as
CRxt,uR,uH;θR,θH,λc=Csel f
Rxt,uR,uH;θR
+λcCcourt
Rxt,uR,uH;θH,(8)
where Csel f
Ris the cost function for a regular (selfish)
robot car which cares about only its own utilities
(safety, efficiency, etc), and Ccourt
Rmodels the courtesy
term of the robot car to the human driver. It is a
function of the robot car’s behavior, the human’s be-
havior, the human’s cost parameters (θH) and some
alternative costs (see Section III.A). λc∈[0, ∞)captures
the trade-off. If we want the robot car to be just as
courteous as a human driver, we could learn λcfrom
human driver demonstration, as we do in Section V.
As robot designers, we might set this parameter higher
than regular human driving to enable more courteous
autonomous cars, particularly when they do not have
passengers on board.
A. Alternative Costs
With any robot plan uR, the robot car changes the
human driver’s environment and therefore induces a
best cost for the human, C∗
H(uR). Our courtesy term
compares this cost with the alternative,Calt,∗
H– the best
case scenario for the person. It is not immediately clear
how to define this best case scenario since it may vary
depending different on driving scenarios. We explore
three alternatives.
What the human could have done, had the robot car
not been there. We first consider a world in which the
robot car wouldn’t even exist to interfere the person.
In such a world, the person gets to optimize their cost
without the robot car:
Calt,∗
H(xt,θH) = min
uHCH(xt,uH;θH)(9)
This induces a very generous definition of courtesy: the
alternative is for the robot car to not have been on the
road at all. In reality though, the robot car is there,
which leads to our second alternative.
What the human could have done, had the robot
car only been there to help the human. Our second
alternative is to assume that the robot car already on
the road could be completely altruistic. The robot car
could actually optimize the human driver’s cost, being
a perfect collaborator:
Calt,∗
H(xt,θH) = min
uH,uRCH(xt,uR,uH;θH)(10)
For this alternative, the robot car and the human would
perform a joint optimization for the human’s cost. For
example, the robot car can brake to make sure that the
human could change lanes in front of it, or even block
another traffic participant to make sure the human has
space.
What the human could have done, had the robot car
just kept doing what it was previously doing. A fully
collaborative robot car is still perhaps not the fairest
one to compute inconvenience against. After all, the
autonomous car does have a passenger sometimes, and
it is fair to take their needs into account too. Our third
alternative computes how well the human driver could
have done, had the robot car kept acting the same way
as it was previously doing:
Calt,∗
H(xt,θH) = min
uHCH(xt,ut−1
R,uH;θH)(11)
This means that the person is now responding to a con-
stant robot trajectory ut−1
R=(ut−1
R, .., ut−1
R), for instance,
maintaining its current velocity.
Our experiments below explore these three different
alternative options for the courtesy term.
B. Courtesy Term
We define the courtesy term based on the difference
between what cost the human has, and what cost they
would have had in the alternative:
Definition 1 (Courtesy of the Robot Car)
Ccourt
R(xt,uR,uH;θH) = max{0, CH(xt,uR,uH;θH)
−Calt,∗
H(xt;θH)}(12)
Note that we could have also sent the courtesy term
to simply be the human cost, and have the robot trade
off between its cost and the human’s. However, that
would have penalized the robot for any cost the human
incurs, even if the robot does not bring any inconve-
nience to the human. That might cause too conservative
behavior. In fact, if we treat the alternative cost as
the reference point in Prospect Theory – a human
irrationality model [15], then the theory suggests that
human weigh losses more than gains. This means that
our courteous robot car should care more about avoid-
ing additional inconvenience, rather than providing
more convenience, i.e., helping to reduce the human
cost lower than the alternative one. Mathematically, this
concept is formulated via Definition 1: the robot does
not get any bonus for bringing the human cost lower
than Calt,∗
H(possible with some definitions of Calt,∗
H), it
only gets a penalty for making it higher.
C. Solution
Thus far, we have constructed a compound cost func-
tion CR(xt,uR,uH;θR,θH,λc)to enable a courteous
robot car, considering three alternative costs. At every
step, the robot needs to solve the optimization problem
in (7) to find the best actions to take. We approximate
the solution by alternatively fixing one of uRor uH,
and solving for the other.
IV. Analysis of Courteous Planning
In this section, we analyze the effect of courteous
planning on the robot’s behavior in different simu-
lated driving scenarios. In Section V, we study how
courteous planning can help better explain real human
driving data, enabling robots to be more human-like
and predictable, as well as better able at anticipating
human driver actions on the road.
Simulation Environment: We implement the simulation
environment using Julia [20] on a 2.5 GHz Intel Core i7
processor with 16 GB RAM. We set the horizon length
to N=10, and the sampling time to 0.1s. Our simulated
environment is 1/10 scale of the real world: 1/10 road
width, car sizes, maximum acceleration (0.5m/s2) and
deceleration (-1.0m/s2), and low speed limit (1.0m/s).
Regarding the cost functions CHand CRin (6)-
(8), except for the courtesy term formulated above,
we penalize safety, car speed, comfort level and goal
distances in both CHand Csel f
R. Details about this can
be found later in Section V.
For all results, we denote a selfish (baseline) au-
tonomous car with gray rectangle, a courteous one as
orange, and the human driver as dark blue.
A. The Effect of Courtesy
1) Lane Changing: We first consider a lane changing
driving scenario, as shown in Fig. 1. The autonomous
car wants to merge into the human driver’s lane from
an adjacent lane. We assume that the goal of the human
driver is to maintain speed. Then all three different al-
ternatives lead to the same alternative optimal behavior
and cost of the human: the human would go in their
lane undisturbed by the robot. Hence, with constant
(b) intermediate courteous robot car:
human driver’s inconvenience = 0.0173
(c) most courteous robot car:
human driver’s inconvenience = 0
(a) selfish robot car:
human driver’s inconvenience = 0.2063
Speed (m/s):
human: 0.78
robot: 1.0
human: 0.7
robot: 1.0
human: 0.75
robot: 1.0
human: 0.80
robot: 1.0
human: 0.85
robot: 1.0
human: 0.85
robot: 1.0
human: 0.85
robot: 1.0
human: 0.85
robot: 1.0
human: 0.85
robot: 1.0
Fig. 1: A lane changing scenario: both the human car and robot car speed at 0.85 m/s initially; (a) a selfish robot car merges in front of the
human with a small gap so that the human brakes to yield; (b) an intermediate courteous robot car merges with a larger gap, which releases
the human driver from hard brakes; (c) a most courteous robot car merges with a gap large enough so that the human can maintain speed.
Speed (m/s):
human: 0.9
robot: 0.99
human: 0.8
robot: 1.0
human: 0.7
robot: 1.0
human: 0.9
robot: 0.92
human: 0.9
robot: 0.85
human: 0.9
robot: 0.85
(b) courteous robot car:
human driver’s inconvenience = 0
(a) selfish robot car:
human driver’s inconvenience = 0.385
Fig. 2: Another lane changing scenario: both the human car and robot
car speed at 0.9 m/s initially; (a) a selfish robot car accelerates and
merges in front of the human driver with a small gap, scaring the
human driver to brake; (b) a courteous robot car decelerates and
merges after the human driver so that the human can maintain speed.
Calt,∗
H, we focus on the influence of the trade-of factor
λcin the results.
We present two sets of simulation results in Fig. 1
and Fig. 2, where the initial human driver’s speeds are
0.85 m/s and 0.9 m/s respectively. The results show
that as λcincreases, i.e., being more courteous, the
autonomous car tends to leave a larger gap when it
merges in front of the human, and the human brakes
less (Fig. 1 from left to right). When the human driver’s
initial speed is high enough, a courteous autonomous
car decides to merge afterwards instead of cutting in,
as shown in Fig. 2.
Figure 3 summarizes the relationship between the
human driver’s inconvenience (the magnitude of the
courtesy term) and λcfor the simulation conditions
in Fig. 1. One can note that as the courtesy of the
autonomous car increases, the human driver’s incon-
venience decreases.
10−310−1101103105
Courtesy weight λc
0.00
0.05
0.10
0.15
0.20
human’s inconvenience
Fig. 3: Inconvenience to the human decreases as λcincreases.
2) Turning Left: In this scenario, an autonomous car
wants to take a left turn at an intersection with a
straight-driving human. In this case as well, the al-
ternative behaviors that we consider when evaluating
inconvenience are the same among three different al-
ternatives: the human driver crosses the intersection
maintaining speed.
Simulation results with a courteous and selfish au-
tonomous car are shown in Fig. 4, where a selfish robot
car takes a left turn immediately and forces the human
driver to brake (Fig. 4(a)); while a courteous robot car
waits in the middle of the intersection and takes the
left turn after the human driver passes the intersection
so that the human can maintain its speed (Fig. 4(b)).
B. Influence of Different Alternative Costs for Evaluating
Inconvenience
In the previous examples, the human would have
arrived at the same trajectory regardless of which alter-
(b) a courteous robot car waits until the human
driver passes the intersection
(a) a selfish robot car takes the left turn first
and forces the human driver to brake
Fig. 4: Interaction between a straight-driving human and a left-turning autonomous car: (a) a selfish (baseline) robot car takes a left turn
immediately and forces the human driver to brake (red frames); (b) a courteous robot car waits in the middle of the intersection and takes
the left turn after the human passes so that the human can maintain speed.
native world we are considering to evaluate how much
inconvenience the autonomous car is causing. Here, we
consider a scenario in which that is no longer the case
to highlight the differences generated by the alternative
formulations of courtesy in the robot car’s behavior.
We consider a scenario where the human is turning
right, with a straight-driving robot car coming from
their left. In this scenario, the three alternative costs
are different, which leads to different courtesy terms:
•Alternative I–Robot car not being there: the op-
timal human behavior would be to take a right
turning directly;
•Alternative II–Robot car being collaborative: the
robot would take the necessary yielding maneuver
to let the human driver take the right turn first,
leading to the same alternative optimal human
behavior of performing the right turn directly;
•Alternative III–Robot car maintaining behavior: the
robot car would maintain its speed, and the opti-
mal human behavior would be to slow down.
Figure 5 summarizes the results of using these differ-
ent courtesy terms. In Alternative III, a courteous robot
car goes first, as shown in Fig. 5(a). Intuitively, this
is because Calt,∗
His initially high, and by maintaining
its speed (or even accelerating depending on Csel f
R), no
further inconvenience is brought to the human by the
robot car, i.e., Ccourt
Rremains zero. Hence, the robot car
goes first (Had the robot try to brake, it only increases
Csel f
Rwithout changing Ccourt
R=0, and therefore CR
increases). The other two alternatives (I and II) are
much more generous to the human. Results in Fig. 5(b)
show that a courteous robot car finds it too expensive
to force the human to go second, and slows down to let
the human go first. The red frames in Fig. 5(b) indicate
the time instants when the autonomous car brakes.
C. Extension to environments with multiple agents
We study a scenario on a two-way road. The robot car
and the human are driving towards opposite directions,
but the robot car is blocked and it has to temporarily
merge into the human driver’s lane to get through, as in
Fig. 6. We use the collaborative robot as our alternative
formulation of the courtesy term in this scenario.
(b) a courteous robot car yields and let the human go first
(a) a courteous robot car goes first
Fig. 5: Interaction between a right-turning human driver and a
courteous autonomous car with different courtesy terms: (a) the robot
car goes first when it evaluates the courtesy term using going forward
as an alternative world; (b) the robot car yields and let the human
go first when it evaluates the courtesy term based on a collaborative
or not-being-there alternative world.
When there are only two agents in the environment,
i.e., the autonomous car and the human driver, the
results for a selfish and a courteous autonomous car
are shown in Fig. 6(a)-(b): A selfish autonomous car
directly merges into the human’s lane and forces the
human driver to brake; while a courteous autonomous
car decides to wait until the human driver passes by
since the courtesy term becomes too expensive to go
first.
Such courtesy-aware planning becomes much more
interesting when there is a third agent in the envi-
ronment, as shown in Fig. 6(c). We assume that the
third agent is a responsive agent to the autonomous
car and the autonomous car is courteous only to the
human driver (and not to both). In this case, for Calt,∗
H,
the human would ideally want to pass undisturbed
by either the robot or the other agent: the courtesy
term captures the difference in cost to the human
between the robot’s behavior and the alternative of a
collaborative robot, and this cost to the human depends
on how much progress the human is able to make
and how fast. As a result, a very courteous robot has
an incentive to produce behavior that is as close as
possible to making that happen.
Then an interesting behavior emerges: the au-
tonomous car first backs up to block the third agent
(the following car) from interrupting the human driver
until the human driver safely passes them, and then
the robot car finishes its task. This displays truly
collaborative behavior, and only happens with high
enough weight on the courtesy term. This may not be
practical for real on-road driving, but it enables the
design of highly courteous robots in some particular
scenarios where human have higher priority over all
other autonomous agents.
(c) a courteous robot car helps to block the other car
(b) a courteous robot car yields
(a) a selfish robot car forces the human brake
selfish courteous human other car blocked area
Fig. 6: A blocking-area overtaking scenario: (a) with a selfish cost
function, the robot car overtakes first and forces the human driver to
brake; (b)(c) a courtesy-aware robot car yields to the human driver
and even helps to block other cars depending on its formulation of
the human driver’s alternative world
V. Courtesy Helps Explain Human Driving
Thus far, we have shown that courtesy is useful for
enabling cars to generate actions that do not cause
inconvenience to other drivers. We have also seen that
the larger the weight we put on the courtesy term, the
more the car behavior becomes social. A natural next
question is – are humans courteous?
Our hypothesis is that our courtesy term can help
explain human driving behavior. If that is the case, this
has two important implications: it means that it can
enable robots to better predict human actions by giving
them a more accurate model of how people drive, and
it also means that robot can use courtesy to produce
more human-like driving.
We put our hypothesis to the test by learning a cost
function from human driver data, with and without
a courtesy feature. We find that using the courtesy
feature leads to a more accurate cost function that
is better at reproducing human driver data, lending
support to our hypothesis.
A. Learning Cost Functions from Human Demonstrations
1) Human Data Collection: The human data is col-
lected from the Next Generation SIMulation (NGSIM)
dataset [19], which captures the highway driving be-
haviors/trajectories by digital video cameras mounted
on top of surrounding buildings. We selected 153 left-
lane-changing driving trajectories on Interstate 80 (near
Emeryville, California), and separated them into two
sets: a training set of size 100 (denoted by UD, i.e., the
human demonstrations), and the other 53 trajectories
as the test set.
2) Learning Algorithm: We use Inverse Reinforcement
Learning (IRL) [7], [16]–[18] to learn an appropriate cost
function from human data.
We assume that cost function is parameterized as a
linear combination of features:
c(xt,ut
R,ut
H;θ) = θTφ(xt,ut
R,ut
H). (13)
Then over the trajectory length L, the cumulative cost
function becomes
C(x0,uR,uH;θ) = θTL−1
∑
t=0
φ(xt,ut
R,ut
H)
=θTΦ(x0,uR,uH)(14)
where uRand uHare, respectively, the actions of the
robot car and the human over the trajectory. Our goal
is to find the weights θwhich maximizes the likelihood
of the demonstrations:
θ∗=arg max
θP(UD|θ)(15)
Building on the principle of maximum entropy, we
assume that trajectories are exponentially more likely
when they have lower cost:
P(uH,θ)∝exp −C(x0,uR,uH;θ). (16)
Thus the probability (likelihood) of the demonstration
set becomes
P(UD|θ) = Πn
i=1
P(uD
H,i,θ)
P(θ)=Πn
i=1
P(uD
H,i,θ)
RP(˜
uH,θ)d˜
uH
(17)
where nis the number of trajectories in UD.
To tackle the partition term RP(˜
uH,θ)d˜
uHin (17),
we approximate C(x0,uR,˜
uH;θ)with its Laplace ap-
proximation as proposed in [7]:
C(x0,uR,˜
uH;θ)≈C(x0,uR,uD
H,i;θ)+ ˜
uH−uD
H,iT∂C
∂uH
+1
2˜
uH−uD
H,iT∂2C
∂u2
H˜
uH−uD
H,i.
(18)
With the assumption of locally optimal demonstra-
tions, we have ∂C
∂uH|uD
H,i=0 in (18). This simplifies the
partition term RP(˜
uH,θ)d˜
uHas a Gaussian Integral
where a closed-form solution exists (see [7] for details).
Substituting (17) and (18) into (15) yields the optimal
parameter θ∗as the maximizer.
B. Experiment Design
Hypothesis. Within human interactions, human drivers
show courtesy to others, i.e., they optimize a compound
cost function in the form of C=Csel f +λcCcourt as (8)
instead of a selfish one as Csel f .
Independent Variable. To test our hypothesis, we run
two sets of IRL on the same set of human data, but
with one different feature. For the selfish cost function
Csel f , four features are selected as follows:
•speed feature fd: deviation of autonomous car’s
speed compared to the speed limit:
fd= (v−vd)2(19)
•comfort features facc and fsteer: jerk and steering
rate of the autonomous car;
•goal feature fg: distance to the target lane:
fg=edg
wl, (20)
where dgis the Euclidean distance and wlis the
lane width.
•safety feature fs: relative positions with respect to
surrounding cars;
fs=
ns
∑
i=1
e−di, (21)
where nsis the number of surrounding cars and
di,i=0, 1, ··· ,nsis the distance to each of them.
For the courtesy-aware cost function C=Csel f +
λcCcourt, we use the same four features as above, plus
one additional feature that equals to the courtesy term.
Dependent Measures. We measured the similarity be-
tween trajectories planned with the learned cost func-
tions and human driving trajectories on the test set (an-
other 53 left-lane changing scenarios that are different
from the training set from the NGSIM dataset).
C. Analysis
Training performance. The training results are shown
in Fig. 7 and Table I. One can see that with the
additional courtesy term, better learning performance
(in terms of training loss) has been achieved. This is a
sanity check: having access to one extra DOF can lead
to better training loss regardless, but if it did not that
would invalidate our hypothesis.
θgθdθacc θsteer θsλc
Csel f 1.0 2.08e+04 5.80e+02 3.91e+02 4.37 –
C1.0 1.96e+02 6.7e+04 2.36e+02 6.53 9.89e+04
TABLE I: The parameters in Clearned via IRL
Trajectory similarity. Figure 8 shows one demonstra-
tive example of the trajectories for a selfish car (grey)
and a courteous car (orange), with four surrounding
vehicles. The dark blue rectangle is the human driver
in our two-agent robot-human interaction system and
all other vehicles (cyan) are treated as moving obstacles.
Fig. 7: Training curves for cost functions with and without the
courtesy term
It shows that a simulated car with Cthat includes
courtesy manages to reduce its influence on the human
driver by choosing a much smoother and less aggres-
sive merging curve, while a car driven by Csel f merges
in much aggressively.
Simulated human trajectory with a courteous cost function
Simulated human trajectory with a selfish cost function
selfish courteous surrounding cars human driver
Fig. 8: An example pair of simulated trajectories with courteous (top)
and selfish (bottom) cost functions
Results for all 53 left-lane changing test trajectories
are given in Fig. 9 (left). To describe the similarities
among trajectories, we adopted the Mean Euclidean
Distance (MED) [21]. As shown in Fig. 9 (right), the
courtesy-aware trajectories are much similar to the
ground truth trajectories, i.e., a courteous robot car
behaves more human-like. We have also calculated the
space headways of the following human driver on
the robot car’s target lane for all 53 test scenarios,
and the statistical results are given in Fig. 9 (middle).
Compared to a selfish robot car, a courteous robot car
can achieve safer left-lane changing behaviours in terms
of following gaps for the human driver behind.
VI. Conclusion
Summary. We introduced courteous planning based
on the fact that human irrationally care more about
additional inconvenience they are brought to by others.
Courteous planning enables an autonomous car to take
into consideration such inconvenience when evaluating
its possible plans. We saw that not only this leads to
more courteous robot behavior, but it also helps explain
real human driving data, because humans too are likely
trying to be courteous.
longitudinal direction
lateral direction
Selfish
Courteous
Human data
0
2
4
6
8
10
12
14
Space headways
Selfish Courteous Human data
Space Headways
0
2
4
6
8
10
12
Mean Euclidean Distance (MED)
Selfish Courteous
Trajectory Similarities
Fig. 9: The courtesy term helps fit test set human driver data significantly better: we can see this from the actual trajectories (left), the
following gaps (middle), and the mean euclidean distances from the ground truth human data (right).
Limitations and Future Work. Despite the fact that
courtesy is not absolute, but relative to how well off
the human driver could be, the trade-off between cour-
tesy and selfishness remains a meta-parameter that is
difficult to set. In general, defining the right trade-off
parameters in the objective function for autonomous
cars and robots more broadly remains a challenge. With
autonomous cars, this is made worse by the fact that
it is not neccessarily a good idea to rely on Inverse
Reinforcement Learning–this might give us models of
human drivers, as it did in our last experiment, but that
might not be what we want the car to optimize for.
Further, we studied courtesy with a single human
driver to be courteous toward (we had other agents,
but the robot did not attempt courtesy toward them). In
real life, there will be many people on the road, and it
becomes difficult to be courteous to all. To some extent,
this is alleviated by our definition of courtesy: it is
not maximizing everyone’s utility, but it is minimizing
the inconvenience we cause. But further work needs to
push courtesy to the limits of interacting with multiple
people in cases where it is difficult to be courteous to
all.
Acknowledgement
This work was partially supported by Mines Paris-
Tech Foundation, “Automated Vehciles–Drive for All”
Chair, and NSF CAREER. We thank Jaime F. Fisac for
helpful discussion and feedback.
References
[1] Y. Kuwata, J. Teo, G. Fiore, S. Karaman, E. Frazzoli, and J. P.
How, “Real-time motion planning with applications to au-
tonomous urban driving,” IEEE Transactions on Control Systems
Technology, vol. 17, no. 5, pp. 1105–1118, 2009.
[2] Z. Liang, G. Zheng, and J. Li, “Automatic parking path op-
timization based on bezier curve fitting,” in Automation and
Logistics (ICAL), 2012 IEEE International Conference on. IEEE,
2012, pp. 583–587.
[3] W. Zhan, J. Chen, C. Y. Chan, C. Liu, and M. Tomizuka,
“Spatially-partitioned environmental representation and plan-
ning architecture for on-road autonomous driving,” in 2017
IEEE Intelligent Vehicles Symposium (IV), June 2017, pp. 632–639.
[4] A. Alahi, K. Goel, V. Ramanathan, A. Robicquet, L. Fei-Fei,
and S. Savarese, “Social LSTM: Human trajectory prediction
in crowded spaces,” in Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition, 2016, pp. 961–971.
[5] W. Zhan, C. Liu, C. Y. Chan, and M. Tomizuka, “A non-
conservatively defensive strategy for urban autonomous driv-
ing,” in 2016 IEEE 19th International Conference on Intelligent
Transportation Systems (ITSC), Nov. 2016, pp. 459–464.
[6] M. Shimosaka, K. Nishi, J. Sato, and H. Kataoka, “Predicting
driving behavior using inverse reinforcement learning with
multiple reward functions towards environmental diversity,” in
Intelligent Vehicles Symposium (IV), 2015 IEEE. IEEE, 2015, pp.
567–572.
[7] S. Levine and V. Koltun, “continuous inverse optimal control
with locally optimal examples„” in the 29th International Confer-
ence on Machine Learning (ICML-12), 2012.
[8] G. R. de Campos, P. Falcone, and J. Sjoberg, “Autonomous co-
operative driving: a velocity-based negotiation approach for in-
tersection crossing,” in Intelligent Transportation Systems-(ITSC),
2013 16th International IEEE Conference on. IEEE, 2013, pp. 1456–
1461.
[9] M. Hafner, D. Cunningham, L. Caminiti, and D. Del Vecchio,
“Automated vehicle-to-vehicle collision avoidance at intersec-
tions,” in Proceedings of world congress on intelligent transport
systems, 2011.
[10] H. Kretzschmar, M. Spies, C. Sprunk, and W. Burgard, “Socially
compliant mobile robot navigation via inverse reinforcement
learning,” The International Journal of Robotics Research, vol. 35,
no. 11, pp. 1289–1307, 2016.
[11] D. Sadigh, S. Sastry, S. A. Seshia, and A. D. Dragan, “Planning
for autonomous cars that leverage effects on human actions.” in
Robotics: Science and Systems, 2016.
[12] M. Bahram, A. Lawitzky, J. Friedrichs, M. Aeberhard, and
D. Wollherr, “A Game-Theoretic Approach to Replanning-
Aware Interactive Scene Prediction and Planning,” IEEE Trans-
actions on Vehicular Technology, vol. 65, no. 6, pp. 3981–3992, June
2016.
[13] N. Li, D. W. Oyler, M. Zhang, Y. Yildiz, I. Kolmanovsky, and
A. R. Girard, “Game Theoretic Modeling of Driver and Vehi-
cle Interactions for Verification and Validation of Autonomous
Vehicle Control Systems,” IEEE Transactions on Control Systems
Technology, vol. PP, no. 99, pp. 1–16, 2017.
[14] S. S. A. D. Dorsa Sadigh, Shankar S. Sastry, “Information gath-
ering actions over human internal state,” in Proceedings of the
IEEE/RSJ International Conference on Intelligent Robots and Systems
(IROS), October 2016, pp. 66–73.
[15] A. Tversky and D. Kahneman, “Advances in prospect theory:
Cumulative representation of uncertainty,” Journal of Risk and
uncertainty, vol. 5, no. 4, pp. 297–323, 1992.
[16] P. Abbeel and A. Y. Ng, “Apprenticeship learning via inverse
reinforcement learning,” in Proceedings of the twenty-first interna-
tional conference on Machine learning. ACM, 2004, p. 1.
[17] B. D. Ziebart, A. L. Maas, J. A. Bagnell, and A. K. Dey, “Maxi-
mum entropy inverse reinforcement learning.” in AAAI, vol. 8.
Chicago, IL, USA, 2008, pp. 1433–1438.
[18] P. Abbeel and A. Y. Ng, “Inverse reinforcement learning,” in
Encyclopedia of machine learning. Springer, 2011, pp. 554–558.
[19] V. Alexiadis, J. Colyar, J. Halkias, R. Hranac, and G. McHale,
“The Next Generation Simulation Program,” Institute of Trans-
portation Engineers. ITE Journal; Washington, vol. 74, no. 8, pp.
22–26, Aug. 2004.
[20] “https://julialang.org.”
[21] J. Quehl, H. Hu, O. S. Tas, E. Rehder, and M. Lauer, “How good
is my prediction? finding a similarity measure for trajectory
prediction evaluation.” in 2017 IEEE 18th International Conference
on Intelligent Transportation Systems (ITSC), 2017, pp. 120–125.