Conference PaperPDF Available

Courteous Autonomous Cars

October 2018

October 2018

DOI:10.1109/IROS.2018.8593969

Conference: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

Authors:

Liting Sun

University of California, Berkeley

Wei Zhan

University of California, Berkeley

Masayoshi Tomizuka

University of California, Berkeley

Interaction between a straight-driving human and a left-turning autonomous car: (a) a selfish (baseline) robot car takes a left turn immediately and forces the human driver to brake (red frames); (b) a courteous robot car waits in the middle of the intersection and takes the left turn after the human passes so that the human can maintain speed.

…

A blocking-area overtaking scenario: (a) with a selfish cost function, the robot car overtakes first and forces the human driver to brake; (b)(c) a courtesy-aware robot car yields to the human driver and even helps to block other cars depending on its formulation of the human driver's alternative world

…

Figures - uploaded by Liting Sun

Content may be subject to copyright.

Content uploaded by Liting Sun

Content may be subject to copyright.

Courteous Autonomous Cars

Liting Sun1, Wei Zhan1, Masayoshi Tomizuka1, and Anca D. Dragan2

Abstract— Typically, autonomous cars optimize for a

combination of safety, efﬁciency, and driving quality. But

as we get better at this optimization, we start seeing

behavior go from too conservative to too aggressive. The

car’s behavior exposes the incentives we provide in its

cost function. In this work, we argue for cars that are not

optimizing a purely selﬁsh cost, but also try to be courteous

to other interactive drivers. We formalize courtesy as a

term in the objective that measures the increase in another

driver’s cost induced by the autonomous car’s behavior.

Such a courtesy term enables the robot car to be aware

of possible irrationality of the human behavior, and plan

accordingly. We analyze the effect of courtesy in a variety

of scenarios. We ﬁnd, for example, that courteous robot

cars leave more space when merging in front of a human

driver. Moreover, we ﬁnd that such a courtesy term can help

explain real human driver behavior on the NGSIM dataset.

I. Introduction

Autonomous cars are getting better at generating

their motion not only in isolation, but also around

people. We now have many strategies for dealing with

interactions with people on the road, each modeling

people in substantially different ways.

Most techniques ﬁrst anticipate what people plan on

doing, and generate the car’s motion to be efﬁcient, but

also to safely stay out of their way. This prediction can

be as simple as assuming the person will maintain their

current velocity within the planning horizon [1]–[3], or

as complicated as learning a human driver policy or

cost function [4]–[7].

Other techniques account for the interactive nature of

coordinating on the road, and model people as chang-

ing their plans depending on what the car does. Some

do it via coupled planning, assuming that the person

and the robot are on the same team, optimizing the

same joint cost function [8]–[10], while others capture

interaction as a game in which the human and robot

have different utilities, but they inﬂuence each other’s

actions [11]–[13].

All of these works focus on how to optimize the

robot’s cost when the robot needs to interact with

people. In this paper, we focus on what the robot should

optimize in such situations, particularly if we consider

the fact that humans are not perfectly rational.

1Liting Sun, Wei Zhan and Masayoshi Tomizuka are with

the Department of Mechanical Engineering, University of Cal-

ifornia, Berkeley, CA, USA, 94720. {litingsun, wzhan,

tomizuka}@berkeley.edu

2Anca D. Dragan is with the Department of Electrical Engineering

and Computer Sciences, University of California, Berkeley, CA, USA,

94720. anca@berkeley.edu

Typically, when designing the robot’s cost function,

we focus on safety and driving quality of the ego

vehicle. Arguably, that is rather selﬁsh.

Selﬁshness has not been a problem with approaches

that predict human plans and react to them, because

that led to conservative robots that always try to stay

out of the way and let people do what they want.

But, as we are switching to more recent approaches

that draw on the game-theoretic aspects of interaction,

our cars are starting to become more aggressive. They

cut people off, or inch forward at intersections to go

ﬁrst [11] [14]. While this behavior is good sometimes,

we would not want to see it all the time.

Our observation is that as we get better at solving the

optimization problem for driving by better models of

the world and of the people in it, there is an increased

burden on the cost function we optimize to capture

what we want. We propose that purely selﬁsh robots

that care about their safety and driving quality are not

good enough. They should also be courteous to other

drivers. This is of crucial importance since humans

are not perfectly rational, and their behavior will be

inﬂuenced by the aggressiveness of the robot cars.

We advocate that a robot should balance minimizing the

inconvenience it brings to another driver, and that we can

formalize inconvenience as the increase in the other driver’s

cost due to the robot’s behavior to capture one aspect of

human behavior irrationality.

We make the following contributions:

A formalism for courtesy incorporating irrational

human behavior. We formalize courteous planning as

trading off between the robot’s selﬁsh objective and a

courtesy term, and introduce a mathematical deﬁni-

tion for this term for irrational human behavior – we

measure the increase of the vehicle’s best cost under

the robot’s planned behavior, compared to the vehicle’s

best cost under an alternative "best case scenario", and

deﬁne the cost increase as the courtesy term.

An analysis of the effects of courteous planning.

We show the difference between courteous and selﬁsh

robots under different trafﬁc scenarios. The courteous

robot leaves the person more space when it merges,

and might even block another agent (not a person) to

ensure that the human can safely proceed.

Showing that courtesy helps explain human driving.

We do an Inverse Reinforcement Learning (IRL)-based

analysis [7], [16]–[18] to study whether our courtesy

term helps in better predicting how humans drive. On

the NGSIM dataset [19] of real human driver trajecto-

arXiv:1808.02633v2 [cs.RO] 16 Aug 2018

ries, we ﬁnd that courtesy produces trajectories that are

signiﬁcantly closer to the ground truth.

We think that the autonomous car of the future

should be safe, efﬁcient, and courteous to others, per-

haps even more so than represented in our current

human-only driving society. Our paper enables au-

tonomous car designers to decide to make that happen.

II. Problem Statement

In this paper, we consider an interactive robot-human

system with two agents: an autonomous car Rand a

human driver H1. Our task is to enable a courteous

robot car which cares about the potential inconvenience

it brings to the human driver’s utilities, and generates

trajectories that are socially predictable and acceptable.

Throughout the paper, we denote all robot-related

terms by subscript (·)Rand all human-related terms

by (·)H.

Let xRand uRdenote, respectively, the robot’s state

and control input, and xHand uHfor the human’s.

x=(xT

R,xT

H)Trepresents the states of the interaction

system. For each agent, we have

xt+1

R=fRxt

R,ut

R, (1)

xt+1

H=fHxt

H,ut

H, (2)

and the overall system dynamics are

xt+1=fxt,ut

R,ut

H. (3)

We assume that both the human driver and the

autonomous car are optimal planners, and they use

Model Predictive Control (MPC) with a horizon of

length N. Let CRand CHbe, respectively, the cost

functions of the robot car and the human driver over

the horizon:

Cixt,uR,uH;θi=

N−1

∑

k=0

cixt,k,uk

R,uk

H;θi,i∈{R,H}

(4)

where ui=(u0

i,u1

i,··· ,uN−1

i)Tare sequences of con-

trol actions of the robot car (i=R) and the human

driver (i=H), and xt,kwith k=0, 1, ··· ,N−1 are the

corresponding sequence of system states. θirepresent,

respectively, the preferences of the robot car (i=R) and

the human driver (i=H). At every time step t, the

robot car and the human driver generate their optimal

sequences of actions u∗

Rand u∗

Hby minimizing CRand

CH, respectively, execute the ﬁrst steps u∗0

Rand u∗0

H(i.e.,

set ut

i=u∗0

iin (3)), and replan for step t+1.

Such an optimization-based state feedback strategy

formulates the closed-loop dynamics of the robot-

human interaction system as a game. To simplify the

game, we assume that the robot car has access to CH,

and that the human only computes a best response to

the robot’s actions rather than trying to inﬂuence them,

1If there are multiple robot cars that we control, we treat them all

as a single R. If there are multiple human drivers, we reason about

how each of them affects the robot’s utility separately.

as in [11]. This means that the robot car can compute,

for any control sequence it considers, how the human

would respond and what cost the human will incur:

u∗

H=arg min

uHCHxt,uR,uH;θH,g(xt,uR;θH)(5)

C∗

H(uR) = CHxt,uR,g(xt,uR;θH);θH. (6)

Here g(xt,uR;θH)represents the response curve of the

human driver towards the autonomous car.

Armed with this model, the robot can now compute

what it should do, such that when the human responds,

the combination is good for the robot’s cost:

u∗

R=arg min

uRCRxt,uR,g(xt,uR;θH);θR. (7)

Our goal is to generate courteous robot behavior

to the human, i.e. that takes into consideration the

inconvenience it brings to the human driver. We will

do so by changing the cost function of the robot to

reﬂect this inconvenience.

III. Courteous Planning

We propose a courteous planning strategy based on

one key observation: human is not perfectly rational,

and one of the irrationality is that they weight losses

higher than gains when evaluating their actions [15].

Hence, a courteous robot car should balance the mini-

mization of its own cost function and the inconvenience

(loss) it brings to the human driver.

Therefore, we construct CRin (7) as

CRxt,uR,uH;θR,θH,λc=Csel f

Rxt,uR,uH;θR

+λcCcourt

Rxt,uR,uH;θH,(8)

where Csel f

Ris the cost function for a regular (selﬁsh)

robot car which cares about only its own utilities

(safety, efﬁciency, etc), and Ccourt

Rmodels the courtesy

term of the robot car to the human driver. It is a

function of the robot car’s behavior, the human’s be-

havior, the human’s cost parameters (θH) and some

alternative costs (see Section III.A). λc∈[0, ∞)captures

the trade-off. If we want the robot car to be just as

courteous as a human driver, we could learn λcfrom

human driver demonstration, as we do in Section V.

As robot designers, we might set this parameter higher

than regular human driving to enable more courteous

autonomous cars, particularly when they do not have

passengers on board.

A. Alternative Costs

With any robot plan uR, the robot car changes the

human driver’s environment and therefore induces a

best cost for the human, C∗

H(uR). Our courtesy term

compares this cost with the alternative,Calt,∗

H– the best

case scenario for the person. It is not immediately clear

how to deﬁne this best case scenario since it may vary

depending different on driving scenarios. We explore

three alternatives.

What the human could have done, had the robot car

not been there. We ﬁrst consider a world in which the

robot car wouldn’t even exist to interfere the person.

In such a world, the person gets to optimize their cost

without the robot car:

Calt,∗

H(xt,θH) = min

uHCH(xt,uH;θH)(9)

This induces a very generous deﬁnition of courtesy: the

alternative is for the robot car to not have been on the

road at all. In reality though, the robot car is there,

which leads to our second alternative.

What the human could have done, had the robot

car only been there to help the human. Our second

alternative is to assume that the robot car already on

the road could be completely altruistic. The robot car

could actually optimize the human driver’s cost, being

a perfect collaborator:

Calt,∗

H(xt,θH) = min

uH,uRCH(xt,uR,uH;θH)(10)

For this alternative, the robot car and the human would

perform a joint optimization for the human’s cost. For

example, the robot car can brake to make sure that the

human could change lanes in front of it, or even block

another trafﬁc participant to make sure the human has

space.

What the human could have done, had the robot car

just kept doing what it was previously doing. A fully

collaborative robot car is still perhaps not the fairest

one to compute inconvenience against. After all, the

autonomous car does have a passenger sometimes, and

it is fair to take their needs into account too. Our third

alternative computes how well the human driver could

have done, had the robot car kept acting the same way

as it was previously doing:

Calt,∗

H(xt,θH) = min

uHCH(xt,ut−1

R,uH;θH)(11)

This means that the person is now responding to a con-

stant robot trajectory ut−1

R=(ut−1

R, .., ut−1

R), for instance,

maintaining its current velocity.

Our experiments below explore these three different

alternative options for the courtesy term.

B. Courtesy Term

We deﬁne the courtesy term based on the difference

between what cost the human has, and what cost they

would have had in the alternative:

Deﬁnition 1 (Courtesy of the Robot Car)

Ccourt

R(xt,uR,uH;θH) = max{0, CH(xt,uR,uH;θH)

−Calt,∗

H(xt;θH)}(12)

Note that we could have also sent the courtesy term

to simply be the human cost, and have the robot trade

off between its cost and the human’s. However, that

would have penalized the robot for any cost the human

incurs, even if the robot does not bring any inconve-

nience to the human. That might cause too conservative

behavior. In fact, if we treat the alternative cost as

the reference point in Prospect Theory – a human

irrationality model [15], then the theory suggests that

human weigh losses more than gains. This means that

our courteous robot car should care more about avoid-

ing additional inconvenience, rather than providing

more convenience, i.e., helping to reduce the human

cost lower than the alternative one. Mathematically, this

concept is formulated via Deﬁnition 1: the robot does

not get any bonus for bringing the human cost lower

than Calt,∗

H(possible with some deﬁnitions of Calt,∗

H), it

only gets a penalty for making it higher.

C. Solution

Thus far, we have constructed a compound cost func-

tion CR(xt,uR,uH;θR,θH,λc)to enable a courteous

robot car, considering three alternative costs. At every

step, the robot needs to solve the optimization problem

in (7) to ﬁnd the best actions to take. We approximate

the solution by alternatively ﬁxing one of uRor uH,

and solving for the other.

IV. Analysis of Courteous Planning

In this section, we analyze the effect of courteous

planning on the robot’s behavior in different simu-

lated driving scenarios. In Section V, we study how

courteous planning can help better explain real human

driving data, enabling robots to be more human-like

and predictable, as well as better able at anticipating

human driver actions on the road.

Simulation Environment: We implement the simulation

environment using Julia [20] on a 2.5 GHz Intel Core i7

processor with 16 GB RAM. We set the horizon length

to N=10, and the sampling time to 0.1s. Our simulated

environment is 1/10 scale of the real world: 1/10 road

width, car sizes, maximum acceleration (0.5m/s2) and

deceleration (-1.0m/s2), and low speed limit (1.0m/s).

Regarding the cost functions CHand CRin (6)-

(8), except for the courtesy term formulated above,

we penalize safety, car speed, comfort level and goal

distances in both CHand Csel f

R. Details about this can

be found later in Section V.

For all results, we denote a selﬁsh (baseline) au-

tonomous car with gray rectangle, a courteous one as

orange, and the human driver as dark blue.

A. The Effect of Courtesy

1) Lane Changing: We ﬁrst consider a lane changing

driving scenario, as shown in Fig. 1. The autonomous

car wants to merge into the human driver’s lane from

an adjacent lane. We assume that the goal of the human

driver is to maintain speed. Then all three different al-

ternatives lead to the same alternative optimal behavior

and cost of the human: the human would go in their

lane undisturbed by the robot. Hence, with constant

(b) intermediate courteous robot car:

human driver’s inconvenience = 0.0173

human driver’s inconvenience = 0

(a) selﬁsh robot car:

human driver’s inconvenience = 0.2063

Speed (m/s):

human: 0.78

robot: 1.0

human: 0.7

robot: 1.0

human: 0.75

robot: 1.0

human: 0.80

robot: 1.0

human: 0.85

robot: 1.0

human: 0.85

robot: 1.0

human: 0.85

robot: 1.0

human: 0.85

robot: 1.0

human: 0.85

robot: 1.0

Fig. 1: A lane changing scenario: both the human car and robot car speed at 0.85 m/s initially; (a) a selﬁsh robot car merges in front of the

human with a small gap so that the human brakes to yield; (b) an intermediate courteous robot car merges with a larger gap, which releases

the human driver from hard brakes; (c) a most courteous robot car merges with a gap large enough so that the human can maintain speed.

Speed (m/s):

human: 0.9

robot: 0.99

human: 0.8

robot: 1.0

human: 0.7

robot: 1.0

human: 0.9

robot: 0.92

human: 0.9

robot: 0.85

human: 0.9

robot: 0.85

(b) courteous robot car:

human driver’s inconvenience = 0

(a) selﬁsh robot car:

human driver’s inconvenience = 0.385

Fig. 2: Another lane changing scenario: both the human car and robot

car speed at 0.9 m/s initially; (a) a selﬁsh robot car accelerates and

merges in front of the human driver with a small gap, scaring the

human driver to brake; (b) a courteous robot car decelerates and

merges after the human driver so that the human can maintain speed.

Calt,∗

H, we focus on the inﬂuence of the trade-of factor

λcin the results.

We present two sets of simulation results in Fig. 1

and Fig. 2, where the initial human driver’s speeds are

0.85 m/s and 0.9 m/s respectively. The results show

that as λcincreases, i.e., being more courteous, the

autonomous car tends to leave a larger gap when it

merges in front of the human, and the human brakes

less (Fig. 1 from left to right). When the human driver’s

initial speed is high enough, a courteous autonomous

car decides to merge afterwards instead of cutting in,

as shown in Fig. 2.

Figure 3 summarizes the relationship between the

human driver’s inconvenience (the magnitude of the

courtesy term) and λcfor the simulation conditions

in Fig. 1. One can note that as the courtesy of the

autonomous car increases, the human driver’s incon-

venience decreases.

10−310−1101103105

Courtesy weight λc

0.00

0.05

0.10

0.15

0.20

human’s inconvenience

Fig. 3: Inconvenience to the human decreases as λcincreases.

2) Turning Left: In this scenario, an autonomous car

wants to take a left turn at an intersection with a

straight-driving human. In this case as well, the al-

ternative behaviors that we consider when evaluating

inconvenience are the same among three different al-

ternatives: the human driver crosses the intersection

maintaining speed.

Simulation results with a courteous and selﬁsh au-

tonomous car are shown in Fig. 4, where a selﬁsh robot

car takes a left turn immediately and forces the human

driver to brake (Fig. 4(a)); while a courteous robot car

waits in the middle of the intersection and takes the

left turn after the human driver passes the intersection

so that the human can maintain its speed (Fig. 4(b)).

B. Inﬂuence of Different Alternative Costs for Evaluating

Inconvenience

In the previous examples, the human would have

arrived at the same trajectory regardless of which alter-

(b) a courteous robot car waits until the human

driver passes the intersection

(a) a selﬁsh robot car takes the left turn ﬁrst

and forces the human driver to brake

Fig. 4: Interaction between a straight-driving human and a left-turning autonomous car: (a) a selﬁsh (baseline) robot car takes a left turn

immediately and forces the human driver to brake (red frames); (b) a courteous robot car waits in the middle of the intersection and takes

the left turn after the human passes so that the human can maintain speed.

native world we are considering to evaluate how much

inconvenience the autonomous car is causing. Here, we

consider a scenario in which that is no longer the case

to highlight the differences generated by the alternative

formulations of courtesy in the robot car’s behavior.

We consider a scenario where the human is turning

right, with a straight-driving robot car coming from

their left. In this scenario, the three alternative costs

are different, which leads to different courtesy terms:

•Alternative I–Robot car not being there: the op-

timal human behavior would be to take a right

turning directly;

•Alternative II–Robot car being collaborative: the

robot would take the necessary yielding maneuver

to let the human driver take the right turn ﬁrst,

leading to the same alternative optimal human

behavior of performing the right turn directly;

•Alternative III–Robot car maintaining behavior: the

robot car would maintain its speed, and the opti-

mal human behavior would be to slow down.

Figure 5 summarizes the results of using these differ-

ent courtesy terms. In Alternative III, a courteous robot

car goes ﬁrst, as shown in Fig. 5(a). Intuitively, this

is because Calt,∗

His initially high, and by maintaining

its speed (or even accelerating depending on Csel f

R), no

further inconvenience is brought to the human by the

robot car, i.e., Ccourt

Rremains zero. Hence, the robot car

goes ﬁrst (Had the robot try to brake, it only increases

Csel f

Rwithout changing Ccourt

R=0, and therefore CR

increases). The other two alternatives (I and II) are

much more generous to the human. Results in Fig. 5(b)

show that a courteous robot car ﬁnds it too expensive

to force the human to go second, and slows down to let

the human go ﬁrst. The red frames in Fig. 5(b) indicate

the time instants when the autonomous car brakes.

C. Extension to environments with multiple agents

We study a scenario on a two-way road. The robot car

and the human are driving towards opposite directions,

but the robot car is blocked and it has to temporarily

merge into the human driver’s lane to get through, as in

Fig. 6. We use the collaborative robot as our alternative

formulation of the courtesy term in this scenario.

(b) a courteous robot car yields and let the human go ﬁrst

(a) a courteous robot car goes ﬁrst

Fig. 5: Interaction between a right-turning human driver and a

courteous autonomous car with different courtesy terms: (a) the robot

car goes ﬁrst when it evaluates the courtesy term using going forward

as an alternative world; (b) the robot car yields and let the human

go ﬁrst when it evaluates the courtesy term based on a collaborative

or not-being-there alternative world.

When there are only two agents in the environment,

i.e., the autonomous car and the human driver, the

results for a selﬁsh and a courteous autonomous car

are shown in Fig. 6(a)-(b): A selﬁsh autonomous car

directly merges into the human’s lane and forces the

human driver to brake; while a courteous autonomous

car decides to wait until the human driver passes by

since the courtesy term becomes too expensive to go

ﬁrst.

Such courtesy-aware planning becomes much more

interesting when there is a third agent in the envi-

ronment, as shown in Fig. 6(c). We assume that the

third agent is a responsive agent to the autonomous

car and the autonomous car is courteous only to the

human driver (and not to both). In this case, for Calt,∗

the human would ideally want to pass undisturbed

by either the robot or the other agent: the courtesy

term captures the difference in cost to the human

between the robot’s behavior and the alternative of a

collaborative robot, and this cost to the human depends

on how much progress the human is able to make

and how fast. As a result, a very courteous robot has

an incentive to produce behavior that is as close as

possible to making that happen.

Then an interesting behavior emerges: the au-

tonomous car ﬁrst backs up to block the third agent

(the following car) from interrupting the human driver

until the human driver safely passes them, and then

the robot car ﬁnishes its task. This displays truly

collaborative behavior, and only happens with high

enough weight on the courtesy term. This may not be

practical for real on-road driving, but it enables the

design of highly courteous robots in some particular

scenarios where human have higher priority over all

other autonomous agents.

(b) a courteous robot car yields

(a) a selﬁsh robot car forces the human brake

selﬁsh courteous human other car blocked area

Fig. 6: A blocking-area overtaking scenario: (a) with a selﬁsh cost

function, the robot car overtakes ﬁrst and forces the human driver to

brake; (b)(c) a courtesy-aware robot car yields to the human driver

and even helps to block other cars depending on its formulation of

the human driver’s alternative world

V. Courtesy Helps Explain Human Driving

Thus far, we have shown that courtesy is useful for

enabling cars to generate actions that do not cause

inconvenience to other drivers. We have also seen that

the larger the weight we put on the courtesy term, the

more the car behavior becomes social. A natural next

question is – are humans courteous?

Our hypothesis is that our courtesy term can help

explain human driving behavior. If that is the case, this

has two important implications: it means that it can

enable robots to better predict human actions by giving

them a more accurate model of how people drive, and

it also means that robot can use courtesy to produce

more human-like driving.

We put our hypothesis to the test by learning a cost

function from human driver data, with and without

a courtesy feature. We ﬁnd that using the courtesy

feature leads to a more accurate cost function that

is better at reproducing human driver data, lending

support to our hypothesis.

A. Learning Cost Functions from Human Demonstrations

1) Human Data Collection: The human data is col-

lected from the Next Generation SIMulation (NGSIM)

dataset [19], which captures the highway driving be-

haviors/trajectories by digital video cameras mounted

on top of surrounding buildings. We selected 153 left-

lane-changing driving trajectories on Interstate 80 (near

Emeryville, California), and separated them into two

sets: a training set of size 100 (denoted by UD, i.e., the

human demonstrations), and the other 53 trajectories

as the test set.

2) Learning Algorithm: We use Inverse Reinforcement

Learning (IRL) [7], [16]–[18] to learn an appropriate cost

function from human data.

We assume that cost function is parameterized as a

linear combination of features:

c(xt,ut

R,ut

H;θ) = θTφ(xt,ut

R,ut

H). (13)

Then over the trajectory length L, the cumulative cost

function becomes

C(x0,uR,uH;θ) = θTL−1

∑

t=0

φ(xt,ut

R,ut

=θTΦ(x0,uR,uH)(14)

where uRand uHare, respectively, the actions of the

robot car and the human over the trajectory. Our goal

is to ﬁnd the weights θwhich maximizes the likelihood

of the demonstrations:

θ∗=arg max

θP(UD|θ)(15)

Building on the principle of maximum entropy, we

assume that trajectories are exponentially more likely

when they have lower cost:

P(uH,θ)∝exp −C(x0,uR,uH;θ). (16)

Thus the probability (likelihood) of the demonstration

set becomes

P(UD|θ) = Πn

i=1

P(uD

H,i,θ)

P(θ)=Πn

i=1

P(uD

H,i,θ)

RP(˜

uH,θ)d˜

(17)

where nis the number of trajectories in UD.

To tackle the partition term RP(˜

uH,θ)d˜

uHin (17),

we approximate C(x0,uR,˜

uH;θ)with its Laplace ap-

proximation as proposed in [7]:

C(x0,uR,˜

uH;θ)≈C(x0,uR,uD

H,i;θ)+ ˜

uH−uD

H,iT∂C

∂uH

2˜

uH−uD

H,iT∂2C

∂u2

H˜

uH−uD

H,i.

(18)

With the assumption of locally optimal demonstra-

tions, we have ∂C

∂uH|uD

H,i=0 in (18). This simpliﬁes the

partition term RP(˜

uH,θ)d˜

uHas a Gaussian Integral

where a closed-form solution exists (see [7] for details).

Substituting (17) and (18) into (15) yields the optimal

parameter θ∗as the maximizer.

B. Experiment Design

Hypothesis. Within human interactions, human drivers

show courtesy to others, i.e., they optimize a compound

cost function in the form of C=Csel f +λcCcourt as (8)

instead of a selﬁsh one as Csel f .

Independent Variable. To test our hypothesis, we run

two sets of IRL on the same set of human data, but

with one different feature. For the selﬁsh cost function

Csel f , four features are selected as follows:

•speed feature fd: deviation of autonomous car’s

speed compared to the speed limit:

fd= (v−vd)2(19)

•comfort features facc and fsteer: jerk and steering

rate of the autonomous car;

•goal feature fg: distance to the target lane:

fg=edg

wl, (20)

where dgis the Euclidean distance and wlis the

lane width.

•safety feature fs: relative positions with respect to

surrounding cars;

fs=

∑

i=1

e−di, (21)

where nsis the number of surrounding cars and

di,i=0, 1, ··· ,nsis the distance to each of them.

For the courtesy-aware cost function C=Csel f +

λcCcourt, we use the same four features as above, plus

one additional feature that equals to the courtesy term.

Dependent Measures. We measured the similarity be-

tween trajectories planned with the learned cost func-

tions and human driving trajectories on the test set (an-

other 53 left-lane changing scenarios that are different

from the training set from the NGSIM dataset).

C. Analysis

Training performance. The training results are shown

in Fig. 7 and Table I. One can see that with the

additional courtesy term, better learning performance

(in terms of training loss) has been achieved. This is a

sanity check: having access to one extra DOF can lead

to better training loss regardless, but if it did not that

would invalidate our hypothesis.

θgθdθacc θsteer θsλc

Csel f 1.0 2.08e+04 5.80e+02 3.91e+02 4.37 –

C1.0 1.96e+02 6.7e+04 2.36e+02 6.53 9.89e+04

TABLE I: The parameters in Clearned via IRL

Trajectory similarity. Figure 8 shows one demonstra-

tive example of the trajectories for a selﬁsh car (grey)

and a courteous car (orange), with four surrounding

vehicles. The dark blue rectangle is the human driver

in our two-agent robot-human interaction system and

all other vehicles (cyan) are treated as moving obstacles.

Fig. 7: Training curves for cost functions with and without the

courtesy term

It shows that a simulated car with Cthat includes

courtesy manages to reduce its inﬂuence on the human

driver by choosing a much smoother and less aggres-

sive merging curve, while a car driven by Csel f merges

in much aggressively.

Simulated human trajectory with a courteous cost function

Simulated human trajectory with a selﬁsh cost function

selﬁsh courteous surrounding cars human driver

Fig. 8: An example pair of simulated trajectories with courteous (top)

and selﬁsh (bottom) cost functions

Results for all 53 left-lane changing test trajectories

are given in Fig. 9 (left). To describe the similarities

among trajectories, we adopted the Mean Euclidean

Distance (MED) [21]. As shown in Fig. 9 (right), the

courtesy-aware trajectories are much similar to the

ground truth trajectories, i.e., a courteous robot car

behaves more human-like. We have also calculated the

space headways of the following human driver on

the robot car’s target lane for all 53 test scenarios,

and the statistical results are given in Fig. 9 (middle).

Compared to a selﬁsh robot car, a courteous robot car

can achieve safer left-lane changing behaviours in terms

of following gaps for the human driver behind.

VI. Conclusion

Summary. We introduced courteous planning based

on the fact that human irrationally care more about

additional inconvenience they are brought to by others.

Courteous planning enables an autonomous car to take

into consideration such inconvenience when evaluating

its possible plans. We saw that not only this leads to

more courteous robot behavior, but it also helps explain

real human driving data, because humans too are likely

trying to be courteous.

longitudinal direction

lateral direction

Selﬁsh

Courteous

Human data

Space headways

Selﬁsh Courteous Human data

Space Headways

Mean Euclidean Distance (MED)

Selﬁsh Courteous

Trajectory Similarities

Fig. 9: The courtesy term helps ﬁt test set human driver data signiﬁcantly better: we can see this from the actual trajectories (left), the

following gaps (middle), and the mean euclidean distances from the ground truth human data (right).

Limitations and Future Work. Despite the fact that

courtesy is not absolute, but relative to how well off

the human driver could be, the trade-off between cour-

tesy and selﬁshness remains a meta-parameter that is

difﬁcult to set. In general, deﬁning the right trade-off

parameters in the objective function for autonomous

cars and robots more broadly remains a challenge. With

autonomous cars, this is made worse by the fact that

it is not neccessarily a good idea to rely on Inverse

Reinforcement Learning–this might give us models of

human drivers, as it did in our last experiment, but that

might not be what we want the car to optimize for.

Further, we studied courtesy with a single human

driver to be courteous toward (we had other agents,

but the robot did not attempt courtesy toward them). In

real life, there will be many people on the road, and it

becomes difﬁcult to be courteous to all. To some extent,

this is alleviated by our deﬁnition of courtesy: it is

not maximizing everyone’s utility, but it is minimizing

the inconvenience we cause. But further work needs to

push courtesy to the limits of interacting with multiple

people in cases where it is difﬁcult to be courteous to

all.

Acknowledgement

This work was partially supported by Mines Paris-

Tech Foundation, “Automated Vehciles–Drive for All”

Chair, and NSF CAREER. We thank Jaime F. Fisac for

helpful discussion and feedback.

References

[1] Y. Kuwata, J. Teo, G. Fiore, S. Karaman, E. Frazzoli, and J. P.

How, “Real-time motion planning with applications to au-

tonomous urban driving,” IEEE Transactions on Control Systems

Technology, vol. 17, no. 5, pp. 1105–1118, 2009.

[2] Z. Liang, G. Zheng, and J. Li, “Automatic parking path op-

timization based on bezier curve ﬁtting,” in Automation and

Logistics (ICAL), 2012 IEEE International Conference on. IEEE,

2012, pp. 583–587.

[3] W. Zhan, J. Chen, C. Y. Chan, C. Liu, and M. Tomizuka,

“Spatially-partitioned environmental representation and plan-

ning architecture for on-road autonomous driving,” in 2017

IEEE Intelligent Vehicles Symposium (IV), June 2017, pp. 632–639.

[4] A. Alahi, K. Goel, V. Ramanathan, A. Robicquet, L. Fei-Fei,

and S. Savarese, “Social LSTM: Human trajectory prediction

in crowded spaces,” in Proceedings of the IEEE Conference on

Computer Vision and Pattern Recognition, 2016, pp. 961–971.

[5] W. Zhan, C. Liu, C. Y. Chan, and M. Tomizuka, “A non-

conservatively defensive strategy for urban autonomous driv-

ing,” in 2016 IEEE 19th International Conference on Intelligent

Transportation Systems (ITSC), Nov. 2016, pp. 459–464.

[6] M. Shimosaka, K. Nishi, J. Sato, and H. Kataoka, “Predicting

driving behavior using inverse reinforcement learning with

multiple reward functions towards environmental diversity,” in

Intelligent Vehicles Symposium (IV), 2015 IEEE. IEEE, 2015, pp.

567–572.

[7] S. Levine and V. Koltun, “continuous inverse optimal control

with locally optimal examples„” in the 29th International Confer-

ence on Machine Learning (ICML-12), 2012.

[8] G. R. de Campos, P. Falcone, and J. Sjoberg, “Autonomous co-

operative driving: a velocity-based negotiation approach for in-

tersection crossing,” in Intelligent Transportation Systems-(ITSC),

2013 16th International IEEE Conference on. IEEE, 2013, pp. 1456–

1461.

[9] M. Hafner, D. Cunningham, L. Caminiti, and D. Del Vecchio,

“Automated vehicle-to-vehicle collision avoidance at intersec-

tions,” in Proceedings of world congress on intelligent transport

systems, 2011.

[10] H. Kretzschmar, M. Spies, C. Sprunk, and W. Burgard, “Socially

compliant mobile robot navigation via inverse reinforcement

learning,” The International Journal of Robotics Research, vol. 35,

no. 11, pp. 1289–1307, 2016.

[11] D. Sadigh, S. Sastry, S. A. Seshia, and A. D. Dragan, “Planning

for autonomous cars that leverage effects on human actions.” in

Robotics: Science and Systems, 2016.

[12] M. Bahram, A. Lawitzky, J. Friedrichs, M. Aeberhard, and

D. Wollherr, “A Game-Theoretic Approach to Replanning-

Aware Interactive Scene Prediction and Planning,” IEEE Trans-

actions on Vehicular Technology, vol. 65, no. 6, pp. 3981–3992, June

2016.

[13] N. Li, D. W. Oyler, M. Zhang, Y. Yildiz, I. Kolmanovsky, and

A. R. Girard, “Game Theoretic Modeling of Driver and Vehi-

cle Interactions for Veriﬁcation and Validation of Autonomous

Vehicle Control Systems,” IEEE Transactions on Control Systems

Technology, vol. PP, no. 99, pp. 1–16, 2017.

[14] S. S. A. D. Dorsa Sadigh, Shankar S. Sastry, “Information gath-

ering actions over human internal state,” in Proceedings of the

IEEE/RSJ International Conference on Intelligent Robots and Systems

(IROS), October 2016, pp. 66–73.

[15] A. Tversky and D. Kahneman, “Advances in prospect theory:

Cumulative representation of uncertainty,” Journal of Risk and

uncertainty, vol. 5, no. 4, pp. 297–323, 1992.

[16] P. Abbeel and A. Y. Ng, “Apprenticeship learning via inverse

reinforcement learning,” in Proceedings of the twenty-ﬁrst interna-

tional conference on Machine learning. ACM, 2004, p. 1.

[17] B. D. Ziebart, A. L. Maas, J. A. Bagnell, and A. K. Dey, “Maxi-

mum entropy inverse reinforcement learning.” in AAAI, vol. 8.

Chicago, IL, USA, 2008, pp. 1433–1438.

[18] P. Abbeel and A. Y. Ng, “Inverse reinforcement learning,” in

Encyclopedia of machine learning. Springer, 2011, pp. 554–558.

[19] V. Alexiadis, J. Colyar, J. Halkias, R. Hranac, and G. McHale,

“The Next Generation Simulation Program,” Institute of Trans-

portation Engineers. ITE Journal; Washington, vol. 74, no. 8, pp.

22–26, Aug. 2004.

[20] “https://julialang.org.”

[21] J. Quehl, H. Hu, O. S. Tas, E. Rehder, and M. Lauer, “How good

is my prediction? ﬁnding a similarity measure for trajectory

prediction evaluation.” in 2017 IEEE 18th International Conference

on Intelligent Transportation Systems (ITSC), 2017, pp. 120–125.

Does Unpredictability Influence Driving Behavior?

Preprint

Jul 2023

In this paper we investigate the effect of the unpredictability of surrounding cars on an ego-car performing a driving maneuver. We use Maximum Entropy Inverse Reinforcement Learning to model reward functions for an ego-car conducting a lane change in a highway setting. We define a new feature based on the unpredictability of surrounding cars and use it in the reward function. We learn two reward functions from human data: a baseline and one that incorporates our defined unpredictability feature, then compare their performance with a quantitative and qualitative evaluation. Our evaluation demonstrates that incorporating the unpredictability feature leads to a better fit of human-generated test data. These results encourage further investigation of the effect of unpredictability on driving behavior.

Nudging human drivers via implicit communication by automated vehicles: Empirical evidence and computational cognitive modeling

Article

Full-text available

May 2024
INT J HUM-COMPUT ST

Understanding behavior of human drivers in interactions with automated vehicles (AV) can aid the development of future AVs. Existing investigations of such behavior have predominantly focused on situations in which an AV a priori needs to take action because the human has the right of way. However, future AVs might need to proactively manage interactions even if they have the right of way over humans, e.g., a human driver taking a left turn in front of the approaching AV. Yet it remains unclear how AVs could behave in such interactions and how humans would react to them. To address this issue, here we investigated behavior of human drivers (N = 19) when interacting with an oncoming AV during unprotected left turns in a driving simulator experiment. We measured the outcomes (Go or Stay) and timing of participants’ decisions when interacting with an AV which performed subtle longitudinal nudging maneuvers, e.g. briefly decelerating and then accelerating back to its original speed. We found that participants’ behavior was sensitive to deceleration nudges but not acceleration nudges. We compared the obtained data to predictions of several variants of a drift-diffusion model of human decision making. The most parsimonious model that captured the data hypothesized noisy integration of dynamic information on time-to-arrival and distance to a fixed decision boundary, with an initial accumulation bias towards the Go decision. Our model not only accounts for the observed behavior but can also flexibly generate predictions of human responses to arbitrary longitudinal AV maneuvers, and can be used for both informing future studies of human behavior and incorporating insights from such studies into computational frameworks for AV interaction planning.

Polish city residents' social perception of autonomous vehicles

Article

Sep 2023

The paper presents the results of a social study relating to the awareness and attitudes of city residents towards the prospects of using autonomous vehicles as a means of urban transportation, including last-mile transportation. The main objective of the study was to find answers to the following questions: 1) what is the current state of city residents' knowledge of autonomous vehicles, 2) what attitude respondents have towards the phenomenon under study, 3) what determinants may foster acceptance of this form of transportation as part of daily mobility, 4) what scenarios for travel using self-drive cars are most likely at present. The source base consisted of contributions from 648 respondents qualified for the study based on the criterion of place of residence. Data analysis and interpretation were carried out using the SPSS package and using statistical tests. The results of the study allow us to draw the conclusion that the use of autonomous vehicles as part of daily mobility requires achieving a state of public awareness where the choice of autonomous transportation will become, for city residents, a conscious and desirable form of mobility in urban agglomerations, competitive to other modes of transport.

Human-inspired autonomous driving: A survey

Article

Full-text available

Sep 2023
COGN SYST RES

Evaluation method with digital expert on the criticality of car-following scenarios for autonomous vehicles testing

Article

Apr 2024

Evaluation of autonomous vehicles is one of the major challenges before they can be released. Due to the advantages in efficiency, cost, and safety, scenario-based simulation methods have recently received great attention. Even so, as the complexity and uncertainty exist in the real driving environment, the scenarios that autonomous vehicles may encounter are infinite. Therefore, it is necessary to classify simulation scenarios according to their criticality. It contributes to accelerating the evaluation processes. This paper presents a novel criticality evaluation method, based on a proposed Digital Expert, for car-following autonomous driving. The Digital Expert acts as the evaluator to evaluate the criticality of scenarios depending on their driving performance. Driving performance refers to the achieved degree of driving intentions. Firstly, a Digital Expert is established as the evaluator for the criticality of the scenario using the inverse reinforcement learning method. Then, based on the fact that the intention of Digital Expert is to maximize its internal reward function, the reward function is used to evaluate driving performance. Finally, calculating the criticality of the car-following scenario according to the mapping relationship between driving performance and criticality. Using the driving data in the NGSIM data set, this paper generates two groups of simulated car-following scenarios and evaluates the criticalities of the two scenarios. The experimental results show that the proposed criticality evaluation method can reasonably evaluate the criticality of car-following scenarios.

A Two-Stage Based Social Preference Recognition in Multi-Agent Autonomous Driving System

Conference Paper

Oct 2023

Does Unpredictability Influence Driving Behavior?

Conference Paper

Oct 2023

Car-Following Behavior Modeling With Maximum Entropy Deep Inverse Reinforcement Learning

Article

Sep 2023

Modeling driving behavior plays a pivotal role in advancing the development of human-like autonomous driving. In light of this, this paper proposes a car-following behavior modeling method with sample-based deep inverse reinforcement learning (DIRL). Due to the challenges associated with feature extraction and the limited fitting capacity of linear functions, traditional IRL, which employs feature-based linear functions to represent reward functions, exhibits low modeling accuracy. Accordingly, DIRL leverages deep neural networks to represent reward functions. However, the requirement for reinforcement learning to determine the optimal policy for DIRL's reward function makes the training and inference processes computationally resource-intensive and inefficient. To address this issue, this paper proposes the sample-based DIRL. Through solution space discretization, sample-based DIRL streamlines the integration calculation of the partition function into a summation, resulting in improved computational efficiency. Specifically, it is a three-stage framework: sampling candidate trajectories, evaluating candidate trajectories, and selecting the trajectory with the highest reward. In order to evaluate DIRL at both the level of driving behavior and the reward function, the MPC-based virtual driver with the explicit reward function is utilized to collect driving data for training and assessing the convergence of the learned reward function. The experimental results confirm that the proposed method can accurately model the car-following behavior, and acquire the driver's reward function from the driving data.

Interaction and Decision Making-aware Motion Planning using Branch Model Predictive Control

Conference Paper

Jun 2023

Interaction-Aware Planning With Deep Inverse Reinforcement Learning for Human-Like Autonomous Driving in Merge Scenarios

Article

Jan 2024

Merge scenarios on highway are often challenging for autonomous driving, due to its lack of sufficient tacit understanding on and subtle interaction with human drivers in the traffic flow. This, as a result, may impose serious safety risks, and often cause traffic jam with autonomous driving. Therefore, human-like autonomous driving becomes important, yet imperative. This paper presents an interaction-aware decision-making and planning method for human-like autonomous driving in merge scenarios. Rather than directly mimicking human behavior, deep inverse reinforcement learning is employed to learn the human-used reward function for decision-making and planning from naturalistic driving data to enhance interpretability and generalizability. To consider the interaction factor, the reward function for planning is utilized to evaluate the joint trajectories of the autonomous driving vehicle (ADV) and traffic vehicles. In contrast to predicting trajectories of traffic vehicles with the fixed trajectory of ADV given by the upstream prediction model, the trajectories of traffic vehicles are predicted by responding to the ADV's behavior in this paper. Additionally, the decision-making module is employed to reduce the solution space of planning via the selection of a proper gap for merging. Both the decision-making and planning algorithms follow a “sampling, evaluation, and selection” framework. After being verified through experiments, the results indicate that the planned trajectories with the presented method are highly similar to those of human drivers. Moreover, compared to the interaction-unaware planning method, the interaction-aware planning method behaves closer to human drivers.

Spatially-partitioned environmental representation and planning architecture for on-road autonomous driving

Conference Paper

Full-text available

Jun 2017

How good is my prediction? Finding a similarity measure for trajectory prediction evaluation

Conference Paper

Oct 2017

A non-conservatively defensive strategy for urban autonomous driving

Conference Paper

Nov 2016

From the driving strategy point of view, a major challenge for autonomous vehicles in urban environment is to behave defensively to potential dangers, yet to not overreact to threats with low probability. As it is overwhelming to program the action rules case-by-case, a unified planning framework under uncertainty is proposed in this paper, which achieves a non-conservatively defensive strategy (NCDS) in various kinds of scenarios for urban autonomous driving. First, uncertainties in urban scenarios are simplified to two probabilistic cases, namely passing and yielding. Two-way-stop intersection is used as an exemplar scenario to illustrate the derivation of probabilities for different intentions of others via a logistic regression model. Then a deterministic planner is designed as the baseline. Also, a safe set is defined, which considers both current and preview safety. The planning framework under uncertainty is then proposed, in which safety is guaranteed and overcautious behavior is prevented. Finally, the proposed planning framework is tested by simulation in the exemplar scenario, which demonstrates that an NCDS can be realistically achieved by employing the proposed framework.

Information gathering actions over human internal state

Conference Paper

Oct 2016

Social LSTM: Human Trajectory Prediction in Crowded Spaces

Conference Paper

Jun 2016

Predicting driving behavior using inverse reinforcement learning with multiple reward functions towards environmental diversity

Conference Paper

Jun 2015

Predicting defensive driving is a promising technology for novel advanced driver assistance systems. In recent years, modeling driving behavior in residential roads through inverse reinforcement learning (IRL) has been attracting attention in intelligent vehicle community thanks to the superiority of this approach providing long-term prediction of fine-grained driving behavior. However, it suffers from poor performance in diverse environment due to the fact that the single reward function could not handle all the environment with large diversity. Towards this issue, a novel IRL framework with multiple reward functions to deal with environmental diversity is proposed in the paper. Specifically, the model employs Dirichlet process mixtures as a flexible and powerful Bayesian model to divide the environment into clusters and learns the parameters in each cluster simultaneously. Experimental result with expert driver behavior data shows that our model with multiple reward functions provides superior performance over the IRL model with single reward function. It also suggests that the clustering of environments based on the driving behavior of professional drivers could be useful on evaluating driving environments.

Inverse Reinforcement Learning

Chapter

Jan 2016

Game-Theoretic Modeling of Driver and Vehicle Interactions for Verification and Validation of Autonomous Vehicle Control Systems

Article

Aug 2016

Autonomous driving has been the subject of increased interest in recent years both in industry and in academia. Serious efforts are being pursued to address legal, technical and logistical problems and make autonomous cars a viable option for everyday transportation. One significant challenge is the time and effort required for the verification and validation of the decision and control algorithms employed in these vehicles to ensure a safe and comfortable driving experience. Hundreds of thousands of miles of driving tests are required to achieve a well calibrated control system that is capable of operating an autonomous vehicle in an uncertain traffic environment where multiple interactions between vehicles and drivers simultaneously occur. Traffic simulators where these interactions can be modeled and represented with reasonable fidelity can help decrease the time and effort necessary for the development of the autonomous driving control algorithms by providing a venue where acceptable initial control calibrations can be achieved quickly and safely before actual road tests. In this paper, we present a game theoretic traffic model that can be used to 1) test and compare various autonomous vehicle decision and control systems and 2) calibrate the parameters of an existing control system. We demonstrate two example case studies, where, in the first case, we test and quantitatively compare two autonomous vehicle control systems in terms of their safety and performance, and, in the second case, we optimize the parameters of an autonomous vehicle control system, utilizing the proposed traffic model and simulation environment.

Planning for Autonomous Cars that Leverage Effects on Human Actions

Conference Paper

Jun 2016

Socially compliant mobile robot navigation via inverse reinforcement learning

Article

Jan 2016

Mobile robots are increasingly populating our human environments. To interact with humans in a socially compliant way, these robots need to understand and comply with mutually accepted rules. In this paper, we present a novel approach to model the cooperative navigation behavior of humans. We model their behavior in terms of a mixture distribution that captures both the discrete navigation decisions, such as going left or going right, as well as the natural variance of human trajectories. Our approach learns the model parameters of this distribution that match, in expectation, the observed behavior in terms of user-defined features. To compute the feature expectations over the resulting high-dimensional continuous distributions, we use Hamiltonian Markov chain Monte Carlo sampling. Furthermore, we rely on a Voronoi graph of the environment to efficiently explore the space of trajectories from the robot’s current position to its target position. Using the proposed model, our method is able to imitate the behavior of pedestrians or, alternatively, to replicate a specific behavior that was taught by tele-operation in the target environment of the robot. We implemented our approach on a real mobile robot and demonstrated that it is able to successfully navigate in an office environment in the presence of humans. An extensive set of experiments suggests that our technique outperforms state-of-the-art methods to model the behavior of pedestrians, which also makes it applicable to fields such as behavioral science or computer graphics.

Courteous Autonomous Cars

Figures

Recommended publications

Corrigendum to “Vision-based active safety system for automatic stopping” [Expert Systems with Appli...

DLFuzz: Differential Fuzzing Testing of Deep Learning Systems

Steering and velocity commands for parking assistance

Study on Agile Supply Chain