Conference PaperPDF Available

A Game-Theoretical Approach to Driving Decision Making in Highway Scenarios

June 2018

June 2018

DOI:10.1109/IVS.2018.8500496

Conference: 2018 IEEE Intelligent Vehicles Symposium (IV)

Authors:

Jun Wang

Tongji University

Yihuan Zhang

Tsinghua University Automotive Research Institute, Suzhou, China

Highway driving scenarios.

…

Figures - uploaded by Yihuan Zhang

Content may be subject to copyright.

Content uploaded by Yihuan Zhang

Content may be subject to copyright.

A Game-Theoretical Approach to Driving Decision Making in Highway

Scenarios

Zhihai Yan, Jun Wang∗, Yihuan Zhang

Abstract— With the development of self-driving technology,

the fundamental behaviors like car-following, lane change

have been validated and tested in various kinds of scenarios.

Currently one of the most challenging domain for self-driving

is decision making under dynamic environments. For self-

driving cars, it is essential to understand and estimate other

vehicles’ behavior and behave like a human driver to interact

with other vehicles in the mean time. In this paper, a game

theoretical approach is proposed to model the interaction of

vehicles while considering the surrounding trafﬁc situations.

One of the novel move is that a neural network is applied

to establish the payoff function in the game which is able to

describe the interaction more precisely. The calibration method

is then applied to estimate the parameters by using the Next

Generation SIMulation (NGSIM) dataset. The experiments

demonstrate the accuracy of the proposed method and the

ability of making a cooperate decision in highway scenarios.

I. INTRODUCTION

Recently, self-driving technology has been widely applied

in the transportation and military. Highway is an important

component of transportation system. However, the proper-

ties of long distance, high speed, and trafﬁc congestion

make highway driving is stressful and dangerous. Thus, the

decision making in highway scenarios is one of the most

challenging problem for self-driving cars.

The most common decision making methods are tackled

by manual-deﬁned rules corresponding to speciﬁc situations.

A variety of solutions including ﬁnite state machines [1] and

hierarchical state machines [2] are used to make decisions for

self-driving cars. However, the methods of manual-deﬁned

rules are tailored for speciﬁc and simpliﬁed trafﬁc scenarios

without considering the uncertainty of drivers.

Partially Observable Markov Decision Process (POMDP)

provides a mathematically framework of the decision making

problem in dynamic, uncertain scenarios such as highway

driving. Ulbrich [3] applied a POMDP to solve the decision

making problem of self-driving cars. However, the method

was time-consuming and can hardly applied in real-time. In

order to solve this problem, Edwin [4] proposed an improved

approach of POMDP named multi-policy decision making,

which used a set of possible high-level policies to replace

the continuous action space. However, these methods do not

put the subject vehicle and the other vehicles in the same

level when considering the dynamic interactions. They focus

Zhihai Yan, Jun Wang and Yihuan Zhang are with the Department of

Control Science and Engineering, Tongji University, Shanghai 201804, P.

R. China.

∗Corresponding author junwang@tongji.edu.cn

Fig. 1. Highway driving scenarios.

on the decision of the subject vehicle while ignoring the

decisions of other vehicles.

Other probabilistic methods were proposed to deal with

the interactive merging scenarios [5], [6]. In addition, a

social behavior generator was proposed to generate the lane

change trajectory under interactions with the surrounding

vehicles [7]. In order to model a more accurate interaction

model between different drivers, the game theory is applied

in this paper. Game theory constructs mathematical models

to deal with the conﬂict and cooperation between decision

makers. The drivers’ decisions converge to the equilibrium

of a set of strategies to maximize their rewards together in an

independent game. Kita [8] ﬁrst proposed a game theoretical

model to describe the on-ramp merging behavior by using

a discrete choice model. Talebpour [9] deﬁned a two-type

game of lane change behaviors according to the safety and

speed gain for the drivers. Recently, Hesham [10] deﬁned

a model of merging maneuvers at freeway on-ramps. This

model deﬁned three strategies (change, wait, overtake) for the

subject vehicle and two strategies (yield, block) for the lag

vehicle. Based on the game theory, these methods considered

the interactions among drivers and make a cooperate decision

for each driver in the game. However, the most difﬁcult

part of the game theoretical methods is the formulation of

the payoff function of the strategies for each game-player,

because it is hardly to determine the effects of different

2018 IEEE Intelligent Vehicles Symposium (IV)

Changshu, Suzhou, China, June 26-30, 2018

factors to each player.

Many researches have demonstrated the effectiveness of

artiﬁcial intelligence methods in car-following scenarios [11]

and lane change scenarios [12]. Inspired by these work, a

game theoretical based framework is proposed in this paper

to model the interaction between human drivers. A two-

person non-zero-sum non-cooperative game under complete

information is presented to model the decision making be-

havior in highway driving. Moreover, an improved Gaussian

Particle Swarm Optimization (GPSO) method is applied to

calibrate the model. The main contribution of this paper is

that we have built a neural-network based payoff model to

describe the rewards of each driver and calibrate the model

using real trafﬁc data.

The remained paper is organized as follows. The frame-

work of the proposed decision making method in highway

driving is detailed in Section II. The trafﬁc scenario extrac-

tion and model calibration are described in Section III. Ex-

periments on model validation are carried out in Section IV.

Conclusions and future works are presented in Section V.

II. PROP OSE D METHOD

The interaction between the subject vehicle and the lag

vehicle can be seen as a game, as shown in Fig. 1. The

drivers of the subject vehicle and the lag vehicle choose their

best strategy through the game. Generally, a game has three

important factors: game players, strategies for each player

and a payoff model representing the reward for different

strategies combination of each player. A two-player, non-

zero-sum, non-cooperative game under complete information

is applied in this paper.

A. Players of Game

As shown in Fig 1, there are two players in this game: the

driver of the subject vehicle (V3, red) and the driver of the lag

vehicle (V1, blue), which is the closest following vehicle in

the target lane. The meaning of non-zero-sum is that the sum

of their payoffs in the game is not zero, because the payoffs

of the two players are independent. Non-cooperative game

means that players choose their strategies independently and

no communication is established. It is assumed that the two

players know the possible strategies of each other and the

states of the surrounding vehicles. In another word, this game

is a complete information game.

B. Strategies for Each Player

The subject vehicle has three pure strategies: (a) pass the

leading vehicle(V2, black) in the target lane; (b) yielded by

the lag vehicle; (c) merge into the target lane. The lag vehicle

has two corresponding strategies: cooperate or compete.

Table I shows the structure of the game. Pand Qdenote

the payoff of subject vehicle and lag vehicle, respectively.

Each player chooses one of the pure strategies to achieve

the goal of the game. The selection of optimum strategy

set has been a topic of interest since the introduction of

game theory. In order to ﬁnd the optimum strategies for

the drivers of the subject vehicle and the lag vehicle, the

TABLE I

STRUCTURE OF LANE CHANGE GAME

Strategy Lag Vehicle

b1(cooperate) b2(compete)

Subject Vehicle

a1(pass) (P11, Q11 ) (P12, Q12)

a2(yield) (P21, Q21 ) (P22, Q22)

a3(lane change) (P31, Q31 ) (P32, Q32)

Nash equilibrium is considered. The Nash equilibrium is a

solution concept of a non-cooperative game involving two or

more players in which each player is assumed to know the

equilibrium strategies of others, and no player has anything

to gain by changing only his or her own strategy. The Nash

equilibrium has two types: the pure strategies and the mixed

strategies. The players select the pure strategies if Nash

equilibrium of the pure strategies exists. It implies that one

player maximizes his or her own reward considering that the

opposite player also wants to maximize his or her reward. In

this game, the driver of the subject vehicle has three possible

strategies and the driver of the lag vehicle has two possible

strategies, respectively: S1=a1, a2, a3or S2=b1, b2. This

means that this game has six possible sets of strategies. The

Nash equilibrium of each pure strategy set is deﬁned as

follows:

(P(a∗, b∗)≥P(a, b∗)∀a∈ {a1, a2, a3}

Q(a∗, b∗)≥Q(a∗, b)∀b∈ {b1, b2}(1)

where P(ai, bj)and Q(ai, bj)are equivalent to Pij and Qij

in Table I, (a∗, b∗)represents the Nash equilibrium strategy

set for the drivers of the subject vehicle and the lag vehicle.

However, the Nash equilibrium of pure strategy set dose

not always exist and the mixed strategy set can be used

instead. A set of probabilities for each player’s strategies

are used to maximize his or her own payoff according to the

set of probabilities selected by the opposite player. Thus, the

Nash equilibrium of pure strategy is a special mixed strategy

where the probability of a strategy is 1 for each player. In

order to calculate the Nash equilibrium of mixed strategies,

the MATLAB function ”npg” developed by Chatterjee [13]

is used to solve this two-player non-cooperative game in this

paper.

C. Payoff Model

To the best of our knowledge, all the game theoretical

approaches of driving decision making only used the ﬁxed-

form payoff functions to formulate the reward of each

strategy set in past researches. One of the drawbacks of

this assumption is that the ﬁxed-form functions are unable

to describe the effects of each factors precisely. The neural

networks are universal function approximators and have been

demonstrated effective in various domains of researches [14].

A neural network consists of three parts in general: an input

layer, multiple hidden layers and an output layer. Vector-

valued inputs are fed into the input layer and are manipulated

1222

by a set of linear transformation and nonlinear activations as

they traverse the hidden layers to the output layer. In this

paper, the neural network is used to formulate the payoff

function. The interaction between the subject vehicle and

the lag vehicle is inﬂuenced not only by their own states,

but also by the states of the front vehicles in host lane and

the target lane. Therefore, the inputs of the neural network

are the states of these four vehicles and the outputs are the

payoff values of different strategy sets.

8 FF

tanh

8 FF

tanh

   

  

P P P P

Q Q Q Q

Input layer Hidden layers Output layer









 

 



i i

y x

v v

' '

idden

ayers

Fig. 2. Network structure of payoff model

The structure of the network used in this paper is shown

in Fig. 2. It consists of two hidden feed-forward layers with

8 units. The hyperbolic tangent function (tanh) is used as the

activation function of the hidden layers. The input layer xin

is denoted by:

xin = [∆v3,i,∆y3,i ,∆x3,i, v3], i = 1,2,4(2)

where ∆v3,i,∆y3,i and ∆x3,i are respectively the relative

speed, longitudinal gap and lateral gap between the subject

vehicle and the other three surrounding vehicles, and v3is

the speed of the subject vehicle.

In the lane keeping (pass or yield) scenarios, the lag

vehicle’s strategy is assumed not to affect the subject vehicle.

The payoffs of the subject vehicle or the lag vehicle remain

the same whatever strategy the lag vehicle selects in these

scenarios, i.e.

P11 =P12, Q11 =Q12 , P21 =P22, Q21 =Q22 (3)

The output layer xout is denoted by:

xout = [P11, Q11 , P21, Q21, P13 , Q13, P23, Q23 ](4)

After the neural network based payoff function is built, the

real trafﬁc data is then used to estimate the parameters in

this model and validated in the next section.

III. MOD EL CALIBRATION

In this paper, the data used to calibrate and evaluate the

proposed method is obtained from real trafﬁc data. The

detailed description of the dataset and the calibration method

are presented in this section.

A. Scenario Extraction

This paper uses the public datasets of individual vehicle

trajectories from NGSIM [15], a program funded by the

US Federal Highway Administration. These trajectory data

is thus far unique in the history of trafﬁc research and

provide a valuable basis for research into driving behavior

on structured roads. All the experiments are performed on

the I-80, showed in Fig. 3. The I80 dataset consists of three

15-minute periods: 4:00 p.m. to 4:15 p.m. (I-80-1), 5:00 p.m.

to 5:15 p.m. (I-80-2), and 5:15 p.m. to 5:30 p.m. (I-80-3).

These periods represent the buildup of congestion, or the

transition between uncongested and congested conditions,

and full congestion during the peak period.

Fig. 3. I-80 scenario [15]

The segmented scenarios have the following properties:

•In each scenario, the subject vehicle and the lag vehicle

remain the same.

•This work sets the relative longitudinal distance to

100 m, the relative lateral distance to 10 m and the

relative speed to 0 for any of the surrounding vehicles

that does not exist.

•A scenario ends when the subject vehicle crosses the

lane marker, passes V2, or yields to V1.

•A new scenario restarts immediately after an ending

of last scenario to ensure no gaps between driving

scenarios.

•The segmented scenarios last at least two seconds to

ensure a relatively complete lane change or lane keeping

behavior.

The summary of segmented sequences in the I-80 dataset

is shown in Table II. The average duration of each scenario

segmentation is about ﬁve seconds. The highly imbalanced

data, i.e., much higher proportion of pass or yield scenarios

than lane-changing scenarios, pose another signiﬁcant chal-

lenge to the behavior recognition. However, it is consistent

with daily driving.

There are two main difﬁcult problems in the data scenario

extraction. The ﬁrst problem is the deﬁnition of the exact

time at which a driver makes a decision (i.e., the time at

1223

TABLE II

SCENARIOS SEGMENTATIONS

Dataset (a1,−) (a2,−) (a3, b1) (a3, b2)

I-80-1 1759 2897 105 122

I-80-2 1873 3743 99 91

I-80-3 1964 3944 124 92

Total 5596 10584 328 305

which the driver turns on the turn signal light in lane change

scenario). In this work, it is assumed that the driver has made

the decision 3 seconds before the end of the scenarios. And

if a scenario’s duration is less than 3 seconds, the start time

of the scenario is regarded as the time of a driver making a

decision.

0246

t (s)

3,1

3,2

(a) Longitudinal distance

0246

t (s)

-4

-2

a (m/s2)

(b) Acceleration

0246

t (s)

3,1

3,2

0246

t (s)

-4

-2

a (m/s2)

(d) Acceleration

Fig. 4. Two indicators in different scenarios

The other problem is the classiﬁcation of the strategy

chosen by the lag vehicle in lane change scenarios. The

general method uses average acceleration as a threshold.

However, the acceleration changes frequently and has a large

disturbance of measurement. Kang [10] used the longitudinal

distance as the classiﬁcation standard for strategy of the lag

vehicle. In this work, we adopt this indicator to classify the

lag vehicle’s strategy. Examples of cooperate scenario and

compete scenario are given in Fig. 4. The example of a

cooperate scenario is given in Fig. 4(a) and Fig. 4(b), the

longitudinal distance increases obviously when the subject

vehicle changes the lane. The example of a compete scenario

is shown in Fig. 4(c) and Fig. 4(d) where the longitudinal

distance decreases when the subject vehicle changes the lane.

However, it is hard to ﬁnd a law of acceleration in different

scenarios.

B. Calibration Approach

The method of Gaussian particle swarm optimization

(GPSO) presented by Krohling [16] is used to estimate the

parameters of the payoff model in this paper. GPSO has a

strong ability of ﬁnding the global optimum, but its local

search ability is weak [17]. In order to solve this problem,

differential evolution (DE) is also used in our work. The

calibration method is given in Algorithm 1 in details.

Algorithm 1: Model calibration with DE-GPSO [17]

Input: the structure of payoff model, marked data of

different scenarios;

Output: the parameters of payoff model;

1for each particle i= 1,2, . . . , S do

2initialize the particle’s position: xi;

3initialize the particle’s best known position to its

initial position: li←xi;

4Initialize the swarm’s best known position: g;

5k=0;

6repeat

7for each particle i= 1,2, . . . , S do

8Update the particle’s position:

xi←xi+abs(α1)(li−xi) + abs(α2)(g−xi),

where α1and α2meet the Gaussian

distribution;

9for each training data n= 1,2, . . . , N do

10 Calculate the payoff values through the

network;

11 Calculate the Nash equilibrium:

(pS,n(1), pS,n (2), pS,n(3), pL,n(1), pL,n (2));

12 Calculate the cost function: J(xi);

13 if J(xi)< J(li)then

14 Update the particle’s best know position:

li←xi;

15 if J(xi)< J(g)then

16 Update the swarm’s best know position:

g←xi;

17 for each parameter of network j= 1,2, . . . , P do

18 Randomly select two particles xmand xn;

19 xtmp (j) = g(j)+(xm(j)−xn(j));

20 if J(xtmp)< J (g)then

21 Update the swarm’s best known position:

g←gtmp;

22 k=k+ 1;

23 until J < 0.1or k > 1000;

The One-hot encoding method is used to deﬁne the label

of different sets of strategies (e.g. the label of subject

vehicle in lane change scenarios is [0,0,1]). The error of the

probabilities between the label and the output of our model

is used as the optimization object which is denoted by:

J=1

n=1

(|tS,n −pS,n|+α|tL,n −pL,n |)(5)

α=(1,if the type of scenario is lane change

0,if the type of scenario is lane keeping (6)

1224

where nis the index of the scenarios, Nis the number of

scenarios in the data set, tS,n is the label of the subject

vehicle, pS,n is the possibility of the subject vehicle from

our model, tL,n is the label of the lag vehicle, pL,n is the

possibility of the subject vehicle from our model.

IV. MOD EL VALI DATIO N

In order to balance the number of different scenarios,

300 segmentations of each scenario type are selected in this

paper. The 80% of the data (240 segmentations of each type)

are used to calibrate the payoff function and the remaining

20% (60 segmentations of each type) are used to validate

the model. The experiment results of the proposed method

are listed in Table III. The prediction results of the pass and

yield scenarios are accurate and there are several false alarm

and missed detection in cut-in scenarios.

TABLE III

EXP ER IM EN T RE SU LTS

Label

(a1,−) (a2,−) (a3, b1) (a3, b2)

Prediction

(a1,−)55 2 3 8

(a2,−)0 53 7 3

(a3, b1)2 0 39 9

(a3, b2)3 5 11 40

In order to evaluate the proposed model accurately, four

quantitative metrics are used to evaluate the performance of

the proposed model:

•Accuracy (ACC) is the fraction of correctly classiﬁed

events out of all testing events. It is deﬁned as

ACC =TP +TN

TP +TN +FP +FN

where TP means true positive, TN means true negative,

FP means false positive (false alarm) and FN means

false negative (missed detection) .

•Precision (PRE) is the fraction of events classiﬁed

correctly out of all events predicted to be positive, i.e.

PRE =TP

TP +FP

•True Positive Rate (TPR),also named Recall, is the

fraction of events classiﬁed correctly out of all true

events, i.e.

TPR =TP

TP +FN

•F1 Score is the harmonic mean of the precision and the

recall, i.e.

F1= 2 ×PRE ×TPR

PRE +TPR

The overall ACC of the proposed model is 0.7792. In order

to evaluate the performance of the proposed model in each

type of scenarios, the prediction can be regarded as a binary

classiﬁcation by treating the other three types’ interaction

as one type. The results of the metrics for each type of

interaction are listed in Table IV.

TABLE IV

MODEL PERFORMANCE

(a1,−) (a2,−) (a3, b1) (a3, b2)

ACC 0.9250 0.9292 0.8667 0.8375

PRE 0.8088 0.8413 0.7800 0.6780

TPR 0.9167 0.8833 0.6500 0.6667

F10.8594 0.8618 0.7091 0.6723

The performance of the proposed method in lane keeping

scenarios is better than the performance in lane change

scenarios. In the lane keeping scenarios, the proposed method

has a good ability to classify the yield scenarios and pass

scenarios. In the lane change scenarios, the prediction perfor-

mance of the subject vehicle is better than the performance

of the lag vehicle. Compared with the subject vehicle, the

individual difference has more inﬂuence on the lag vehicle.

Therefore, it is more difﬁcult to predict the strategy of the

lag vehicle.

Different game theoretical driving decision making meth-

ods have different strategies of vehicles. Therefore, this paper

compares the performance of our method with that of the

method proposed by Talebpour [9] under the same strategies

of the subject vehicle (lane keeping and lane change). This

method is evaluated by the four metrics mentioned above,

which are listed in Table V.

TABLE V

RESULTS COMPARISON

ACC PRE TPR F1Score

Our method 0.8708 0.9083 0.8250 0.8995

Talebpour [9] 0.5789 0.6614 0.5793 0.6176

The comparison shows that the proposed method has

a better ability of decision making in highway scenarios.

An application example of the proposed method with a

continuous vehicle trajectory (subject vehicle ID: 820) is

given in Fig. 5. The start frames of each segmented scenarios

(also the end frames of the last scenarios) are shown in

Fig. 5(a). Fig. 5(b) and Fig. 5(c) give the strategies of subject

vehicle and lag vehicle, respectively. This example shows

that the proposed method can make a correct decision in

most of the highway scenarios.

V. CONCLUSION AND FUTURE WORK

This paper proposes a driving decision making method

in highway scenarios based on game theory and neural-

network. A two-player, non-zero-sum, non-cooperative game

under complete information is used to describe the inter-

action between two vehicles. The neural-network is used

to build the payoff model. Compared with the ﬁxed-rule

based payoff model, neural-network based payoff model can

describe the effects of each factor precisely and improve

the ability of decision making. The model is calibrated by

DE-GPSO method with the NGSIM dataset. Compared with

another method [9], the performance of our method has been

validated with multiple quantitative metrics.

1225

0 50 100 150 200 250 300 350 400 450 500

y (m)

x (m)

(a) The start frames of each scenarios

0 50 100 150 200 250 300 350 400 450 500

y (m)

None

Pass

Yield

Lane Change Ground truth

Proposed model

(b) Strategies of subject vehicle

0 50 100 150 200 250 300 350 400 450 500

y (m)

None

Cooperate

Compete Ground truth

Proposed model

Fig. 5. An example of the proposed lane-changing decision making method

In the future work, we want to extend the number of

players and the strategy sets to consider the both sides of

the subject vehicle and more complicated scenarios. In order

to evaluate the performance of the proposed method under

different trafﬁc condition, we need to validate the method

with the other data. Considered the development of self-

driving technology, the decision making process should be

regard as a cooperative game because the communication

between vehicles is allowed.

ACKNOWLEDGMENT

This work was supported in part by the National Natural

Science Foundation of China under Grant No. 61473209 and

No. 61773291.

REFERENCES

[1] S. Kammel, J. Ziegler, B. Pitzer, Werling et al., “Team annieway’s

autonomous system for the 2007 darpa urban challenge,” Journal of

Field Robotics, vol. 25, no. 9, pp. 615–639, 2008.

[2] C. R. Baker and J. M. Dolan, “Trafﬁc interaction in the urban chal-

lenge: Putting boss on its best behavior,” in IEEE/RSJ International

Conference on Intelligent Robots and Systems. IEEE, 2008, pp. 1752–

1758.

[3] S. Ulbrich and M. Maurer, “Probabilistic online pomdp decision

making for lane changes in fully automated driving,” in The 16th

International IEEE Conference on Intelligent Transportation Systems.

IEEE, 2013, pp. 2063–2067.

[4] A. G. Cunningham, E. Galceran, R. M. Eustice, and E. Olson, “Mpdm:

Multipolicy decision-making in dynamic, uncertain environments for

autonomous driving,” in IEEE International Conference on Robotics

and Automation. IEEE, 2015, pp. 1670–1677.

[5] C. Dong, J. M. Dolan, and B. Litkouhi, “Interactive ramp merging

planning in autonomous driving: Multi-merging leading pgm,” in IEEE

International Conference on Intelligent Transportation, October 2017,

pp. 2186–2191.

[6] ——, “Intention estimation for ramp merging control in autonomous

driving,” in 2017 IEEE Intelligent Vehicles Symposium, June 2017.

[7] C. Dong, Y. Zhang, and J. M. Dolan, “Lane-change social behavior

generator for autonomous driving car by non-parametric regression in

reproducing kernel hilbert space,” in IEEE International Conference

on Intelligent Transportation, September 2017, pp. 4489–4494.

[8] H. Kita, “A merging–giveway interaction model of cars in a merging

section: a game theoretic analysis,” Transportation Research Part A:

Policy and Practice, vol. 33, no. 3, pp. 305–312, 1999.

[9] A. Talebpour, H. S. Mahmassani, and S. H. Hamdar, “Modeling

lane-changing behavior in a connected environment: A game theory

approach,” Transportation Research Procedia, vol. 7, pp. 420–440,

2015.

[10] K. Kang and H. A. Rakha, “Game theoretical approach to model

decision making for merging maneuvers at freeway on-ramps,” Trans-

portation Research Record: Journal of the Transportation Research

Board, no. 2623, pp. 19–28, 2017.

[11] Y. Zhang, Q. Lin, J. Wang, and S. Verwer, “Car-following behavior

model learning using timed automata,” IFAC-PapersOnLine, vol. 50,

no. 1, pp. 2353–2358, 2017.

[12] C. Vallon, Z. Ercan, A. Carvalho, and F. Borrelli, “A machine learning

approach for personalized autonomous lane change initiation and

control,” in Intelligent Vehicles Symposium. IEEE, 2017, pp. 1590–

1595.

[13] B. Chatterjee, “An optimization formulation to compute nash equilib-

rium in ﬁnite games,” in Proceeding of International Conference on

Methods and Models in Computer Science. IEEE, 2009, pp. 1–5.

[14] J. Morton, T. A. Wheeler, and M. J. Kochenderfer, “Analysis of recur-

rent neural networks for probabilistic modeling of driver behavior,”

IEEE Transactions on Intelligent Transportation Systems, vol. 18,

no. 5, pp. 1289–1298, 2017.

[15] NGSIM, “U.S. Department of Transportation, NGSIM - Next genera-

tion simulation,” http://www.ngsim.fhwa.dot.gov, 2007.

[16] R. A. Krohling, “Gaussian swarm: a novel particle swarm optimiza-

tion algorithm,” in IEEE Conference on Cybernetics and Intelligent

Systems, vol. 1. IEEE, 2004, pp. 372–376.

[17] C. Wan, J. Wang, G. Yang, H. Gu, and X. Zhang, “Wind farm

micro-siting by gaussian particle swarm optimization with local search

strategy,” Renewable Energy, vol. 48, no. 6, pp. 276–286, 2012.

1226

Modeling Human Driver Interactions Using an Infinite Policy Space Through Gaussian Processes

Preprint

Jan 2022

This paper proposes a method for modeling human driver interactions that relies on multi-output gaussian processes. The proposed method is developed as a refinement of the game theoretical hierarchical reasoning approach called "level-k reasoning" which conventionally assigns discrete levels of behaviors to agents. Although it is shown to be an effective modeling tool, the level-k reasoning approach may pose undesired constraints for predicting human decision making due to a limited number (usually 2 or 3) of driver policies it extracts. The proposed approach is put forward to fill this gap in the literature by introducing a continuous domain framework that enables an infinite policy space. By using the approach presented in this paper, more accurate driver models can be obtained, which can then be employed for creating high fidelity simulation platforms for the validation of autonomous vehicle control algorithms. The proposed method is validated on a real traffic dataset and compared with the conventional level-k approach to demonstrate its contributions and implications.

The Use of Game Theory for Autonomous Systems Safety: An Overview

Conference Paper

Full-text available

Sep 2021

The increasing use of autonomous systems (AS) aims to improve efficiency, costs, and safety of numerous operations. Yet, they also pose several safety challenges. Most of AS will operate in a dynamic environment, interacting with non-autonomous and/or other autonomous systems. The anticipation of both the AS and non-AS possible decisions during these interactions is crucial to identify and analyze potential hazards and risks, and to guarantee a safe operation. Game Theory (GT) has been increasingly used for modeling the interactions between AS and other agents in conflicting or cooperating situations. Recent applications of GT for AS also include the use of game-theoretical approaches for algorithm-testing and development, as well as for cyber-physical security assessment. Yet, the application of GT for analysis of AS operations under a risk perspective can still be considered in an early stage. This paper provides an overview of how GT is being applied to AS in the context of risk assessment. A review of the recent literature on GT applied to AS was carried out on the Scopus database using a combination of relevant keywords. It resulted in 100 articles within the period of 2015-2021. The articles were analyzed with regard to the technical domain of application and the scope of use of GT.

INTELLIGENT CONTROL SYSTEMS LEARNING, INTERPRETING, VERIFICATION

Thesis

Full-text available

Aug 2019

Qin Lin

Automatic control is a technique about designing control devices for controlling ma- chinery processes without human intervention. However, devising controllers using conventional control theory requires first principle design on the basis of the full under- standing of the environment and the plant, which is infeasible for complex control tasks such as driving in highly uncertain traffic environment. Intelligent control offers new op- portunities about deriving the control policy of human beings by mimicking our control behaviors from demonstrations. In this thesis, we focus on intelligent control techniques from two aspects: (1) how to learn control policy from supervisors with the available demonstration data; (2) how to verify the controller learned from data will safely control the process.

An LSTM-based Game Theory Method for Multi-Agent DecisionMaking in Highway Scenarios

Conference Paper

May 2023

Game theory in intelligent traffic system on the highway using OMOPSO -an MOEA framework

Article

Full-text available

Dec 2022
J INTELL TRANSPORT S

Tran Thi Huong

Regulating traffic flow on highways presents a challenging matter as the movement interaction between vehicles can gain traveling advantage to some while causing unfavorable effects to many others, which leads to the imbalance between individual benefits and the mutual interest of the highway network. To solve this problem, an Intelligence system solution is being directed to provide communication and guidance among vehicles aiming to reduce arbitrariness in travel. For this purpose, a Game theory model approach was used for which the problem became a symmetric game, drivers are the players and their strategies depend on the other actions. This game produced a Nash equilibrium, which is the optimal course of action all players can follow needlessly for any further changes, thus establishing a predictable and stable traffic network and reducing average travel time to minimum. The experimental calculation was conducted by using OMOPSO-a multi-objective particle swarm optimizer algorithm to calculate the game payoff which is the evaluation of an optimal speed strategy. The result has made a contribution to the development of such an Intelligence system for tackling traffic problems in big cities' highways.

Predicting Lane Change Decision Making with Compact Support

Conference Paper

Jul 2021

Interactive Trajectory Prediction for Autonomous Driving via Recurrent Meta Induction Neural Network

Conference Paper

May 2019

Car-following Behavior Model Learning Using Timed Automata

Article

Full-text available

Jul 2017

Learning driving behavior is fundamental for autonomous vehicles to “understand” traffic situations. This paper proposes a novel method for learning a behavioral model of car-following using automata learning algorithms. The model is interpretable for car-following behavior analysis. Frequent common state sequences are extracted from the model and clustered as driving patterns. The Next Generation SIMulation dataset on the I-80 highway is used for learning and evaluating. The experimental results demonstrate high accuracy of car-following model fitting.

Lane-Change Social Behavior Generator for Autonomous Driving Car by Non-parametric Regression in Reproducing Kernel Hilbert Space

Conference Paper

Full-text available

Sep 2017

Nowadays, self-driving cars are being applied to more complex urban scenarios including intersections, merging ramps or lane changes. It is, therefore, important for self-driving cars to behave socially with human-driven cars. In this paper, we focus on generating the lane change behavior for self-driving cars: perform a safe and effective lane change behavior once a lane-change command is received. Our method bridges the gap between higher-level behavior commands and the trajectory planner. There are two challenges in the task: 1) Analyzing the surrounding vehicles' mutual effects from their trajectories. 2) Estimating the proper lane change start point and end point according to the analysis of surrounding vehicles. We propose a learning-based approach to understand surrounding traffic and make decisions for a safe lane change. Our contributions and advantages of the approach are: 1 Considers the behavior generator as a continuous function in Reproducing Kernel Hilbert Space (RKHS) which contains a family of behavior generators; 2 Constructs the behavior generator function in RKHS by non-parametric regressions on training data; 3 Takes past trajectories of all related surrounding cars as input to capture mutual interactions and output continuous values to represent behaviors. Experimental results show that the proposed approach is able to generate feasible and human-like lane-change behavior (repre-sented by start and end points) in multi-car environments. The experiments also verified that our suggested kernel outperforms the ones which were used in a previous method.

Interactive ramp merging planning in autonomous driving: Multi-merging leading PGM (MML-PGM)

Conference Paper

Oct 2017

Game Theoretical Approach to Model Decision Making for Merging Maneuvers at Freeway On-Ramps

Article

Jan 2017

Drivers of merging vehicles decide when to merge by considering surrounding vehicles in adjacent lanes in their deliberation process. Conflicts between drivers of the subject vehicles (i.e., merging vehicles) in an auxiliary lane and lag vehicles in the adjacent lane are typical near freeway on-ramps. This paper models a decision-making process for merging maneuvers that uses a game theoretical approach. The proposed model is based on the noncooperative decision making of two players, that is, drivers of the subject and lag vehicles, without consideration of advanced communication technologies. In the decision-making process, the drivers of the subject vehicles elect to accept gaps, and drivers of lag vehicles either yield or block the action of the subject vehicle. Corresponding payoff functions for two players were formulated to describe their respective maneuvers. To estimate model parameters, a bi-level optimization approach was used. The next generation simulation data set was used for model calibration and validation. The data set defined the moment the game started and was modeled as a continuous sequence of games until a decision is made. The defined merging decision-making model was then validated with an independent data set. The validation results reveal that the proposed model provides considerable prediction accuracy with correct predictions 84% of the time.

A machine learning approach for personalized autonomous lane change initiation and control

Conference Paper

Jun 2017

Analysis of Recurrent Neural Networks for Probabilistic Modeling of Driver Behavior

Article

Sep 2016

The validity of any traffic simulation model depends on its ability to generate representative driver acceleration profiles. This paper studies the effectiveness of recurrent neural networks in predicting the acceleration distributions for car following on highways. The long short-term memory recurrent networks are trained and used to propagate the simulated vehicle trajectories over 10-s horizons. On the basis of several performance metrics, the recurrent networks are shown to generally match or outperform baseline methods in replicating driver behavior, including smoothness and oscillatory characteristics present in real trajectories. This paper reveals that the strong performance is due to the ability of the recurrent network to identify recent trends in the ego-vehicle's state, and recurrent networks are shown to perform as, well as feedforward networks with longer histories as inputs.

MPDM: Multipolicy decision-making in dynamic, uncertain environments for autonomous driving

Article

Jun 2015

Real-world autonomous driving in city traffic must cope with dynamic environments including other agents with uncertain intentions. This poses a challenging decision-making problem, e.g., deciding when to perform a passing maneuver or how to safely merge into traffic. Previous work in the literature has typically approached the problem using ad-hoc solutions that do not consider the possible future states of other agents, and thus have difficulty scaling to complex traffic scenarios where the actions of participating agents are tightly conditioned on one another. In this paper we present multipolicy decision-making (MPDM), a decision-making algorithm that exploits knowledge from the autonomous driving domain to make decisions online for an autonomous vehicle navigating in traffic. By assuming the controlled vehicle and other traffic participants execute a policy from a set of plausible closed-loop policies at every timestep, the algorithm selects the best available policy for the controlled vehicle to execute. We perform policy election using forward simulation of both the controlled vehicle and other agents, efficiently sampling from the high-likelihood outcomes of their interactions. We then score the resulting outcomes using a user-defined cost function to accommodate different driving preferences, and select the policy with the highest score. We demonstrate the algorithm on a real-world autonomous vehicle performing passing maneuvers and in a simulated merging scenario.

Modeling Lane-Changing Behavior in a Connected Environment: A Game Theory Approach

Article

Oct 2015
TRANSPORT RES C-EMER

Probabilistic online POMDP decision making for lane changes in fully automated driving

Conference Paper

Oct 2013

The Stadtpilot project aims at fully automated driving on Braunschweig's inner city ring road. The TU Braunschweig's research vehicle “Leonie” is one of the first vehicles having the ability of fully automated driving in real urban traffic scenarios. This paper shows our decision making approach for performing lane changes while driving fully automated in urban environments. We apply an online Partially Observable Markov Decision Process (POMDP) to accommodate inevitable sensor noise to be faced in urban traffic scenarios. In this paper we propose a two step algorithm to keep the complexity of the POMDP low enough for real-time decision making while driving. The presented approach has been integrated in our vehicle and was evaluated in real urban traffic.

Wind farm micro-siting by Gaussian particle swarm optimization with local search strategy

Article

Dec 2012
RENEW ENERG

A Game-Theoretical Approach to Driving Decision Making in Highway Scenarios

Figures

Recommended publications

A Foundational Criticism of the Base of the Pyramid Model

Effects of age and target location uncertainty on decision making in a simulated driving task

Understanding and Reducing Inconsistency in Seatbelt-Use Decisions: Findings from a Cardinal Decisio...

Applying Visualization and Collective Intelligence for Rapid Group Decision Making