Conference PaperPDF Available

A Game-Theoretical Approach to Driving Decision Making in Highway Scenarios

Authors:
  • Tsinghua University Automotive Research Institute, Suzhou, China

Figures

Content may be subject to copyright.
A Game-Theoretical Approach to Driving Decision Making in Highway
Scenarios
Zhihai Yan, Jun Wang, Yihuan Zhang
Abstract With the development of self-driving technology,
the fundamental behaviors like car-following, lane change
have been validated and tested in various kinds of scenarios.
Currently one of the most challenging domain for self-driving
is decision making under dynamic environments. For self-
driving cars, it is essential to understand and estimate other
vehicles’ behavior and behave like a human driver to interact
with other vehicles in the mean time. In this paper, a game
theoretical approach is proposed to model the interaction of
vehicles while considering the surrounding traffic situations.
One of the novel move is that a neural network is applied
to establish the payoff function in the game which is able to
describe the interaction more precisely. The calibration method
is then applied to estimate the parameters by using the Next
Generation SIMulation (NGSIM) dataset. The experiments
demonstrate the accuracy of the proposed method and the
ability of making a cooperate decision in highway scenarios.
I. INTRODUCTION
Recently, self-driving technology has been widely applied
in the transportation and military. Highway is an important
component of transportation system. However, the proper-
ties of long distance, high speed, and traffic congestion
make highway driving is stressful and dangerous. Thus, the
decision making in highway scenarios is one of the most
challenging problem for self-driving cars.
The most common decision making methods are tackled
by manual-defined rules corresponding to specific situations.
A variety of solutions including finite state machines [1] and
hierarchical state machines [2] are used to make decisions for
self-driving cars. However, the methods of manual-defined
rules are tailored for specific and simplified traffic scenarios
without considering the uncertainty of drivers.
Partially Observable Markov Decision Process (POMDP)
provides a mathematically framework of the decision making
problem in dynamic, uncertain scenarios such as highway
driving. Ulbrich [3] applied a POMDP to solve the decision
making problem of self-driving cars. However, the method
was time-consuming and can hardly applied in real-time. In
order to solve this problem, Edwin [4] proposed an improved
approach of POMDP named multi-policy decision making,
which used a set of possible high-level policies to replace
the continuous action space. However, these methods do not
put the subject vehicle and the other vehicles in the same
level when considering the dynamic interactions. They focus
Zhihai Yan, Jun Wang and Yihuan Zhang are with the Department of
Control Science and Engineering, Tongji University, Shanghai 201804, P.
R. China.
Corresponding author junwang@tongji.edu.cn
Fig. 1. Highway driving scenarios.
on the decision of the subject vehicle while ignoring the
decisions of other vehicles.
Other probabilistic methods were proposed to deal with
the interactive merging scenarios [5], [6]. In addition, a
social behavior generator was proposed to generate the lane
change trajectory under interactions with the surrounding
vehicles [7]. In order to model a more accurate interaction
model between different drivers, the game theory is applied
in this paper. Game theory constructs mathematical models
to deal with the conflict and cooperation between decision
makers. The drivers’ decisions converge to the equilibrium
of a set of strategies to maximize their rewards together in an
independent game. Kita [8] first proposed a game theoretical
model to describe the on-ramp merging behavior by using
a discrete choice model. Talebpour [9] defined a two-type
game of lane change behaviors according to the safety and
speed gain for the drivers. Recently, Hesham [10] defined
a model of merging maneuvers at freeway on-ramps. This
model defined three strategies (change, wait, overtake) for the
subject vehicle and two strategies (yield, block) for the lag
vehicle. Based on the game theory, these methods considered
the interactions among drivers and make a cooperate decision
for each driver in the game. However, the most difficult
part of the game theoretical methods is the formulation of
the payoff function of the strategies for each game-player,
because it is hardly to determine the effects of different
2018 IEEE Intelligent Vehicles Symposium (IV)
Changshu, Suzhou, China, June 26-30, 2018
978-1-5386-4452-2/18/$31.00 ©2018 IEEE 1221
factors to each player.
Many researches have demonstrated the effectiveness of
artificial intelligence methods in car-following scenarios [11]
and lane change scenarios [12]. Inspired by these work, a
game theoretical based framework is proposed in this paper
to model the interaction between human drivers. A two-
person non-zero-sum non-cooperative game under complete
information is presented to model the decision making be-
havior in highway driving. Moreover, an improved Gaussian
Particle Swarm Optimization (GPSO) method is applied to
calibrate the model. The main contribution of this paper is
that we have built a neural-network based payoff model to
describe the rewards of each driver and calibrate the model
using real traffic data.
The remained paper is organized as follows. The frame-
work of the proposed decision making method in highway
driving is detailed in Section II. The traffic scenario extrac-
tion and model calibration are described in Section III. Ex-
periments on model validation are carried out in Section IV.
Conclusions and future works are presented in Section V.
II. PROP OSE D METHOD
The interaction between the subject vehicle and the lag
vehicle can be seen as a game, as shown in Fig. 1. The
drivers of the subject vehicle and the lag vehicle choose their
best strategy through the game. Generally, a game has three
important factors: game players, strategies for each player
and a payoff model representing the reward for different
strategies combination of each player. A two-player, non-
zero-sum, non-cooperative game under complete information
is applied in this paper.
A. Players of Game
As shown in Fig 1, there are two players in this game: the
driver of the subject vehicle (V3, red) and the driver of the lag
vehicle (V1, blue), which is the closest following vehicle in
the target lane. The meaning of non-zero-sum is that the sum
of their payoffs in the game is not zero, because the payoffs
of the two players are independent. Non-cooperative game
means that players choose their strategies independently and
no communication is established. It is assumed that the two
players know the possible strategies of each other and the
states of the surrounding vehicles. In another word, this game
is a complete information game.
B. Strategies for Each Player
The subject vehicle has three pure strategies: (a) pass the
leading vehicle(V2, black) in the target lane; (b) yielded by
the lag vehicle; (c) merge into the target lane. The lag vehicle
has two corresponding strategies: cooperate or compete.
Table I shows the structure of the game. Pand Qdenote
the payoff of subject vehicle and lag vehicle, respectively.
Each player chooses one of the pure strategies to achieve
the goal of the game. The selection of optimum strategy
set has been a topic of interest since the introduction of
game theory. In order to find the optimum strategies for
the drivers of the subject vehicle and the lag vehicle, the
TABLE I
STRUCTURE OF LANE CHANGE GAME
Strategy Lag Vehicle
b1(cooperate) b2(compete)
Subject Vehicle
a1(pass) (P11, Q11 ) (P12, Q12)
a2(yield) (P21, Q21 ) (P22, Q22)
a3(lane change) (P31, Q31 ) (P32, Q32)
Nash equilibrium is considered. The Nash equilibrium is a
solution concept of a non-cooperative game involving two or
more players in which each player is assumed to know the
equilibrium strategies of others, and no player has anything
to gain by changing only his or her own strategy. The Nash
equilibrium has two types: the pure strategies and the mixed
strategies. The players select the pure strategies if Nash
equilibrium of the pure strategies exists. It implies that one
player maximizes his or her own reward considering that the
opposite player also wants to maximize his or her reward. In
this game, the driver of the subject vehicle has three possible
strategies and the driver of the lag vehicle has two possible
strategies, respectively: S1=a1, a2, a3or S2=b1, b2. This
means that this game has six possible sets of strategies. The
Nash equilibrium of each pure strategy set is defined as
follows:
(P(a, b)P(a, b)a∈ {a1, a2, a3}
Q(a, b)Q(a, b)b∈ {b1, b2}(1)
where P(ai, bj)and Q(ai, bj)are equivalent to Pij and Qij
in Table I, (a, b)represents the Nash equilibrium strategy
set for the drivers of the subject vehicle and the lag vehicle.
However, the Nash equilibrium of pure strategy set dose
not always exist and the mixed strategy set can be used
instead. A set of probabilities for each player’s strategies
are used to maximize his or her own payoff according to the
set of probabilities selected by the opposite player. Thus, the
Nash equilibrium of pure strategy is a special mixed strategy
where the probability of a strategy is 1 for each player. In
order to calculate the Nash equilibrium of mixed strategies,
the MATLAB function ”npg” developed by Chatterjee [13]
is used to solve this two-player non-cooperative game in this
paper.
C. Payoff Model
To the best of our knowledge, all the game theoretical
approaches of driving decision making only used the fixed-
form payoff functions to formulate the reward of each
strategy set in past researches. One of the drawbacks of
this assumption is that the fixed-form functions are unable
to describe the effects of each factors precisely. The neural
networks are universal function approximators and have been
demonstrated effective in various domains of researches [14].
A neural network consists of three parts in general: an input
layer, multiple hidden layers and an output layer. Vector-
valued inputs are fed into the input layer and are manipulated
1222
by a set of linear transformation and nonlinear activations as
they traverse the hidden layers to the output layer. In this
paper, the neural network is used to formulate the payoff
function. The interaction between the subject vehicle and
the lag vehicle is influenced not only by their own states,
but also by the states of the front vehicles in host lane and
the target lane. Therefore, the inputs of the neural network
are the states of these four vehicles and the outputs are the
payoff values of different strategy sets.
8 FF
tanh
8 FF
tanh
   
   
  
  
P P P P
Q Q Q Q
Input layer Hidden layers Output layer
V
V
V
V
 
 
i i
i
y x
v v
' '
'
F
F
t
a
n
h
8
F
F
t
a
n
h
H
idden
l
ayers
Fig. 2. Network structure of payoff model
The structure of the network used in this paper is shown
in Fig. 2. It consists of two hidden feed-forward layers with
8 units. The hyperbolic tangent function (tanh) is used as the
activation function of the hidden layers. The input layer xin
is denoted by:
xin = [∆v3,i,y3,i ,x3,i, v3], i = 1,2,4(2)
where v3,i,y3,i and x3,i are respectively the relative
speed, longitudinal gap and lateral gap between the subject
vehicle and the other three surrounding vehicles, and v3is
the speed of the subject vehicle.
In the lane keeping (pass or yield) scenarios, the lag
vehicle’s strategy is assumed not to affect the subject vehicle.
The payoffs of the subject vehicle or the lag vehicle remain
the same whatever strategy the lag vehicle selects in these
scenarios, i.e.
P11 =P12, Q11 =Q12 , P21 =P22, Q21 =Q22 (3)
The output layer xout is denoted by:
xout = [P11, Q11 , P21, Q21, P13 , Q13, P23, Q23 ](4)
After the neural network based payoff function is built, the
real traffic data is then used to estimate the parameters in
this model and validated in the next section.
III. MOD EL CALIBRATION
In this paper, the data used to calibrate and evaluate the
proposed method is obtained from real traffic data. The
detailed description of the dataset and the calibration method
are presented in this section.
A. Scenario Extraction
This paper uses the public datasets of individual vehicle
trajectories from NGSIM [15], a program funded by the
US Federal Highway Administration. These trajectory data
is thus far unique in the history of traffic research and
provide a valuable basis for research into driving behavior
on structured roads. All the experiments are performed on
the I-80, showed in Fig. 3. The I80 dataset consists of three
15-minute periods: 4:00 p.m. to 4:15 p.m. (I-80-1), 5:00 p.m.
to 5:15 p.m. (I-80-2), and 5:15 p.m. to 5:30 p.m. (I-80-3).
These periods represent the buildup of congestion, or the
transition between uncongested and congested conditions,
and full congestion during the peak period.
Fig. 3. I-80 scenario [15]
The segmented scenarios have the following properties:
In each scenario, the subject vehicle and the lag vehicle
remain the same.
This work sets the relative longitudinal distance to
100 m, the relative lateral distance to 10 m and the
relative speed to 0 for any of the surrounding vehicles
that does not exist.
A scenario ends when the subject vehicle crosses the
lane marker, passes V2, or yields to V1.
A new scenario restarts immediately after an ending
of last scenario to ensure no gaps between driving
scenarios.
The segmented scenarios last at least two seconds to
ensure a relatively complete lane change or lane keeping
behavior.
The summary of segmented sequences in the I-80 dataset
is shown in Table II. The average duration of each scenario
segmentation is about five seconds. The highly imbalanced
data, i.e., much higher proportion of pass or yield scenarios
than lane-changing scenarios, pose another significant chal-
lenge to the behavior recognition. However, it is consistent
with daily driving.
There are two main difficult problems in the data scenario
extraction. The first problem is the definition of the exact
time at which a driver makes a decision (i.e., the time at
1223
TABLE II
SCENARIOS SEGMENTATIONS
Dataset (a1,) (a2,) (a3, b1) (a3, b2)
I-80-1 1759 2897 105 122
I-80-2 1873 3743 99 91
I-80-3 1964 3944 124 92
Total 5596 10584 328 305
which the driver turns on the turn signal light in lane change
scenario). In this work, it is assumed that the driver has made
the decision 3 seconds before the end of the scenarios. And
if a scenario’s duration is less than 3 seconds, the start time
of the scenario is regarded as the time of a driver making a
decision.
0246
t (s)
0
10
20
30
40
3,1
3,2
(a) Longitudinal distance
0246
t (s)
-4
-2
0
2
4
a (m/s2)
V1
V3
(b) Acceleration
0246
t (s)
0
10
20
30
40
3,1
3,2
(c) Longitudinal distance
0246
t (s)
-4
-2
0
2
4
a (m/s2)
V1
V3
(d) Acceleration
Fig. 4. Two indicators in different scenarios
The other problem is the classification of the strategy
chosen by the lag vehicle in lane change scenarios. The
general method uses average acceleration as a threshold.
However, the acceleration changes frequently and has a large
disturbance of measurement. Kang [10] used the longitudinal
distance as the classification standard for strategy of the lag
vehicle. In this work, we adopt this indicator to classify the
lag vehicle’s strategy. Examples of cooperate scenario and
compete scenario are given in Fig. 4. The example of a
cooperate scenario is given in Fig. 4(a) and Fig. 4(b), the
longitudinal distance increases obviously when the subject
vehicle changes the lane. The example of a compete scenario
is shown in Fig. 4(c) and Fig. 4(d) where the longitudinal
distance decreases when the subject vehicle changes the lane.
However, it is hard to find a law of acceleration in different
scenarios.
B. Calibration Approach
The method of Gaussian particle swarm optimization
(GPSO) presented by Krohling [16] is used to estimate the
parameters of the payoff model in this paper. GPSO has a
strong ability of finding the global optimum, but its local
search ability is weak [17]. In order to solve this problem,
differential evolution (DE) is also used in our work. The
calibration method is given in Algorithm 1 in details.
Algorithm 1: Model calibration with DE-GPSO [17]
Input: the structure of payoff model, marked data of
different scenarios;
Output: the parameters of payoff model;
1for each particle i= 1,2, . . . , S do
2initialize the particle’s position: xi;
3initialize the particle’s best known position to its
initial position: lixi;
4Initialize the swarm’s best known position: g;
5k=0;
6repeat
7for each particle i= 1,2, . . . , S do
8Update the particle’s position:
xixi+abs(α1)(lixi) + abs(α2)(gxi),
where α1and α2meet the Gaussian
distribution;
9for each training data n= 1,2, . . . , N do
10 Calculate the payoff values through the
network;
11 Calculate the Nash equilibrium:
(pS,n(1), pS,n (2), pS,n(3), pL,n(1), pL,n (2));
12 Calculate the cost function: J(xi);
13 if J(xi)< J(li)then
14 Update the particle’s best know position:
lixi;
15 if J(xi)< J(g)then
16 Update the swarm’s best know position:
gxi;
17 for each parameter of network j= 1,2, . . . , P do
18 Randomly select two particles xmand xn;
19 xtmp (j) = g(j)+(xm(j)xn(j));
20 if J(xtmp)< J (g)then
21 Update the swarm’s best known position:
ggtmp;
22 k=k+ 1;
23 until J < 0.1or k > 1000;
The One-hot encoding method is used to define the label
of different sets of strategies (e.g. the label of subject
vehicle in lane change scenarios is [0,0,1]). The error of the
probabilities between the label and the output of our model
is used as the optimization object which is denoted by:
J=1
N
N
X
n=1
(|tS,n pS,n|+α|tL,n pL,n |)(5)
α=(1,if the type of scenario is lane change
0,if the type of scenario is lane keeping (6)
1224
where nis the index of the scenarios, Nis the number of
scenarios in the data set, tS,n is the label of the subject
vehicle, pS,n is the possibility of the subject vehicle from
our model, tL,n is the label of the lag vehicle, pL,n is the
possibility of the subject vehicle from our model.
IV. MOD EL VALI DATIO N
In order to balance the number of different scenarios,
300 segmentations of each scenario type are selected in this
paper. The 80% of the data (240 segmentations of each type)
are used to calibrate the payoff function and the remaining
20% (60 segmentations of each type) are used to validate
the model. The experiment results of the proposed method
are listed in Table III. The prediction results of the pass and
yield scenarios are accurate and there are several false alarm
and missed detection in cut-in scenarios.
TABLE III
EXP ER IM EN T RE SU LTS
Label
(a1,) (a2,) (a3, b1) (a3, b2)
Prediction
(a1,)55 2 3 8
(a2,)0 53 7 3
(a3, b1)2 0 39 9
(a3, b2)3 5 11 40
In order to evaluate the proposed model accurately, four
quantitative metrics are used to evaluate the performance of
the proposed model:
Accuracy (ACC) is the fraction of correctly classified
events out of all testing events. It is defined as
ACC =TP +TN
TP +TN +FP +FN
where TP means true positive, TN means true negative,
FP means false positive (false alarm) and FN means
false negative (missed detection) .
Precision (PRE) is the fraction of events classified
correctly out of all events predicted to be positive, i.e.
PRE =TP
TP +FP
True Positive Rate (TPR),also named Recall, is the
fraction of events classified correctly out of all true
events, i.e.
TPR =TP
TP +FN
F1 Score is the harmonic mean of the precision and the
recall, i.e.
F1= 2 ×PRE ×TPR
PRE +TPR
The overall ACC of the proposed model is 0.7792. In order
to evaluate the performance of the proposed model in each
type of scenarios, the prediction can be regarded as a binary
classification by treating the other three types’ interaction
as one type. The results of the metrics for each type of
interaction are listed in Table IV.
TABLE IV
MODEL PERFORMANCE
(a1,) (a2,) (a3, b1) (a3, b2)
ACC 0.9250 0.9292 0.8667 0.8375
PRE 0.8088 0.8413 0.7800 0.6780
TPR 0.9167 0.8833 0.6500 0.6667
F10.8594 0.8618 0.7091 0.6723
The performance of the proposed method in lane keeping
scenarios is better than the performance in lane change
scenarios. In the lane keeping scenarios, the proposed method
has a good ability to classify the yield scenarios and pass
scenarios. In the lane change scenarios, the prediction perfor-
mance of the subject vehicle is better than the performance
of the lag vehicle. Compared with the subject vehicle, the
individual difference has more influence on the lag vehicle.
Therefore, it is more difficult to predict the strategy of the
lag vehicle.
Different game theoretical driving decision making meth-
ods have different strategies of vehicles. Therefore, this paper
compares the performance of our method with that of the
method proposed by Talebpour [9] under the same strategies
of the subject vehicle (lane keeping and lane change). This
method is evaluated by the four metrics mentioned above,
which are listed in Table V.
TABLE V
RESULTS COMPARISON
ACC PRE TPR F1Score
Our method 0.8708 0.9083 0.8250 0.8995
Talebpour [9] 0.5789 0.6614 0.5793 0.6176
The comparison shows that the proposed method has
a better ability of decision making in highway scenarios.
An application example of the proposed method with a
continuous vehicle trajectory (subject vehicle ID: 820) is
given in Fig. 5. The start frames of each segmented scenarios
(also the end frames of the last scenarios) are shown in
Fig. 5(a). Fig. 5(b) and Fig. 5(c) give the strategies of subject
vehicle and lag vehicle, respectively. This example shows
that the proposed method can make a correct decision in
most of the highway scenarios.
V. CONCLUSION AND FUTURE WORK
This paper proposes a driving decision making method
in highway scenarios based on game theory and neural-
network. A two-player, non-zero-sum, non-cooperative game
under complete information is used to describe the inter-
action between two vehicles. The neural-network is used
to build the payoff model. Compared with the fixed-rule
based payoff model, neural-network based payoff model can
describe the effects of each factor precisely and improve
the ability of decision making. The model is calibrated by
DE-GPSO method with the NGSIM dataset. Compared with
another method [9], the performance of our method has been
validated with multiple quantitative metrics.
1225
0 50 100 150 200 250 300 350 400 450 500
y (m)
10
20
x (m)
(a) The start frames of each scenarios
0 50 100 150 200 250 300 350 400 450 500
y (m)
None
Pass
Yield
Lane Change Ground truth
Proposed model
(b) Strategies of subject vehicle
0 50 100 150 200 250 300 350 400 450 500
y (m)
None
Cooperate
Compete Ground truth
Proposed model
(c) Strategies of lag vehicle
Fig. 5. An example of the proposed lane-changing decision making method
In the future work, we want to extend the number of
players and the strategy sets to consider the both sides of
the subject vehicle and more complicated scenarios. In order
to evaluate the performance of the proposed method under
different traffic condition, we need to validate the method
with the other data. Considered the development of self-
driving technology, the decision making process should be
regard as a cooperative game because the communication
between vehicles is allowed.
ACKNOWLEDGMENT
This work was supported in part by the National Natural
Science Foundation of China under Grant No. 61473209 and
No. 61773291.
REFERENCES
[1] S. Kammel, J. Ziegler, B. Pitzer, Werling et al., “Team annieway’s
autonomous system for the 2007 darpa urban challenge,” Journal of
Field Robotics, vol. 25, no. 9, pp. 615–639, 2008.
[2] C. R. Baker and J. M. Dolan, “Traffic interaction in the urban chal-
lenge: Putting boss on its best behavior,” in IEEE/RSJ International
Conference on Intelligent Robots and Systems. IEEE, 2008, pp. 1752–
1758.
[3] S. Ulbrich and M. Maurer, “Probabilistic online pomdp decision
making for lane changes in fully automated driving,” in The 16th
International IEEE Conference on Intelligent Transportation Systems.
IEEE, 2013, pp. 2063–2067.
[4] A. G. Cunningham, E. Galceran, R. M. Eustice, and E. Olson, “Mpdm:
Multipolicy decision-making in dynamic, uncertain environments for
autonomous driving,” in IEEE International Conference on Robotics
and Automation. IEEE, 2015, pp. 1670–1677.
[5] C. Dong, J. M. Dolan, and B. Litkouhi, “Interactive ramp merging
planning in autonomous driving: Multi-merging leading pgm,” in IEEE
International Conference on Intelligent Transportation, October 2017,
pp. 2186–2191.
[6] ——, “Intention estimation for ramp merging control in autonomous
driving,” in 2017 IEEE Intelligent Vehicles Symposium, June 2017.
[7] C. Dong, Y. Zhang, and J. M. Dolan, “Lane-change social behavior
generator for autonomous driving car by non-parametric regression in
reproducing kernel hilbert space,” in IEEE International Conference
on Intelligent Transportation, September 2017, pp. 4489–4494.
[8] H. Kita, “A merging–giveway interaction model of cars in a merging
section: a game theoretic analysis,” Transportation Research Part A:
Policy and Practice, vol. 33, no. 3, pp. 305–312, 1999.
[9] A. Talebpour, H. S. Mahmassani, and S. H. Hamdar, “Modeling
lane-changing behavior in a connected environment: A game theory
approach,” Transportation Research Procedia, vol. 7, pp. 420–440,
2015.
[10] K. Kang and H. A. Rakha, “Game theoretical approach to model
decision making for merging maneuvers at freeway on-ramps,Trans-
portation Research Record: Journal of the Transportation Research
Board, no. 2623, pp. 19–28, 2017.
[11] Y. Zhang, Q. Lin, J. Wang, and S. Verwer, “Car-following behavior
model learning using timed automata,” IFAC-PapersOnLine, vol. 50,
no. 1, pp. 2353–2358, 2017.
[12] C. Vallon, Z. Ercan, A. Carvalho, and F. Borrelli, “A machine learning
approach for personalized autonomous lane change initiation and
control,” in Intelligent Vehicles Symposium. IEEE, 2017, pp. 1590–
1595.
[13] B. Chatterjee, “An optimization formulation to compute nash equilib-
rium in finite games,” in Proceeding of International Conference on
Methods and Models in Computer Science. IEEE, 2009, pp. 1–5.
[14] J. Morton, T. A. Wheeler, and M. J. Kochenderfer, “Analysis of recur-
rent neural networks for probabilistic modeling of driver behavior,”
IEEE Transactions on Intelligent Transportation Systems, vol. 18,
no. 5, pp. 1289–1298, 2017.
[15] NGSIM, “U.S. Department of Transportation, NGSIM - Next genera-
tion simulation,” http://www.ngsim.fhwa.dot.gov, 2007.
[16] R. A. Krohling, “Gaussian swarm: a novel particle swarm optimiza-
tion algorithm,” in IEEE Conference on Cybernetics and Intelligent
Systems, vol. 1. IEEE, 2004, pp. 372–376.
[17] C. Wan, J. Wang, G. Yang, H. Gu, and X. Zhang, “Wind farm
micro-siting by gaussian particle swarm optimization with local search
strategy,Renewable Energy, vol. 48, no. 6, pp. 276–286, 2012.
1226
... Several solutions for these problems are introduced in the literature. Among these, Naive Bayes models [20], gaussian mixture models [21], markov chain models [22], markov state space models [23], optimal control based models [24], inverse reinforcement learning based models [25], [26], Gaussian Process based models [27], [28], [29], [30], game theoretical approaches [31], [32], [33], [34], Gated Recurrent Unit based models [35], joint use of recurrent and convolutional neural networks [36], joint use of recurrent and generative networks [37], and joint use of generative networks and imitation learning [38] can be counted. ...
Preprint
This paper proposes a method for modeling human driver interactions that relies on multi-output gaussian processes. The proposed method is developed as a refinement of the game theoretical hierarchical reasoning approach called "level-k reasoning" which conventionally assigns discrete levels of behaviors to agents. Although it is shown to be an effective modeling tool, the level-k reasoning approach may pose undesired constraints for predicting human decision making due to a limited number (usually 2 or 3) of driver policies it extracts. The proposed approach is put forward to fill this gap in the literature by introducing a continuous domain framework that enables an infinite policy space. By using the approach presented in this paper, more accurate driver models can be obtained, which can then be employed for creating high fidelity simulation platforms for the validation of autonomous vehicle control algorithms. The proposed method is validated on a real traffic dataset and compared with the conventional level-k approach to demonstrate its contributions and implications.
... Section 2.2 futher described their study. Their application demontrate that, while pedestrians and cyclists will eventually have a higher control of the road over AVs, one can also expect an increase in the number of accidents due to the safety feeling induced by AVs on other road users Other traffic situations can also be analysed through GT. Yan, Wang, and Zhang (2018) propose a gametheoretical approach to model the interaction of vehicles while considering different surrounding traffic situations (Figure 4). They focus on lane changing scenarios in highways, as those are particularly challenging given vehicles' high speed, travels' long distance, and traffic jams. ...
Conference Paper
Full-text available
The increasing use of autonomous systems (AS) aims to improve efficiency, costs, and safety of numerous operations. Yet, they also pose several safety challenges. Most of AS will operate in a dynamic environment, interacting with non-autonomous and/or other autonomous systems. The anticipation of both the AS and non-AS possible decisions during these interactions is crucial to identify and analyze potential hazards and risks, and to guarantee a safe operation. Game Theory (GT) has been increasingly used for modeling the interactions between AS and other agents in conflicting or cooperating situations. Recent applications of GT for AS also include the use of game-theoretical approaches for algorithm-testing and development, as well as for cyber-physical security assessment. Yet, the application of GT for analysis of AS operations under a risk perspective can still be considered in an early stage. This paper provides an overview of how GT is being applied to AS in the context of risk assessment. A review of the recent literature on GT applied to AS was carried out on the Scopus database using a combination of relevant keywords. It resulted in 100 articles within the period of 2015-2021. The articles were analyzed with regard to the technical domain of application and the scope of use of GT.
... It would be better to also consider the impact in the other direction. A game-theoretical approach would be a solution to learn the interacting behaviors (Yan et al., 2018). In addition, as the lane change is modeled by stochastic input, it would be possible to conduct probabilistic model checking on the safety property of the controller. ...
Thesis
Full-text available
Automatic control is a technique about designing control devices for controlling ma- chinery processes without human intervention. However, devising controllers using conventional control theory requires first principle design on the basis of the full under- standing of the environment and the plant, which is infeasible for complex control tasks such as driving in highly uncertain traffic environment. Intelligent control offers new op- portunities about deriving the control policy of human beings by mimicking our control behaviors from demonstrations. In this thesis, we focus on intelligent control techniques from two aspects: (1) how to learn control policy from supervisors with the available demonstration data; (2) how to verify the controller learned from data will safely control the process.
Article
Full-text available
Regulating traffic flow on highways presents a challenging matter as the movement interaction between vehicles can gain traveling advantage to some while causing unfavorable effects to many others, which leads to the imbalance between individual benefits and the mutual interest of the highway network. To solve this problem, an Intelligence system solution is being directed to provide communication and guidance among vehicles aiming to reduce arbitrariness in travel. For this purpose, a Game theory model approach was used for which the problem became a symmetric game, drivers are the players and their strategies depend on the other actions. This game produced a Nash equilibrium, which is the optimal course of action all players can follow needlessly for any further changes, thus establishing a predictable and stable traffic network and reducing average travel time to minimum. The experimental calculation was conducted by using OMOPSO-a multi-objective particle swarm optimizer algorithm to calculate the game payoff which is the evaluation of an optimal speed strategy. The result has made a contribution to the development of such an Intelligence system for tackling traffic problems in big cities' highways.
Article
Full-text available
Learning driving behavior is fundamental for autonomous vehicles to “understand” traffic situations. This paper proposes a novel method for learning a behavioral model of car-following using automata learning algorithms. The model is interpretable for car-following behavior analysis. Frequent common state sequences are extracted from the model and clustered as driving patterns. The Next Generation SIMulation dataset on the I-80 highway is used for learning and evaluating. The experimental results demonstrate high accuracy of car-following model fitting.
Conference Paper
Full-text available
Nowadays, self-driving cars are being applied to more complex urban scenarios including intersections, merging ramps or lane changes. It is, therefore, important for self-driving cars to behave socially with human-driven cars. In this paper, we focus on generating the lane change behavior for self-driving cars: perform a safe and effective lane change behavior once a lane-change command is received. Our method bridges the gap between higher-level behavior commands and the trajectory planner. There are two challenges in the task: 1) Analyzing the surrounding vehicles' mutual effects from their trajectories. 2) Estimating the proper lane change start point and end point according to the analysis of surrounding vehicles. We propose a learning-based approach to understand surrounding traffic and make decisions for a safe lane change. Our contributions and advantages of the approach are: 1 Considers the behavior generator as a continuous function in Reproducing Kernel Hilbert Space (RKHS) which contains a family of behavior generators; 2 Constructs the behavior generator function in RKHS by non-parametric regressions on training data; 3 Takes past trajectories of all related surrounding cars as input to capture mutual interactions and output continuous values to represent behaviors. Experimental results show that the proposed approach is able to generate feasible and human-like lane-change behavior (repre-sented by start and end points) in multi-car environments. The experiments also verified that our suggested kernel outperforms the ones which were used in a previous method.
Article
Drivers of merging vehicles decide when to merge by considering surrounding vehicles in adjacent lanes in their deliberation process. Conflicts between drivers of the subject vehicles (i.e., merging vehicles) in an auxiliary lane and lag vehicles in the adjacent lane are typical near freeway on-ramps. This paper models a decision-making process for merging maneuvers that uses a game theoretical approach. The proposed model is based on the noncooperative decision making of two players, that is, drivers of the subject and lag vehicles, without consideration of advanced communication technologies. In the decision-making process, the drivers of the subject vehicles elect to accept gaps, and drivers of lag vehicles either yield or block the action of the subject vehicle. Corresponding payoff functions for two players were formulated to describe their respective maneuvers. To estimate model parameters, a bi-level optimization approach was used. The next generation simulation data set was used for model calibration and validation. The data set defined the moment the game started and was modeled as a continuous sequence of games until a decision is made. The defined merging decision-making model was then validated with an independent data set. The validation results reveal that the proposed model provides considerable prediction accuracy with correct predictions 84% of the time.
Article
The validity of any traffic simulation model depends on its ability to generate representative driver acceleration profiles. This paper studies the effectiveness of recurrent neural networks in predicting the acceleration distributions for car following on highways. The long short-term memory recurrent networks are trained and used to propagate the simulated vehicle trajectories over 10-s horizons. On the basis of several performance metrics, the recurrent networks are shown to generally match or outperform baseline methods in replicating driver behavior, including smoothness and oscillatory characteristics present in real trajectories. This paper reveals that the strong performance is due to the ability of the recurrent network to identify recent trends in the ego-vehicle's state, and recurrent networks are shown to perform as, well as feedforward networks with longer histories as inputs.
Article
Real-world autonomous driving in city traffic must cope with dynamic environments including other agents with uncertain intentions. This poses a challenging decision-making problem, e.g., deciding when to perform a passing maneuver or how to safely merge into traffic. Previous work in the literature has typically approached the problem using ad-hoc solutions that do not consider the possible future states of other agents, and thus have difficulty scaling to complex traffic scenarios where the actions of participating agents are tightly conditioned on one another. In this paper we present multipolicy decision-making (MPDM), a decision-making algorithm that exploits knowledge from the autonomous driving domain to make decisions online for an autonomous vehicle navigating in traffic. By assuming the controlled vehicle and other traffic participants execute a policy from a set of plausible closed-loop policies at every timestep, the algorithm selects the best available policy for the controlled vehicle to execute. We perform policy election using forward simulation of both the controlled vehicle and other agents, efficiently sampling from the high-likelihood outcomes of their interactions. We then score the resulting outcomes using a user-defined cost function to accommodate different driving preferences, and select the policy with the highest score. We demonstrate the algorithm on a real-world autonomous vehicle performing passing maneuvers and in a simulated merging scenario.
Conference Paper
The Stadtpilot project aims at fully automated driving on Braunschweig's inner city ring road. The TU Braunschweig's research vehicle “Leonie” is one of the first vehicles having the ability of fully automated driving in real urban traffic scenarios. This paper shows our decision making approach for performing lane changes while driving fully automated in urban environments. We apply an online Partially Observable Markov Decision Process (POMDP) to accommodate inevitable sensor noise to be faced in urban traffic scenarios. In this paper we propose a two step algorithm to keep the complexity of the POMDP low enough for real-time decision making while driving. The presented approach has been integrated in our vehicle and was evaluated in real urban traffic.