ArticlePDF Available

Probabilistic multi-modal expected trajectory prediction based on LSTM for autonomous driving

Authors:

Abstract and Figures

Autonomous vehicles (AVs) need to adequately predict the trajectory space of surrounding vehicles (SVs) in order to make reasonable decision-making and improve driving safety. In this paper, we build the driving behavior intention recognition module and traffic vehicle expected trajectory prediction module by deep learning. On the one hand, the driving behavior intention recognition module identifies the probabilities of lane keeping, left lane changing, right lane changing, left acceleration lane changing, and right acceleration lane changing of the predicted vehicle. On the other hand, the expected trajectory prediction module adopts an encoder-decoder architecture, in which the encoder encodes the historical environment information of the surrounding agents as a context vector, and the decoder and MDN network combine the context vector and the identified driving behavior intention to predict the probability distribution of future trajectories. Additionally, our model produces the multiple behaviors and trajectories that may occur in the next 6 s for the predicted vehicle (PV). The proposed model is trained, validated and tested with the HighD dataset. The experimental results show that the constructed probabilistic multi-modal expected trajectory prediction possesses high accuracy in the intention recognition module with full consideration of interactive information. At the same time, the multi-modal probability distribution generated by the anticipated trajectory prediction model is more consistent with the real trajec-tories, which significantly improves the trajectory prediction accuracy compared with other approaches and has apparent advantages in predicting long-term domain trajectories.
Content may be subject to copyright.
Original Article
Proc IMechE Part D:
J Automobile Engineering
1–12
ÓIMechE 2023
Article reuse guidelines:
sagepub.com/journals-permissions
DOI: 10.1177/09544070231167906
journals.sagepub.com/home/pid
Probabilistic multi-modal expected
trajectory prediction based on LSTM
for autonomous driving
Zhenhai Gao, Mingxi Bao , Fei Gao and Minghong Tang
Abstract
Autonomous vehicles (AVs) need to adequately predict the trajectory space of surrounding vehicles (SVs) in order to
make reasonable decision-making and improve driving safety. In this paper, we build the driving behavior intention recog-
nition module and traffic vehicle expected trajectory prediction module by deep learning. On the one hand, the driving
behavior intention recognition module identifies the probabilities of lane keeping, left lane changing, right lane changing,
left acceleration lane changing, and right acceleration lane changing of the predicted vehicle. On the other hand, the
expected trajectory prediction module adopts an encoder-decoder architecture, in which the encoder encodes the his-
torical environment information of the surrounding agents as a context vector, and the decoder and MDN network
combine the context vector and the identified driving behavior intention to predict the probability distribution of future
trajectories. Additionally, our model produces the multiple behaviors and trajectories that may occur in the next 6 s for
the predicted vehicle (PV). The proposed model is trained, validated and tested with the HighD dataset. The experimen-
tal results show that the constructed probabilistic multi-modal expected trajectory prediction possesses high accuracy in
the intention recognition module with full consideration of interactive information. At the same time, the multi-modal
probability distribution generated by the anticipated trajectory prediction model is more consistent with the real trajec-
tories, which significantly improves the trajectory prediction accuracy compared with other approaches and has appar-
ent advantages in predicting long-term domain trajectories.
Keywords
Trajectory prediction, behavioral intent recognition, LSTM, interactive behavior
Date received: 15 December 2022; accepted: 20 March 2023
Introduction
To safely and effectively navigate through complex
traffic scenarios, the vehicle needs to have the ability to
predict the intentions and future trajectories of sur-
rounding agents. Excellent trajectory prediction capa-
bility not only makes decisions in advance but also
better enhances the safety and efficiency of agents.
1–4
Many researchers have studied the future trajec-
tories of AVs in recent years. Trajectory prediction can
be roughly divided into the prediction model based on
physical constraints
5–9
and the data-driven.
10,11
The
prediction model based on physical constraints mainly
considered the vehicle’s motion state, road environment
factors, and vehicle’s characteristics to predict the
future motion trend of the agent using kinematic mod-
els. However, this model relied too much on the cer-
tainty of the current state of the vehicle and the
completeness of the model input. The state estimation
of the host vehicle is still a significant challenge for
autonomous driving (AD) due to dynamic model
uncertainties, sensor noise, and bias.
5–8
The model
based on physical constraints was not applicable to
solve the high nonlinearity
9
of the vehicle trajectory. As
a result, the trajectory prediction method of this model
could not predict the long-time domain information
accurately. In addition, to solve the problem of low
accuracy of long-term domain prediction in dynamic
scenes, deep learning has been widely applied to trajec-
tory prediction. Kim et al.
12
used the LSTM to predict
the position of vehicles in the next 2s. Khakzar et al.
13
built a dual learning model (DLM) based on LSTM,
but increasing the dimension of the input feature space
State Key Laboratory of Automotive Simulation and Control, School of
Vehicle Engineering, Jilin University, Changchun, China
Corresponding author:
Fei Gao, State Key Laboratory of Automotive Simulation and Control,
School of Vehicle Engineering, Jilin University, 5988 Renmin Road,
Changchun 130022, China.
Email: gaofei123284123@jlu.edu.cn
will increase the difficulty of training the model, which
is difficult to meet the real-time requirements of intelli-
gent agents. Xie et al.
14
constructed a data-driven lane
change model based on LSTM only, without consider-
ing the influence of driving behaviors such as lane
keeping. Lin et al.
15
analyzed the influence of historical
trajectories and adjacent vehicles on PV based on
spatial-temporal attention LSTM, which lacked inter-
pretation of driving intentions. Xiao et al.
16
used a
behavioral intention module and a trajectory prediction
module in a highway scene to predict the future single-
mode trajectories of vehicles, which can effectively
identify the vehicle’s future behavioral intention.
However, the model’s output produced a large error
with the real trajectories, so the vehicle’s trajectories
were further fitted through optimization.
However, the trajectory prediction model constructed
above predicts the future single-mode trajectories through
the historical time-domain information, which does not
comprehensively represent the future prediction space of
PV and analyze the influence of driving behavior inten-
tions on the model. Therefore, in this paper, we investi-
gate multi-modal trajectorypredictionintermsofthe
diversity of the future prediction space and the influence
of driving behavior intentions on the model. Different
self-driving cars make various behaviors in the same sce-
nario, that is, there are multiple possible future outcomes
due to the inherent uncertainty in predicting the future.
For example, a blue vehicle may continue to go straight
or turn left based on the current environment, forming
different patterns in the trajectory space, as shown in
Figure 1. The problem of uncertainty in predicting the
future leads to the existence of multi-modal properties in
motion forecasting, making trajectory prediction a chal-
lenging problem.
To simulate uncertainty, a large number of scholars
have learned the potential variable
17
to indicate the
multi-modal properties of the trajectory, such as
VAES
18,19
and GANS.
20
Tang and Salakhutdinov
17
constructed a model architecture to capture multi-
modal attributes by introducing latent variables and
parallel neural networks. A large amount of work has
also focused on raster images to process interactions for
environmental modeling , applying convolutional neural
networks(CNN)
17,21–24
and recurrent neural networks
(RNN) to extract scenario information. Deo et al.
25–27
nominated a convolutional LSTM model based on a
social pool. It predicted the distribution of future traffic
vehicle trajectories, but ignored the effect of interactions
between agents. Cui et al.
21
encoded each participant’s
surroundings as a raster image, which was used as the
input to the deep CNN. However, the image-based
approach generates two complicated problems: (1) it
causes sparse convolution and wastes computational
resources (2) it is difficult to be interpretative.
In response to the above problems, in order to
adequately represent the vehicle’s behavior prediction
space, reduce model complexity, address the inherent
uncertainty of prediction, and lessen the safety issues of
motion planning, we propose a Probabilistic Multi-modal
Expected Trajectory Prediction (PMETP) model. The
contributions of this paper is two-fold.
1. In this paper, the motion state of PV and the inter-
action information of the surrounding environ-
ment are deeply extracted as the model input.
Meanwhile, the interaction information between
the SVs of the target agent is considered.
2. A framework is proposed to realize the specific
classification of the behavior intention by a neural
network, and predict the probability distribution of
the future trajectories of PV by the MDN network.
Problem formulation
In this paper, the expected trajectory prediction is
expressed as the probability distribution of predicting
the future position of a vehicle at each time step from
the historical characteristic information of the PV and
its surrounding traffic vehicles. PMETP aims to gener-
ate multiple possible and safe trajectories for traffic
agents in complex and highly dynamic scenarios to ade-
quately represent the future prediction space. It consists
of two main tasks: (1) how to represent the multi-
modal nature of the prediction results: different targets
may have different future trajectories for the same his-
torical trajectories. (2) How to model the interactions
between targets: the behavior among targets is influ-
enced not only by their intentions, but also by other
targets around them.
Frame of reference
The traffic scenario uses a fixed coordinate system
where the origin of the coordinate system is fixed to the
predicted vehicle at time t, as shown in Figure 2. The
direction of the x-axis is defined as the driving direction
of the vehicle, and the direction of the y-axis is defined
as the direction perpendicular to the driving direction
of the vehicle. PMETP does not rely on high-precision
maps, but only requires lane parameters and vehicle
status information to complete the expected trajectory
prediction.
Environment characteristic information
In the complex dynamic traffic environment, the
prediction of the trajectory of AVs should consider not
Figure 1. Multiple possible future trajectories.
2Proc IMechE Part D: J Automobile Engineering 00(0)
only the motion state of PV but also the environmental
information of the target vehicle, that is, the character-
istic information of the surrounding traffic vehicles, the
characteristic information of the interaction between
the predicted vehicle and the surrounding traffic vehi-
cles. In order for the motion prediction model to
understand the interactive behavior between vehicles,
the input information includes the historical and envi-
ronmental characteristics of the predicted agents, as
shown in formula (1).
Mt=Pt
ego,Et

,t2(0, T)ð1Þ
where, Mtis the input of the motion prediction model
at time t.Pt
ego =(x
t
,y
t
,v
xt
,v
yt
,a
xt
,a
yt
,d
hw
,t
hw
,t
tc
) denotes
the characteristic information of the predicted vehicle
at time t. T is the length of vehicle historical track time.
xt,ytare the vertical and horizontal coordinates of the
target vehicle respectively. vt
x,vt
ycorresponds to the
speeds of longitudinal and horizontal coordinate of
agents respectively. at
x,at
yare the accelerations of longi-
tudinal and lateral of the predicted vehicle at time t,
respectively. dhwthwttc are the headway distance, head-
way time distance, and collision time between the pre-
dicted and the vehicle ahead, respectively. Etis the
environmental information of target agent at time t.
The environmental information is represented by the
eight directions of the predicted vehicle, as shown in
Figure 3. Environmental information is characterized
as Et=(St
LF,St
LA,St
LB,St
MF,St
MB,St
RF,St
RA,St
RB,It), St
p=
{xt
p,yt
p,vt
px,vt
py,at
px,at
py}. p= LF, LA, LB, MF, MB, RF,
RA, RB refers to the location number of vehicles
around the predicted agent. It=fDSt
i,DSt
I,g.DSt
i=
(Dxt
i,Dyt
i,Dacct
i,Dacct
i,Dvt
i,Dvt
i),i=(LF_TV, LA_TV, LB_
TV, MF_TV, MB_TV, RF_TV, RA_TV, RB_TV)
indicates the interaction information between the pre-
dicted vehicle and the surrounding traffic vehicles.
DSt
I=(Dlt
LF LA,Dlt
LA LB,Dlt
LF LB,Dlt
RF RA,Dlt
RA RB,Dlt
RF RB)
represents the interaction information between the
surrounding traffic vehicles, reflecting the absolute dis-
tance between the vehicles in the left and right lanes of
the predicted agent.
Dlt
mn=ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
(xt
mxt
n)2+(yt
myt
n)2
q,m,n
= (LF`LA`LB`RF`RA`RB):
Probabilistic motion forecasting
The flowchart of the trajectory forecasting model is
shown in Figure 4. Among them, the data pre-
processing is seen in Section 4.2, the PMETP is
described in Section 3, and the model training result is
illustrated in Section 4.
The PMETP proposed in this paper predicts the
probability distribution P(OjI) of future locations from
historical environmental information and driving beha-
vior intentions.
P(OjI)= XiPm,s,r,u(oijci,I)P(cijI)ð2Þ
where, Orepresents model output.Pm,s,r,u(oijci,I) cor-
responds to the probability distribution of the model’s
trajectory based on different driving intentions.
m,s,r,uare the parameters of the bi-variate Gaussian
distribution function regarding to the future time step,
which are the mean, variance, probability, and correla-
tion coefficient, respectively.
Driving behavior intention classification
When AV drive in a dynamic traffic environment, the
reason for lane change is to obtain larger driving space
or to avoid collision risk. Therefore, in this paper, the
specific driving behavior intentions of the left and right
lane changing are classified as acceleration and general
lane changing. As a result, we divided the driving beha-
vior intentions into five categories: left lane-changing,
lane keeping, right lane-changing, left speeded-up lane-
changing, and right speeded-up lane-changing, as shown
in Figure 5. If the average speed in the predicted time
domain is greater than the historical time domain aver-
age speed of 7.2 km/h, it is defined as an accelerated lane
Figure 2. Diagram of the coordinate system of the traffic scene.
Figure 3. Schematic diagram of the target vehicle orientation.
Gao et al. 3
change; otherwise, the vehicle’s intention is to change
lanes under normal running conditions.
PMETP model
Model framework
The PMETP proposed in this paper is composed of a
driving behavior intention recognition module and a
traffic vehicle expected trajectory prediction module, as
shown in Figure 6. Driving behavior intention recognition
and expected trajectory prediction of AVs are classifica-
tion problems and regression problems in machine learn-
ing, respectively. Based on the historical coded state
information of the vehicle, the AV driving behavior inten-
tion recognition module outputs five driving behavior
probabilities of the vehicle at the current moment, such as
lane keeping, left and right lane-changing, left and right
accelerating lane-changing. The traffic vehicle expected
trajectory prediction module produces a probability distri-
bution of future vehicle’s trajectories in the time domain
based on historical coded information and probability of
driving behavior intention.
The AV’ behavior intention recognition module,
which is built based on long short-term memory
(LSTM) network and multiple layer perceptron (MLP),
calculates the probability of each driving behavior
intention by the Softmax function. Let the historical
state feature information Mof the overall environment
of AVs at the current moment be the input vector of the
vehicle motion forecasting model. C=(c1,c2,c3,c4,c5)
is the intention category vector. c1;c5represents lane
keeping, left and right lane changing, left and right
acceleration lane changing respectively. The driving
intention category probability vector is
Ω=v1,v2,v3,v4,v5
ðÞ ð3Þ
where, v
i
= P(c
i
|M), i = 1,2,3,4,5 represents the prob-
ability of driving intention c
i
.
The traffic vehicle expected trajectory prediction
module is made up of a fully connected layer, encoder,
decoder, MLP, and Mixture density network (MDN).
First of all, the fully connected layer network extracts
the feature information of the historical state of the
traffic vehicle as the input of the encoder. Secondly, the
encoder uses LSTM to encode the input feature infor-
mation into a context vector. In order to improve the
backward and forward correlation of the current state,
Figure 5. Diagram of driving behavior intention.
Figure 4. Trajectory forecasting flowchart.
Figure 6. Architecture of the PMETP.
4Proc IMechE Part D: J Automobile Engineering 00(0)
the encoder uses a combination of bidirectional LSTM
and unidirectional LSTM to enhance the extraction of
the environmental feature information in the forward
process, so that the encoder pays more attention to the
feature information of the forward process. MLP and
MDN input the output vector of the decoder, enabling
the model to predict the probability distribution of
future tracks based on intent recognition.
LSTM
LSTM
28
is a gated recurrent neural network with a for-
get gate, an input gate, an output gate and memory cells
with the same shape as the hidden state, as shown in
Figure 7. The input information of the LSTM is X
t
at
the current time step, the hidden state H
t21
of the previ-
ous time step and the memory cell C
t21
. The fully con-
nected layer activation functions sand tanh are:
s=1
1+ex
tanh =exex
ex+ex
8
>
<
>
:
ð4Þ
The forgetting gate determines the proportion of the
cell state that is forgotten from the previous moment,
the input gate determines whether the current input
contributes to the cell state, and the output gate con-
trols the candidate memory outcome. The input gate I
t
,
the forgetting gate F
t
, and the output gate O
t
are:
It=s(XtWxi +Ht1Whi +bi)
Ft=s(XtWxf +Ht1Whf +bf)
Ot=s(XtWxo +Ht1Who +bo)
8
<
:
ð5Þ
where,WxiWxfWxo 2dhWhi Whf,Who 2hhdenotes the
weight parameters. bi,bf,bo21hrepresents the bias
parameters.
Candidate memory cells select new information, and
memory cells combine information from memory cells
of the previous time step and candidate memory cells
of the current time step. The candidate memory con-
trols the flow of information through both input gates
as well as forgetting gates.
~
Ct=tanh(XtWxc +Ht1Whc +bc)
Ct=FtCt1+It~
Ct
ð6Þ
where,Wxc 2dh,Whc 2hhindicates the weight para-
meters. bc21hcorresponds to the bias parameters.
The hidden state at the current time step is controlled
by the output gate.
Ht=Ottanh(Ct)ð7Þ
Behavior intention recognition module
The traffic vehicle behavior intention recognition mod-
ule can understand the running law of PV and its sur-
rounding traffic vehicles with respect to their motion
state and interaction information, and can accurately
identify the vehicle’s driving intention in the current
state. The model framework is shown in Figure 6. The
overall model is built by a combination of MLP and
LSTM. The LSTM unit reads the input feature infor-
mation Mof the current time step and the hidden state
H
t21
of the historical time step and updates the hidden
state of the current time step, that is, Ht=f(Ht1,M).
Finally, the MLP and Softmax produce the probability
matrix Cof five driving intention categories: lane keep-
ing, left lane-changing, right lane-changing, left accel-
eration lane-changing, and right acceleration lane-
changing. Softmax is represented as:
softmax zn
ðÞ=ezn
Pm
m=1 ezmð8Þ
where, znis the output value of the n-th driving beha-
vior intention, which is in the range [0,1] and sums to 1.
mshows the number of categories for the classification
of driving behavior intention.
The intention recognition module employs multi-
categorical cross entropy as the loss function with an
Adam optimizer and a learning rate of 0.0002. The loss
function is:
Lc=Xm
n=1 yn3log(pn)ð9Þ
where, Lcis the loss of the behavioral intention recog-
nition module. yndenotes the real value of the n-th
sample label. pnrepresents the prediction probability of
the n-th observed sample.
Encoder-decoder
The anticipated trajectory prediction module is com-
posed of a fully connected layer, encoder, decoder,
MLP and MDN. The encoder consists of a deep bidir-
ectional LSTM and a unidirectional deep LSTM, and
the input to the decoder contains the output of the
encoder and a probability vector Xof the output of the
behavioral intention recognition module after feature
extraction from the fully connected layer. The state of
Figure 7. LSTM structure.
Gao et al. 5
the current time step of the agent is related not only to
the temporal state of the previous historical moment
but also to the future state. The feature information of
the current moment of the predicted vehicle is obtained
simultaneously from the historical and future timing
information via bi-directional LSTM to compose con-
textual information to determine the current state char-
acteristics of the agent. The structure of bidirectional
RNN is shown in Figure 8.
The fundamental concept of bidirectional LSTM is
that each training sequence is two RNNs forward and
backward respectively, and both of them are connected
to an output layer. This structure provides the output
layer with complete past and future contextual infor-
mation for each point in the input sequence. Six unique
weights are utilized repeatedly at each time step, and
the six weights correspond to the input to the hidden
layers of forward and backward (v1,v3), the hidden
layers to the hidden layers (v2,v5), and the hidden
layers of forward and backward to the output layers
(v4,v6), but there is no information communication
between the forward and backward hidden layers.
Mixture density network
MDN was proposed by Christopher Bishop
29
in 1994
to tackle multi-valued mapping problems using
Gaussian mixture models and neural networks. In
order to better reflect the diversity of driving behaviors
and the uncertainty in predicting future trajectories, the
probability distribution of future trajectories is fore-
casted by MDN to generate multiple possible future
behaviors and trajectories of PV. In this paper, the
combination of six Gaussian functions is selected as the
kernel function of MDN. The probability of the trajec-
tory distribution is
p(ojx)= P
n
i=1
ai(x)ui(ojx)
ui(ojx)= 1
(2p)c=2si(x)cexp jjomi(x)jj2
2s2
i(x)
()
8
>
>
>
<
>
>
>
:ð10Þ
where, xis the input characteristic parameter. odenotes
the location of the agent at a given time. nindicates the
number of mixed kernel function. ai(x) refers to the
model weight coefficient. si(x) is the variance para-
meter. mi(x) corresponds to the center of i-th kernel
function.
Ensure that the weighting coefficients of the model
add up to 1 and that each one is greater than 0, while
the exponential operation ensures that siis positive.
Pn
i=1 ai(x)=1
ai(x)= exp(za
i)
Pn
j=1 za
j
si=expfzs
ig
8
>
>
>
<
>
>
>
:
ð11Þ
The maximum likelihood function loss function
which minimizes the negative logarithm is used as the
optimization objective. The loss function Lis
L=log( XkPu(GjCk,Xobs)P(CkjXobs)) ð12Þ
where, Xobs denotes the historical trajectory sequence
of PV. Ckillustrates the driving behavior forecasted in
driving behavior prediction module. Gis the Gaussian
distribution of the future trajectories.
Experimental evaluation
Dataset
We use the HighD
30
dataset published by the Institute of
Automotive Engineering at RWTH Aachen University,
Germany, for training, validating and testing the
proposed motion forecasting model. The dataset
provides an extensive set of test data for AD, including a
total of 16.5 h of measurement data, a total distance of
45,000 km, and 5600 complete lane changes. The
sampling frequency of the original dataset trajectory data
is 25 Hz. In order to conform to the experimental scene
and reduce the computational cost, the sampling fre-
quency of the dataset is set to 8 Hz. The scene diagram
of the dataset is shown in Figure 9.
Figure 8. Bidirectional circulatory neural network.
Figure 9. The scene diagram of the HighD dataset.
6Proc IMechE Part D: J Automobile Engineering 00(0)
Implementation details
The proposed learning framework was implemented
using the PyTorch(1.12.1) Library and the python(3.8)
Library, and the model was trained on Nvidia GeForce
GTX 1650 Ti GPU cards. In the behavioral intention
recognition module, the historical state information M
first passes through the Fully Connected Layers (FC)
with 128 neurons and Leaky ReLU activation with
a= 0.1, and the encoded vectors are passed to the deep
RNN, which uses a two-layer LSTM network structure
with 256 hidden features and a Dropout ratio of 0.5. In
addition, in the expected trajectory prediction module,
the historical state information Mundergoes two fully
connected layers with 256 dimensional state and Tanh,
and is passed into the bidirectional LSTM and unidir-
ectional LSTM with hidden features of 512 dimensional
state and a Dropout ratio of 0.5, respectively. At the
end, the MDN and MLP produce the trajectory data of
PV after 6 s.
Data preprocessing
The behavioral intent recognition module needs to
extract the trajectories of lane keeping, left lane chang-
ing, right lane changing, left acceleration lane changing,
and right acceleration lane changing in the HighD data-
set and add the corresponding labels (0,1,2,3,4). The
step size of each sampling sequence is 3 s, and the step
length of the prediction sequence is 6 s. The steps to
classify the lane-changing trajectory of the vehicle illu-
strated in Figure 10 are:
Extraction of the intersection point of the track and
the lane line and recording of the time
Calculate yaw angle u
u=xt+1 xt
yt+1 ytð13Þ
where, xt,ytare the vertical and horizontal coordinates
of the vehicle at time t. xt+1,yt+1 represents the vertical
and horizontal coordinates of the vehicle at time t+1.
u
jj
\ub(Heading angle threshold at the start) was
defined as lane keeping, otherwise as lane changing
Determine the start point and end point of the lane-
changing
Due to the uneven distribution of working condi-
tions, straight-line driving is far more than the category
of lane changing in the extracted sequence. The whole
data set was randomly selected, 80% of which was
taken as the training set, 10% as the verification set,
and 10% as the test set. Finally, all the extracted data
must be standardized to facilitate the training of the
proposed model.
Performance analysis of behavior intention
recognition module
The accuracy of the behavioral intention recognition
module of a traffic vehicle plays a crucial role in the
predicted trajectories of the agent. We adopt negative
log-likelihood loss (NLL) as the loss function of this
module, and the loss values of the behavioral intention
recognition model are displayed in Figure 11. The loss
value of the training process is stable at 0.0584, and the
loss value of the validation process is stable at 0.0796,
and the convergence effect of the behavioral intention
recognition module is outstanding. The confusion
matrix is a common visualization approach for super-
vised learning in machine learning, and the confusion
matrix for behavioral intention recognition is demon-
strated in Table 1. We consider the accuracy ratio,
recall ratio, F1-score, and precision ratio as the evalua-
tion metrics of the classifier. Taking binary classifica-
tion as an example, the precision ratio p, recall ratio r,
F1-score F1, and accuracy ratio aare:
Figure 10. Diagram of lane changing trajectory.
Gao et al. 7
p=TP
TP +FP
r=TP
TP +FN
2
F1=1
p+1
r
a=TP +TN
TP +FN +FP +TN
8
>
>
>
>
<
>
>
>
>
:
ð14Þ
where, Tp,Tn,Fp,Fnrepresent the number of true cases,
true negative cases, false positive cases, and false nega-
tive cases, respectively.
As can be seen from Table 2, all the performance
indicators of behavioral intention are excellent, with p
of over 90%; rof over 97% for lane keeping, left and
right lane changing, and rof over 81% for left and right
acceleration lane changing. The F1-score reflects the
average level of aand r.F1of 97% or more for lane
keeping, left lane changing and right lane changing ,
and 85% or more for left and right acceleration chang-
ing. The accuracy rate reflects the degree of goodness of
the model, and the accuracy rate of the model reached
more than 98%. Figure 12 shows the accuracy of the
behavioral intention recognition model. It should be
noted that as the irregular trajectories of straight-line
driving are removed during the data pre-processing
stage, the accuracy, recall, and F1-score of straight-line
driving are higher, and the indicators of left and right
lane changing are closer. Due to the small sample size
of the extracted left and right acceleration lane chang-
ing, each of their performance indexes is lower than
those of the lane keeping, left and right lane changing ,
but the indexes of left and right acceleration lane chang-
ing are close. Although some misjudgments are gener-
ated in the intention recognition module, the opposite
type of verdict is rarely produced, which shows that the
behavior intention recognition module has a good
intention recognition capability and meets the require-
ments of the vehicle motion forecasting module.
PMETP performance analysis
From Figure 13, we can observe the multi-modal pre-
diction results of the agents in different time domains
for both straight-line driving and lane change scenar-
ios. Each plot presents the historical trajectories of the
vehicles for the past 3 s and the predicted trajectories
for the next 6 s. The shades of color in the plot are pro-
portional to the probability of predicting behavioral
intentions and reveal the complete heat map of the pre-
dicted multi-modal distribution.
Figure 11. Behavioral intention recognition model loss.
Table 1. Confusion matrix for behavioral intention identification.
Real intention
Lane-keeping
(item)
Left
change
(item)
Right
change
(item)
Left acceleration
lane change (item)
Right acceleration
lane change (item)
Predicted
Intention
Lane-keeping (item) 201598 683 497 462 596
Left change (item) 473 28560 37 274 7
Right change (item) 692 21 35169 17 239
Left acceleration lane change (item) 171 89 1 3346 18
Right acceleration lane change (item) 205 8 116 3 3855
Table 2. The performance measures for behavioral intention identification.
Evaluation metric Precision
ratio p
Recall
ratio r
F1-score F1Accuracy
ratio a
Predicted intention Lane-keeping (item) 0.989 0.992 0.991 0.983
Left change (item) 0.973 0.973 0.973
Right change (item) 0.973 0.982 0.978
Left acceleration lane change (item) 0.923 0.816 0.866
Right acceleration lane change (item) 0.921 0.818 0.872
8Proc IMechE Part D: J Automobile Engineering 00(0)
Figure 13(a) shows the effect of the lane keeping sce-
nario on the PMETP with an interval of 0.84 s. The first
example (top-left) and the second example (top-middle)
represent that the purpose of PV and its trailers is to
overtake the vehicle in front, therefore the PMETP out-
puts the predicted trajectory to go around the vehicle in
front (without causing a collision) and keep it straight.
The third example (top-right) indicates that the current
environment is insufficient to complete the overtaking
behavior, and the historical trajectories and lane are
approximately parallel, so there is no obvious overtak-
ing tendency. Therefore, the predicted vehicle will stay
in the lane, and PMETP predicts that there will be two
probability trajectories in the later period to continue
driving. However, during the prediction process,
PMETP predicts that neither the behavioral intention
nor the future trajectories will change lanes based on
the historical trajectories.
Figure 13(b) shows the effect of vehicles in adjacent
lanes and the same lane on PMETP in the lane chang-
ing scenario with an interval of 0.84 s. As can be seen in
Figure 13(b), the PV is in the congested rightmost lane,
and the middle lane is faster than the right lane. The
first example (top-left) demonstrates that based on the
vehicle’s historical trajectories (no significant tendency
to change lanes), PMETP predicts the highest probabil-
ity of remaining in the lane and the lower probability of
moving to the middle lane based on the current envi-
ronment. In the second example (top-middle), the PV
travels to the middle of the front and back vehicles and
has a clear tendency to change lanes. PMETP predicts
that the future trajectories of PV will all move toward
the middle lane, and the probabilities of the two pre-
dicted trajectories are approximate. The third example
(top-right) shows the historical trajectory of the red car
continuing toward the middle lane. The main trend
Figure 12. The accuracy of the behavioral intention
recognition model.
Figure 13. Multi-modal trajectory prediction of PMETP model: (a) prediction results of PMETP in lane keeping scenario and (b)
prediction results of PMETP in the channel change scenario.
Gao et al. 9
predicted via the PMETP based on the current scene is
consistent but the model predicts two probabilistic tra-
jectories with similar probabilities in the future time
domain.
The test dataset of HighD can be seen that our pro-
posed PMETP has performed well in predicting the
multi-modal distribution.
In this paper, the Root Mean Square Error (RMSE)
and the Negative Log Likelihood (NLL) of the 6s both
of predicted trajectories and the true trajectories are
employed as the evaluation metrics of the PMETP pre-
diction results. For the trajectory prediction model with
multi-modal distribution, the RMSE is calculated by
the maximum probability trajectory. The advantages
and disadvantages of uni-modal and multi-modal dis-
tributions are compared by the NLL both of the trajec-
tory distribution and the real trajectories generated by
the PMETP.
We compare the RMSE and NLL of the following
models within 6 s to test the validity of the models.
Constant Velocity (CV): The fixed-speed Kalman
filter is used as the basic model.
LSTM with convolutional social pooling and
maneuvers (CS-LSTM): This method was proposed
by Deo and Trivedi.
25
It includes the maneuver-
based decoder that generates a multi-modal predic-
tive distribution. Each vehicle is modeled using
LSTM and the hidden states were pooled in each
iteration using a social pooling layer. The model
was trained using Adam with learning rate of 0.001.
The encoder LSTM has 64 dimensional states while
the decoder has a 128 dimensional states.
XY-LSTM: Based on the architecture of the
PMETP, we use the location feature information of
the predicted vehicle and the surrounding vehicles.
The parameter values are derived from the model
proposed in this paper.
V-LSTM: we increase the speed feature information
of the vehicle based on XY-LSTM. The parameter
values are derived from the model proposed in this
paper.
E1-LSTM: Additional information on the interac-
tion between the predicted vehicle and the sur-
rounding vehicles is added to the V-LSTM. The
parameter values are derived from the model pro-
posed in this paper.
E2-LSTM: we augment the E1-LSTM with infor-
mation about the interaction between surrounding
vehicles. The parameter sizes are derived from the
model proposed in this paper.
PMETP(M): The complete model described in this
paper includes behavioral intention recognition and
multi-modal prediction distribution generated by
encoders and decoders. The parameter values are
provided in section implementation details.
Table 3 indicates the RMSE and NLL results for
each model based on the HighD dataset. It can be seen
that the RMSE of the models (CS-LSTM, E1-LSTM,
E2-LSTM, PMETP) considering the information of the
interaction characteristics of the predicted vehicle and
the surrounding vehicles is significantly lower than that
of XY-LSTM and V-LSTM, indicating that the interac-
tion between agents is one of the powerful factors for
motion forecasting. The RMSE of CV-based and
LSTM-based are close in the short-term domain, indi-
cating that the CV-based model is only suitable for
short-term trajectory prediction, and also proving that
LSTM has a stronger advantage in the long-term
domain prediction. Furthermore, according to the com-
parison of RMSE and NLL between each proposed
model based on the HighD dataset, the PMETP pro-
posed in this paper has obvious advantages. The aver-
age error of RMSE in 6 s was decreased by 45.93%,
5.97%, 34.76%, 29.6%, 10.7%, 5.51%, and the average
error of NLL in 6 s was reduced by 6.61%, 28.29%,
22.49%, 16.06%, 8.7%, respectively. The generated
multi-modal probability distribution is more consistent
with the real trajectory.
Table 3. Test results of each model based on HighD dataset.
Evaluation
metric
Prediction
horizon (s)
CV CS-LSTM XY-LSTM V-LSTM E1-LSTM E2-LSTM PMETP
RMSE (m) 1 0.86 0.59 0.72 0.69 0.65 0.63 0.61
2 2.08 1.22 1.59 1.46 1.33 1.28 1.24
3 4.52 2.33 3.68 3.25 2.43 2.16 2.03
4 5.79 3.42 4.79 4.33 3.48 3.23 3.14
5 6.93 4.19 6.26 5.84 4.31 4.22 3.95
6 8.67 4.84 6.87 6.59 5.27 4.99 4.63
NLL 1 - 0.59 2.11 1.65 1.26 0.72 0.52
2 - 2.16 2.98 2.69 2.43 2.23 1.98
3 - 2.78 3.83 3.49 3.06 2.85 2.56
4 - 3.25 4.39 4.03 3.76 3.46 3.08
5 - 4.41 5.29 5.06 4.97 4.66 4.31
6 - 5.25 5.82 5.67 5.38 5.26 5.06
10 Proc IMechE Part D: J Automobile Engineering 00(0)
Conclusion
To adequately represent the vehicle behavior prediction
space and address the inherent uncertainty in predic-
tion, we propose a multi-modal expected trajectory pre-
diction model based on probability density. The major
contents and results of this paper are as follows.
1. In this paper, we consider not only the motion
state of the predicted vehicle but also the environ-
mental information of the target vehicle, that is,
the characteristic information of the surrounding
traffic vehicles, the characteristic information of
the interaction between the predicted vehicle and
the surrounding traffic vehicles, and the associa-
tion between the surrounding traffic vehicles.
Meanwhile, we also extract the classification label
of the agent based on the yaw angle, which pro-
vides the data support for the PMETP model.
2. The driving behavior intention recognition module
was used to predict the probability of the target
vehicle in lane keeping, left lane changing, right
lane changing, left accelerated lane changing and
right accelerated lane changing. The probability
distribution of the future trajectory position is pre-
dicted by the MDN Gaussian kernel function.
3. The driving behavior intention recognition module
is analyzed by evaluation metrics such as accuracy,
recall, F1-score, and precision, and the model
achieves an accuracy rate of over 98%. Meanwhile,
PMETP can produce multi-modal prediction
results in the driving scenarios of lane-keeping and
lane-changing . In this paper, we compare the
RMSE and NLL between the proposed PMETP
and different models. The average error of RMSE
in 6 s was decreased by 45.93%, 5.97%, 34.76%,
29.6%, 10.7%, 5.51%, and the average error of
NLL in 6 s was reduced by 6.61%, 28.29%,
22.49%, 16.06%, 8.7%, respectively.
However, we only focus on the multi-modal trajec-
tory prediction under the motorway, and do not con-
sider complex scenarios, such as congestion,
intersections, pedestrian, and vehicular mixing, etc. The
following research will take into account complex sce-
narios in motion prediction and adopt state-of-the-art
methods such as a transformer.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest
with respect to the research, authorship, and/or publi-
cation of this article.
Funding
The author(s) disclosed receipt of the following finan-
cial support for the research, authorship, and/or publi-
cation of this article: This research was supported by
the National Natural Science Foundation of China
(grants No. 52202494).
ORCID iDs
Mingxi Bao https://orcid.org/0000-0003-4929-5260
Fei Gao https://orcid.org/0000-0003-4195-5033
References
1. Huang Z, Mo X and Lv C. Multi-modal motion predic-
tion with transformer-based neural network for autono-
mous driving. In: 2022 International conference on
robotics and automation (ICRA), Philadelphia, PA,
USA, 23–27 May 2022, pp. 2605–2611. New York, NY:
IEEE.
2. Luo C, Sun L, Dabiri D, et al. Probabilistic multi-modal
trajectory prediction with lane attention for autonomous
vehicles. In: 2020 IEEE/RSJ international conference on
intelligent robots and systems (IROS), Las Vegas, NV,
USA, 24 October 2020–24 January 2021, pp.2370–2376.
New York, NY: IEEE.
3. Casas S, Gulino C, Suo S, et al. The importance of prior
knowledge in precise multimodal prediction. In:2020
IEEE/RSJ international conference on intelligent robots
and systems (IROS), Las Vegas, NV, USA, 24 October
2020–24 January 2021 pp.2295–2302. New York, NY:
IEEE.
4. Huang Y, Du J, Yang Z, et al. A survey on trajectory-
prediction methods for autonomous driving. IEEE Trans
Intell Vehicles 2022; 7: 652–674.
5. Xiong L, Xia X, Lu Y, et al. IMU-based automated
vehicle body sideslip angle and attitude estimation aided
by GNSS using parallel adaptive Kalman filters. IEEE
Trans Vehicular Technol 2020; 69(10): 10668–10680.
6. Liu W, Xia X, Xiong L, et al. Automated vehicle sideslip
angle estimation considering signal measurement charac-
teristic. IEEE Sens J 2021; 21(19): 21675–21687.
7. Liu W, Xiong L, Xia X, et al. Vision-aided intelligent
vehicle sideslip angle estimation based on a dynamic
model. IET Intell Transp Syst 2020; 14(10): 1183–1189.
8. Zhenhai G. Soft sensor application in vehicle yaw rate
measurement based on Kalman filter and vehicle
dynamics. In: Proceedings of the 2003 IEEE international
conference on intelligent transportation systems, Shanghai,
China, 12–15 October 2003, pp.1352–1354. New York,
NY: IEEE.
9. Polychronopoulos A, Tsogas M, Amditis AJ, et al. Sen-
sor fusion for predicting vehicles’ path for collision
avoidance systems. IEEE Trans Intell Transp Syst 2007;
8(3): 549–562.
10. Mozaffari S, Al-Jarrah OY, Dianati M, et al. Deep
learning-based vehicle behavior prediction for autono-
mous driving applications: A review. IEEE Trans Intell
Transp Syst 2022; 23(1): 33–47.
11. Lefe
`vre S, Vasquez D and Laugier C. A survey on motion
prediction and risk assessment for intelligent vehicles.
Robomech J 2014; 1(1): 1–14.
12. Kim BD, Kang CM, Kim J, et al. Probabilistic vehicle
trajectory prediction over occupancy grid map via recur-
rent neural network. In: 2017 IEEE 20th international
conference on intelligent transportation systems (ITSC),
Yokohama, Japan, 16–19 October 2017, pp.399–404.
New York, NY: IEEE.
Gao et al. 11
13. Khakzar M, Rakotonirainy A, Bond A, et al. A dual
learning model for vehicle trajectory prediction. IEEE
Access 2020; 8: 21897–21908.
14. Xie DF, Fang ZZ, Jia B, et al. A data-driven lane-chang-
ing model based on deep learning. Transp Res Part C
Emerg Technol 2019; 106: 41–60.
15. Lin L, Li W, Bi H, et al. Vehicle trajectory prediction
using LSTMs with spatial–temporal attention mechan-
isms. IEEE Intell Transp Syst Mag 2022; 14(2): 197–208.
16. Xiao H, Wang C, Li Z, et al. UB-LSTM: a trajectory pre-
diction method combined with vehicle behavior recogni-
tion. J Adv Transport 2020; 2020: 1–12.
17. Tang C and Salakhutdinov RR. Multiple futures predic-
tion. Adv Neural Inf Process Syst 2019; 32: 15398–15408.
18. Lee N, Choi W, Vernaza P, et al. Desire: distant future
prediction in dynamic scenes with interacting agents. In:
Proceedings of the IEEE conference on computer vision
and pattern recognition, Honolulu, HI, USA,21–26 July
2017, pp.336–345. New York, NY: IEEE.
19. Yuan Y and Kitani K. Diverse trajectory forecasting
with determinantal point processes. arXiv preprint
arXiv:1907.04967, 2019.
20. Gupta A, Johnson J, Fei-Fei L, et al. Social GAN:
socially acceptable trajectories with generative adversarial
networks. In: Proceedings of the IEEE conference on com-
puter vision and pattern recognition, Salt Lake City, UT,
USA, 18–23 June 2018, pp.2255–2264. New York, NY:
IEEE.
21. Cui H, Radosavljevic V, Chou FC, et al. Multimodal tra-
jectory predictions for autonomous driving using deep
convolutional networks. In: 2019 International conference
on robotics and automation (ICRA), Montreal, QC,
Canada, 20–24 May 2019, pp.2090–2096.
22. Phan-Minh T, Grigore EC, Boulton FA, et al. CoverNet:
multimodal behavior prediction using trajectory sets. In:
Proceedings of the IEEE/CVF conference on computer
vision and pattern recognition, Seattle, WA, USA,13–19
June 2020, pp.14074–14083. New York, NY: IEEE.
23. Rhinehart N, McAllister R, Kitani K, et al. PRECOG:
prediction conditioned on goals in visual multi-agent set-
tings. In: Proceedings of the IEEE/CVF international con-
ference on computer vision, Seoul, South Korea, 27
October–2 November 2019, pp.2821–2830. New York,
NY: IEEE.
24. Biktairov Y, Stebelev M, Rudenko I, et al. Prank: motion
prediction based on ranking. Adv Neural Inf Process Syst
2020; 33: 2553–2563.
25. Deo N and Trivedi MM. Convolutional social pooling
for vehicle trajectory prediction. In: Proceedings of the
IEEE conference on computer vision and pattern recogni-
tion workshops, Salt Lake City, UT, USA, 18–22 June
2018, pp.1468–1476. New York, NY: IEEE.
26. Deo N and Trivedi MM. Multi-modal trajectory predic-
tion of surrounding vehicles with maneuver based
LSTMs. In: 2018 IEEE intelligent vehicles symposium
(IV), Changshu, China, 26–30 June 2018, pp.1179–1184.
New York, NY: IEEE.
27. Deo N, Wolff E and Beijbom O. Multimodal trajectory
prediction conditioned on lane-graph traversals. In:
Proceedings of the 5th Conference on Robot Learning,
London, UK, 2022, pp.203–212.
28. Gers FA, Schmidhuber J and Cummins F. Learning to
forget: Continual prediction with LSTM. Neural Comput
2000; 12(10): 2451–2471.
29. Bishop CM. Mixture density networks. Technical report,
Aston University, 1994.
30. Krajewski R, Bock J, Kloeker L, et al. The highD data-
set: a drone dataset of naturalistic vehicle trajectories on
German highways for validation of highly automated
driving systems. In: 2018 21st international conference on
intelligent transportation systems (ITSC), Maui, HI,
2018, pp.2118–2125. New York, NY: IEEE.
12 Proc IMechE Part D: J Automobile Engineering 00(0)
... The recognition step is to use the original data X 1 to X M are used as the input of the feature attention mechanism, and the feature weighting and screening process is realized by using (5) and (6) as the feature attention rules at the time and space levels. The LSTM is used for feature encoding [23]. After the highdimensional intermediate features obtained by coding, the self-attention mechanism is introduced, and the LSTM output hidden layer vector is self-fitted. ...
Article
Full-text available
To enhance task completion efficiency and quality, the coordination of Unmanned Surface Vehicle (USV) formations in complex environmental situations often requires user intervention. This paper proposes a human-machine collaborative approach for USV mission planning and explores a method for identifying user intervention intentions. A method for recognizing user intention based on intervention style was proposed. The method utilizes the Improved Particle Swarm Optimization-Support Vector Machine (IPSO-SVM) model to recognize intervention style and emphasizes human intention recognition to enhance the ability of USV in complex environments. The method involves modeling continuous intervention operations and incorporating intervention style features to accurately identify user intent. The study proposes a fusion method that combines feature attention, self-attention, and Fusion of Long Short-Term Memory Networks (FLSTMS) to achieve its purpose. Furthermore, it suggests a cooperative mission planning method based on prospect theory, which integrates user risk propensity and identified intentions to optimize planning. Simulation experiments confirm the effectiveness of this approach, highlighting its advantages over traditional methods.
Conference Paper
div class="section abstract"> When the aircraft towing operations are carried out in narrow areas such as the hangars or parking aprons, it has a high safety risk for aircraft that the wingtips may collide with the surrounding aircraft or the airport facility. A real-time trajectory prediction method for the towbarless aircraft taxiing system (TLATS) is proposed to evaluate the collision risk based on image recognition. The Yolov7 module is utilized to detect objects and extract the corresponding features. By obtaining information about the configuration of the airplane wing and obstacles in a narrow region, a Long Short-Term Memory (LSTM) encoder-decoder model is utilized to predict future motion trends. In addition, a video dataset containing the motions of various airplane wings in real traction scenarios is constructed for training and testing. Compared with the conventional methods, the proposed method combines image recognition and trajectory prediction methods to describe the relative positional relationship between the wings and obstacles, which enhances the accuracy of aircraft wing collision prediction during aircraft towing operations. </div
Article
Full-text available
To ensure the reliability of autonomous driving, the system must be capable of potential hazard identification and appropriate response to prevent accidents. This involves the prediction of possible developments in traffic situations and an evaluation of the potential danger of future scenarios. Precise Collision Risk Assessment (CRA) faces complex challenges due to uncertainties inherent in vehicle and road environmental conditions. This paper introduces a new CRA approach, the Multi-Dimensional Uncertainties-CRA (MDU-CRA), which integrates uncertainties related to driver behavior, sensor perception, motion prediction models, and road infrastructure into a comprehensive risk evaluation framework. The estimation of vehicle state is initiated using Extended Kalman Filtering (EKF) to capture uncertainties in sensor perception. Concurrently, a probabilistic motion prediction model based on Gaussian distributions has been developed, which considers the uncertainty in driver behavior. Subsequently, the uncertainty of the road structure is modeled using a truncated Gaussian distribution. Finally, collision risk is quantified as the future probability of collision through heuristic Monte Carlo (MC) sampling. This paper presents the results of two experiments Firstly, our proposed method is demonstrated to outperform the reference neural network-based method in terms of short-term motion prediction accuracy. Secondly, two driving scenarios are extracted and reconstructed from the Next Generation Simulation (NGSIM) dataset for validation and evaluation, i.e., an active lane-change scenario and an emergency braking scenario. In the domain of collision risk assessment, our approach consistently outperforms other evaluation methods. It exhibits the capability to perceive collision risks 2 to 5 seconds in advance, significantly reducing the probability of imminent collision incidents.
Conference Paper
Full-text available
We present CoverNet, a new method for multimodal, probabilistic trajectory prediction for urban driving. Previous work has employed a variety of methods, including multimodal regression, occupancy maps, and 1-step stochastic policies. We instead frame the trajectory prediction problem as classification over a diverse set of trajectories. The size of this set remains manageable due to the limited number of distinct actions that can be taken over a reasonable prediction horizon. We structure the trajectory set to a) ensure a desired level of coverage of the state space, and b) eliminate physically impossible trajectories. By dynamically generating trajectory sets based on the agent's current state, we can further improve our method's efficiency. We demonstrate our approach on public, real world self-driving datasets, and show that it outperforms state-of-the-art methods.
Article
Full-text available
In order to make an accurate prediction of vehicle trajectory in a dynamic environment, a Unidirectional and Bidirectional LSTM (UB-LSTM) vehicle trajectory prediction model combined with behavior recognition is proposed, and then an acceleration trajectory optimization algorithm is proposed. Firstly, the interactive information with the surrounding vehicles is obtained by calculation, then the vehicle behavior recognition model is established by using LSTM, and the vehicle information is input into the behavior recognition model to identify vehicle behavior. Then, the trajectory prediction model is established based on Unidirectional and Bidirectional LSTM, and the identified vehicle behavior and the input information of the behavior recognition model are input into the trajectory prediction model to predict the horizontal and vertical speed and coordinates of the vehicle in the next 3 seconds. Experiments are carried out with NGSIM data sets, and the experimental results show that the mean square error (MSE) between the predicted trajectory and the actual trajectory obtained by this method is 0.124, which is 97.2% lower than that of the method that does not consider vehicle behavior and directly predicts the trajectory. The test loss is 0.000497, which is 95.68% lower than that without considering vehicle behavior. The predicted trajectory is obviously optimized, closer to the actual trajectory, and the performance is more stable.
Article
In order to drive safely in a dynamic environment, autonomous vehicles should be able to predict the future states of traffic participants nearby, especially surrounding vehicles, similar to the capability of predictive driving of human drivers. That is why researchers are devoted to the field of trajectory prediction and propose different methods. This paper is to provide a comprehensive and comparative review of trajectory-prediction methods proposed over the last two decades for autonomous driving. It starts with the problem formulation and algorithm classification. Then, the popular methods based on physics, classic machine learning, deep learning, and reinforcement learning are elaborately introduced and analyzed. Finally, this paper evaluates the performance of each kind of method and outlines potential research directions to guide readers.
Article
Vehicle slip angle (VSA) estimation is of paramount importance for connected automated vehicle dynamic control, especially in critical lateral driving scenarios. In this paper, a novel kinematic-model-based VSA estimation method is proposed by fusing information from a global navigation satellite system (GNSS) and an inertial measurement unit (IMU). First, to reject the gravity components induced by the vehicle roll and pitch, a vehicle attitude angle observer based on the square-root cubature Kalman filter (SCKF) is designed to estimate the roll and pitch. A novel feedback mechanism based on the vehicle intrinsic information (the steering angle and wheel speed) for the pitch and roll is designed. Then, the integration of the reverse smoothing and grey prediction is adopted to compensate for the cumulative velocity errors during the relatively low sampling interval of the GNSS. Moreover, the GNSS signal delay has been addressed by an estimation-prediction integrated framework. Finally, the results confirm that the proposed method can estimate the VSA under both the slalom and double lane change (DLC) scenarios.
Article
Accurate vehicle trajectory prediction can benefit a variety of intelligent transportation system applications ranging from traffic simulations to driver assistance. The need for this ability is pronounced with the emergence of autonomous vehicles as they require the prediction of nearby vehicles’ trajectories to navigate safely and efficiently. Recent studies based on deep learning have greatly improved prediction accuracy. However, one prominent issue of these models is the lack of model explainability. We alleviate this issue by proposing spatiotemporal attention long short-term memory (STA-LSTM), an LSTM model with spatial-temporal attention mechanisms for explainability in vehicle trajectory prediction. STA-LSTM not only achieves comparable prediction performance against other state-of-the-art models but, more importantly, explains the influence of historical trajectories and neighboring vehicles on the target vehicle. We provide in-depth analyses of the learned spatial–temporal attention weights in various highway scenarios based on different vehicle and environment factors, including target vehicle class, target vehicle location, and traffic density. A demonstration illustrating that STA-LSTM can capture and explain fine-grained lane-changing behaviors is also provided. The data and implementation of STA-LSTM can be found at https://github.com/leilin-research/VTP .
Article
Behaviour prediction function of an autonomous vehicle predicts the future states of the nearby vehicles based on the current and past observations of the surrounding environment. This helps enhance their awareness of the imminent hazards. However, conventional behavior prediction solutions are applicable in simple driving scenarios that require short prediction horizons. Most recently, deep learning-based approaches have become popular due to their promising performance in more complex environments compared to the conventional approaches. Motivated by this increased popularity, we provide a comprehensive review of the state-of-the-art of deep learning-based approaches for vehicle behavior prediction in this article. We firstly give an overview of the generic problem of vehicle behavior prediction and discuss its challenges, followed by classification and review of the most recent deep learning-based solutions based on three criteria: input representation, output type, and prediction method. The article also discusses the performance of several well-known solutions, identifies the research gaps in the literature and outlines potential new research directions.
Article
The sideslip angle and attitude are crucial for automated driving especially for chassis integrated control and environmental perception. In this article an inertial measurement unit (IMU)-based automated vehicle body sideslip angle and attitude estimation method aided by low-sample-rate global navigation satellite system (GNSS) velocity and position measurements using parallel adaptive Kalman filters is proposed. This method can estimate the sideslip angle and attitude simultaneously and is robust against the vehicle parameters and road friction even as the vehicle enters critical maneuvers. First, based on the acceleration and angular rate from the six-dimensional inertial measurement unit, the attitude, velocity and position (AVP) are integrated with the navigation coordinates and the AVP error dynamics and observation equations of the integration results are developed. Second, parallel innovation adaptive estimation (IAE)-based Kalman filters is designed to estimate the AVP error of the integration method to address the issues of the GNSS low sampled rate and abnormal measurements. Then the AVP error is forwarded to the AVP integration to compensate the accumulated error. To improve the heading angle estimation accuracy, the heading error is estimated by a decoupled IAE-based Kalman filter aided by GNSS heading. In addition, time synchronization of the IMU and GNSS is realized through hardware based on the pulse per second signal of the GNSS receiver and the spatial synchronization is achieved by a direct compensation method. Lastly, the sideslip angle and attitude estimation method is validated by a comprehensive experimental test including critical double lane change and slalom maneuvers. The results show that the estimation error of the longitudinal velocity and lateral velocity is smaller than 0.1 m/s $({1\sigma })$ , and the estimation error of the sideslip angle is smaller than 0.15° $({1\sigma })$ .