Conference PaperPDF Available

Autonomous Vehicle Trajectory Prediction on Multi-Lane Highways Using Attention Based Model

Authors:

Abstract

The autonomous vehicle anticipates its own behaviour and future trajectory based on the expected trajectories of surrounding vehicles to prevent a potential collision in order to navigate through complex traffic scenarios safely and effectively. The estimated trajectories of surrounding vehicles (target vehicles) are also influenced by past trajectory and positions of its surroundings. In this study, a novel Transformer-based network is used to predict autonomous vehicle trajectory in highway driving. Transformer’s multi-head attention method is employed to capture social-temporal interaction between the target vehicle and its surroundings. The performance of the proposed model is compared with Recurrent Neural Network (RNN) based sequential models, using the NGSIM dataset. The results show that the proposed model predicts 5s long trajectory with 10% lower Root-Mean-Square Error (RMSE) than the RNN-based state-of-the-art model.
Autonomous Vehicle Trajectory Prediction on
Multi-Lane Highways Using Attention based Model
1st Omveer Sharma
School of Electrical Sciences
Indian Institute of Technology
Bhubaneswar, India
os10@iitbbs.ac.in
2nd N. C. Sahoo
School of Electrical Sciences
Indian Institute of Technology
Bhubaneswar, India
ncsahoo@iitbbs.ac.in
3rd Niladri B. Puhan
School of Electrical Sciences
Indian Institute of Technology
Bhubaneswar, India
nbpuhan@iitbbs.ac.in
Abstract—The autonomous vehicle anticipates its own be-
haviour and future trajectory based on the expected trajectories
of surrounding vehicles to prevent a potential collision in order
to navigate through complex traffic scenarios safely and effec-
tively. The estimated trajectories of surrounding vehicles (target
vehicles) are also influenced by past trajectory and positions
of its surroundings. In this study, a novel Transformer-based
network is used to predict autonomous vehicle trajectory in
highway driving. Transformer’s multi-head attention method is
employed to capture social-temporal interaction between the
target vehicle and its surroundings. The performance of the
proposed model is compared with Recurrent Neural Network
(RNN) based sequential models, using the NGSIM dataset. The
results show that the proposed model predicts 5s long trajectory
with 10% lower Root-Mean-Square Error (RMSE) than the
RNN-based state-of-the-art model.
Index Terms—Trajectory prediction, Transformer, Intelligent
vehicle, Sequential network, Autonomous Vehicle.
I. INTRODUCTION
Trajectory prediction is an indispensable task for an au-
tonomous vehicle (AV) in complex and autonomous driving
scenarios. An AV plans its own trajectory after anticipating
the future trajectory of surrounding vehicles, followed by an
advanced driving assistance system (ADAS) [1–4]. AV (black)
is surrounded by red target vehicles whose trajectories must
be determined as shown in Fig. 1. Further, a target vehicle’s
previous track and the positions of its surrounding vehicles
affect its future trajectory. The phrase ‘surrounding vehicles’
is used throughout the rest of the article to refer to the target
vehicle’s immediate neighbouring vehicles.
The trajectory predictors can be split into two major cat-
egories: model-based and data-driven approaches. A model-
based prediction strategy can forecast a vehicle’s trajectory
using kinematics or statistical model. Trajectory estimation
has been done using model-based techniques such as constant
acceleration model (CA), constant velocity model (CV), con-
stant yaw rate & acceleration motion model, Kalman filter,
hidden Markov model (HMM), and Gaussian mixture model
(GMM) [5–9]. Most model-based techniques can only predict
short-term trajectories. These models could result in prediction
errors if there is even a little difference between the driver’s
actual and predicted behaviour. The limitations of model-based
979-8-3503-1997-2/23/$31.00 ©2023 IEEE
Fig. 1: An autonomous vehicle, positioned in the center, is
predicting the future paths of nearby vehicles.
prediction techniques have been significantly overcome by
data-driven prediction techniques.
For trajectory prediction, several RNN based models have
been proposed [10–13]. Gated Recurrent Units (GRUs) were
incorporated into a generative adversarial imitation learning
trajectory prediction model in [11]. To incorporate informa-
tion from several agents, the Social Generative Adversarial
Networks (S-GAN) model uses both GAN and a recurrent
sequence-to-sequence model in [14]. The research mentioned
above successfully illustrates temporal correlation in terms
of individual motion states, however, it is equally crucial
to consider the spatial relationships between vehicles. These
relationships significantly influence the trajectory of the target
vehicle.
In [15], the spatiotemporal attention long short-term mem-
ory (STA-LSTM) framework is proposed to predict the target
vehicle’s future trajectory. The model, however, only works
well when forecasting future 1 second long trajectories. A
novel approach is introduced through the Attention-based
LSTM encoder-decoder (LSTM-ED) frameworks, which effec-
tively correlate the time dimension and space dimension [16–
18]. Their primary purpose is to generate a spatial-temporal
navigation map. Instead of establishing vehicle spatial rela-
tionships, the authors estimated the target vehicle’s future
behaviour from past traffic participant tracks to predict the
future trajectory [19–21]. Shi et al. [20] introduced a novel
attention network that combines temporal convolution neural
2023 IEEE 3rd International Conference on Sustainable Energy and Future Electric Transportation (SEFET) | 979-8-3503-1997-2/23/$31.00 ©2023 IEEE | DOI: 10.1109/SEFET57834.2023.10245038
Authorized licensed use limited to: University Haifa. Downloaded on December 24,2023 at 07:41:44 UTC from IEEE Xplore. Restrictions apply.
network (TCN) and bi-directional long-short term memory
(Bi-LSTM). This network aims to accurately predict lane
keep (LK) and lane change (LC) behavior, along with future
trajectories.
Occupancy grids are utilized in most spatial-temporal
attention-based frameworks [22, 23]. Individual sequential
models (encoders) establish temporal correlation and extract
temporal-dependent sequences to provide spatiotemporal at-
tention. Lastly, a sequential model-based decoder predicts
the target vehicle trajectory utilising the spatial-temporal envi-
ronment. Complex networks with many encoders make trajec-
tory prediction slow and RNN-based encoders propagate input
data sequentially, limiting parallel operation. Gradient vanish-
ing in RNN-based models decreases trajectory prediction accu-
racy and instability over extended sequences. A Transformer-
based encoder-decoder architecture addresses gradient vanish-
ing and computation time issues by accepting the complete
input sequence, unlike RNN-based models. Transformer (TF)
models are popular for sequence-to-sequence learning issues
[24–28]. However, vehicle trajectory prediction on multi-lane
highways has not been investigated using TF-based modelling.
In order to forecast future trajectory utilising a short seg-
ment of tracking data, this work introduces a novel pure
attention-based spatial-temporal attention framework (STA-
TF). The main technological contributions are summarized
below:
1) To assess the influence of surrounding vehicle trajec-
tories on the target vehicle, robust multi-head attention
mechanisms are employed.
2) The proposed model leverages the advantages of TF to
achieve faster performance compared to existing sequen-
tial networks such as LSTM, GRU, and Bi-LSTM.
3) The trajectory prediction issue has been addressed by
adaptation, customisation, and establishment of the pow-
erful Transformer model.
4) A real-world NGSIM dataset is utilized to evaluate the
potential of the proposed model in predicting vehicle
trajectories in highway driving scenarios. The results
demonstrate that the proposed model outperforms ex-
isting RNN-based models.
The structure of this paper is as follows: Section II introduces
the formulation of the trajectory prediction task. The network
architecture of the proposed model is explained in Section
III. Section IV presents the experimental results, and finally,
Section V concludes the work.
II. PRO BL EM F OR MU LATI ON
The proposed model’s goal is to forecast the target ve-
hicle’s future trajectory using the most recent tracks of the
target and its surrounding vehicles as of the observation time
(tobs). While driving on a highway, various driving styles are
exhibited by drivers, each of which significantly influences
the future trajectory of the vehicle. This diversity of driving
styles adds complexity to the task of accurately predicting the
vehicle’s trajectory.
Fig. 2: Target and surrounding vehicles in modified frame of
reference.
A. Inputs and outputs
The proposed model’s input consists of the the previous
tracks of the target (T) and its six immediate neighbours
[19], including the three vehicles that preceded it on the
left, current, and right lanes (PLL, PCL, and PRL), as well
as the additional three vehicles that followed (FLL, FCL,
and FRL), as shown in Fig. 2. The past trajectory of a sur-
rounding vehicle i {P LL, F LL, P C L, F CL, P RL, F RL}
is defined as Si=Xi
tobsLin +1, X i
tobsLin +2, . . . , X i
tobs ,
where Lin is input sequence length and Xi
t=xi
t, yi
t
is positional vector. The target vehicle’s historical track is
specified as ST=XT
tobsLin +1, X T
tobsLin +2, . . . , X T
tobs ,
where XT
t=xT
t, yT
t, vT
t, αT
t, classis feature vector, vT
t
is target vehicle’s velocity, αT
tis target vehicle’s acceleration,
and class is type of target vehicle (bike, car or truck). These
past vehicle trajectories of the target and its surrounding
vehicles are fed into the proposed model as input. The model
predicts the target vehicle’s future trajectory (positional feature
vectors). This can be stated as follows:
OT=YT
tobs+1 , Y T
tobs+2 , . . . , Y T
tobs+Lout (1)
where YT
t=xT
t, yT
tare the predicted future coordinates
for the target vehicle.
B. Frame of reference
At the observation time tobs, a stationary frame of reference
is established with the target vehicle serving as the origin. In
this frame, the y-axis represents forward motion, while the x-
axis is perpendicular to it, as illustrated in Fig. 2. Because of
this technique, the proposed model is unaffected by vehicle
track generation [19].
III. NET WORK ARCHITECTURE
Encoder-decoder Transformer models potentially overcome
RNN-based models’ difficulties, as described in the intro-
duction. The proposed model (STA-TF) overcomes gradient
vanishing constraints by processing the entire input sequence
with a TF encoder layer. TF encoder priorities input segments
by using Multi-head attention (MHA) mechanism. In order to
make precise assumptions about vehicle motions, it is crucial
to have a thorough understanding of the interactions and rela-
tionships between traffic participants on the road. Therefore,
the proposed model architecture is divided into three key
Authorized licensed use limited to: University Haifa. Downloaded on December 24,2023 at 07:41:44 UTC from IEEE Xplore. Restrictions apply.
Fig. 3: Proposed model architecture
components: (1) Spatial attention network, (2) Encoder, and
(3) Decoder. These components are illustrated in Fig. 3. MHA
satisfies the specific requirements of each of the three model
components.
A. Multi-head Attention (MHA) Mechanism
Attention correlates a sequence’s numerous locations in
order to identify the hidden representation of the sequence.
The query, key, and value concepts from the information
retrieval approach are used to do this. Attention, in particular,
produces a weighted sum of all values, where the key and
the queries decide the weights. As an illustration, consider
Q(RLin×dq) to be the query matrix composed of the d-
dimensional query vectors corresponding to various positions
in the Lin-length sequence. Similar to Q,K(RLin ×dk) and
V(RLin×dv) represents the key-value pair for various points
in the sequence, where query, key, and value vector dimensions
match (dq=dk=dv). The attention weights, denoted
as WA=sof tmax QKT/dk, are computed using the
matrices Qand K. Subsequently, the scaled dot product can be
calculated using Eq. 2. In their work [24], the authors propose
linearly projecting the matrices Q,K, and Vmultiple times
(referred to as htimes or heads) to parallelize the scaled dot
product attention for each head, a technique known as“Multi-
head attention” (MHA). This approach allows the model to
jointly attend to multiple representation subspaces. Finally, Eq.
Fig. 4: Multi-head attention mechanism (MHA) [24].
3 is employed to combine the outputs from the various heads.
Attention(Q, K, V ) = sof tmax QKT
dkV(2)
MultiHead(Q, K, V ) = Concat(head1, ..., headh)Wo(3)
Fig. 4 shows the scaled dot product attention and MHA pro-
cedure. In MHA, dimension of key vector dk(dmodel/heads)
is calculated based on the dimension of model (dmodel = 128)
Authorized licensed use limited to: University Haifa. Downloaded on December 24,2023 at 07:41:44 UTC from IEEE Xplore. Restrictions apply.
and number of heads (h= 8). It should be noticed that the
attention weight matrix has a dimension of Lin×Lin . In this
weight matrix, element WA
fg indicates the attention between
fth position of Q(feature vectors of matrix Qat time instant
f) with gth position of K(feature vectors of matrix Kat
time instant g). Thus, a weighted correlation between all-time
instants (all positions of Qand K) can therefore be inferred
from this attention weight matrix.
B. Spatial attention network
A spatial attention network is used to establish the relation-
ship of six surrounding vehicles with the target vehicles. In this
network, traffic participants’ seven (target and six surrounding
vehicles) positional trajectories (Siand ST) are used as input
and fed to individual feed-forward layers. In the absence of any
surrounding vehicle, the position of the vehicle is represented
as a two-dimensional zero vector. For six surrounding vehicles,
six MHA layers are used to establish a correlation between
the surrounding vehicle trajectory with the past target vehicle
trajectory. The output of six feed-forward layers (Oi
S,F F1
RLin×dmodel ) serves as Kand Vpair matrices and the output
of the target vehicle trajectory (OT
S,F F1RLin×dmodel ) serves
as matrix Qfor six MHA layers. The outputs of MHA layers
Oi
S,MH A RLin×dmodel (representing the correlation of the
surrounding vehicle ith with the target vehicle) and OT
S,F F1
are concatenated and passed threw another feed-forward layer
to adjust the dimension of output OS,F F2RLin×dmodel .
C. Encoder layer
The proposed model encoder determines temporal cor-
relation in the input sequence (output of spatial attention
network). Encoder layer of vanilla Transformer network is
adopted to perform this task [24]. To incorporate temporal
information and leverage sequential correlations between time
steps, the encoder receives the output (OS,FF2) from the
spatial attention network. This output is then passed through
layers of positional encoding. The positional encoding layer
utilizes both sine and cosine functions. The resulting output
of the positional encoding layer (OEncoder
pos RLin×dmodel ) is
calculated as follows:
OEncoder
pos =OS,F F2+P E (4)
where the positional encoding coefficient matrix is represented
by P E. The output is then transmitted to the encoder layer of
the TF after position encoding. The encoder layer is composed
of both an MHA layer and a fully connected feed-forward
network (FFN). The Q,K, and Vinputs for the MHA layer
are provided by OEncoder
pos . The output of the MHA layer
(OEncoder
MH A RLin ×dmodel ) is calculated using Eqs. 2-3. As
mentioned earlier, the attention weights allowed the MHA
layer to establish the hidden temporal relationship in the
input sequence of MHA layer. The output of MHA layer is
processed through Add & Normalization (Norm) as follows:
OEncoder
add &norm1=N orm(OE ncoder
MH A +OEncoder
pos )(5)
This output OEncoder
add &norm1RLin×dmodel goes to the FFN,
which executes the following linear transformations over var-
ious positions:
OEncoder
FFN =σOEncoder
add &norm1W1+B1W2+B2(6)
The output of FNN OEncoder
FFN RLin×dmodel is processed
through another Add & Normalization (Norm) as follows:
OEncoder
add &norm2=N orm(OE ncoder
FFN +Oadd &norm1)(7)
The next decoder module of the proposed model receives
the output of the encoder (OEncoder =OEncoder
add &norm2
RLin×dmodel ).
D. Decoder layer
Two distinct inter-dependencies are integrated into the de-
coder layer: The first, known as self-attention, is between
decoder input and decoder output whereas the second, known
as encoder-decoder attention, is between encoder output and
decoder output. The target vehicle’s predicted future trajec-
tory is right-shifted (h˜
YT
tobs+1 ,˜
YT
tobs+2 ,..., ˜
YT
tobs+kiat time
t=tobs +k+ 1), combined with encoder output (OEncoder ),
and used as the decoder’s inputs to predict future trajectory
(ODecoder
tobs+1 , ODecoder
tobs+2 , . . . , ODecoder
tobs+k, ODecoder
tobs+k+1 ), as shown
in Fig. 3. Performance of the decoder is enhanced by the
regressive (feedback) technique, which uses the previously
predicted trajectory as input. A sosinput (a zeros vector
with a size of 2) is supplied to the decoder as input to start
processing since there is no projected trajectory at time tobs
(or k= 0).
Similar to the encoder, the decoder’s input (right-shifted
decoder output) features are converted to high-dimensional
space using a fully connected layer before being sent via a
layer of positional encoding in accordance with the Eq. 4.
The first MHA layer for self-attention receives the output
of the positional encoding layer as its input, and the output
(ODecoder
MH A1RLout ×dmodel ) is calculated using Eqs. 2-3. The
matrices Q,K, and Vare constructed by only using decoder
input, hence the first MHA layer can draw out self-attention
from this decoder input. In the Second MHA layer, matrix
Qcomes from the first MHA layer (a hidden correlation in
decoder input sequence) and the Kand Vpair comes from en-
coder output (OEncoder ), thus attention is computed between
these two input signals (encoder output and decoder input).
The output of second MHA layer (ODecoder
MH A2RLout ×dmodel )
is fed to Add & Normalization and FNN of decoder, and output
of decoder (ODecoder RLout×2) is calculated as per Eqs. 5-
7.
Iteratively decreasing learning rate from 0.00001 to
0.000001 with a 64-batch size trains the proposed model.
Modified teacher forcing trains proposed model with full
teacher forcing for the first 10 epochs. The teacher forcing
factor is then incrementally decreased until it equals zero.
The model parameters are trained using the Root-Mean-Square
Error (RMSE) as the loss function. These studies use a desktop
computer with an Intel(R) Xeon(R) Processor E5-2643 V4.
Authorized licensed use limited to: University Haifa. Downloaded on December 24,2023 at 07:41:44 UTC from IEEE Xplore. Restrictions apply.
IV. RESULTS AND DISCUSSION
The performance of the proposed model is evaluated and
compared with baseline models after describing the research,
revealing its superior performance in trajectory prediction over
state-of-the-art models.
A. Dataset and its preprocessing
The evaluation of the proposed model is conducted using
the well-known NGSIM dataset, which comprises data from
the southbound US 101 highway in Los Angeles and the
Intersection 80 highway section in Emeryville, California. This
particular highway segment consists of six lanes, including five
motorway lanes and one auxiliary lane, spanning from the on-
ramp to the off-ramp [29]. Only lane keep and discretionary
lane changing (DLC) trajectories are taken into consideration
for vehicles travelling in lanes 2, 3, 4, and 5 throughout this
process. To decrease the complexity and computation time of
the proposed model, all trajectories are down-sampled from a
rate of 10 Hz to 5 Hz. Each trajectory is broken into segments
that last 8s. The vehicle’s historical track is determined by a 3s
long trajectory segment, and the subsequent 5s long trajectory
segments are predicted by the proposed model. The proposed
model is trained using 80% of the trajectory segments, and
the remaining 20% is dedicated to testing its performance.
B. Evaluation metric
The effectiveness of the proposed model is evaluated using
metrics such as RMSE, Mean-Absolute Error (MAE), and
Mean-Square Error (MSE), represented mathematically by
Eqns. 8 to 10.
RMSE =v
u
u
t
1
tpred
tobs+tpred
X
t=tobs+1 ˜
YT
tYT
t2(8)
MAE =1
tpred
tobs+tpred
X
t=tobs+1
˜
YT
tYT
t
(9)
MSE =1
tpred
tobs+tpred
X
t=tobs+1 ˜
YT
tYT
t2(10)
where ˜
YT
tand YT
trepresent the projected position and
actual location of the target vehicle (T) at the prediction
timestamp (t), respectively (here 5 timestamps for 5s).
C. Proposed model performance
Table I demonstrates the effectiveness of the proposed
model for prediction horizons up to 5s along with the worst
5% and 1% of prediction errors. The predictive capability
of the proposed model is evident from its achievement of
RMSE values of 0.46m, 0.80m, 1.19m, 1.67m, and 2.24m for
trajectory predictions at 1s, 2s, 3s, 4s, and 5s respectively.
Similarly, the model demonstrates MSE values of 0.25m,
0.72m, 1.58m, 3.10m, and 5.61m for trajectory lengths of
1s, 2s, 3s, 4s, and 5s respectively. Furthermore, the proposed
model achieves MAE values of 0.38m, 0.63m, 0.91m, 1.23m,
TABLE I: RMSE, MSE and MAE in Proposed model predic-
tion.
Evaluation
metric
Prediction horizon (s)
1s 2s 3s 4s 5s
RMSE (m)
All 0.46 0.80 1.19 1.67 2.24
Worst 5% 0.80 1.78 3.07 4.62 6.32
Worst 1% 1.19 2.68 4.55 6.68 8.99
MSE (m)
All 0.25 0.72 1.58 3.10 5.61
Worst 5% 1.38 4.59 11.34 23.65 43.15
Worst 1% 3.22 10.28 24.29 48.55 85.61
MAE (m)
All 0.38 0.63 0.91 1.23 1.61
Worst 5% 0.88 1.70 2.76 4.02 5.40
Worst 1% 1.31 2.55 4.10 5.85 7.76
Lateral
Error (m)
All 0.11 0.16 0.20 0.23 0.26
Worst 5% 0.11 0.16 0.21 0.25 0.29
Worst 1% 0.15 0.23 0.30 0.35 0.40
Longitudinal
Error (m)
All 0.45 0.78 1.18 1.65 2.23
Worst 5% 0.78 1.77 3.06 4.61 6.32
Worst 1% 1.17 2.65 4.53 6.68 8.99
0 0.5 1 1.5 2 2.5
RMSE (m)
1
2
3
4
5
Prediction horizon (s)
Lateral
Longitudinal
Total
Fig. 5: Lateral, longitudinal and total error (RMSE) of pro-
posed model.
and 1.61m for trajectory lengths of 1s, 2s, 3s, 4s, and 5s
respectively, further highlighting its predictive accuracy. The
proposed model’s robustness is demonstrated by the fact that
prediction errors for worst-case scenarios are noticeably lower.
The longitudinal and lateral prediction RMSE error of the
proposed models is also shown in Fig. 5.
D. Comparative Analysis
In this section, the performance of the proposed trajectory
prediction model is evaluated and compared against state-of-
the-art models. To demonstrate the model’s effectiveness, the
same experimental settings and evaluation metric are adopted
as [16, 21]. The following models are trained and evaluated
on NGSIM dataset under the same experimental conditions to
compare the proposed model.
Vanilla LSTM (V-LSTM) [10] : The LSTM network takes
the raw input trajectories of the target and surrounding
vehicles as inputs. By employing an LSTM model, future
trajectory predictions are made as point estimates.
Bi-LSTM [13]: The bi-directional LSTM network re-
ceives the raw input trajectories as its input.
TCN [20]: The TCN network receives the raw input
trajectories as its input.
Authorized licensed use limited to: University Haifa. Downloaded on December 24,2023 at 07:41:44 UTC from IEEE Xplore. Restrictions apply.
TABLE II: Analyzing the Comparative Performance of the
Proposed Model against State-of-the-Art Models. The best
result is indicated in the bold face.
Evaluation
metric
Models Prediction horizon (s)
1s 2s 3s 4s 5s
RMSE (m)
V-LSTM [10] 0.9 1.29 1.71 2.2 2.74
Bi-LSTM [13] 0.77 1.09 1.51 2.03 2.63
TCN [20] 0.71 1.09 1.52 2.01 2.55
LSTM-ED [18] 0.53 0.91 1.36 1.9 2.56
TCN-LSTM 0.56 0.91 1.34 1.86 2.49
STA-LSTM [16] 0.49 0.86 1.31 1.85 2.51
MHA-LSTM [22] 0.48 0.85 1.28 1.81 2.47
V-TF 0.47 0.82 1.24 1.73 2.35
Proposed (STA-TF) 0.46 0.8 1.19 1.67 2.24
MSE (m)
V-LSTM [10] 0.93 1.81 3.15 5.15 8
Bi-LSTM [13] 0.66 1.29 2.45 4.41 7.38
TCN [20] 0.58 1.33 2.59 4.5 7.26
LSTM-ED [18] 0.33 0.92 2.03 3.99 7.26
TCN-LSTM 0.36 0.95 2.03 3.88 6.94
STA-LSTM [16] 0.3 0.84 1.9 3.8 6.94
MHA-LSTM [22] 0.27 0.79 1.78 3.54 6.59
V-TF 0.25 0.76 1.7 3.37 6.21
Proposed (STA-TF) 0.25 0.72 1.58 3.1 5.61
MAE (m)
V-LSTM [10] 0.72 1 1.31 1.64 2.01
Bi-LSTM [13] 0.64 0.88 1.17 1.51 1.9
TCN [20] 0.61 0.89 1.19 1.53 1.9
LSTM-ED [18] 0.47 0.74 1.05 1.42 1.84
TCN-LSTM 0.49 0.75 1.05 1.39 1.8
STA-LSTM [16] 0.46 0.7 1.02 1.38 1.78
MHA-LSTM [22] 0.38 0.64 0.94 1.3 1.72
V-TF 0.39 0.64 0.94 1.28 1.68
Proposed (STA-TF) 0.38 0.63 0.91 1.23 1.61
LSTM-ED [18]: LSTM-based encoder-decoder model is
used in which raw input trajectories are fed to LSTM-
based encoder and decoder is predicting future trajectory.
TCN-LSTM: A encoder-decoder network is constructed,
where TCN-based encoder and LSTM-based decoder is
used.
STA-LSTM [16]: Spatio-temporal attention-based LSTM
encoder-decoder network is used to vehicle trajectory
prediction.
MHA-LSTM [22]: To extract spatial attention between
vehicles, an encoder-decoder network based on spatial
attention employs LSTM.
V-TF: The spatial attention network is excluded from
the proposed model, and therefore, only the raw input
trajectories of the target and surrounding vehicles are
directly fed into the vanilla Transformer network (V-TF).
The quantitative experimental results are summarised in Table
II and are shown in Fig. 6. The results show that the proposed
model has utilised the spatial attention network to establish
the hidden relationship between traffic participants, which has
resulted in a small trajectory prediction error. The proposed
method successfully predicts future trajectory with a 2.24m
RMSE for a 5s long prediction horizon, which is 10% less than
the state-of-the-art models [16, 22]. In congested traffic, where
several vehicles occupy the same drivable area, trajectory
prediction must account for surrounding vehicle correlation.
Irrespective of the prediction horizon, the proposed model
demonstrates superior performance compared to state-of-the-
art models. Short-term predictions primarily rely on recent
1 1.5 2 2.5 3 3.5 4 4.5 5
Prediction Horizon (s)
0.5
1
1.5
2
2.5
RMSE (m)
V-LSTM [10]
Bi-LSTM [13]
LSTM-ED [18]
MHA-LSTM [22]
V-TF
STA-TF
1 1.5 2
0.4
0.6
0.8
1
1.2
Fig. 6: Comparing the proposed model with other models
based on RMSE.
TABLE III: Computing time comparisons among models.
Computation time Proposed V-TF STA-LSTM MHA-LSTM
STA-TF [16] [22]
(millisecond) 3.9 3.1 6.8 10.0
vehicle dynamics, whereas long-term predictions are more in-
fluenced by correlation information. Gradient vanishing, which
impacts longer forecasts, did not affect the Transformer’s
memory mechanism. Hence, the proposed model (STA-TF)
predicts future trajectories more efficiently than current state-
of-the-art models due to correlation (spatial attention network)
and Transformer-based architecture.
During deployment, the model’s calculation time com-
plexity is examined . The computation time of proposed
model is compared with LSTM-based similar social contex-
tual attention-based state-of-the-art models [16, 22]. Table III
shows the models’ computation times. The proposed model
predicts a 5s trajectory 43% and 61% faster than LSTM
[16] and social contextual attention-based models [22], respec-
tively. The TF-based model offers the advantage of processing
the entire input sequence simultaneously, resulting in faster
computation compared to RNN-based models. However, it is
important to note that the TF-based model requires a longer
training time. The performance evaluation of the proposed
model encompasses both lane keeping (LK) and lane change
trajectories. Furthermore, the analysis of lane change trajecto-
ries is further divided into two types: lane change to the left
(LCL) and lane change to the right (LCR). The subsequent
section provides a comprehensive analysis of the proposed
model’s performance in relation to these lateral behaviors
(LCL, LCR, and LK).
E. Proposed model’s performance on lane keep and lane
change trajectories
In this section, the investigation focuses on trajectory pre-
diction errors related to lateral behaviours. Table IV presents
the prediction errors for each behaviour. Across all behaviours,
Authorized licensed use limited to: University Haifa. Downloaded on December 24,2023 at 07:41:44 UTC from IEEE Xplore. Restrictions apply.
1 1.5 2 2.5 3 3.5 4 4.5 5
Prediction Horizon (s)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Lateral Error (m)
LCL
CLR
LK
Overall
(a)
1 1.5 2 2.5 3 3.5 4 4.5 5
Prediction Horizon (s)
0
0.5
1
1.5
2
2.5
3
Longitudinal Error (m)
LCL
LCR
LK
Overall
(b)
1 1.5 2 2.5 3 3.5 4 4.5 5
Prediction Horizon (s)
0
0.5
1
1.5
2
2.5
3
RMSE (m)
LCL
LCR
LK
Overall
(c)
1 1.5 2 2.5 3 3.5 4 4.5 5
Prediction Horizon (s)
0
2
4
6
8
10
MSE (m)
LCL
LCR
LK
Overall
(d)
1 1.5 2 2.5 3 3.5 4 4.5 5
Prediction Horizon (s)
0
0.5
1
1.5
2
2.5
MAE (m)
LCL
LCR
LK
Overall
(e)
Fig. 7: (a) Lateral, (b) longitudinal and total error ((c) RMSE,
(d) MSE and (e) MAE) of proposed model.
TABLE IV: The RMSE, MSE, and MAE for lateral and
longitudinal behaviours.
Prediction Horizon (s)
12345
Lateral
Error (m)
Lateral
Behaviours
LCL 0.33 0.46 0.52 0.56 0.60
LCR 0.35 0.50 0.59 0.66 0.71
LK 0.08 0.12 0.15 0.19 0.22
Overall 0.11 0.16 0.20 0.23 0.26
Longitudinal
Error (m)
Lateral
Behaviours
LCL 0.65 1.07 1.53 2.08 2.75
LCR 0.72 1.16 1.67 2.28 3
LK 0.43 0.76 1.15 1.62 2.19
Overall 0.45 0.78 1.18 1.65 2.23
RMSE
(m)
Lateral
Behaviours
LCL 0.73 1.17 1.62 2.16 2.81
LCR 0.80 1.27 1.77 2.37 3.08
LK 0.44 0.77 1.15 1.63 2.2
Overall 0.46 0.80 1.19 1.67 2.24
MSE
(m)
Lateral
Behaviours
LCL 0.61 1.51 2.84 4.99 8.41
LCR 0.69 1.69 3.32 6 10.16
LK 0.22 0.66 1.47 2.93 5.34
Overall 0.25 0.72 1.58 3.1 5.61
MAE
(m)
Lateral
Behaviours
LCL 0.71 1.09 1.42 1.79 2.22
LCR 0.77 1.17 1.56 1.99 2.47
LK 0.36 0.59 0.86 1.18 1.55
Overall 0.38 0.63 0.91 1.23 1.61
the lateral directional error consistently appears smaller than
the longitudinal directional error, with a range of 0.22m to
0.71m for 5-second long predictions. Notably, both LCL and
LCR trajectories exhibit high lateral directional errors, measur-
ing 0.6m and 0.71m, respectively. Similarly, the longitudinal
directional errors are also high for LCL and LCR trajectories
measuring 2.75m and 3m, respectively.
It should be noted that the highest trajectory error (RMSE)
is observed in lane change (LCL and LCR) related trajectories,
indicating the significant impact of longitudinal error on the
overall error, as depicted in Table IV. A similar observation
can be drawn from the MSE and MAE for lateral behaviour-
based trajectories (LCL, LCR, and LK). Fig. 7 displays the
lateral error, longitudinal error, RMSE, MSE, and MAE of
the proposed model for lateral behaviours.
V. CONCLUSION
This research proposes a novel vehicle trajectory predic-
tion model using raw trajectory data of the target and its
surrounding vehicles. The proposed model has three sub-
modules: Spatial attention network, encoder, and decoder. The
Spatial attention network established the hidden relationship
between the target and surrounding vehicles. Tracking modules
(encoder and decoder) use spatial attention network output for
trajectory prediction. Finally, thorough quantitative and quali-
tative experiments on the publicly available NGSIM dataset
show that the proposed model outperforms state-of-the-art
methods in long-range trajectory prediction and is comparable
in short-term prediction. Since trajectory analysis is important
to pedestrian trajectory prediction, it would be interesting to
modify the proposed model for pedestrian trajectory prediction
in future work. Further, implementing the proposed methods
for highway risk/collision estimation is also a promising
direction.
Authorized licensed use limited to: University Haifa. Downloaded on December 24,2023 at 07:41:44 UTC from IEEE Xplore. Restrictions apply.
ACKNOWLEDGMENT
The research grant for the project ”Driver Behavior Mod-
elling for Autonomous Driving” has been provided by KPIT
Technologies Pvt. Ltd., Bangalore, India, offering valuable
support to this work.
REFERENCES
[1] G. S. Aoude, V. R. Desaraju, L. H. Stephens, and J. P. How, “Driver
behavior classification at intersections and validation on large naturalistic
data set,” IEEE Transactions on Intelligent Transportation Systems,
vol. 13, no. 2, pp. 724–736, 2012.
[2] O. Sharma, N. C. Sahoo, and N. B. Puhan, “Recent advances in
motion and behavior planning techniques for software architecture of
autonomous vehicles: A state-of-the-art survey,” Engineering applica-
tions of artificial intelligence, vol. 101, p. 104211, 2021.
[3] M. Br¨
annstr¨
om, E. Coelingh, and J. Sj¨
oberg, “Model-based threat
assessment for avoiding arbitrary vehicle collisions, IEEE Transactions
on Intelligent Transportation Systems, vol. 11, no. 3, pp. 658–669, 2010.
[4] O. Sharma, N. C. Sahoo, and N. B. Puhan, “A survey on smooth path
generation techniques for nonholonomic autonomous vehicle systems,”
in IECON 2019 - 45th Annual Conference of the IEEE Industrial
Electronics Society. IEEE, 2019, pp. 5167–5172.
[5] A. Houenou, P. Bonnifait, V. Cherfaoui, and W. Yao, “Vehicle trajectory
prediction based on motion model and maneuver recognition,” in 2013
IEEE/RSJ international conference on intelligent robots and systems.
IEEE, 2013, pp. 4363–4369.
[6] S. Qiao, D. Shen, X. Wang, N. Han, and W. Zhu, “A self-adaptive param-
eter selection trajectory prediction approach via hidden markov models,
IEEE Transactions on Intelligent Transportation Systems, vol. 16, no. 1,
pp. 284–296, 2014.
[7] O. Sharma, N. C. Sahoo, and N. B. Puhan, “Highway discretionary
lane changing behavior recognition using continuous and discrete hidden
markov model, in 2021 IEEE International Intelligent Transportation
Systems Conference (ITSC). IEEE, 2021, pp. 1476–1481.
[8] M. Treiber, A. Hennecke, and D. Helbing, “Congested traffic states in
empirical observations and microscopic simulations, Physical review E,
vol. 62, no. 2, p. 1805, 2000.
[9] N. Deo, A. Rangesh, and M. M. Trivedi, “How would surround vehicles
move? a unified framework for maneuver classification and motion
prediction,” IEEE Transactions on Intelligent Vehicles, vol. 3, no. 2,
pp. 129–140, 2018.
[10] F. Altch´
e and A. de La Fortelle, An lstm network for highway trajectory
prediction,” in 2017 IEEE 20th international conference on intelligent
transportation systems (ITSC). IEEE, 2017, pp. 353–359.
[11] A. Kuefler, J. Morton, T. Wheeler, and M. Kochenderfer, “Imitating
driver behavior with generative adversarial networks,” in 2017 IEEE
Intelligent Vehicles Symposium (IV). IEEE, 2017, pp. 204–211.
[12] G. Xie, A. Shangguan, R. Fei, W. Ji, W. Ma, and X. Hei, “Motion
trajectory prediction based on a cnn-lstm sequential model,” Science
China Information Sciences, vol. 63, no. 11, pp. 1–21, 2020.
[13] M. Abdalla, A. Hendawi, H. M. Mokhtar, N. Elgamal, J. Krumm, and
M. Ali, deepmotions: A deep learning system for path prediction
using similar motions,” IEEE Access, vol. 8, pp. 23881–23 894, 2020.
[14] A. Gupta, J. Johnson, L. Fei-Fei, S. Savarese, and A. Alahi, “Social gan:
Socially acceptable trajectories with generative adversarial networks,
in Proceedings of the IEEE conference on computer vision and pattern
recognition, 2018, pp. 2255–2264.
[15] L. Lin, W. Li, H. Bi, and L. Qin, “Vehicle trajectory prediction using
lstms with spatial–temporal attention mechanisms,” IEEE Intelligent
Transportation Systems Magazine, vol. 14, no. 2, pp. 197–208, 2021.
[16] M. Fu, T. Zhang, W. Song, Y. Yang, and M. Wang, “Trajectory
prediction-based local spatio-temporal navigation map for autonomous
driving in dynamic highway environments, IEEE Transactions on
Intelligent Transportation Systems, 2021.
[17] H. Kim, D. Kim, G. Kim, J. Cho, and K. Huh, “Multi-head attention
based probabilistic vehicle trajectory prediction,” in 2020 IEEE Intelli-
gent Vehicles Symposium (IV). IEEE, 2020, pp. 1720–1725.
[18] M. Khakzar, A. Rakotonirainy, A. Bond, and S. G. Dehkordi, “A dual
learning model for vehicle trajectory prediction,” IEEE Access, vol. 8,
pp. 21 897–21 908, 2020.
[19] N. Deo and M. M. Trivedi, “Multi-modal trajectory prediction of sur-
rounding vehicles with maneuver based lstms, in 2018 IEEE Intelligent
Vehicles Symposium (IV). IEEE, 2018, pp. 1179–1184.
[20] K. Shi, Y. Wu, H. Shi, Y. Zhou, and B. Ran, “An integrated car-
following and lane changing vehicle trajectory prediction algorithm
based on a deep neural network,” Physica A: Statistical Mechanics and
its Applications, vol. 599, p. 127303, 2022.
[21] N. Deo and M. M. Trivedi, “Convolutional social pooling for vehicle
trajectory prediction,” in Proceedings of the IEEE Conference on Com-
puter Vision and Pattern Recognition Workshops, 2018, pp. 1468–1476.
[22] K. Messaoud, I. Yahiaoui, A. Verroust, and F. Nashashibi, Attention
based vehicle trajectory prediction,” IEEE Transactions on Intelligent
Vehicles, vol. 6, no. 1, pp. 175–185, 2020.
[23] K. Messaoud, I. Yahiaoui, A. Verroust-Blondet, and F. Nashashibi,
“Non-local social pooling for vehicle trajectory prediction,” in 2019
IEEE Intelligent Vehicles Symposium (IV). IEEE, 2019, pp. 975–980.
[24] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez,
Ł. Kaiser, and I. Polosukhin, Attention is all you need,” in Advances
in neural information processing systems, 2017, pp. 5998–6008.
[25] F. Giuliari, I. Hasan, M. Cristani, and F. Galasso, “Transformer networks
for trajectory forecasting,” in 2020 25th International Conference on
Pattern Recognition (ICPR). IEEE, 2021, pp. 10 335–10 342.
[26] Y. Liu, J. Zhang, L. Fang, Q. Jiang, and B. Zhou, “Multimodal motion
prediction with stacked transformers,” in Proceedings of the IEEE/CVF
Conference on Computer Vision and Pattern Recognition, 2021, pp.
7577–7586.
[27] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training
of deep bidirectional transformers for language understanding,” arXiv
preprint arXiv:1810.04805, 2018.
[28] O. Sharma, N. Sahoo, and N. B. Puhan, “Kernelized convolutional trans-
former network based driver behavior estimation for conflict resolution
at unsignalized roundabout,” ISA transactions, 2022.
[29] A. Vassili, C. James, and H. John, “Next generation simulation fact
sheet, washington, dc, usa.” 2007.
Authorized licensed use limited to: University Haifa. Downloaded on December 24,2023 at 07:41:44 UTC from IEEE Xplore. Restrictions apply.
... Data-driven methodologies have surfaced as a pragmatic solution to address this issue [44,45]. Recurrent Neural Network (RNN) based models have been crafted to contend with temporal dependencies [46][47][48]. ...
Article
Full-text available
In order to navigate through complex traffic scenarios safely and efficiently, the autonomous vehicle (AV) predicts its own behavior and future trajectory based on the predicted trajectories of surrounding vehicles to avoid potential collisions. Further, the predicted trajectories of surrounding vehicles (target vehicles) are greatly influenced by their driving behavior and prior trajectory. In this article, we propose a novel Transformer-based composite network to predict both driver behavior and future trajectory of a target vehicle in a highway driving scenario. The powerful multi-head attention mechanism of the transformer is exploited to extract social-temporal interaction between target vehicle and its surrounding vehicles. The prediction of both lateral and longitudinal behavior is carried out within the behavior prediction module, and this additional information is further utilized by the trajectory predictor module to ensure precise trajectory prediction. Furthermore, mixture density network is augmented in the model to handle uncertainties in the predicted trajectories. The proposed model’s performance is compared with several state-of-the-art models on real-world Next Generation Simulation (NGSIM) dataset. The results indicate the superiority of the proposed model over all contemporary state-of-the-art models, as evaluated using Root Mean Square Error (RMSE) metric. The proposed model predicts a 5s long trajectory with an 11% lower RMSE than the state-of-the-art model.
... In the SA network, an MHA layer correlates district-level rainfall (Q) with overall Assam data (K -V ). Additional information about the MHA mechanism can be referenced in [9], [10], [11], and [12]. Notably, a dedicated decoder for district-level data is utilized due to its significance (Fig. 1). ...
Article
Predicting Heavy Rainfall Events (HREs) with lead time poses a significant challenge for meteorological agencies, especially in mountainous regions like Assam. In this study, we simulated a real-time HRE that occurred between June 13 and 17, 2023, resulting in severe flooding in Assam. To enhance rainfall prediction, we integrated output from the Weather Research and Forecasting (WRF) model into a Deep Learning (DL) model. When comparing the district-level performance of WRF and DL models, it becomes evident that the DL model excels in capturing HREs with a significant accuracy of 54.4%, outperforming WRF’s accuracy of only 22.8%. The proposed model demonstrates a mean absolute error (MAE) of under 30 mm, outperforming WRF’s more than 50 mm MAE for Days 2-4, as compared with the India Meteorological Department (IMD). Remarkably, the DL model accurately represents rainfall intensity and magnitude in the western and southern parts of Assam. This study is the first of its kind to focus on a district-scale analysis in Assam.
... In this study, a TF-based encoder-decoder architecture is proposed to enhance the prediction accuracy and solve the gradient-vanishing problem of LSTM-based networks. The model processes the entire input sequence at once, overcoming limitations of RNN-based models [11][12][13][14][15]. However, TF models have not been extensively explored for multilane highway trajectory prediction. ...
Conference Paper
Full-text available
—The autonomous vehicle uses the expected trajectories of nearby vehicles to anticipate its own actions and path, ensuring safe and efficient navigation in complex traffic scenarios. The most influential factors in determining the future trajectory of the target vehicle are its past trajectory and movements. This research introduces a novel approach that combines a Convolutional Neural Network (CNN) and a multi-head attention-based network to predict the trajectory of autonomous vehicles on multi-lane highways. The CNN is employed to extract various time-varying features, whereas the Transformer’s multi-head attention (MHA) effectively captures the space-time interactions between the target and its neighbouring vehicles. Using the NGSIM dataset, the proposed model’s performance is assessed, and compared with sequential models built using recurrent neural networks (RNNs). The results demonstrate that the proposed model outperforms others models by achieving a 10% reduction in Root Mean Square Error (RMSE) for predicting trajectories over 5 seconds duration.
... Zang et al. [29] proposed a method for recognizing and predicting lane change intentions of vehicles using contextual traffic information to improve car-following control. After determining the intention, relative gap and vehicle speed based collision estimation function are utilized to provide collision probability in a driving scenario [30][31][32][33][34][35][36]. ...
Article
Full-text available
The collision avoidance system in an autonomous vehicle, intended to address traffic safety issues, has a crucial function called collision estimation. It accomplishes this by identifying potential dangers and notifying the drivers in advance or by using autonomous control to navigate safely. In this work, a novel approach is proposed for generating and selecting a lane change trajectory for the vehicle in a driving scenario where two vehicles are simultaneously executing lane change processes on highways and approaching the same target lane. Moreover, a novel fuzzy logic estimator based on time-to-collision (TTC) and time-to-gap (TTG) is designed to estimate the collision risk. In the collision avoidance process, the proposed estimator is utilized to determine the risk of a collision with polynomial function-based generation of possible lane change trajectories. The safest lane change trajectory is then provided to the motion controller so that it can navigate the vehicle safely through such a challenging lane change scenario. This work also investigates Stanley and Pure Pursuit controllers to follow the optimized trajectory. The simulation experiment results demonstrate that the proposed approach for dynamic trajectory generation during the lane change process can successfully handle this type of challenging situation and prevent a potential collision. Experimental results also indicate that monitoring the movement of the nearby lane-changing vehicle is crucial for safe lane change execution and that the proposed approach successfully handles the challenging situation preventing potential collision.
Article
Full-text available
The modelling of driver behavior plays an essential role in developing Advanced Driver Assistance Systems (ADAS) to support the driver in various complex driving scenarios. The behavior estimation of surrounding vehicles is crucial for an autonomous vehicle to safely navigate through an unsignalized intersection. This work proposes a novel kernelized convolutional transformer network (KCTN) with multi-head attention (MHA) mechanism to estimate driver behavior at a challenging unsignalized three-way roundabout. More emphasis has been placed on creating convolution in non-linear space by introducing a kervolution operation into the proposed network. It generalises convolution, improves model capacity, and captures higher-order feature interactions by using Gaussian kernel function. The proposed model is validated using the real-world ACFR dataset, where it outperforms current state-of-the-art in terms of behavior prediction accuracy and provides a significant lead time before potential conflict situations.
Article
Vehicle trajectory prediction is essential for the operation safety and control efficiency of automated driving. Prevailing studies predict car following and lane change processes in a separate manner, ignoring the dependencies of these two behaviors. To remedy this issue, this paper proposes an integrated deep learning-based two-dimension trajectory prediction model that can predict combined behaviors. Specifically, we designed a switch neural network structure based on the attention mechanism, bi-directional long-short term memory (BiLSTM) and Temporal convolution neural network (TCN) to mimic and predict the joint behaviors. Experiments are conducted based on the Next Generation Simulation (NGSIM) dataset to validate the effectiveness of our proposed model. As results indicate, our proposed model outperforms the state-of-art trajectory prediction models and can provide accurate short-term and long-term predictions.
Article
Autonomous vehicles (AVs) have now drawn significant attentions in academic and industrial research because of various advantages such as safety improvement, lower energy and fuel consumption, exploitation of road network, reduced traffic congestion and greater mobility. In critical decision making process during motion of an AV, intelligent motion planning takes an important and challenging role for obstacle avoidance, searching for the safest path to follow, generation of suitable behavior and comfortable trajectory generation by optimization while keeping road boundaries and traffic rules as important concerns. An AV should also be able to decide the safest behavior (such as overtaking in case of highway driving) at each moment during driving. The behavior planning techniques anticipate the behaviors of all traffic participants; then it reasonably decides the best and safest behavior for AV. For this highly challenging task, many different motion and behavior planning techniques for AVs have been developed over past few decades. The purpose of this paper is to present an exhaustive and critical review of these existing approaches on motion and behavior planning for AVs in terms of their feasibility, capability in handling dynamic constraints and obstacles, and optimality of motion for comfort. A critical evaluation of the existing behavior planning techniques highlighting their advantages, ability in handling of static and dynamic obstacles, vehicle constraints and limitations in operational environments has also been presented.
Article
Autonomous driving, including intelligent decision-making and path planning, in dynamic environments (like highway) is significantly more difficult than the navigation in static scenarios because of the additional time dimension. Therefore, correlating the time dimension and the space dimension through prediction to create a spatio-temporal navigation map can make decision-making and path planning in such kinds of environment much easier. In this article, NGSIM data is analysed and processed from the perspective of the ego-vehicle (using the data as an ego-vehicle's perception results). Based on the data, we develop an LSTM (Long-Short Term Memory)-based framework to predict possible trajectories of multiple surrounding vehicles within a certain range of the ego-vehicle. Then, the multiple predicted trajectories in a series of continuous dynamic highway scenes are projected into a spatio-temporal domain to create an octree map. Thus, dynamic targets and static obstacles can be unified into the same domain or map so that the dynamic disturbance problem for autonomous driving in highway environments can be resolved. Experimental results show that the proposed model is capable of predicting all the future trajectories around the ego-vehicle efficiently and the corresponding spatio-temporal map can be generated accurately in different dynamic scenarios.
Article
Accurate vehicle trajectory prediction can benefit a variety of intelligent transportation system applications ranging from traffic simulations to driver assistance. The need for this ability is pronounced with the emergence of autonomous vehicles as they require the prediction of nearby vehicles’ trajectories to navigate safely and efficiently. Recent studies based on deep learning have greatly improved prediction accuracy. However, one prominent issue of these models is the lack of model explainability. We alleviate this issue by proposing spatiotemporal attention long short-term memory (STA-LSTM), an LSTM model with spatial-temporal attention mechanisms for explainability in vehicle trajectory prediction. STA-LSTM not only achieves comparable prediction performance against other state-of-the-art models but, more importantly, explains the influence of historical trajectories and neighboring vehicles on the target vehicle. We provide in-depth analyses of the learned spatial–temporal attention weights in various highway scenarios based on different vehicle and environment factors, including target vehicle class, target vehicle location, and traffic density. A demonstration illustrating that STA-LSTM can capture and explain fine-grained lane-changing behaviors is also provided. The data and implementation of STA-LSTM can be found at https://github.com/leilin-research/VTP .
Article
Accurate monitoring the surrounding environment is an important research direction in the field of unmanned systems such as bio-robotics, and has attracted much research attention in recent years. The trajectories of surrounding vehicles should be predicted accurately in space and time to realize active defense and running safety of an unmanned system. However, there is uncertainty and uncontrollability in the process of trajectory prediction of surrounding obstacles. In this study, we propose a trajectory prediction method based on a sequential model, that fuses two neural networks of a convolutional neural network (CNN) and a long short-term memory network (LSTM). First, a box plot is used to detect and eliminate abnormal values of vehicle trajectories, and valid trajectory data are obtained. Second, the trajectories of surrounding vehicles are predicted by merging the characteristics of CNN space expansion and LSTM time expansion; the hyper-parameters of the model are optimized according to a grid search algorithm, which satisfies the double-precision prediction requirement in space and time. Finally, data from next generation simulation (NGSIM) and Creteil roundabout in France are taken as test cases; the correctness and rationality of the method are verified by prediction error indicators. Experimental results demonstrate that the proposed CNN-LSTM method is more accurate and features a shorter time cost, which meets the prediction requirements and provides an effective method for the safe operation of unmanned systems.