Conference PaperPDF Available

Autonomous Vehicle Trajectory Prediction on Multi-Lane Highways Using Attention Based Model

August 2023

August 2023

DOI:10.1109/SeFeT57834.2023.10245038

Conference: 2023 IEEE 3rd International Conference on Sustainable Energy and Future Electric Transportation (SEFET)

Authors:

Omveer Sharma

University of Haifa

Niladri Puhan

Indian Institute of Technology Bhubaneswar

The autonomous vehicle anticipates its own behaviour and future trajectory based on the expected trajectories of surrounding vehicles to prevent a potential collision in order to navigate through complex traffic scenarios safely and effectively. The estimated trajectories of surrounding vehicles (target vehicles) are also influenced by past trajectory and positions of its surroundings. In this study, a novel Transformer-based network is used to predict autonomous vehicle trajectory in highway driving. Transformer’s multi-head attention method is employed to capture social-temporal interaction between the target vehicle and its surroundings. The performance of the proposed model is compared with Recurrent Neural Network (RNN) based sequential models, using the NGSIM dataset. The results show that the proposed model predicts 5s long trajectory with 10% lower Root-Mean-Square Error (RMSE) than the RNN-based state-of-the-art model.

Content uploaded by Omveer Sharma

Content may be subject to copyright.

Autonomous Vehicle Trajectory Prediction on

Multi-Lane Highways Using Attention based Model

1st Omveer Sharma

School of Electrical Sciences

Indian Institute of Technology

Bhubaneswar, India

os10@iitbbs.ac.in

2nd N. C. Sahoo

School of Electrical Sciences

Indian Institute of Technology

Bhubaneswar, India

ncsahoo@iitbbs.ac.in

3rd Niladri B. Puhan

School of Electrical Sciences

Indian Institute of Technology

Bhubaneswar, India

nbpuhan@iitbbs.ac.in

Abstract—The autonomous vehicle anticipates its own be-

haviour and future trajectory based on the expected trajectories

of surrounding vehicles to prevent a potential collision in order

to navigate through complex trafﬁc scenarios safely and effec-

tively. The estimated trajectories of surrounding vehicles (target

vehicles) are also inﬂuenced by past trajectory and positions

of its surroundings. In this study, a novel Transformer-based

network is used to predict autonomous vehicle trajectory in

highway driving. Transformer’s multi-head attention method is

employed to capture social-temporal interaction between the

target vehicle and its surroundings. The performance of the

proposed model is compared with Recurrent Neural Network

(RNN) based sequential models, using the NGSIM dataset. The

results show that the proposed model predicts 5s long trajectory

with 10% lower Root-Mean-Square Error (RMSE) than the

RNN-based state-of-the-art model.

Index Terms—Trajectory prediction, Transformer, Intelligent

vehicle, Sequential network, Autonomous Vehicle.

I. INTRODUCTION

Trajectory prediction is an indispensable task for an au-

tonomous vehicle (AV) in complex and autonomous driving

scenarios. An AV plans its own trajectory after anticipating

the future trajectory of surrounding vehicles, followed by an

advanced driving assistance system (ADAS) [1–4]. AV (black)

is surrounded by red target vehicles whose trajectories must

be determined as shown in Fig. 1. Further, a target vehicle’s

previous track and the positions of its surrounding vehicles

affect its future trajectory. The phrase ‘surrounding vehicles’

is used throughout the rest of the article to refer to the target

vehicle’s immediate neighbouring vehicles.

The trajectory predictors can be split into two major cat-

egories: model-based and data-driven approaches. A model-

based prediction strategy can forecast a vehicle’s trajectory

using kinematics or statistical model. Trajectory estimation

has been done using model-based techniques such as constant

acceleration model (CA), constant velocity model (CV), con-

stant yaw rate & acceleration motion model, Kalman ﬁlter,

hidden Markov model (HMM), and Gaussian mixture model

(GMM) [5–9]. Most model-based techniques can only predict

short-term trajectories. These models could result in prediction

errors if there is even a little difference between the driver’s

actual and predicted behaviour. The limitations of model-based

Fig. 1: An autonomous vehicle, positioned in the center, is

predicting the future paths of nearby vehicles.

prediction techniques have been signiﬁcantly overcome by

data-driven prediction techniques.

For trajectory prediction, several RNN based models have

been proposed [10–13]. Gated Recurrent Units (GRUs) were

incorporated into a generative adversarial imitation learning

trajectory prediction model in [11]. To incorporate informa-

tion from several agents, the Social Generative Adversarial

Networks (S-GAN) model uses both GAN and a recurrent

sequence-to-sequence model in [14]. The research mentioned

above successfully illustrates temporal correlation in terms

of individual motion states, however, it is equally crucial

to consider the spatial relationships between vehicles. These

relationships signiﬁcantly inﬂuence the trajectory of the target

vehicle.

In [15], the spatiotemporal attention long short-term mem-

ory (STA-LSTM) framework is proposed to predict the target

vehicle’s future trajectory. The model, however, only works

well when forecasting future 1 second long trajectories. A

novel approach is introduced through the Attention-based

LSTM encoder-decoder (LSTM-ED) frameworks, which effec-

tively correlate the time dimension and space dimension [16–

18]. Their primary purpose is to generate a spatial-temporal

navigation map. Instead of establishing vehicle spatial rela-

tionships, the authors estimated the target vehicle’s future

behaviour from past trafﬁc participant tracks to predict the

future trajectory [19–21]. Shi et al. [20] introduced a novel

attention network that combines temporal convolution neural

Authorized licensed use limited to: University Haifa. Downloaded on December 24,2023 at 07:41:44 UTC from IEEE Xplore. Restrictions apply.

network (TCN) and bi-directional long-short term memory

(Bi-LSTM). This network aims to accurately predict lane

keep (LK) and lane change (LC) behavior, along with future

trajectories.

Occupancy grids are utilized in most spatial-temporal

attention-based frameworks [22, 23]. Individual sequential

models (encoders) establish temporal correlation and extract

temporal-dependent sequences to provide spatiotemporal at-

tention. Lastly, a sequential model-based decoder predicts

the target vehicle trajectory utilising the spatial-temporal envi-

ronment. Complex networks with many encoders make trajec-

tory prediction slow and RNN-based encoders propagate input

data sequentially, limiting parallel operation. Gradient vanish-

ing in RNN-based models decreases trajectory prediction accu-

racy and instability over extended sequences. A Transformer-

based encoder-decoder architecture addresses gradient vanish-

ing and computation time issues by accepting the complete

input sequence, unlike RNN-based models. Transformer (TF)

models are popular for sequence-to-sequence learning issues

[24–28]. However, vehicle trajectory prediction on multi-lane

highways has not been investigated using TF-based modelling.

In order to forecast future trajectory utilising a short seg-

ment of tracking data, this work introduces a novel pure

attention-based spatial-temporal attention framework (STA-

TF). The main technological contributions are summarized

below:

1) To assess the inﬂuence of surrounding vehicle trajec-

tories on the target vehicle, robust multi-head attention

mechanisms are employed.

2) The proposed model leverages the advantages of TF to

achieve faster performance compared to existing sequen-

tial networks such as LSTM, GRU, and Bi-LSTM.

3) The trajectory prediction issue has been addressed by

adaptation, customisation, and establishment of the pow-

erful Transformer model.

4) A real-world NGSIM dataset is utilized to evaluate the

potential of the proposed model in predicting vehicle

trajectories in highway driving scenarios. The results

demonstrate that the proposed model outperforms ex-

isting RNN-based models.

The structure of this paper is as follows: Section II introduces

the formulation of the trajectory prediction task. The network

architecture of the proposed model is explained in Section

III. Section IV presents the experimental results, and ﬁnally,

Section V concludes the work.

II. PRO BL EM F OR MU LATI ON

The proposed model’s goal is to forecast the target ve-

hicle’s future trajectory using the most recent tracks of the

target and its surrounding vehicles as of the observation time

(tobs). While driving on a highway, various driving styles are

exhibited by drivers, each of which signiﬁcantly inﬂuences

the future trajectory of the vehicle. This diversity of driving

styles adds complexity to the task of accurately predicting the

vehicle’s trajectory.

Fig. 2: Target and surrounding vehicles in modiﬁed frame of

reference.

A. Inputs and outputs

The proposed model’s input consists of the the previous

tracks of the target (T) and its six immediate neighbours

[19], including the three vehicles that preceded it on the

left, current, and right lanes (PLL, PCL, and PRL), as well

as the additional three vehicles that followed (FLL, FCL,

and FRL), as shown in Fig. 2. The past trajectory of a sur-

rounding vehicle i∈ {P LL, F LL, P C L, F CL, P RL, F RL}

is deﬁned as Si=Xi

tobs−Lin +1, X i

tobs−Lin +2, . . . , X i

tobs ,

where Lin is input sequence length and Xi

t=xi

t, yi

t

is positional vector. The target vehicle’s historical track is

speciﬁed as ST=XT

tobs−Lin +1, X T

tobs−Lin +2, . . . , X T

tobs ,

where XT

t=xT

t, yT

t, vT

t, αT

t, classis feature vector, vT

is target vehicle’s velocity, αT

tis target vehicle’s acceleration,

and class is type of target vehicle (bike, car or truck). These

past vehicle trajectories of the target and its surrounding

vehicles are fed into the proposed model as input. The model

predicts the target vehicle’s future trajectory (positional feature

vectors). This can be stated as follows:

OT=YT

tobs+1 , Y T

tobs+2 , . . . , Y T

tobs+Lout (1)

where YT

t=xT

t, yT

tare the predicted future coordinates

for the target vehicle.

B. Frame of reference

At the observation time tobs, a stationary frame of reference

is established with the target vehicle serving as the origin. In

this frame, the y-axis represents forward motion, while the x-

axis is perpendicular to it, as illustrated in Fig. 2. Because of

this technique, the proposed model is unaffected by vehicle

track generation [19].

III. NET WORK ARCHITECTURE

Encoder-decoder Transformer models potentially overcome

RNN-based models’ difﬁculties, as described in the intro-

duction. The proposed model (STA-TF) overcomes gradient

vanishing constraints by processing the entire input sequence

with a TF encoder layer. TF encoder priorities input segments

by using Multi-head attention (MHA) mechanism. In order to

make precise assumptions about vehicle motions, it is crucial

to have a thorough understanding of the interactions and rela-

tionships between trafﬁc participants on the road. Therefore,

the proposed model architecture is divided into three key

Authorized licensed use limited to: University Haifa. Downloaded on December 24,2023 at 07:41:44 UTC from IEEE Xplore. Restrictions apply.

Fig. 3: Proposed model architecture

components: (1) Spatial attention network, (2) Encoder, and

(3) Decoder. These components are illustrated in Fig. 3. MHA

satisﬁes the speciﬁc requirements of each of the three model

components.

A. Multi-head Attention (MHA) Mechanism

Attention correlates a sequence’s numerous locations in

order to identify the hidden representation of the sequence.

The query, key, and value concepts from the information

retrieval approach are used to do this. Attention, in particular,

produces a weighted sum of all values, where the key and

the queries decide the weights. As an illustration, consider

Q(∈RLin×dq) to be the query matrix composed of the d-

dimensional query vectors corresponding to various positions

in the Lin-length sequence. Similar to Q,K(∈RLin ×dk) and

V(∈RLin×dv) represents the key-value pair for various points

in the sequence, where query, key, and value vector dimensions

match (dq=dk=dv). The attention weights, denoted

as WA=sof tmax QKT/√dk, are computed using the

matrices Qand K. Subsequently, the scaled dot product can be

calculated using Eq. 2. In their work [24], the authors propose

linearly projecting the matrices Q,K, and Vmultiple times

(referred to as htimes or heads) to parallelize the scaled dot

product attention for each head, a technique known as“Multi-

head attention” (MHA). This approach allows the model to

jointly attend to multiple representation subspaces. Finally, Eq.

Fig. 4: Multi-head attention mechanism (MHA) [24].

3 is employed to combine the outputs from the various heads.

Attention(Q, K, V ) = sof tmax QKT

√dkV(2)

MultiHead(Q, K, V ) = Concat(head1, ..., headh)Wo(3)

Fig. 4 shows the scaled dot product attention and MHA pro-

cedure. In MHA, dimension of key vector dk(dmodel/heads)

is calculated based on the dimension of model (dmodel = 128)

Authorized licensed use limited to: University Haifa. Downloaded on December 24,2023 at 07:41:44 UTC from IEEE Xplore. Restrictions apply.

and number of heads (h= 8). It should be noticed that the

attention weight matrix has a dimension of Lin×Lin . In this

weight matrix, element WA

fg indicates the attention between

fth position of Q(feature vectors of matrix Qat time instant

‘f′) with gth position of K(feature vectors of matrix Kat

time instant ‘g′). Thus, a weighted correlation between all-time

instants (all positions of Qand K) can therefore be inferred

from this attention weight matrix.

B. Spatial attention network

A spatial attention network is used to establish the relation-

ship of six surrounding vehicles with the target vehicles. In this

network, trafﬁc participants’ seven (target and six surrounding

vehicles) positional trajectories (Siand ST) are used as input

and fed to individual feed-forward layers. In the absence of any

surrounding vehicle, the position of the vehicle is represented

as a two-dimensional zero vector. For six surrounding vehicles,

six MHA layers are used to establish a correlation between

the surrounding vehicle trajectory with the past target vehicle

trajectory. The output of six feed-forward layers (Oi

S,F F1∈

RLin×dmodel ) serves as Kand Vpair matrices and the output

of the target vehicle trajectory (OT

S,F F1∈RLin×dmodel ) serves

as matrix Qfor six MHA layers. The outputs of MHA layers

S,MH A ∈RLin×dmodel (representing the correlation of the

surrounding vehicle ith with the target vehicle) and OT

S,F F1

are concatenated and passed threw another feed-forward layer

to adjust the dimension of output OS,F F2∈RLin×dmodel .

C. Encoder layer

The proposed model encoder determines temporal cor-

relation in the input sequence (output of spatial attention

network). Encoder layer of vanilla Transformer network is

adopted to perform this task [24]. To incorporate temporal

information and leverage sequential correlations between time

steps, the encoder receives the output (OS,FF2) from the

spatial attention network. This output is then passed through

layers of positional encoding. The positional encoding layer

utilizes both sine and cosine functions. The resulting output

of the positional encoding layer (OEncoder

pos ∈RLin×dmodel ) is

calculated as follows:

OEncoder

pos =OS,F F2+P E (4)

where the positional encoding coefﬁcient matrix is represented

by P E. The output is then transmitted to the encoder layer of

the TF after position encoding. The encoder layer is composed

of both an MHA layer and a fully connected feed-forward

network (FFN). The Q,K, and Vinputs for the MHA layer

are provided by OEncoder

pos . The output of the MHA layer

(OEncoder

MH A ∈RLin ×dmodel ) is calculated using Eqs. 2-3. As

mentioned earlier, the attention weights allowed the MHA

layer to establish the hidden temporal relationship in the

input sequence of MHA layer. The output of MHA layer is

processed through Add & Normalization (Norm) as follows:

OEncoder

add &norm1=N orm(OE ncoder

MH A +OEncoder

pos )(5)

This output OEncoder

add &norm1∈RLin×dmodel goes to the FFN,

which executes the following linear transformations over var-

ious positions:

OEncoder

FFN =σOEncoder

add &norm1W1+B1W2+B2(6)

The output of FNN OEncoder

FFN ∈RLin×dmodel is processed

through another Add & Normalization (Norm) as follows:

OEncoder

add &norm2=N orm(OE ncoder

FFN +Oadd &norm1)(7)

The next decoder module of the proposed model receives

the output of the encoder (OEncoder =OEncoder

add &norm2∈

RLin×dmodel ).

D. Decoder layer

Two distinct inter-dependencies are integrated into the de-

coder layer: The ﬁrst, known as self-attention, is between

decoder input and decoder output whereas the second, known

as encoder-decoder attention, is between encoder output and

decoder output. The target vehicle’s predicted future trajec-

tory is right-shifted (h˜

tobs+1 ,˜

tobs+2 ,..., ˜

tobs+kiat time

t=tobs +k+ 1), combined with encoder output (OEncoder ),

and used as the decoder’s inputs to predict future trajectory

(ODecoder

tobs+1 , ODecoder

tobs+2 , . . . , ODecoder

tobs+k, ODecoder

tobs+k+1 ), as shown

in Fig. 3. Performance of the decoder is enhanced by the

regressive (feedback) technique, which uses the previously

predicted trajectory as input. A ′sos′input (a zeros vector

with a size of 2) is supplied to the decoder as input to start

processing since there is no projected trajectory at time tobs

(or k= 0).

Similar to the encoder, the decoder’s input (right-shifted

decoder output) features are converted to high-dimensional

space using a fully connected layer before being sent via a

layer of positional encoding in accordance with the Eq. 4.

The ﬁrst MHA layer for self-attention receives the output

of the positional encoding layer as its input, and the output

(ODecoder

MH A1∈RLout ×dmodel ) is calculated using Eqs. 2-3. The

matrices Q,K, and Vare constructed by only using decoder

input, hence the ﬁrst MHA layer can draw out self-attention

from this decoder input. In the Second MHA layer, matrix

Qcomes from the ﬁrst MHA layer (a hidden correlation in

decoder input sequence) and the Kand Vpair comes from en-

coder output (OEncoder ), thus attention is computed between

these two input signals (encoder output and decoder input).

The output of second MHA layer (ODecoder

MH A2∈RLout ×dmodel )

is fed to Add & Normalization and FNN of decoder, and output

of decoder (ODecoder ∈RLout×2) is calculated as per Eqs. 5-

Iteratively decreasing learning rate from 0.00001 to

0.000001 with a 64-batch size trains the proposed model.

Modiﬁed teacher forcing trains proposed model with full

teacher forcing for the ﬁrst 10 epochs. The teacher forcing

factor is then incrementally decreased until it equals zero.

The model parameters are trained using the Root-Mean-Square

Error (RMSE) as the loss function. These studies use a desktop

computer with an Intel(R) Xeon(R) Processor E5-2643 V4.

Authorized licensed use limited to: University Haifa. Downloaded on December 24,2023 at 07:41:44 UTC from IEEE Xplore. Restrictions apply.

IV. RESULTS AND DISCUSSION

The performance of the proposed model is evaluated and

compared with baseline models after describing the research,

revealing its superior performance in trajectory prediction over

state-of-the-art models.

A. Dataset and its preprocessing

The evaluation of the proposed model is conducted using

the well-known NGSIM dataset, which comprises data from

the southbound US 101 highway in Los Angeles and the

Intersection 80 highway section in Emeryville, California. This

particular highway segment consists of six lanes, including ﬁve

motorway lanes and one auxiliary lane, spanning from the on-

ramp to the off-ramp [29]. Only lane keep and discretionary

lane changing (DLC) trajectories are taken into consideration

for vehicles travelling in lanes 2, 3, 4, and 5 throughout this

process. To decrease the complexity and computation time of

the proposed model, all trajectories are down-sampled from a

rate of 10 Hz to 5 Hz. Each trajectory is broken into segments

that last 8s. The vehicle’s historical track is determined by a 3s

long trajectory segment, and the subsequent 5s long trajectory

segments are predicted by the proposed model. The proposed

model is trained using 80% of the trajectory segments, and

the remaining 20% is dedicated to testing its performance.

B. Evaluation metric

The effectiveness of the proposed model is evaluated using

metrics such as RMSE, Mean-Absolute Error (MAE), and

Mean-Square Error (MSE), represented mathematically by

Eqns. 8 to 10.

RMSE =v

tpred

tobs+tpred

t=tobs+1 ˜

t−YT

t2(8)

MAE =1

tpred

tobs+tpred

t=tobs+1 



t−YT

t



(9)

MSE =1

tpred

tobs+tpred

t=tobs+1 ˜

t−YT

t2(10)

where ˜

tand YT

trepresent the projected position and

actual location of the target vehicle (T) at the prediction

timestamp (t), respectively (here 5 timestamps for 5s).

C. Proposed model performance

Table I demonstrates the effectiveness of the proposed

model for prediction horizons up to 5s along with the worst

5% and 1% of prediction errors. The predictive capability

of the proposed model is evident from its achievement of

RMSE values of 0.46m, 0.80m, 1.19m, 1.67m, and 2.24m for

trajectory predictions at 1s, 2s, 3s, 4s, and 5s respectively.

Similarly, the model demonstrates MSE values of 0.25m,

0.72m, 1.58m, 3.10m, and 5.61m for trajectory lengths of

1s, 2s, 3s, 4s, and 5s respectively. Furthermore, the proposed

model achieves MAE values of 0.38m, 0.63m, 0.91m, 1.23m,

TABLE I: RMSE, MSE and MAE in Proposed model predic-

tion.

Evaluation

metric

Prediction horizon (s)

1s 2s 3s 4s 5s

RMSE (m)

All 0.46 0.80 1.19 1.67 2.24

Worst 5% 0.80 1.78 3.07 4.62 6.32

Worst 1% 1.19 2.68 4.55 6.68 8.99

MSE (m)

All 0.25 0.72 1.58 3.10 5.61

Worst 5% 1.38 4.59 11.34 23.65 43.15

Worst 1% 3.22 10.28 24.29 48.55 85.61

MAE (m)

All 0.38 0.63 0.91 1.23 1.61

Worst 5% 0.88 1.70 2.76 4.02 5.40

Worst 1% 1.31 2.55 4.10 5.85 7.76

Lateral

Error (m)

All 0.11 0.16 0.20 0.23 0.26

Worst 5% 0.11 0.16 0.21 0.25 0.29

Worst 1% 0.15 0.23 0.30 0.35 0.40

Longitudinal

Error (m)

All 0.45 0.78 1.18 1.65 2.23

Worst 5% 0.78 1.77 3.06 4.61 6.32

Worst 1% 1.17 2.65 4.53 6.68 8.99

0 0.5 1 1.5 2 2.5

RMSE (m)

Prediction horizon (s)

Lateral

Longitudinal

Total

Fig. 5: Lateral, longitudinal and total error (RMSE) of pro-

posed model.

and 1.61m for trajectory lengths of 1s, 2s, 3s, 4s, and 5s

respectively, further highlighting its predictive accuracy. The

proposed model’s robustness is demonstrated by the fact that

prediction errors for worst-case scenarios are noticeably lower.

The longitudinal and lateral prediction RMSE error of the

proposed models is also shown in Fig. 5.

D. Comparative Analysis

In this section, the performance of the proposed trajectory

prediction model is evaluated and compared against state-of-

the-art models. To demonstrate the model’s effectiveness, the

same experimental settings and evaluation metric are adopted

as [16, 21]. The following models are trained and evaluated

on NGSIM dataset under the same experimental conditions to

compare the proposed model.

•Vanilla LSTM (V-LSTM) [10] : The LSTM network takes

the raw input trajectories of the target and surrounding

vehicles as inputs. By employing an LSTM model, future

trajectory predictions are made as point estimates.

•Bi-LSTM [13]: The bi-directional LSTM network re-

ceives the raw input trajectories as its input.

•TCN [20]: The TCN network receives the raw input

trajectories as its input.

Authorized licensed use limited to: University Haifa. Downloaded on December 24,2023 at 07:41:44 UTC from IEEE Xplore. Restrictions apply.

TABLE II: Analyzing the Comparative Performance of the

Proposed Model against State-of-the-Art Models. The best

result is indicated in the bold face.

Evaluation

metric

Models Prediction horizon (s)

1s 2s 3s 4s 5s

RMSE (m)

V-LSTM [10] 0.9 1.29 1.71 2.2 2.74

Bi-LSTM [13] 0.77 1.09 1.51 2.03 2.63

TCN [20] 0.71 1.09 1.52 2.01 2.55

LSTM-ED [18] 0.53 0.91 1.36 1.9 2.56

TCN-LSTM 0.56 0.91 1.34 1.86 2.49

STA-LSTM [16] 0.49 0.86 1.31 1.85 2.51

MHA-LSTM [22] 0.48 0.85 1.28 1.81 2.47

V-TF 0.47 0.82 1.24 1.73 2.35

Proposed (STA-TF) 0.46 0.8 1.19 1.67 2.24

MSE (m)

V-LSTM [10] 0.93 1.81 3.15 5.15 8

Bi-LSTM [13] 0.66 1.29 2.45 4.41 7.38

TCN [20] 0.58 1.33 2.59 4.5 7.26

LSTM-ED [18] 0.33 0.92 2.03 3.99 7.26

TCN-LSTM 0.36 0.95 2.03 3.88 6.94

STA-LSTM [16] 0.3 0.84 1.9 3.8 6.94

MHA-LSTM [22] 0.27 0.79 1.78 3.54 6.59

V-TF 0.25 0.76 1.7 3.37 6.21

Proposed (STA-TF) 0.25 0.72 1.58 3.1 5.61

MAE (m)

V-LSTM [10] 0.72 1 1.31 1.64 2.01

Bi-LSTM [13] 0.64 0.88 1.17 1.51 1.9

TCN [20] 0.61 0.89 1.19 1.53 1.9

LSTM-ED [18] 0.47 0.74 1.05 1.42 1.84

TCN-LSTM 0.49 0.75 1.05 1.39 1.8

STA-LSTM [16] 0.46 0.7 1.02 1.38 1.78

MHA-LSTM [22] 0.38 0.64 0.94 1.3 1.72

V-TF 0.39 0.64 0.94 1.28 1.68

Proposed (STA-TF) 0.38 0.63 0.91 1.23 1.61

•LSTM-ED [18]: LSTM-based encoder-decoder model is

used in which raw input trajectories are fed to LSTM-

based encoder and decoder is predicting future trajectory.

•TCN-LSTM: A encoder-decoder network is constructed,

where TCN-based encoder and LSTM-based decoder is

used.

•STA-LSTM [16]: Spatio-temporal attention-based LSTM

encoder-decoder network is used to vehicle trajectory

prediction.

•MHA-LSTM [22]: To extract spatial attention between

vehicles, an encoder-decoder network based on spatial

attention employs LSTM.

•V-TF: The spatial attention network is excluded from

the proposed model, and therefore, only the raw input

trajectories of the target and surrounding vehicles are

directly fed into the vanilla Transformer network (V-TF).

The quantitative experimental results are summarised in Table

II and are shown in Fig. 6. The results show that the proposed

model has utilised the spatial attention network to establish

the hidden relationship between trafﬁc participants, which has

resulted in a small trajectory prediction error. The proposed

method successfully predicts future trajectory with a 2.24m

RMSE for a 5s long prediction horizon, which is 10% less than

the state-of-the-art models [16, 22]. In congested trafﬁc, where

several vehicles occupy the same drivable area, trajectory

prediction must account for surrounding vehicle correlation.

Irrespective of the prediction horizon, the proposed model

demonstrates superior performance compared to state-of-the-

art models. Short-term predictions primarily rely on recent

1 1.5 2 2.5 3 3.5 4 4.5 5

Prediction Horizon (s)

0.5

1.5

2.5

RMSE (m)

V-LSTM [10]

Bi-LSTM [13]

LSTM-ED [18]

MHA-LSTM [22]

V-TF

STA-TF

1 1.5 2

0.4

0.6

0.8

1.2

Fig. 6: Comparing the proposed model with other models

based on RMSE.

TABLE III: Computing time comparisons among models.

Computation time Proposed V-TF STA-LSTM MHA-LSTM

STA-TF [16] [22]

(millisecond) 3.9 3.1 6.8 10.0

vehicle dynamics, whereas long-term predictions are more in-

ﬂuenced by correlation information. Gradient vanishing, which

impacts longer forecasts, did not affect the Transformer’s

memory mechanism. Hence, the proposed model (STA-TF)

predicts future trajectories more efﬁciently than current state-

of-the-art models due to correlation (spatial attention network)

and Transformer-based architecture.

During deployment, the model’s calculation time com-

plexity is examined . The computation time of proposed

model is compared with LSTM-based similar social contex-

tual attention-based state-of-the-art models [16, 22]. Table III

shows the models’ computation times. The proposed model

predicts a 5s trajectory 43% and 61% faster than LSTM

[16] and social contextual attention-based models [22], respec-

tively. The TF-based model offers the advantage of processing

the entire input sequence simultaneously, resulting in faster

computation compared to RNN-based models. However, it is

important to note that the TF-based model requires a longer

training time. The performance evaluation of the proposed

model encompasses both lane keeping (LK) and lane change

trajectories. Furthermore, the analysis of lane change trajecto-

ries is further divided into two types: lane change to the left

(LCL) and lane change to the right (LCR). The subsequent

section provides a comprehensive analysis of the proposed

model’s performance in relation to these lateral behaviors

(LCL, LCR, and LK).

E. Proposed model’s performance on lane keep and lane

change trajectories

In this section, the investigation focuses on trajectory pre-

diction errors related to lateral behaviours. Table IV presents

the prediction errors for each behaviour. Across all behaviours,

Authorized licensed use limited to: University Haifa. Downloaded on December 24,2023 at 07:41:44 UTC from IEEE Xplore. Restrictions apply.

1 1.5 2 2.5 3 3.5 4 4.5 5

Prediction Horizon (s)

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Lateral Error (m)

LCL

CLR

Overall

(a)

1 1.5 2 2.5 3 3.5 4 4.5 5

Prediction Horizon (s)

0.5

1.5

2.5

Longitudinal Error (m)

LCL

LCR

Overall

(b)

1 1.5 2 2.5 3 3.5 4 4.5 5

Prediction Horizon (s)

0.5

1.5

2.5

RMSE (m)

LCL

LCR

Overall

(c)

1 1.5 2 2.5 3 3.5 4 4.5 5

Prediction Horizon (s)

MSE (m)

LCL

LCR

Overall

(d)

1 1.5 2 2.5 3 3.5 4 4.5 5

Prediction Horizon (s)

0.5

1.5

2.5

MAE (m)

LCL

LCR

Overall

(e)

Fig. 7: (a) Lateral, (b) longitudinal and total error ((c) RMSE,

(d) MSE and (e) MAE) of proposed model.

TABLE IV: The RMSE, MSE, and MAE for lateral and

longitudinal behaviours.

Prediction Horizon (s)

12345

Lateral

Error (m)

Lateral

Behaviours

LCL 0.33 0.46 0.52 0.56 0.60

LCR 0.35 0.50 0.59 0.66 0.71

LK 0.08 0.12 0.15 0.19 0.22

Overall 0.11 0.16 0.20 0.23 0.26

Longitudinal

Error (m)

Lateral

Behaviours

LCL 0.65 1.07 1.53 2.08 2.75

LCR 0.72 1.16 1.67 2.28 3

LK 0.43 0.76 1.15 1.62 2.19

Overall 0.45 0.78 1.18 1.65 2.23

RMSE

(m)

Lateral

Behaviours

LCL 0.73 1.17 1.62 2.16 2.81

LCR 0.80 1.27 1.77 2.37 3.08

LK 0.44 0.77 1.15 1.63 2.2

Overall 0.46 0.80 1.19 1.67 2.24

MSE

(m)

Lateral

Behaviours

LCL 0.61 1.51 2.84 4.99 8.41

LCR 0.69 1.69 3.32 6 10.16

LK 0.22 0.66 1.47 2.93 5.34

Overall 0.25 0.72 1.58 3.1 5.61

MAE

(m)

Lateral

Behaviours

LCL 0.71 1.09 1.42 1.79 2.22

LCR 0.77 1.17 1.56 1.99 2.47

LK 0.36 0.59 0.86 1.18 1.55

Overall 0.38 0.63 0.91 1.23 1.61

the lateral directional error consistently appears smaller than

the longitudinal directional error, with a range of 0.22m to

0.71m for 5-second long predictions. Notably, both LCL and

LCR trajectories exhibit high lateral directional errors, measur-

ing 0.6m and 0.71m, respectively. Similarly, the longitudinal

directional errors are also high for LCL and LCR trajectories

measuring 2.75m and 3m, respectively.

It should be noted that the highest trajectory error (RMSE)

is observed in lane change (LCL and LCR) related trajectories,

indicating the signiﬁcant impact of longitudinal error on the

overall error, as depicted in Table IV. A similar observation

can be drawn from the MSE and MAE for lateral behaviour-

based trajectories (LCL, LCR, and LK). Fig. 7 displays the

lateral error, longitudinal error, RMSE, MSE, and MAE of

the proposed model for lateral behaviours.

V. CONCLUSION

This research proposes a novel vehicle trajectory predic-

tion model using raw trajectory data of the target and its

surrounding vehicles. The proposed model has three sub-

modules: Spatial attention network, encoder, and decoder. The

Spatial attention network established the hidden relationship

between the target and surrounding vehicles. Tracking modules

(encoder and decoder) use spatial attention network output for

trajectory prediction. Finally, thorough quantitative and quali-

tative experiments on the publicly available NGSIM dataset

show that the proposed model outperforms state-of-the-art

methods in long-range trajectory prediction and is comparable

in short-term prediction. Since trajectory analysis is important

to pedestrian trajectory prediction, it would be interesting to

modify the proposed model for pedestrian trajectory prediction

in future work. Further, implementing the proposed methods

for highway risk/collision estimation is also a promising

direction.

Authorized licensed use limited to: University Haifa. Downloaded on December 24,2023 at 07:41:44 UTC from IEEE Xplore. Restrictions apply.

ACKNOWLEDGMENT

The research grant for the project ”Driver Behavior Mod-

elling for Autonomous Driving” has been provided by KPIT

Technologies Pvt. Ltd., Bangalore, India, offering valuable

support to this work.

REFERENCES

[1] G. S. Aoude, V. R. Desaraju, L. H. Stephens, and J. P. How, “Driver

behavior classiﬁcation at intersections and validation on large naturalistic

data set,” IEEE Transactions on Intelligent Transportation Systems,

vol. 13, no. 2, pp. 724–736, 2012.

[2] O. Sharma, N. C. Sahoo, and N. B. Puhan, “Recent advances in

motion and behavior planning techniques for software architecture of

autonomous vehicles: A state-of-the-art survey,” Engineering applica-

tions of artiﬁcial intelligence, vol. 101, p. 104211, 2021.

[3] M. Br¨

annstr¨

om, E. Coelingh, and J. Sj¨

oberg, “Model-based threat

assessment for avoiding arbitrary vehicle collisions,” IEEE Transactions

on Intelligent Transportation Systems, vol. 11, no. 3, pp. 658–669, 2010.

[4] O. Sharma, N. C. Sahoo, and N. B. Puhan, “A survey on smooth path

generation techniques for nonholonomic autonomous vehicle systems,”

in IECON 2019 - 45th Annual Conference of the IEEE Industrial

Electronics Society. IEEE, 2019, pp. 5167–5172.

[5] A. Houenou, P. Bonnifait, V. Cherfaoui, and W. Yao, “Vehicle trajectory

prediction based on motion model and maneuver recognition,” in 2013

IEEE/RSJ international conference on intelligent robots and systems.

IEEE, 2013, pp. 4363–4369.

[6] S. Qiao, D. Shen, X. Wang, N. Han, and W. Zhu, “A self-adaptive param-

eter selection trajectory prediction approach via hidden markov models,”

IEEE Transactions on Intelligent Transportation Systems, vol. 16, no. 1,

pp. 284–296, 2014.

[7] O. Sharma, N. C. Sahoo, and N. B. Puhan, “Highway discretionary

lane changing behavior recognition using continuous and discrete hidden

markov model,” in 2021 IEEE International Intelligent Transportation

Systems Conference (ITSC). IEEE, 2021, pp. 1476–1481.

[8] M. Treiber, A. Hennecke, and D. Helbing, “Congested trafﬁc states in

empirical observations and microscopic simulations,” Physical review E,

vol. 62, no. 2, p. 1805, 2000.

[9] N. Deo, A. Rangesh, and M. M. Trivedi, “How would surround vehicles

move? a uniﬁed framework for maneuver classiﬁcation and motion

prediction,” IEEE Transactions on Intelligent Vehicles, vol. 3, no. 2,

pp. 129–140, 2018.

[10] F. Altch´

e and A. de La Fortelle, “An lstm network for highway trajectory

prediction,” in 2017 IEEE 20th international conference on intelligent

transportation systems (ITSC). IEEE, 2017, pp. 353–359.

[11] A. Kueﬂer, J. Morton, T. Wheeler, and M. Kochenderfer, “Imitating

driver behavior with generative adversarial networks,” in 2017 IEEE

Intelligent Vehicles Symposium (IV). IEEE, 2017, pp. 204–211.

[12] G. Xie, A. Shangguan, R. Fei, W. Ji, W. Ma, and X. Hei, “Motion

trajectory prediction based on a cnn-lstm sequential model,” Science

China Information Sciences, vol. 63, no. 11, pp. 1–21, 2020.

[13] M. Abdalla, A. Hendawi, H. M. Mokhtar, N. Elgamal, J. Krumm, and

M. Ali, “deepmotions: A deep learning system for path prediction

using similar motions,” IEEE Access, vol. 8, pp. 23881–23 894, 2020.

[14] A. Gupta, J. Johnson, L. Fei-Fei, S. Savarese, and A. Alahi, “Social gan:

Socially acceptable trajectories with generative adversarial networks,”

in Proceedings of the IEEE conference on computer vision and pattern

recognition, 2018, pp. 2255–2264.

[15] L. Lin, W. Li, H. Bi, and L. Qin, “Vehicle trajectory prediction using

lstms with spatial–temporal attention mechanisms,” IEEE Intelligent

Transportation Systems Magazine, vol. 14, no. 2, pp. 197–208, 2021.

[16] M. Fu, T. Zhang, W. Song, Y. Yang, and M. Wang, “Trajectory

prediction-based local spatio-temporal navigation map for autonomous

driving in dynamic highway environments,” IEEE Transactions on

Intelligent Transportation Systems, 2021.

[17] H. Kim, D. Kim, G. Kim, J. Cho, and K. Huh, “Multi-head attention

based probabilistic vehicle trajectory prediction,” in 2020 IEEE Intelli-

gent Vehicles Symposium (IV). IEEE, 2020, pp. 1720–1725.

[18] M. Khakzar, A. Rakotonirainy, A. Bond, and S. G. Dehkordi, “A dual

learning model for vehicle trajectory prediction,” IEEE Access, vol. 8,

pp. 21 897–21 908, 2020.

[19] N. Deo and M. M. Trivedi, “Multi-modal trajectory prediction of sur-

rounding vehicles with maneuver based lstms,” in 2018 IEEE Intelligent

Vehicles Symposium (IV). IEEE, 2018, pp. 1179–1184.

[20] K. Shi, Y. Wu, H. Shi, Y. Zhou, and B. Ran, “An integrated car-

following and lane changing vehicle trajectory prediction algorithm

based on a deep neural network,” Physica A: Statistical Mechanics and

its Applications, vol. 599, p. 127303, 2022.

[21] N. Deo and M. M. Trivedi, “Convolutional social pooling for vehicle

trajectory prediction,” in Proceedings of the IEEE Conference on Com-

puter Vision and Pattern Recognition Workshops, 2018, pp. 1468–1476.

[22] K. Messaoud, I. Yahiaoui, A. Verroust, and F. Nashashibi, “Attention

based vehicle trajectory prediction,” IEEE Transactions on Intelligent

Vehicles, vol. 6, no. 1, pp. 175–185, 2020.

[23] K. Messaoud, I. Yahiaoui, A. Verroust-Blondet, and F. Nashashibi,

“Non-local social pooling for vehicle trajectory prediction,” in 2019

IEEE Intelligent Vehicles Symposium (IV). IEEE, 2019, pp. 975–980.

[24] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez,

Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” in Advances

in neural information processing systems, 2017, pp. 5998–6008.

[25] F. Giuliari, I. Hasan, M. Cristani, and F. Galasso, “Transformer networks

for trajectory forecasting,” in 2020 25th International Conference on

Pattern Recognition (ICPR). IEEE, 2021, pp. 10 335–10 342.

[26] Y. Liu, J. Zhang, L. Fang, Q. Jiang, and B. Zhou, “Multimodal motion

prediction with stacked transformers,” in Proceedings of the IEEE/CVF

Conference on Computer Vision and Pattern Recognition, 2021, pp.

7577–7586.

[27] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training

of deep bidirectional transformers for language understanding,” arXiv

preprint arXiv:1810.04805, 2018.

[28] O. Sharma, N. Sahoo, and N. B. Puhan, “Kernelized convolutional trans-

former network based driver behavior estimation for conﬂict resolution

at unsignalized roundabout,” ISA transactions, 2022.

[29] A. Vassili, C. James, and H. John, “Next generation simulation fact

sheet, washington, dc, usa.” 2007.

Authorized licensed use limited to: University Haifa. Downloaded on December 24,2023 at 07:41:44 UTC from IEEE Xplore. Restrictions apply.

Transformer based composite network for autonomous driving trajectory prediction on multi-lane highways

Article

Full-text available

Apr 2024
APPL INTELL

In order to navigate through complex traffic scenarios safely and efficiently, the autonomous vehicle (AV) predicts its own behavior and future trajectory based on the predicted trajectories of surrounding vehicles to avoid potential collisions. Further, the predicted trajectories of surrounding vehicles (target vehicles) are greatly influenced by their driving behavior and prior trajectory. In this article, we propose a novel Transformer-based composite network to predict both driver behavior and future trajectory of a target vehicle in a highway driving scenario. The powerful multi-head attention mechanism of the transformer is exploited to extract social-temporal interaction between target vehicle and its surrounding vehicles. The prediction of both lateral and longitudinal behavior is carried out within the behavior prediction module, and this additional information is further utilized by the trajectory predictor module to ensure precise trajectory prediction. Furthermore, mixture density network is augmented in the model to handle uncertainties in the predicted trajectories. The proposed model’s performance is compared with several state-of-the-art models on real-world Next Generation Simulation (NGSIM) dataset. The results indicate the superiority of the proposed model over all contemporary state-of-the-art models, as evaluated using Root Mean Square Error (RMSE) metric. The proposed model predicts a 5s long trajectory with an 11% lower RMSE than the state-of-the-art model.

Minimization of Forecast Error Using Deep Learning for Real-Time Heavy Rainfall Events Over Assam

Article

Jan 2024

Predicting Heavy Rainfall Events (HREs) with lead time poses a significant challenge for meteorological agencies, especially in mountainous regions like Assam. In this study, we simulated a real-time HRE that occurred between June 13 and 17, 2023, resulting in severe flooding in Assam. To enhance rainfall prediction, we integrated output from the Weather Research and Forecasting (WRF) model into a Deep Learning (DL) model. When comparing the district-level performance of WRF and DL models, it becomes evident that the DL model excels in capturing HREs with a significant accuracy of 54.4%, outperforming WRF’s accuracy of only 22.8%. The proposed model demonstrates a mean absolute error (MAE) of under 30 mm, outperforming WRF’s more than 50 mm MAE for Days 2-4, as compared with the India Meteorological Department (IMD). Remarkably, the DL model accurately represents rainfall intensity and magnitude in the western and southern parts of Assam. This study is the first of its kind to focus on a district-scale analysis in Assam.

A CNN and Multi-Head Attention-Based Deep Learning Network for Trajectory Prediction of Autonomous Vehicles on Multi-Lane Highways

Conference Paper

Full-text available

Oct 2023

—The autonomous vehicle uses the expected trajectories of nearby vehicles to anticipate its own actions and path, ensuring safe and efficient navigation in complex traffic scenarios. The most influential factors in determining the future trajectory of the target vehicle are its past trajectory and movements. This research introduces a novel approach that combines a Convolutional Neural Network (CNN) and a multi-head attention-based network to predict the trajectory of autonomous vehicles on multi-lane highways. The CNN is employed to extract various time-varying features, whereas the Transformer’s multi-head attention (MHA) effectively captures the space-time interactions between the target and its neighbouring vehicles. Using the NGSIM dataset, the proposed model’s performance is assessed, and compared with sequential models built using recurrent neural networks (RNNs). The results demonstrate that the proposed model outperforms others models by achieving a 10% reduction in Root Mean Square Error (RMSE) for predicting trajectories over 5 seconds duration.

Dynamic Planning of Optimally Safe Lane-change Trajectory for Autonomous Driving on Multi-lane Highways Using a Fuzzy Logic–based Collision Estimator

Article

Full-text available

Nov 2023

The collision avoidance system in an autonomous vehicle, intended to address traffic safety issues, has a crucial function called collision estimation. It accomplishes this by identifying potential dangers and notifying the drivers in advance or by using autonomous control to navigate safely. In this work, a novel approach is proposed for generating and selecting a lane change trajectory for the vehicle in a driving scenario where two vehicles are simultaneously executing lane change processes on highways and approaching the same target lane. Moreover, a novel fuzzy logic estimator based on time-to-collision (TTC) and time-to-gap (TTG) is designed to estimate the collision risk. In the collision avoidance process, the proposed estimator is utilized to determine the risk of a collision with polynomial function-based generation of possible lane change trajectories. The safest lane change trajectory is then provided to the motion controller so that it can navigate the vehicle safely through such a challenging lane change scenario. This work also investigates Stanley and Pure Pursuit controllers to follow the optimized trajectory. The simulation experiment results demonstrate that the proposed approach for dynamic trajectory generation during the lane change process can successfully handle this type of challenging situation and prevent a potential collision. Experimental results also indicate that monitoring the movement of the nearby lane-changing vehicle is crucial for safe lane change execution and that the proposed approach successfully handles the challenging situation preventing potential collision.

Kernelized convolutional transformer network based driver behavior estimation for conflict resolution at unsignalized roundabout

Article

Full-text available

Jul 2022
ISA T

The modelling of driver behavior plays an essential role in developing Advanced Driver Assistance Systems (ADAS) to support the driver in various complex driving scenarios. The behavior estimation of surrounding vehicles is crucial for an autonomous vehicle to safely navigate through an unsignalized intersection. This work proposes a novel kernelized convolutional transformer network (KCTN) with multi-head attention (MHA) mechanism to estimate driver behavior at a challenging unsignalized three-way roundabout. More emphasis has been placed on creating convolution in non-linear space by introducing a kervolution operation into the proposed network. It generalises convolution, improves model capacity, and captures higher-order feature interactions by using Gaussian kernel function. The proposed model is validated using the real-world ACFR dataset, where it outperforms current state-of-the-art in terms of behavior prediction accuracy and provides a significant lead time before potential conflict situations.

Highway Discretionary Lane Changing Behavior Recognition Using Continuous and Discrete Hidden Markov Model

Conference Paper

Full-text available

Sep 2021

An integrated car-following and lane changing vehicle trajectory prediction algorithm based on a deep neural network

Article

Apr 2022
PHYSICA A

Vehicle trajectory prediction is essential for the operation safety and control efficiency of automated driving. Prevailing studies predict car following and lane change processes in a separate manner, ignoring the dependencies of these two behaviors. To remedy this issue, this paper proposes an integrated deep learning-based two-dimension trajectory prediction model that can predict combined behaviors. Specifically, we designed a switch neural network structure based on the attention mechanism, bi-directional long-short term memory (BiLSTM) and Temporal convolution neural network (TCN) to mimic and predict the joint behaviors. Experiments are conducted based on the Next Generation Simulation (NGSIM) dataset to validate the effectiveness of our proposed model. As results indicate, our proposed model outperforms the state-of-art trajectory prediction models and can provide accurate short-term and long-term predictions.

Multimodal Motion Prediction with Stacked Transformers

Conference Paper

Jun 2021

Transformer Networks for Trajectory Forecasting

Conference Paper

Jan 2021

Recent advances in motion and behavior planning techniques for software architecture of autonomous vehicles: A state-of-the-art survey

Article

May 2021
ENG APPL ARTIF INTEL

Autonomous vehicles (AVs) have now drawn significant attentions in academic and industrial research because of various advantages such as safety improvement, lower energy and fuel consumption, exploitation of road network, reduced traffic congestion and greater mobility. In critical decision making process during motion of an AV, intelligent motion planning takes an important and challenging role for obstacle avoidance, searching for the safest path to follow, generation of suitable behavior and comfortable trajectory generation by optimization while keeping road boundaries and traffic rules as important concerns. An AV should also be able to decide the safest behavior (such as overtaking in case of highway driving) at each moment during driving. The behavior planning techniques anticipate the behaviors of all traffic participants; then it reasonably decides the best and safest behavior for AV. For this highly challenging task, many different motion and behavior planning techniques for AVs have been developed over past few decades. The purpose of this paper is to present an exhaustive and critical review of these existing approaches on motion and behavior planning for AVs in terms of their feasibility, capability in handling dynamic constraints and obstacles, and optimality of motion for comfort. A critical evaluation of the existing behavior planning techniques highlighting their advantages, ability in handling of static and dynamic obstacles, vehicle constraints and limitations in operational environments has also been presented.

Trajectory Prediction-Based Local Spatio-Temporal Navigation Map for Autonomous Driving in Dynamic Highway Environments

Article

Feb 2021

Autonomous driving, including intelligent decision-making and path planning, in dynamic environments (like highway) is significantly more difficult than the navigation in static scenarios because of the additional time dimension. Therefore, correlating the time dimension and the space dimension through prediction to create a spatio-temporal navigation map can make decision-making and path planning in such kinds of environment much easier. In this article, NGSIM data is analysed and processed from the perspective of the ego-vehicle (using the data as an ego-vehicle's perception results). Based on the data, we develop an LSTM (Long-Short Term Memory)-based framework to predict possible trajectories of multiple surrounding vehicles within a certain range of the ego-vehicle. Then, the multiple predicted trajectories in a series of continuous dynamic highway scenes are projected into a spatio-temporal domain to create an octree map. Thus, dynamic targets and static obstacles can be unified into the same domain or map so that the dynamic disturbance problem for autonomous driving in highway environments can be resolved. Experimental results show that the proposed model is capable of predicting all the future trajectories around the ego-vehicle efficiently and the corresponding spatio-temporal map can be generated accurately in different dynamic scenarios.

Vehicle Trajectory Prediction Using LSTMs With Spatial–Temporal Attention Mechanisms

Article

Feb 2021

Accurate vehicle trajectory prediction can benefit a variety of intelligent transportation system applications ranging from traffic simulations to driver assistance. The need for this ability is pronounced with the emergence of autonomous vehicles as they require the prediction of nearby vehicles’ trajectories to navigate safely and efficiently. Recent studies based on deep learning have greatly improved prediction accuracy. However, one prominent issue of these models is the lack of model explainability. We alleviate this issue by proposing spatiotemporal attention long short-term memory (STA-LSTM), an LSTM model with spatial-temporal attention mechanisms for explainability in vehicle trajectory prediction. STA-LSTM not only achieves comparable prediction performance against other state-of-the-art models but, more importantly, explains the influence of historical trajectories and neighboring vehicles on the target vehicle. We provide in-depth analyses of the learned spatial–temporal attention weights in various highway scenarios based on different vehicle and environment factors, including target vehicle class, target vehicle location, and traffic density. A demonstration illustrating that STA-LSTM can capture and explain fine-grained lane-changing behaviors is also provided. The data and implementation of STA-LSTM can be found at https://github.com/leilin-research/VTP .

Multi-Head Attention based Probabilistic Vehicle Trajectory Prediction

Conference Paper

Oct 2020

Motion trajectory prediction based on a CNN-LSTM sequential model

Article

Nov 2020

Accurate monitoring the surrounding environment is an important research direction in the field of unmanned systems such as bio-robotics, and has attracted much research attention in recent years. The trajectories of surrounding vehicles should be predicted accurately in space and time to realize active defense and running safety of an unmanned system. However, there is uncertainty and uncontrollability in the process of trajectory prediction of surrounding obstacles. In this study, we propose a trajectory prediction method based on a sequential model, that fuses two neural networks of a convolutional neural network (CNN) and a long short-term memory network (LSTM). First, a box plot is used to detect and eliminate abnormal values of vehicle trajectories, and valid trajectory data are obtained. Second, the trajectories of surrounding vehicles are predicted by merging the characteristics of CNN space expansion and LSTM time expansion; the hyper-parameters of the model are optimized according to a grid search algorithm, which satisfies the double-precision prediction requirement in space and time. Finally, data from next generation simulation (NGSIM) and Creteil roundabout in France are taken as test cases; the correctness and rationality of the method are verified by prediction error indicators. Experimental results demonstrate that the proposed CNN-LSTM method is more accurate and features a shorter time cost, which meets the prediction requirements and provides an effective method for the safe operation of unmanned systems.

Autonomous Vehicle Trajectory Prediction on Multi-Lane Highways Using Attention Based Model

Abstract

Recommended publications

Human-like Decision-Making System for Overtaking Stationary Vehicles Based on Traffic Scene Interpre...

Research on Recognition Methods for Traffic Signs

A CNN and Multi-Head Attention-Based Deep Learning Network for Trajectory Prediction of Autonomous V...

Kernelized convolutional transformer network based driver behavior estimation for conflict resolutio...

Probabilistic Vehicle Trajectory Prediction Based on LSTM Encoder-Decoder and Attention Mechanism

Dynamic Planning of Optimally Safe Lane-change Trajectory for Autonomous Driving on Multi-lane Highw...