Conference PaperPDF Available
HAL Id: hal-02195180
Submitted on 5 Aug 2019
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of sci-
entic research documents, whether they are pub-
lished or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destinée au dépôt et à la diusion de documents
scientiques de niveau recherche, publiés ou non,
émanant des établissements d’enseignement et de
recherche français ou étrangers, des laboratoires
publics ou privés.
Relational Recurrent Neural Networks For Vehicle
Trajectory Prediction
Kaouther Messaoud, Itheri Yahiaoui, Anne Verroust-Blondet, Fawzi
To cite this version:
Kaouther Messaoud, Itheri Yahiaoui, Anne Verroust-Blondet, Fawzi Nashashibi. Relational Recurrent
Neural Networks For Vehicle Trajectory Prediction. ITSC 2019 - IEEE Intelligent transportation
systems conference, Oct 2019, Auckland, New Zealand. �hal-02195180�
Relational Recurrent Neural Networks For Vehicle Trajectory
Kaouther Messaoud1, Itheri Yahiaoui2, Anne Verroust-Blondet1and Fawzi Nashashibi1
Abstract Scene understanding and future motion prediction
of surrounding vehicles are crucial to achieve safe and reliable
decision-making and motion planning for autonomous driving
in a highway environment. This is a challenging task consid-
ering the correlation between the drivers behaviors. Knowing
the performance of Long Short Term Memories (LSTMs) in
sequence modeling and the power of attention mechanism to
capture long range dependencies, we bring relational recurrent
neural networks (RRNNs) to tackle the vehicle motion predic-
tion problem. We propose an RRNNs based encoder-decoder
architecture where the encoder analyzes the patterns underlying
in the past trajectories and the decoder generates the future
trajectory sequence. The originality of this network is that it
combines the advantages of the LSTM blocks in representing
the temporal evolution of trajectories and the attention mech-
anism to model the relative interactions between vehicles. This
paper compares the proposed approach with the LSTM encoder
decoder using the new large scaled naturalistic driving highD
dataset. The proposed method outperforms LSTM encoder
decoder in terms of RMSE values of the predicted trajectories.
It outputs an estimate of future trajectories over 5s time horizon
for longitudinal and lateral prediction RMSE of about 3.34m
and 0.48m, respectively.
For a safe and efficient navigation, autonomous vehicles
need to acquire the ability to analyze and understand different
driving situations. They require information about the future
intentions of surrounding vehicles in order to assess the
driving situation and decide about their own future trajec-
tories accordingly. Predicting the trajectory of a vehicle is a
challenging task since it is highly correlated to other drivers’
behaviors. Many studies tackle this task using traditional
data-driven techniques [1], [2], [3] as well as deep learning
models [4], [5], [6], [7], [8], [9], [10]. LSTMs have shown
great success in modeling temporal data. Therefore, recent
studies [9], [11], [12] use an LSTM based encoder decoder
architecture to model the spatial interactions between neigh-
boring vehicles. However, LSTMs lack the spatio-temporal
structure to capture both, the temporal evolution and the
spatial interactions between vehicles in the driving scene.
As a remedy, this paper proposes the use of a new architec-
ture based on human like reasoning which selectively focuses
attention on a subset of surrounding vehicles and efficiently
retain pieces of information that probably influence his future
trajectory. For instance, a driver intending to make a lane
change focuses more on the vehicles in the target lane.
Therefore, its future trajectory can be more influenced by
1Inria Paris, 2 rue Simone Iff 75012 Paris FRANCE
2CReSTIC, Universit´
e de Reims Champagne-Ardenne, Reims, FRANCE
distant vehicles in the target lane than the close ones in the
other lanes.
The proposed architecture is based on Relational Recurrent
Neural Networks (RRNNs) [13] encoder decoder. It com-
bines the advantages of LSTMs in sequence modeling and
the power of attention mechanism to capture the spatial inter-
vehicles interactions. It is characterized by:
Per block information storing: Input information are
selectively stored into separate interacting blocks based
on their content.
Relational reasoning: Some vehicles are more likely to
be related to or influenced by the other vehicles because
of some features.
No distance constrained analysis: Dependence be-
tween vehicles is not always tied to proximity in space.
Different focusing: Different relations are encoded
based on selective attention to a set of input information.
We use the new publicly available naturalistic vehicle trajec-
tory highD dataset [14] to train and validate our model in
the task of trajectory prediction. Therefore, we compare our
model to LSTM based encoder decode model and we provide
better results in terms of longitudinal and lateral prediction
In their surveys, Lef`
evre et al. [15] and Zhan et al. [16] di-
vide the vehicle behavior forecasting methods into two main
categories based on whether they consider the interactions
between the neighboring vehicles or not.
A. Independent prediction
Independent vehicle motion prediction approaches con-
sider, in their model, only one single vehicle at a time. Early
work predicted future trajectories based on physics evolution
models like Switching Kalman Filters [17], Constant Turn
Rate and Velocity model (CTRV) [18], Interacting Multiple
Models [19] and Intelligent Driver Model (IDM) [20]. They
mainly rely on the low level characteristics of motion. There-
fore, they are constrained to short-term motion prediction.
More recent methods decompose the motion of a vehicle
into a set of patterns or maneuvers. They consider motion
prediction as a multi class classification problem then use the
predicted maneuvers to infer the future trajectory [2]. Yoon
et al. [21] base their motion prediction on the vehicles target
lane and propose three representative trajectories per lane
depending on how fast the vehicle attain that lane. They use
the Multi-Layer Perceptron MLP to estimate the probabilities
of each lane and each of the possible trajectories.
These models are constrained as they do not consider the
influence of the neighboring vehicles on the predicted tra-
B. Interaction aware prediction
1) Inverse Reinforcement Learning (IRL): Drivers
decision-making process can be considered as a Markov
Decision Process (MDPs): Each vehicle, when it moves, it
minimizes a cost function. Sierra Gonzlez et al. [22] deploy
an IRL algorithm to infer the cost function parameters.
Then, they merge it with a heuristic policy model to present
the risk-aversive behavior of drivers. They predict the future
motion by sequentially applying the actions estimated
by this policy. In [23], they combine the driver model
with Dynamic Bayesian Networks (DBN) to represent
interactions between vehicles.
2) Recurrent Neural Networks (RNNs): Recent advance-
ments in sequence modeling is a result of the use of recurrent
neural networks (RNNs). They have shown promising results
in diverse domains such as natural language processing
(NLP) and speech recognition. Long Short Term Memories
(LSTMs) are particular implementations of RNNs. They
propose to model long-term dependencies between input
features. Therefore, they operate by storing, and retrieving
information to learn to relate inputs. Therefore, LSTM based
approaches have been solid candidates to model maneuver
and trajectory prediction.
LSTMs have been recently deployed for driver intention pre-
diction. Different LSTM-based approaches have been used;
A simple LSTM with one or more layers was utilized in [5],
[6], [7], [10]. Xin et al. [8] use a dual LSTM. The first one for
high-level driver intention recognition succeeded by a second
generating the corresponding predicted trajectory. Others [9],
[11] deploy an LSTM encoder decoder architecture. Different
input features are tested. While Lenz et al. [6] inputs to the
LSTM only the current state of the target and a set of its
surrounding vehicles in order to match the Markov Property,
other studies [5], [7], [9] consider the sequence of past
features to provide the model with the temporal evolution
patterns and improve the trajectory prediction. They attribute
to the LSTM the mission of retaining the relevant events and
considering them to generate the predicted trajectory.
Coming to modeling the interactions between surrounding
vehicles, most of existing models [5], [6], [7], [9] implicitly
infer the dependencies between vehicles. They let the LSTM
implicitly learn the influence of surrounding vehicles on the
target vehicle’s motion by introducing a sequence of sur-
rounding vehicles features as inputs to the LSTMs. LSTMs
compress all the received track sequence into a common
hidden vector. This can limit its performance in modeling
the inter-vehicles dependencies.
Attention mechanisms and mainly self-attention [24], have
been used in a lot of novel neural network architectures [24],
[25], [26] due to their good performance at capturing long
range dependencies. Additionally, they reduce the number of
local operations by directly relating distant elements.
In this work, we predict the future trajectory of a target
vehicle by combining the advantages of LSTMs in sequence
modeling and the power of attention mechanism to capture
the spatial inter-vehicles dependencies. To that end, we
bring relational networks based methods to the problem of
interaction aware vehicle motion prediction. RRNNs extend
the LSTM architecture by introducing interactive memory
blocks using Multi-Head Dot Product Attention inside of the
LSTM block.
Our motion prediction results are compared with LSTM
based encoder decoder model.
We aim to predict the future positions of a target vehicle
Tknowing its track history and the track history of its
surrounding vehicles at current time tobs.
A. Inputs and Outputs
We assume that we have as input the track history of
the target and n surrounding vehicles. The input trajectory
of a vehicle iis defined as Xi= [x1
i= (xt
i, yt
i). We note (xt
T, yt
T)the coordinates of the target
vehicle T.
The coordinates are expressed in a stationary frame of
reference where the origin is the position of the target vehicle
at time tobs. The yaxis and xaxis point respectively to
one direction of motion of the freeway and to the direction
perpendicular to it.
We define a 3Dspatial grid Htcomposed of the coordinates
of the target and its surrounding vehicles at time tbased on
their positions at time tobs.
Ht(m, n, :) = δmn(xtobs
i, ytobs
i, yt
i)i∈ AT(1)
δmn(x, y)is an indicator function equal to 1 if and only if
(x, y)is in the cell (m, n),ATis the set of neighboring
vehicles in addition to the target one.
The columns correspond to the three lanes. We consider a
grid size of (13,3) centered on the target vehicle position
and covering a longitudinal distance of 58.5meters (Grid
cell size = 4.5m).
Unlike most of the state of the art works that consider the
vehicles immediately around the target vehicle, we adopt a
grid over the neighboring area. This representation of the
scene has the following advantages:
It models the spatial distances between the vehicles in
the scene and represents the drivable areas.
It enables us to consider different scenarios with differ-
ent numbers of traffic participants.
It preserves the lane structure of the highway.
The output of the model is the sequence of the target
vehicle’s predicted future positions.
Ypred = [ytobs+1
pred ,...,ytobs+tf
pred ]
Where yt
pred = (xt
pred, y t
pred)is the target vehicle’s predicted
Scene Embedding
Memory Mt
Input Embt
Attention Head
Queries Qt
Keys Kt
Values Vt
Fig. 1. Proposed Model (Per lane scene embedding L-RRNN Example)
B. Loss Function
We train the model first by minimizing the root mean
squared error between the real trajectory and the predicted
pred)2+ (yt
Fig. 1 shows our proposed model. It consists of a scene
embedding (cf. IV-C) and RRNNs based encoder and de-
coder (cf. IV-A). It illustrates the per lane scene embedding
L-RRNN described in IV-C.2. After the scene grid embed-
ding, the encoder learns the vehicle motion and captures the
dependencies in the input data using the Relational Memory
Core (RMC) block. For each iteration, the RMC is fed
with the previous memory matrix Mtand the current scene
embedding Embt. It applies the Multi-Head Dot Product
Attention (MHDPA) (cf. IV-B) to provoke the interaction
between memory and input slots. MHDPA operates by
projecting each memory and input slot using row-wise shared
weights Wl
Kand Wl
Vto generate the queries Qt, keys
Ktand values Vtrespectively. The MHDPA module is
followed by row-wise multilayer perceptron MLP, then, the
resultant memory is gated to form the next memory state and
an output vector which are fed to the decoder at tobs. The
decoder, composed of RRNNs, outputs the predicted future
trajectory of the target vehicle.
A. Relational Recurrent Encoder-Decoder
We deploy an encoder decoder architecture in the task of
trajectory prediction:
RRNN encoder: receives the input sequence embed-
ding, extracts the properties of the target vehicle past
trajectory and interaction information, compresses them
in an encoding vector and feeds this vector with the
memory block to the decoder.
RRNN decoder: learns to generate the predicted tra-
jectory based on the received information: At tobs time
step, the decoder has as input the encoding vector and
the memory block. It makes prediction for the next time
steps and generates the next memory block. Then, we
proceed by forming and passing the memory blocks and
reinjecting the decoder’s predictions into the decoder’s
input of the next time step to sequentially generate the
predicted target vehicle positions.
The encoder and decoder are composed of RRNNs. RRNNs
are memory based recurrent neural networks able to perform
relational reasoning between input entities over time. They
are based on iterative information selective storing into
blocks and computing interactions between them. In fact,
each RRNN block contains a number of memory slots where
the pertinent information are stored.
RRNN operate by slicing the memory and the input into slots
and heads and provoking interactions between them. Indeed,
each memory slot is updated each time step based on:
Memory-Memory attention: each memory slot attends
over the other memory slots. This captures the interac-
tions and dependencies in the stored information.
Memory-Input attention: each memory slot attends
over the input embedding slots. Attention enables to
decide which information from the input would be
stored in adequate memory slots based on its relation
to what is already contained in the memory. This infer
inter-vehicles interactions as well.
B. Multi-Head Dot Product Attention (MHDPA):
In each RRNN block, we use linear projections of the
previous memory Mtand the input embedding Embtat
each time step t to generate the queries Qt
Q, keys
l= [Mt;Embt]Wl
Kand values Vt
l= [Mt;Embt]Wl
[Mt;Embt]denotes the row-wise concatenation of Mtand
In order to enable the memory slots to share different infor-
mation and represent different interactions, we use multiple
attention heads. Therefore, we generate h sets of queries,
keys, and values for l= 1..h using different projection
The memory is updated using multi-head dot product at-
tention over the other memory slots and the current input
t+1 =A(Qt
l, Kt
l, V t
l) = softmax(Qt
| {z }
attention weights
lis an update of the memory where each slot is a
weighted sum of the projections of the previous memory
slots and the projections of the current embedding input.
dkis a a scaling factor that corresponds to the dimensionality
of the key vectors.
We apply the attention operation described above for each
head. The resulting memory ˜
Mt+1 is column-wise concate-
nation of the memories ˜
lfor l= 1..h.
We employ a residual connection [27] around the MHDPA
followed an MLP then a second residual connection. These
operations are encapsulated into an LSTM cell as described
in [13]. Therefore, the resultant memory block is gated and
used as next memory state Mt+1.
C. Inputs Embedding
In this work, we use two different ways of embedding the
input data, and then we compare the results of the different
1) Scene embedding (Sc-RRNN): We consider the whole
scene as an input vector. We embed the scene using a fully
connected layer to generate an embedding vector. The vectors
embedding the scene for time steps t= 1, . . . , tobs are
sequentially fed to the RRNN encoder:
Embt= Ψ(Ht;Wemb )
The RRNN implicitly infers the interactions and the depen-
dencies between the input vehicles.
2) Per lane embedding (L-RRNN): We divide the scene
based on lanes to generate an input matrix. We embed
each lane using a fully connected layer Ψto generate an
embedding matrix of size (3, size of embedding ). The
matrix embedding the scene for time steps t= 1, . . . , tobs
are sequentially fed to the RRNN encoder:
Embt(n, :) = Ψ(Ht(:, n, :); Wemb ), n = 1,2,3
This model conserves the lane-wise structure of the road.
It captures the spatio-temporal interactions between vehicles
in the same and adjacent lanes. It performs a lane-based
attention to focus on the lane-changing behavior.
In this model, we consider three memory slots to store lane-
level information.
D. Training and Implementation Details
The input grid is embedded into an embedding vector
or matrix of sizes 64 and (3,64) depending on the input
embedding type. Then, we use the Leaky ReLU activation
function with α= 0.1.
We deploy RRNNs encoder decoder with two memory slots
for Sc-RRNN and three for L-RRNN. Each memory slot is
64 in size. We employ h= 2 parallel attention heads over
projected vectors of size 32. We use a batch size of 128. We
adopt the Adam optimizer [28]. The model is implemented
using PyTorch [29].
A. Dataset
We are the first to use the new publicly available natu-
ralistic vehicle trajectory highD dataset [14] in the task of
trajectory prediction. Previous studies used even personal
dataset or the Next Generation Simulation (NGSIM) [30],
[31] dataset. However, Coifman et al. [32] prove annotations
inaccuracies in the NGSIM dataset. This may result in
physically unrealistic vehicle behaviors. Besides, highD is
bigger than NGSIM. It contains about 12 times as many
vehicles as NGSIM. Therefore, we choose the highD dataset
to train and evaluate our network.
HighD [14] is a new dataset captured in 2017 and 2018.
It is recorded by camera-equipped drones from an aerial
Fig. 2. Highway drone dataset highD [14]
perspective of six different German highways at 25 Hz. It
is composed of 60 recordings of about 17 minutes each,
covering a segment of about 420m of two driving directions
roads (Figure 2). It consists of vehicle position measurements
from six different highways with 110 000 vehicles and a total
driven distance of 45 000 km. This dataset is important since
it has 5 600 recorded complete lane changes and presents
recent driver behaviors.
We split each of the 60 recordings of the highD dataset
into train (75%) and test (25%) sets. Therefore, we include
different driving behaviors at different times of the day and
different locations in both train at test sets. This enhance the
network ability to learn generalized behavior over different
drivers and different driving conditions. Then, we split the
Error Prediction Horizon (s) V-LSTM Sc-LSTM Sc-RRNN L-RRNN
Total 1 0.31 0.32 0.29 0.22
2 0.81 0.82 0.69 0.65
3 1.51 1.60 1.33 1.31
4 2.48 2.63 2.22 2.22
5 3.71 3.87 3.33 3.38
Lateral 1 0.10 0.10 0.08 0.05
2 0.32 0.20 0.18 0.14
3 0.46 0.33 0.30 0.26
4 0.57 0.45 0.43 0.37
5 0.65 0.56 0.53 0.48
Longitudinal 1 0.27 0.31 0.27 0.22
2 0.74 0.79 0.66 0.63
3 1.44 1.57 1.30 1.29
4 2.42 2.59 2.16 2.19
5 3.65 3.83 3.27 3.34
trajectories into segments of 8s composed of a track history
of 3s and a prediction horizon of 5s. We downsample each
segment to get only 5 fps to reduce the complexity of the
B. Evaluation Metric
We use the predicted trajectories for the Root of the Mean
Squared Error (RMSE) calculation. The RMSE averages the
distance between predicted positions and the ground truth.
We consider also the longitudinal and lateral errors to be able
to infer further information about the error on lane change
C. Models Compared
We compare our proposed models with LSTM based
encoder decoder architecture. For a fair comparison, we
consider an LSTM having the same total memory size of
the relational memory (Sc-RRNN) we have used.
Vanilla LSTM (V-LSTM): an encoder decoder LSTM
based model. It uses the track history of the target
vehicle in the encoder LSTM and generates the output
trajectory with the LSTM decoder. This represents an
independent trajectory prediction model.
Scene LSTM Encoder Decoder (Sc-LSTM): an en-
coder decoder based model where the encoder encodes
the trajectories of the target and surrounding vehicles.
The encoding vector is fed to the decoder which gen-
erates trajectory predictions.
Relational Recurrent Neural Network with scene
embedding (Sc-RRNN) model described in this paper.
Relational Recurrent Neural Network with per lane
embedding (L-RRNN) model described in this paper.
D. Results
Table I shows the RMSE values for the models being
compared. First, we observe that Sc-LSTM and V-LSTM
comparable total RMSE error. While Sc-LSTM produces
better lateral error, it has larger longitudinal error than V-
LSTM. This can imply that the LSTM has limited capability
in capturing the effects of surrounding vehicles on predicting
the future motion of the target vehicle. This also proves
the effectiveness of considering neighboring vehicles in the
prediction of the lateral motion of the target vehicle.
Both proposed methods, Sc-RRNN and L-RRNN, lead to
further improvement in prediction error, suggesting the im-
portance of the use of the multiple memory slots and
the attention across these memories in the task of motion
prediction. We also note that the improvement produced by
the use of RRNNs seem to be more remarkable for longer
prediction horizons. This implies that LSTM has limited
capacity to perform long term relational reasoning.
Additionally, the per-lane embedding of the scene has pro-
duced lower lateral error. This can infer that explicit lane-
wise division of the scene and the memory-input slots inter-
actions via MHDPA give additional information about inter-
lane dependencies. However, we assume that considerable
further analysis of the architecture is needed using other
metrics able to evaluate lane changes detection for example
to conclude about the best way to embed the scene. Besides,
the memory update should be studied over time for additional
proof of the model performance.
In this work, we presented a novel way to tackle the task
of long-term (5s) trajectory prediction on highway using
relational recurrent neural networks RRNNs. This approach
combines the advantages of multi-head dot product attention
mechanism and LSTMs to capture the spatio-temporal
dependencies between the input tracks. The proposed model
provided competitive results with the state-of-the-art on
the naturalistic driving large scale highD dataset based on
the RMSE metric for both longitudinal and lateral position
The deployed architecture represents a promising way for
motion prediction of surrounding vehicles for autonomous
vehicle. We believe that it can be extended and utilized to
further improve vehicle motion prediction in various driving
scenarios such as intersections and roundabouts. Moreover,
and part of our future work, we plan to extend and validate
the proposed approach to consider heterogeneous and mixed
traffic scenarios with different road agents such as buses,
trucks, cars, scooters, bicycles, or pedestrians.
The work presented in this paper has been financially
supported by PIA French project CAMPUS (Connected
Automated Mobility Platform for Urban Sustainability).
[1] J. Schlechtriemen, A. Wedel, J. Hillenbrand, G. Breuel, and K. Kuh-
nert, “A lane change detection approach using feature ranking with
maximized predictive power,” in 2014 IEEE Intelligent Vehicles Sym-
posium Proceedings, June 2014, pp. 108–114.
[2] A. Houenou, P. Bonnifait, V. Cherfaoui, and W. Yao, “Vehicle trajec-
tory prediction based on motion model and maneuver recognition,”
in 2013 IEEE/RSJ International Conference on Intelligent Robots and
Systems, Nov 2013, pp. 4363–4369.
[3] N. Deo, A. Rangesh, and M. M. Trivedi, “How would surround
vehicles move? A unified framework for maneuver classification and
motion prediction,” IEEE Transactions on Intelligent Vehicles, vol. 3,
no. 2, pp. 129–140, June 2018.
[4] A. Khosroshahi, E. Ohn-Bar, and M. M. Trivedi, “Surround vehicles
trajectory analysis with recurrent neural networks,” in 2016 IEEE
19th International Conference on Intelligent Transportation Systems
(ITSC), Nov 2016, pp. 2267–2272.
[5] D. J. Phillips, T. A. Wheeler, and M. J. Kochenderfer, “Generalizable
intention prediction of human drivers at intersections,” in 2017 IEEE
Intelligent Vehicles Symposium (IV), June 2017, pp. 1665–1670.
[6] D. Lenz, F. Diehl, M. T. Le, and A. Knoll, “Deep neural networks for
markovian interactive scene prediction in highway scenarios,” in 2017
IEEE Intelligent Vehicles Symposium (IV), June 2017, pp. 685–692.
[7] F. Altch´
e and A. de La Fortelle, “An LSTM network for highway
trajectory prediction,” in 2017 IEEE 20th International Conference on
Intelligent Transportation Systems (ITSC), Oct 2017, pp. 353–359.
[8] L. Xin, P. Wang, C. Chan, J. Chen, S. E. Li, and B. Cheng, “Intention-
aware long horizon trajectory prediction of surrounding vehicles using
dual LSTM networks,” in 2018 21st International Conference on
Intelligent Transportation Systems (ITSC), Nov 2018, pp. 1441–1446.
[9] N. Deo and M. M. Trivedi, “Multi-modal trajectory prediction of
surrounding vehicles with maneuver based LSTMs,” in 2018 IEEE
Intelligent Vehicles Symposium (IV), June 2018, pp. 1179–1184.
[10] A. Zyner, S. Worrall, J. Ward, and E. Nebot, “Long short term
memory for driver intent prediction,” in 2017 IEEE Intelligent Vehicles
Symposium (IV), June 2017, pp. 1484–1489.
[11] S. H. Park, B. Kim, C. M. Kang, C. C. Chung, and J. W. Choi,
“Sequence-to-sequence prediction of vehicle trajectory via LSTM
encoder-decoder architecture,” in 2018 IEEE Intelligent Vehicles Sym-
posium (IV), June 2018, pp. 1672–1678.
[12] H. Misawa, K. Takenaka, T. Sugihara, H. Liu, T. Taniguchi, and
T. Bando, “Prediction of driving behavior based on sequence to se-
quence model with parametric bias,” in 2017 IEEE 20th International
Conference on Intelligent Transportation Systems (ITSC), Oct 2017,
pp. 1–6.
[13] A. Santoro, R. Faulkner, D. Raposo, J. Rae, M. Chrzanowski, T. Weber,
D. Wierstra, O. Vinyals, R. Pascanu, and T. Lillicrap, “Relational re-
current neural networks,” in Advances in Neural Information Process-
ing Systems 31, S. Bengio, H. Wallach, H. Larochelle, K. Grauman,
N. Cesa-Bianchi, and R. Garnett, Eds., 2018, pp. 7299–7310.
[14] R. Krajewski, J. Bock, L. Kloeker, and L. Eckstein, “The highD
dataset: A drone dataset of naturalistic vehicle trajectories on german
highways for validation of highly automated driving systems,” in 2018
21st International Conference on Intelligent Transportation Systems
(ITSC), Nov 2018, pp. 2118–2125.
[15] S. Lef`
evre, D. Vasquez, and C. Laugier, “A survey on motion
prediction and risk assessment for intelligent vehicles,” ROBOMECH
Journal, vol. 1, no. 1, pp. 1–14, 2014.
[16] W. Zhan, A. L. de Fortelle, Y. Chen, C. Chan, and M. Tomizuka,
“Probabilistic prediction from planning perspective: Problem formu-
lation, representation simplification and evaluation metric,” in 2018
IEEE Intelligent Vehicles Symposium (IV), June 2018, pp. 1150–1156.
[17] H. Veeraraghavan, N. Papanikolopoulos, and P. Schrater, “Determinis-
tic sampling-based switching kalman filtering for vehicle tracking,” in
2006 IEEE Intelligent Transportation Systems Conference, Sep. 2006,
pp. 1340–1345.
[18] A. Polychronopoulos, M. Tsogas, A. J. Amditis, and L. Andreone,
“Sensor fusion for predicting vehicles’ path for collision avoidance
systems,” IEEE Transactions on Intelligent Transportation Systems,
vol. 8, no. 3, pp. 549–562, Sep. 2007.
[19] R. Toledo-Moreo and M. A. Zamora-Izquierdo, “Imm-based lane-
change prediction in highways with low-cost gps/ins,IEEE Trans-
actions on Intelligent Transportation Systems, vol. 10, no. 1, pp. 180–
185, March 2009.
[20] M. Liebner, M. Baumann, F. Klanner, and C. Stiller, “Driver intent
inference at urban intersections using the intelligent driver model,” in
2012 IEEE Intelligent Vehicles Symposium, June 2012, pp. 1162–1167.
[21] S. Yoon and D. Kum, “The multilayer perceptron approach to lateral
motion prediction of surrounding vehicles for autonomous vehicles,
in 2016 IEEE Intelligent Vehicles Symposium (IV), June 2016, pp.
[22] D. Sierra Gonz´
alez, J. S. Dibangoye, and C. Laugier, “High-speed
highway scene prediction based on driver models learned from demon-
strations,” in 2016 IEEE 19th International Conference on Intelligent
Transportation Systems (ITSC), Nov 2016, pp. 149–155.
[23] D. Sierra Gonz´
alez, V. Romero-Cano, J. S. Dibangoye, and C. Laugier,
“Interaction-aware driver maneuver inference in highways using re-
alistic driver models,” in 2017 IEEE International Conference on
Intelligent Transportation Systems (ITSC), Oct. 2017, pp. 1–8.
[24] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N.
Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,” in
Neural Information Processing Systems (NIPS), 2017.
[25] P. Velickovic, G. Cucurull, A. Casanova, A. Romero, P. Li`
o, and
Y. Bengio, “Graph attention networks,” in 6th International Con-
ference on Learning Representations, ICLR 2018, Vancouver, BC,
Canada, April 30 - May 3, 2018, Conference Track Proceedings, 2018.
[26] H. Zhang, I. J. Goodfellow, D. N. Metaxas, and A. Odena, “Self-
attention generative adversarial networks,CoRR, vol. abs/1805.08318,
2018. [Online]. Available:
[27] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for
image recognition,” in 2016 IEEE Conference on Computer Vision
and Pattern Recognition (CVPR), June 2016, pp. 770–778.
[28] D. P. Kingma and J. Ba, “Adam: A method for stochastic
optimization,” CoRR, vol. abs/1412.6980, 2014. [Online]. Available:
[29] A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito,
Z. Lin, A. D. L. Antiga, and A. Lerer, “Automatic differentiation in
pytorch,” in NIPS 2017 Autodiff Workshop: The Future of Gradient-
based Machine Learning Software and Techniques, Dec. 2017.
[30] J. Colyar and J. Halkias, “Us highway 101 dataset.” in Federal High-
way Administration (FHWA), Tech. Rep. FHWA-HRT07-030, 2007.
[31] J. Colyar and J. Halkias, “Intersate 80 freeway dataset.” in Federal
Highway Administration (FHWA), Tech. Rep. FHWA-HRT-06-137,
[32] B. Coifman and L. Li, “A critical evaluation of the next generation sim-
ulation (ngsim) vehicle trajectory dataset,” Transportation Research
Part B: Methodological, vol. 105, pp. 362–377, 11 2017.
... Local Potential City Wide Socio-Demographic Travel Demand Generation [32][33][34] LSTM [22][23][24][25] LSTM [26] GAN [14] [ [26][27][28] Pairwise Reorganization [29] Many data-driven approaches are developed, especially for generating general trajectories, without including personal information or biases between groups. These approaches are scalable, from generating pedestrian trajectories [22] to city-wide traffic counts [14]. ...
... The scope of this work is on generating individual vehicle-based trajectories, so this factor needs to be further discussed. The task can be distinguished between short-term trajectories, for example, by Park et al. [24] or Messaoud et al. [25] with a prediction horizon under 10 seconds and large-scale trajectories over a whole geographical region and a time horizon around hours to days. The short-term solutions use LSTM architectures and are mainly used to prevent collisions. ...
Full-text available
The proposed method deals with the problem of data privacy and sharing when processing personal mobility tracking data. Previous methods concentrate on producing highly detailed data on short-term and restricted areas, e.g. for autonomous driving scenarios. Another possibility consists of city-wide scales and beyond, that are used to predict general traffic flows. The presented approach takes the tracked mobility behavior of individuals to create coherent new mobility data that reflects the long-term mobility behavior of the person, guaranteeing location persistency and sound embedding within the point-of-interest structure of the observed area. After an analysis and clustering step with the original data, the area is distributed into a geospatial grid structure (H3 is used here), and the neighborhood relationships between the grids are interpreted as a graph. A feed-forward-autoencoder and a graph encoding-decoding network generate a latent space representation of the area. The original clustered data is associated with their respective H3 grids. With a greedy algorithm approach and concerning privacy strategies, new combinations of grids are top-level patterns for individual mobility behavior. Concrete locations are found and connected within the grids based on the original data. The described method is then applied to a study with 1000 participants from the city of Munich in Germany, and the results are described, showing the application of the approach in generating synthetic data, enabling further research on individual mobility behavior and patterns. The result is a sharable dataset on the same abstraction level as the input data, which makes it interesting, particularly for machine learning applications.
... They proposed a hybrid DL model VOLUME 11, 2023 that combines Convolutional and Recurrent Neural Networks (RNNs) to enhance the detection of such obstacles. Messaoud and al [24] developed a relational deep learning framework aimed at understanding and interpreting complex weatheraffected scenes. Zeng and al [25] proposed an adaptive neural network model designed to dynamically adjust its parameters based on real-time weather conditions. ...
Full-text available
In recent years, there has been a significant increase in the development of autonomous vehicles. One critical task for ensuring their safety and dependability, is obstacle avoidance in challenging weather conditions. However, no studies have explored the use of data augmentation to generate training data for Deep learning (DL) models aimed at navigating obstacles in extreme weather conditions. This study makes a substantial contribution to the field of autonomous vehicle obstacle avoidance by introducing an innovative approach that utilizes a Generative Adversarial Network (GAN) model for data augmentation, with the objective of enhancing the accuracy of DL models. The use of a GAN model to generate a training dataset and integrate images depicting challenging weather conditions has been pivotal in enhancing the accuracy of the DL models. The extensive training dataset, consisting of 64,336 images, was created using three cameras installed in VSim-AV, an autonomous vehicle simulator, thereby ensuring a diverse and comprehensive dataset for training purposes. Three DL models (ResNet50, ResNet101, and VGG16 transfer learning) were trained on this dataset both before and after applying the data augmentation techniques. The performance of the augmented models was evaluated in a real-time environment using the autonomous mode of the VSim-AV simulator. The testing phase resulted in the highest accuracy rate of 97.2% when employing Resnet101 following the implementation of GAN. It was observed that the autonomous car could navigate without any collisions, showcasing a remarkable reaction time of 0.105 seconds, thus affirming the effectiveness of the approach. The comparison between the original and augmented datasets demonstrate the originality and value of this study, showcasing its significant contribution to the advancement of autonomous vehicle obstacle avoidance technology. This paper makes significant advances to the field of autonomous vehicle navigation by exploiting Generative Adversarial Networks (GANs) to improve obstacle avoidance capabilities in severe weather conditions, hence increasing safety and dependability in real-world applications.
... In the multi-agent tensor fusion (MATF) network [89], multiple vehicles' previous trajectories and scene context are encoded into a multi-vehicle tensor, which is then decoded recurrently to predict future trajectories using an ED-LSTM model. Attention-based ED-LSTM frameworks that correlate the time dimension and space dimension in order to produce a spatial-temporal navigation map are proposed in [58,[90][91][92]. Interaction-aware spatio-temporal graph neural networks (GNNs) have been proposed in the literature for pedestrian trajectory prediction [93][94][95][96][97]. [98] proposed a multi-scale GNN with temporal features architecture to capture comprehensive spatio-temporal correlations for the trajectory prediction problem. ...
Full-text available
In order to navigate through complex traffic scenarios safely and efficiently, the autonomous vehicle (AV) predicts its own behavior and future trajectory based on the predicted trajectories of surrounding vehicles to avoid potential collisions. Further, the predicted trajectories of surrounding vehicles (target vehicles) are greatly influenced by their driving behavior and prior trajectory. In this article, we propose a novel Transformer-based composite network to predict both driver behavior and future trajectory of a target vehicle in a highway driving scenario. The powerful multi-head attention mechanism of the transformer is exploited to extract social-temporal interaction between target vehicle and its surrounding vehicles. The prediction of both lateral and longitudinal behavior is carried out within the behavior prediction module, and this additional information is further utilized by the trajectory predictor module to ensure precise trajectory prediction. Furthermore, mixture density network is augmented in the model to handle uncertainties in the predicted trajectories. The proposed model’s performance is compared with several state-of-the-art models on real-world Next Generation Simulation (NGSIM) dataset. The results indicate the superiority of the proposed model over all contemporary state-of-the-art models, as evaluated using Root Mean Square Error (RMSE) metric. The proposed model predicts a 5s long trajectory with an 11% lower RMSE than the state-of-the-art model.
... to non-linear state space models [23][24][25]. Modern data-driven methods such as support vector machines [26], recurrent neural networks [27], and causality representation theory [28] had further enriched the toolkit of state estimation. ...
Full-text available
Aiming to improve the estimation and prediction accuracy of a target's position, this paper proposes a state estimation method for photoelectric tracking systems, based on the evaluation of the tracked target's motion intention. Traditional photoelectric tracking systems utilize external physical quantities such as the position, velocity, and acceleration of the target as the estimated states. While this method can output good results for pre‐modelled target positions, it struggles to maintain the accuracy when facing manoeuvering targets or complex motion patterns targets. Here, the relevant parameters of the tracked target's motion intention are directly estimated innovatively, like estimating the circling point position rather than the circular flying target's position and velocity. This approach enables recognizing the target's motion intention and leads to precise estimation, which specifically consists of an interacting multiple model approach, multiple unscented Kalman estimators, and a robust estimator. The effectiveness and stability of this estimator are validated through software simulations and experiments on a dual‐reflection mirror platform.
... Autonomous vehicles employ onboard sensors and advanced algorithms to perceive their surrounding environment, infer the intentions of circumambient traffic participants, and conduct highprecision estimation of their future trajectories. This process can help autonomous vehicles better understand future driving environments and make better decisions [1]. In real-world complex traffic scenarios, the future trajectories of vehicles are contingent not only on their historical paths but also on the uncertain, multimodal, and intricate agent-to-agent and agent-to-space interactions [2]. ...
Full-text available
Accurate trajectory prediction for multiple vehicles in complex social interaction environments is essential for ensuring the safety of autonomous vehicles and improving the quality of their planning and control. The social interactions between vehicles significantly influence their future trajectories. However, traditional trajectory prediction models based on Recurrent Neural Networks (RNN) or Convolutional Neural Networks (CNN) often overlook or simplify these interactions. Although these models may exhibit high performance in short-term predictions, they fail to achieve high prediction accuracy in scenarios with long-term dynamic interactions. To address this limitation, we propose a Social-Attention Long Short-Term Memory (LSTM) model which predicts the future trajectories of neighboring vehicles and achieves increased accuracy. Our proposed model employs a Social-Pooling layer to effectively capture cooperative behaviors and mutual influences between vehicles. Additionally, we incorporate a self-attention mechanism to weight the inputs and outputs of the Social-Pooling layer, which is significant for assessing the influence between vehicles in different positions. This combination allows our model to take into consideration both the dependencies within the sequence and the social relationships between vehicles, providing a more comprehensive scene understanding. The efficacy of our model is tested on two real-world freeway trajectory datasets, namely NGSIM and HighD. Our model surpasses various baseline methods, exhibiting exceptional accuracy in both prediction and tracking.
Predicting vehicle trajectories using deep learning has seen substantial progress in recent years. However, making autonomous vehicles pay attention to their surrounding vehicles with the consideration of social interaction remains an open problem, especially in long-term prediction scenarios. Unlike autonomous vehicles, human drivers continuously observes and analyzes interactive information between their vehicle and other traffic participants for long-term route planning. To alleviate the challenge that the trajectory prediction should be interaction-aware, this study proposes a multi-head attention mechanism to boost the trajectory prediction performance by globally exploiting the interactive information. The multi-dimensional spatial interactive information encoded with the vehicle type and size can assign different weights of surrounding vehicles to realize the interaction of diverse trajectories. Furthermore, the model is based on a simple data pre-processing method, surpassing the traditional grid data processing approach. In the experiment, the proposed model achieves significant prediction performance. Surprisingly, this proposed multi-head trajectory prediction model outperforms state-of-the-art models, particularly in long-term prediction metrics. The code for this model is accessible at: attention.
Conference Paper
Full-text available
Abstract—To safely and efficiently navigate through complex traffic scenarios, autonomous vehicles need to have the ability to predict the future motion of surrounding vehicles. Multiple interacting agents, the multi-modal nature of driver behavior, and the inherent uncertainty involved in the task make motion prediction of surrounding vehicles a challenging problem. In this paper, we present an LSTM model for interaction aware motion prediction of surrounding vehicles on freeways. Our model assigns confidence values to maneuvers being performed by vehicles and outputs a multi-modal distribution over future motion based on them. We compare our approach with the prior art for vehicle motion prediction on the publicly available NGSIM US-101 and I-80 datasets. Our results show an improvement in terms of RMS values of prediction error. We also present an ablative analysis of the components of our proposed model and analyze the predictions made by the model in complex traffic scenarios
Conference Paper
Full-text available
Abstract: Prediction of driving behavior is important in safer advanced driving assistance systems to prevent potential risk of traffic accidents. To predict the driving behavior, driving style, i.e., the difference of tendency of driving behavior for each driver, is also important. We propose a prediction method for time-series driving behavior based on the driving style. The proposed method consists of a sequence-to-sequence (S2S) model and embeds driver information considering with the driving style into S2S model in order to improve prediction performance. We evaluated efficiency of the proposed method by two experiments using actual driving behavior data on a test track. We found that the proposed method could predict driving behavior better than comparative methods. We also found that the proposed method could change the driving style from one driver to another driver by changing driver information input to the proposed model.
In this paper, we propose a deep learning-based vehicle trajectory prediction technique which can generate the future trajectory sequence of the surrounding vehicles in real time. We employ the encoder-decoder architecture which analyzes the pattern underlying in the past trajectory using the long short term memory (LSTM)-based encoder and generates the future trajectory sequence using the LSTM-based decoder. This structure produces the $K$ most likely trajectory candidates over occupancy grid map by employing the {\it beam search} technique which keeps the $K$ locally best candidates from the decoder output. The experiments conducted on highway traffic scenarios show that the prediction accuracy of the proposed method is significantly higher than the conventional trajectory prediction techniques.