ArticlePDF Available

Kernelized convolutional transformer network based driver behavior estimation for conflict resolution at unsignalized roundabout

Authors:

Abstract

The modelling of driver behavior plays an essential role in developing Advanced Driver Assistance Systems (ADAS) to support the driver in various complex driving scenarios. The behavior estimation of surrounding vehicles is crucial for an autonomous vehicle to safely navigate through an unsignalized intersection. This work proposes a novel kernelized convolutional transformer network (KCTN) with multi-head attention (MHA) mechanism to estimate driver behavior at a challenging unsignalized three-way roundabout. More emphasis has been placed on creating convolution in non-linear space by introducing a kervolution operation into the proposed network. It generalises convolution, improves model capacity, and captures higher-order feature interactions by using Gaussian kernel function. The proposed model is validated using the real-world ACFR dataset, where it outperforms current state-of-the-art in terms of behavior prediction accuracy and provides a significant lead time before potential conflict situations.
ISA Transactions 133 (2023) 13–28
Contents lists available at ScienceDirect
ISA Transactions
journal homepage: www.elsevier.com/locate/isatrans
Research article
Kernelized convolutional transformer network based driver behavior
estimation for conflict resolution at unsignalized roundabout
Omveer Sharma, N.C. Sahoo , Niladri B. Puhan
School of Electrical Sciences, Indian Institute of Technology Bhubaneswar, Odisha, India
article info
Article history:
Received 12 February 2022
Received in revised form 6 July 2022
Accepted 6 July 2022
Available online 16 July 2022
Keywords:
Intelligent vehicle
Deep learning
Driver behavior
Roundabout
Convolutional neural network
Attention mechanism
abstract
The modeling of driver behavior plays an essential role in developing Advanced Driver Assistance
Systems (ADAS) to support the driver in various complex driving scenarios. The behavior estimation
of surrounding vehicles is crucial for an autonomous vehicle to safely navigate through an unsignalized
intersection. This work proposes a novel kernelized convolutional transformer network (KCTN) with
multi-head attention (MHA) mechanism to estimate driver behavior at a challenging unsignalized
three-way roundabout. More emphasis has been placed on creating convolution in non-linear space by
introducing a kervolution operation into the proposed network. It generalizes convolution, improves
model capacity, and captures higher-order feature interactions by using Gaussian kernel function. The
proposed model is validated using the real-world ACFR dataset, where it outperforms current state-of-
the-art in terms of behavior prediction accuracy and provides a significant lead time before potential
conflict situations.
©2022 ISA. Published by Elsevier Ltd. All rights reserved.
1. Introduction
Future driver assistance and safety systems for vehicles will
incorporate contextual information such as driver’s intention and
expected behavior to achieve predictive driving. As driver behav-
ior cannot be measured directly, it is a challenging task to collect
this information making it an open field of research [17]. The
behavior estimation functionality of a safety system is needed to
predict future position of surrounding traffic participants based
on current and past observations of traffic environment [814].
While driving behavior is highly concentrated under organized
conditions, especially with strictly graded paths and traffic lights,
unmarked street scenes are significantly less focused. Driving at
intersections with no traffic lights is challenging and necessitates
significant interaction and estimation from other drivers. A wide
range of driving styles that a driver might display while navigat-
ing through an unsignalized intersection, adds to complexity of
the problem. One such intersection is the roundabout, extensively
found in numerous urban areas.
The driver intention estimation at the intersection is a widely
studied problem. The approaches like Hidden Markov Models
(HMMs) [15,16], Support Vector Machines (SVM) [17] and Deep
learning-based networks [18] have been applied successfully for
intention prediction. Amsalu and Homaifar [15] used HMMs with
Corresponding author.
E-mail addresses: os10@iitbbs.ac.in (O. Sharma), ncsahoo@iitbbs.ac.in
(N.C. Sahoo), nbpuhan@iitbbs.ac.in (N.B. Puhan).
Hybrid State System (HSS) framework to estimate turning be-
haviors near intersections. Aoude et al. [17] also used SVM and
HMM to estimate compliant and violating behaviors at road in-
tersection. A fuzzy logic (FL) based model is proposed to describe
driver behavior in dilemma zone of high-speed signalized inter-
sections [19]. It may be noted that such a model works well for
specific feature patterns. The driver behavior is heavily dependent
on vehicle’s prior states (trajectory) so that driver behavior pre-
dictor model should be able to learn the sequential pattern and
intelligently propagate past information to estimate behavior in
current state. In [20], authors proposed a closed Lagrange based
solution that uses a reciprocal collaborative process in terms of
time and cost to solve collision detection and avoidance.
More recently, deep learning-based techniques have been
explored to deal with complex environments such as intersec-
tions [21]. Jeong et al. [22] designed a Model Predictive Control
(MPC), and long short-term memory (LSTM) based motion plan-
ner to determine acceleration commands on predicted states
of surrounding vehicles at three-way intersections. The LSTM
models have been utilized to classify vehicle tracks into turn-left,
turn-right and go-straight at both three-way and four-way square
intersections [2325]. Zhang and Fu [26] used a bi-directional
LSTM (Bi-LSTM) network to recognize turning behavior and
achieved 94.2% and 93.5% accuracies after 2s and 1s, respectively,
at four-way intersection. Liu et al. [27] used LSTM to classify
driving intents at the intersection with inputs of acceleration,
speed and brake pedal, and achieved a prediction accuracy of
95.3%.
https://doi.org/10.1016/j.isatra.2022.07.004
0019-0578/©2022 ISA. Published by Elsevier Ltd. All rights reserved.
O. Sharma, N.C. Sahoo and N.B. Puhan ISA Transactions 133 (2023) 13–28
These works are either in a well-structured intersection or in
relays to convey driver’s intentions via vehicle-to-vehicle com-
mutation. However, behavior in unsignalized roundabout is dif-
ferent from a normal intersection [2830]. Zyner et al. [31] used
LSTM model to predict exit destination of vehicles at an unsignal-
ized roundabout by using position, speed and heading of the
target vehicle. The authors also presented ACFR dataset of five
unsignalized roundabouts, a collection of over 23,000 vehicle
trajectories [32]. Zyner et al. [33] introduced clustering process
in conjunction with LSTM model that extracts possible paths in
prediction output and sorts them by probability scores. The opti-
mization process is integrated into reinforcement learning frame-
work to simultaneously decide behavior, desired acceleration, and
action time at the roundabout [34].
The driver’s behavior is highly influenced by vehicle’s prior
states (trajectory), so driver behavior predictor model should be
able to learn the sequential pattern and intelligently propagate
past data to forecast behavior in current state. The RNN (recurrent
neural network)-based models are commonly employed to deal
with sequential data in case of time series prediction and classifi-
cation problems [35], although they have two drawbacks. Firstly,
such models suffer from gradient vanishing problem for long in-
put sequences. Secondly, RNN-based models propagate input in a
regressive manner; as a result, models require more computation
time. The length of input sequence has a significant impact on
computation time of such models. In driver behavior modeling,
long input sequences increase computation time and decrease
prediction accuracy. More recently, memory mechanism for RNN
processing in a sequential step-by-step manner and their ability
to replicate social interaction have also been analyzed [3640].
Transformer (TF) based models have become popular in recent
years for overcoming sequence-to-sequence learning difficulties.
Unlike RNN-based models, TF accepts complete input sequence
at once. As a result, the length of input sequence has little effect
on their computation time, and it can handle gradient vanishing
problem. Using an attention mechanism in a TF network, the
authors came up with a new approach to address this sequential
memory mechanism [41]. The distinction between TF and RNN is
that RNN processes observations sequentially before attempting
to forecast auto-regressively, whereas TF ‘‘considers’’ all available
observations and weights them via an attention mechanism.
The generic TF model is deployed for pedestrian trajectory
prediction where the problem is formulated as sequence-to-
sequence learning like in natural language processing (NLP)
[4244]. However, TF-based modeling has not been explored for
driver behavior prediction, which is a sequence-to-point (clas-
sification) problem. In this work, the generic TF is modified to
perform driver behavior classification task. This work presents
a novel Convolutional Transformer Network (CTN) to estimate
driver intent at an unsignalized roundabout using a short seg-
ment of tracking data that can be collected using overhead cam-
eras and Lidars in real-time. The proposed network is inspired by
recent effectiveness of TF network in natural language processing
[41,45] and takes benefit of TF’s ability to process entire input
sequence in a single time step with less computation time.
The novelty/contribution of present work is in improvements
carried out on generic TF to address driver behavior prediction
task. In particular, the following modifications have been carried
out on generic TF model.
1. The decoder layer of TF is replaced by fully connected
layers. This is done because there is no need for a decoder
as the model’s output is not sequential.
2. A large number of features is a key component of an
effective TF model (similar to word embedding layer in
NLP). For example, the row input sequence only contains
four features (two location coordinates, heading direction,
and speed). Accordingly, a convolutional neural network
(CNN) is deployed to extract multiple temporal dependant
features in high-dimensional space.
3. The activation layer can only give point-wise non-linearity
in CNN-based models [4649]. To increase proposed CTN’s
capacity to establish convolution in non-linear space, ker-
volution (kernel convolution) methodology is adopted. The
higher-order non-linear feature maps improve discrimina-
tion of subsequent linear classifiers [50]. Thus the pro-
posed model integrated benefits of both TF and kernel-
ized convolution-based feature extraction techniques. As
a result, compared to an RNN-based model, the proposed
model is more accurate in predicting driver behavior and
takes nearly half the time for computation.
In this work, the proposed CTN and KCTN are tested on real-
world data and results demonstrate that the proposed models
are suitable for driver’s intent classification task and outperform
state-of-the-art prediction techniques. It is expected that these
models can be applied in conjunction with positional informa-
tion of other vehicles to achieve a higher degree of safety in
autonomous driving. The proposed models can be integrated
into autonomous vehicle’s advanced driver assistance systems
(ADAS) to test real-world performance. The contributions of this
study are primarily based on above mentioned ideas. Below we
summarize our main technological contributions:
1. By exploiting strength of TF, the proposed CTN and KCTN
require less computation time than other sequential net-
works (Eg. LSTM, GTN, Bi-LSTM).
2. Kernel trick is employed to generalize convolution process
in nonlinear space. This enables TF encoder layer to extract
intermediate time features of vehicle trajectories.
3. The proposed CTN and KCTN are evaluated to forecast
drivers’ intentions using a real-world dataset on the round-
about. The experiments illustrate that the proposed CTN
and KCTN perform better than existing RNN-based models.
4. In NLP, the TF has proven its effectiveness for sequence-to-
sequence learning. However, this study attempts to address
the way for future improvements in TF-based models for
driver behavior prediction, which can be viewed as a time
series classification problem.
Thus the proposed model integrates benefits of both TF and
kernelized convolution operation. As a result, the proposed model
is more accurate in terms of classification and takes less time
to compute. This work is structured as follows. In Section 2,
summary of dataset and problem formulation are presented. Sec-
tion 3then explains network architecture of the proposed model.
Section 4describes results and Section 5concludes the work.
2. Dataset and problem formulation
The model used a real-world dataset from vehicles crossing an
unsignalized roundabout in Leith-Croydon, Sydney, Australia [31].
The data is collected on each detected vehicle, including relative
X/Y position with respect to frame of reference (meters), speed
(meters/second), heading (radians), size (width/height meters),
classification (bike, car, truck, pedestrian), and classification con-
fidence at a sampling rate of 25 Hz. The vehicle tracks are labeled
with three entrances and three exits from the intersection, as
shown in Fig. 1. It also shows three conflict points. The conflict
area is defined as a place where two vehicles from different sides
can collide for first time. For example, at the bottom-right conflict
point of Fig. 1, vehicles entering from south can collide with
vehicles entering from east for first time. As a result, vehicles
14
O. Sharma, N.C. Sahoo and N.B. Puhan ISA Transactions 133 (2023) 13–28
Fig. 1. Diagram of intersection studied. The conflict points are marked with
squares [31].
Table 1
Number of trajectory samples in Leith-Croydon intersection (ACFR dataset) [31].
Origin Destinations
East North South Total
East 0 2418 368 2786
North 2268 0 530 2798
South 1122 688 0 1810
Total 3390 3106 898 7394
approaching from south must be aware of the destination (exit
side) of vehicles approaching from east. Only vehicles that pass
through both entry and exit points are selected in this research.
The summary of dataset is shown in Table 1. The dataset con-
tains vehicle tracks, S= [S1,S2,S3,...,SN]where Nis number
of vehicles in the dataset. Each vehicle track Sicontains entire
trajectory concerning position (lateral (xt) and longitudinal (yt)),
heading (θt) and speed (ut) which exist over time-steps t. All
tracks are broken down into all possible sequences of length L
and used as input for the model. The extraction of input sequence
is summarized in following equations.
S={(X1,X2,X3,...,XT)n,Cn}N
n=1(1)
Xt= [xt,yt, θt,ut](2)
I=Xp,Xp+1,...,Xp+L1,CnT+1L
p=1N
n=1(3)
Where, Sis the set of all trajectory samples, Tis length of nth
track in S,Xtis the feature vector of nth track at time t,Cnis
target class of nth track, and Iis the extracted sequences which
are fed to train and validate the model. All tracks are aligned with
the distance traveled from intersection entry point, which are
then utilized to evaluate classification accuracies of the proposed
model at different traveled distances [31].
3. Network architecture
The CNN is a powerful deep learning model primarily used
for image-related applications like image classification [51]. Al-
though CNN with an RNN-based model can be effective in se-
quence learning applications such as driver behavior prediction,
to our knowledge it has not been explored with numeric ACFR
dataset. For instance, [52,53] explores CNN-LSTM architecture
for driver behavior prediction using PREVENTION [54] image
dataset. In CNN-RNN based models, convolution layers learn to
extract crucial features from observation sequences and assign
intermediate features to different output classes. The RNN-based
model can find interdependence in time series sequences and
identify the appropriate mode. However, members of RNN family
use inputs sequentially and require high computation time. The
RNN-based model may also experience difficulty due to vanishing
gradient problem with long input sequences. These limitations
can be avoided in proposed CTN by utilizing TF encoder layer
for processing entire input sequence. The encoder is a compo-
nent of TF that determines which sections of input should be
focused on. Encoder is successfully implemented for graphical
inputs (like images) and sequential numeric data for classification
purposes [55]. However, this work uses sequential numeric data
for driver behavior prediction. The motivation for the proposed
model is drawn from successful instances of CNN and TF mod-
els in their respective fields of applications [41,45,5557]. The
proposed model computes intermediate features using CNN and
then encodes them to capitalize on both CNN and TF benefits.
Precisely, only encoder layers of TF are used with sequential input
that contains multiple convolutional features. The original feature
space is expanded using two parallel CNN branches consisting
of 1D convolutions to obtain discriminative features, which are
concatenated with initial input. The decoder layers in vanilla TF
network generate the output sequence for sequence-to-sequence
learning. However, driver behavior prediction can be considered
as a sequence-to-point learning problem. So, decoder layers are
replaced by feed-forward layers to reduce network complexity
and computation time. The proposed network architecture is
shown in Fig. 2.
In proposed network, original input sequence (relative posi-
tion of vehicle, vehicle speed, and vehicle heading) is fed to both
CNN branches to expand the feature space. The input dimension
is L×4. where, L is input sequence’s length (time steps) with
4 features (relative X/Y position, heading and speed). Each CNN
branch has two convolution layers, and each convolution layer
uses 254 kernels. Different kernels generate different feature
representations, resulting in feature diversity and efficiency. The
filters used in these branches are of sizes 2 and 3, respectively.
Mathematically, the output of tth layer ltis:
lt=φ(rtat+bt) (4)
Where, φis ReLU activation function, rtis an input, denotes
convolution operation, atis the learned kernel, and btis bias.
The output features from two parallel branches are concatenated
along with original input features to generate output (Ocnn) di-
mension of L×512. The output is passed through positional
encoding layers to encode temporal information and utilize se-
quential correlation of time steps. The positional encoding layer
uses both sine and cosine functions [41]. The output of positional
encoding layer (Opos) has dimension of L×512 and is calculated
as:
Opos =Ocnn +PE (5)
Where, PE is positional encoding coefficient matrix. The output of
positional encoding layer is fed to the encoder layer. Each encoder
layer consists of a MHA layer and a fully connected feed-forward
network (FFN). A residual connection on each sub-layer is main-
tained to direct flow of information and gradient. The residual
link and output of the sub-layer are added and normalized before
passing forward. In MHA, eight parallel self-attention layers are
concatenated (head, h = 8) as shown in Fig. 3. The number of
heads and encoder layers are decided based on outcomes of the
experiments. A more detailed explanation will be provided in
‘Results and discussion’ section (4). The encoder’s MHA block
generates attention vectors. The MHA generates information at
various positions from different representation subspaces. The
15
O. Sharma, N.C. Sahoo and N.B. Puhan ISA Transactions 133 (2023) 13–28
Fig. 2. Proposed Convolutional Transformer Network (CTN).
Fig. 3. Multi-head attention (MHA) mechanism for eight heads.
single head attention function in self-attention sub-layer is de-
fined by assignment of queries (q), keys (k) and values (ν) to an
output, and it is computed as weighted sum of values. The query’s
compatibility function and related key are used to determine how
much weight (attention) each value gets. To compute attention
function simultaneously in a matrix form, the queries, keys and
values are represented as matrices Q,Kand V, respectively. The
matrices Q,K, and Vcan be considered weight matrices that help
in keeping connection between feature vectors in a sequence.
Thus encoder layer (specially MHA layer) establishes attention
between feature vectors at different time instants. In this work,
sequential data (multivariate time-series data) is used to train and
validate the proposed model. The attention is calculated as:
Attention(Q,K,V)=soft max Q K T
dkV(6)
Where, dkis the dimension of the key vector. For MHA, the output
matrix is calculated as:
MultiHead(Q,K,V)=Concat(head1,...,headh)Wo(7)
headi=Attention(QW Q
i,KW K
i,VW V
i) (8)
Where, WQ
iRdmodel×dk,WK
iRdmodel×dk,WV
iRdmodel×dV
and WoRhdV×dmodel are parameter matrices, and dVis the
dimension of value vector. The encoder layer produces an output
of dimension dmodel= 512. In attention layer of encoder layer,
the keys, values and queries come from previous encoder layer.
Thus, each position in encoder layer can account for all positions
in previous layer, and the model learns dependencies between
input time steps. The MHA mechanism is explored in-depth in
following paragraph.
To explain MHA mechanism, an input sample is shown in
Table 2. The input sample has a dimension of N ×L, as indicated
16
O. Sharma, N.C. Sahoo and N.B. Puhan ISA Transactions 133 (2023) 13–28
Table 2
An illustrative example with input sequence of length 15.
Features Time
t0t1t2t3t4t5t6t7t8t9t10 t11 t12 t13 t14
Speed 4.9 4.9 4.9 4.9 5 5.1 5.1 5.2 5.2 5.2 5.1 5.2 5.2 5.1 5.1
X9.4 9.3 9.3 9.2 9.2 9.1 98.9 8.8 8.7 8.6 8.5 8.4 8.3 8.1
Y 1.4 1.3 1.2 1.1 1 0.8 0.7 0.5 0.3 0.1 0.1 0.3 0.4 0.5 0.7
Heading 1.3 1.3 1.3 1.2 1.2 1.2 1.2 1.2 1.2 1.1 1.1 1.1 1.1 11
in Table 2, where Nis number of features (4), and L is sequence
length. In this work, sequence lengths (L) of 5, 15, and 25 are used.
Here, matrices Q,K, and Vare used to compute attention which
represents the relationship between feature vectors at different
time instants of an input sequence (in this work, the number
of time instants is same as input sequence length). The MHA
mechanism is explained below.
Firstly, this original input sequence is fed to CNN to expand the
feature space. The dimension of output of CNN (Ocnn) is 512 ×15.
Then, to encode temporal information, output (Ocnn) is transposed
and passed through positional encoding layers. This output (Opos)
is then taken as input (IEncoder ) to MHA layer of the encoder. The
MHA mechanism on input (IEncoder ) is performed in the following
steps.
Step 1: Linear transformations are carried out on input (IEncoder )
by using three weight matrices WQRdmodel×dmodel ,WK
Rdmodel×dmodel and WVRdmodel ×dmodel to compute matrices Q,
K, and V, respectively, where dmodel is dimension of the model
(dmodel =512).
[Q]15×512 =[IEncoder ]15×512.WQ512×512 (9a)
[K]15×512 =[IEncoder ]15×512.WK512×512 (9b)
[V]15×512 =[IEncoder ]15×512.WV512×512 (9c)
Step 2: For each head ‘i’, Q,K, and Vmatrices (as obtained in Step
1) are passed through three linear transformations to obtain Qi,
Kiand Viby using learnable weight matrices WQ
iRdmodel×dk,
WK
iRdmodel×dkand WV
iRdmodel×dv, respectively, where dk
is dimension of key vector and dvis dimension of value vector.
In proposed model, 8 heads (h = 8) are used. Thus, dk=dv=
dmodel
h=64.
[Qi]15×64 =[Q]15×512.WQ
i512×64 (10a)
[Ki]15×64 =[K]15×512.WK
i512×64 (10b)
[Vi]15×64 =[V]15×512.WV
i512×64 (10c)
Step 3: The attention weights (scores) for value vectors of ith head
are calculated as:
WiA15×15 =softmax [Qi]15×64KiT64×15
dk(11)
It is noted that dimension of the above-mentioned weight ma-
trix is 15 ×15. In this weight matrix, the element WiA
fg repre-
sents attention between feature vectors at time instant fwith
time instant g. Thus this weight matrix represents weighted
relationship between all-time instants (here sequence length is
15).
Step 4: The output of headiis calculated as:
[headi]15×64 =WiA15×15.[Vi]15×64 (12)
It is noted that matrix Viis a linear transformation of input
sequence IEncoder and attention weight matrix WA
irepresents
weighted relationship between time instant features. Thus these
matrices perform the same role as RNN does without adding to
computation time by taking whole sequence once.
Step 5: The outputs of all 8 heads are concatenated as:
[MHAout ]15×512 =concat [head1]15×64 ,
[head2]15×64,..., [head8]15×64 (13)
Step 6: The final output of MHA layer is calculated by a linear
transformation of concatenated output from all heads by using
weight matrix WoRhdv×dmodel .
[MultiHead(Q,K,V)]15×512 =[MHAout ]15×512 .Wo864×512 (14)
Thus it is clear that input of MHA layer (IEncoder ) and output
of MHA layer (MultiHead(Q,K,V)) have same dimensionality
(15 ×512).
The output of MHA layer in first encoder layer OMHA1(as
defined in Eq. (7)) is processed through Add & Normalization
(Norm). The processed output in first encoder layer (Oadd &norm1)
is calculated as:
Oadd &norm1=Norm(OMH A1+Opos) (15)
The output after normalization process has dimension of L×
512. This output OAdd &Norm1is fed to the FFN. In FFN, similar
linear transformation is carried out across different positions. The
output of FFN in first encoder layer (OFFN 1) is calculated as:
OFFN1=σOadd &nor m1W1+B1W2+B2(16)
Where, dimensions of input (Oadd &norm1) and output (OF FN 1) are
same as L×512. The output of FFN in first encoder layer (OFFN1)
is processed through Add & Normalization. The processed output
of first encoder layer (OEncoder1) is given as:
OEncoder1=Norm(OFFN 1+Oadd &nor m1) (17)
The output of first encoder layer (OEncoder1) is fed to second
encoder layer subsequently. The outputs of second encoder layer
(OEncoder2) and third encoder layer (OEncoder3) are calculated sim-
ilarly to first layer as in Eqs. (6)(17). The output OEncoder3is
flattened and fed to first feed-forward (FF) layer to generate
output (OFF ) as per Eq. (16). Then output OFF is passed through
softmax layer to generate final output (O) that denotes the class
(east/north/south) to which the input belongs.
The detailed procedure involving training and testing of pro-
posed algorithm is outlined in Table 3. In following, B is batch
size, L is input sequence length, number of features in raw input
sequence is 4, and model dimension (dmodel) is 512.
Using kervolution technique, the proposed CTN obtains even
more performance improvement. The motivation derives from
the effective use of kervolution to improve CNN models in their
respective fields of application. Because convolutional layers are
linear, they cannot express non-linear behavior properly. How-
ever, the activation layers’ non-linearity, such as rectified linear
unit (ReLU), can only give point-wise non-linearity. It has been
shown that higher-order non-linear feature maps can improve
17
O. Sharma, N.C. Sahoo and N.B. Puhan ISA Transactions 133 (2023) 13–28
Table 3
Training and testing process.
(1) TRAINING PROCESS Dimension
INPUT: Labeled training data as, ˜
S=S(1),S(2) ,...,S(E),Eis total of classes. B,L, 4
Step 1: F1
CNN1
˜
S; The raw training data (˜
S) are sent into first CNN branch to get extracted feature vectors, as
per Eqn. (4). CNN branch contains 254 kernels size of 2.
B,L, 254
Step 2: F2
CNN2
˜
S; The raw training data (˜
S) are sent into second CNN branch to get extracted feature vectors,
as per Eqn. (4). CNN branch contains 254 kernels size of 3.
B,L, 254
Step 3: Ocnn
Concat.
˜
S,F1,F2, The output features from two parallel branches are concatenated along with
raw input features to generate output (Ocnn).
B,L, 512
Step 4: Opos
PE
Ocnn; The output (Ocnn ) is passed through positional encoding layers, as per Eqn. (5). B,L, 512
IEncoder1=Opos;IEncoder1is input to the encoder. B,L, 512
Step 5: for i 1 to 3 (number of encoder layers)
OMHAi
MHA
Opos ; The output (Ocnn ) is passed through MHA layers, as per Eqn. (7). B,L, 512
Oadd &norm1
i
add & norm.
{Opos,OMHAi}.; The output and are added and then normalized in ‘Add &
Normalization’ layer, as per Eqn. (15).
B,L, 512
OFNNi
FNN
Oadd &norm1
i; The output (Oadd &norm1
i) is fed to FFN. In FFN, the linear transformation is
carried out, as per Eqn. (16).
B,L, 512
OEncoderi
add & norm.
{OFNNi,Oadd &norm1
i}; The output Opos and OMHAiare added and then normalized in
‘Add & Normalization’ layer, as per Eqn. (17).
B,L, 512
IEncoderi+1=OEncoderi; The output OEncoderiis passed to next encoder layer as input IEncoderi+1. B,L, 512
end
Step 6: OFlat
Flatten
OEncoder3; Output of last encoder layer (OEncoder3) is flattened. B,L, 512
Step 7: OFNN
FNN
OFlat ; The flattened output (OFlat ) is fed to FFN, as per Eqn. (16). B,L, 512
Step 8: Osoftmax
OFNN ; The output (OFNN ) is passed through softmax layer.
Calculate Cross-Entropy loss and back propagate it to update learnable parameters of model. B,L, 512
(2) TESTING PROCESS Dimension
INPUT:
xis an input sample. B,L, 4
Step 1:
cCTN
˜
x;
xis fed to the proposed CTN.
cis probability of
xbelonging to going straight, left turn and right turn.
Step 2: c=maxE
j=1cj,j=1,2,3; the driver behavior (going straight, left-turn or right-turn) that
xbelongs to
the discrimination of subsequent linear classifiers [50,5860].
Authors claim that by using the kernel approach to generalize
convolution to non-linear operations, CNN can perform better
[6166]. The kervolution is defined as follows:
Zp+1
i=φ(rp), φ(ap
i)(18)
Where, Zp+1
iis ith feature map of output layer (p+1th layer), rp
is feature map of input layer (pth layer), ap
iis ith kernel for input
layer, and high-dimensional mapping for rpand ap
iis denoted
by φ(.). The transition from low to high dimensions results in
a significant increase in computational burden. However, there
is a kernel trick that can deliberately reduce the burden. The
computation using the kernel trick is as follows:
φ(rp), φ(ap
i)=k(rp,ap
i) (19)
Where, kernel function of rpand ap
iis represented by k(rp,ap
i):
Rn×RnR. The Gaussian kernel function is utilized in this
work to implement kervolution procedure, which is defined as:
k(r,a)=exp(γra2) (20)
Where, rrefers for input vector, afor convolution kernel, and γ
for gamma value.
4. Results and discussion
The proposed model’s performance is evaluated by using a
real-world dataset. Total 7394 vehicle tracks are divided into
5914 tracks for training with normalization and 1480 tracks for
testing. In [31], data acquisition is performed using a vehicle that
has a system similar to the autonomous vehicle. The center of
occupied parking space by this vehicle is used as reference point
to calculate relative position of passing vehicles, as shown in
Fig. 4(a). The heading of passing vehicle is calculated by taking
traveled direction towards south (north to south) as reference
heading (zero in radians). This common frame is used to get ve-
hicle trajectories concerning the vehicle entrance side. The speed
and heading profiles along with traveled distance are shown in
Fig. 4(b) and Fig. 4(c), respectively. These tracks can be labeled in
two ways: based on track destination (east, north or south) and
based on turns (left-turn, right-turn or go-straight). The vehicle
tracks and their labeling are shown in Table 4. The proposed CTN
and KCTN are tested under both labeling conditions. The results
of both proposed networks are compared with those obtained
from LSTM-based model [31], Bi-LSTM model [26], vanilla TF
model (6 encoder layers), GRU-based model (built by replacing
the LSTM cells in 31 with GRU), and a CNN model (first portion
of the proposed CTN). It is emphasized that all models (includ-
ing refs 26,31) used in this work are trained and tested under
identical conditions. All models are trained and evaluated using
same training and testing samples. To represent vehicle state at a
specific time instant, all models employ four features (two spatial
coordinates, heading direction, and speed).
18
O. Sharma, N.C. Sahoo and N.B. Puhan ISA Transactions 133 (2023) 13–28
Fig. 4. (a) Average tracks of vehicles following left hand driving. (b) Heading profiles. (c) Speed profiles. Profiles of all tracks grouped into the six classes in
‘origin–destination’ pairs.
Table 4
Track Labeling based on destinations and turns.
Origin–destination Track labeling based Track labeling
on destination based on turn
East-north North Right-turn
East-south South Left-turn
North-east East Left-turn
North–south South Go-straight
South-east East Right-turn
South–north North Go-straight
4.1. Scenario 1: Track classification based on vehicle destination in
a common frame of reference
The track classification based on destination is performed
using past tracks. The average vehicle tracks from each original
side in roundabout are shown in Fig. 4(a). The speed and heading
profiles along with the traveled distance are shown in Fig. 4(b)
and Fig. 4(c), respectively. Based on experimental results, three
encoder layers and eight parallel self-attention heads are used in
proposed model’s MHA modules. The experimental results for dif-
ferent numbers of encoder layers and varying numbers of heads
are shown in Fig. 5, where input sequence length 15 is used.
The distance traveled from intersection entry is represented by
x-axis. The vehicle has not yet entered intersection, as indicated
by the negative traveled distance. The experimental results show
that beyond three encoder layers and eight heads, accuracies
saturate. As a result, taking into account the computational time,
we chose three encoder layers and eight heads in the proposed
model.
4.1.1. Performance of the proposed model in comparison to current
state-of-the-art models
Three network lengths are selected for evaluation: 5, 15 and
25-time steps long, corresponding to about 0.2, 0.6 and 1-second
data. Fig. 6 shows the performance of proposed networks (CTN
and KCTN) to predict vehicle’s destination. The x-axis represents
distance traveled from the intersection entrance. The vertical red
dash-line indicates the conflict point (12 m, 14 m and 22 m away
from east, north and south entrance point, respectively). Based on
the distance traveled at which the model achieves an accuracy of
99%, lead distance (distance of vehicle from the conflict point) is
estimated; further, lead time (time to reach the conflict point)
is calculated based on lead distance and average speed of the
vehicle. The lead distance and lead time in relation to vehicle
origin are presented in Table 5.
19
O. Sharma, N.C. Sahoo and N.B. Puhan ISA Transactions 133 (2023) 13–28
Fig. 5. Accuracy vs distance traveled relative to the start of the intersection. (a) An experimental study varying the number of parallel self attention heads (H) in
MHA module. (b) An experimental study varying the number of encoder layers (L) in the proposed model architecture.
Vehicles entering intersection from east side can go towards
north or south. The proposed KCTN achieved an accuracy of 99%
at 1 m traveled distance for all three sequence lengths. The
collision point for eastern origin vehicle occurs at 12 m traveled
distance, so KCTN offers a lead distance of 11 m. The lead time is
calculated based on lead distance and average speed (4.8 m/s) at
the forecast point. Thus, KCTN offers 2.29 s lead time for vehicles
originating in the east. Vehicles entering from north can travel
either east or south. The proposed KCTN achieves 99% accuracy at
3, 2, and 3 m traveled distance using input sequence lengths of 5,
15 and 25, respectively. The point of conflict occurs at a distance
of 14 m; therefore, KCTN offers lead distances of 11, 12 and 12 m
for sequence lengths of 5, 15 and 25, respectively. Based on lead
distance and vehicle’s average speed (7.9 m/s) at prediction point,
maximum lead time is 1.52s by taking input sequence lengths
15 and 25. The vehicles coming from south can travel towards
east or north. The proposed KCTN achieves 99% accuracy at a
traveled distance of 16 m for sequence lengths 15 and 25. The
point of conflict for southern origin vehicle is at a distance of
22 m; consequently, the proposed model offers a lead distance
of 6 m. The model provides a calculated lead time of 1.33 s based
on average speed (4.5 m/s). These findings are also graphically
represented in Figs. 7 and 8.
The result shows that both proposed networks (CTN and
KCTN) outperform current state-of-the-art models. However, the
20
O. Sharma, N.C. Sahoo and N.B. Puhan ISA Transactions 133 (2023) 13–28
Fig. 6. Accuracy vs distance traveled relative to the start of the intersection for scenario 1. (a) Input sequence length 5. (b) Input sequence length 15. (c) Input
sequence length 25.
proposed KCTN provides better performance than CTN. The pro-
posed KCTN achieves 99% accuracy earlier than other models for
all three sequence lengths (5, 15 and 25). The best results of the
model are produced with a sequence length of 15. The model
offers maximum lead time of 2.29 s, 1.52 s and 1.33 s for east,
north and south origin vehicles, respectively.
4.2. Scenario 2: Track classification based on vehicle turning behav-
ior in common frame of reference
The proposed KCTN is used for turn-based (turning behavior)
classification in this scenario. All vehicle tracks are marked as:
short left-turn, long right-turn and go-straight as presented in
21
O. Sharma, N.C. Sahoo and N.B. Puhan ISA Transactions 133 (2023) 13–28
Table 5
Lead distance and lead time comparison among various models for Scenario 1.
Input
sequence
length
LSTM[31] GRU Bi-LSTM[26] CNN TF CTN KCTN
Lead
distance
(m)
5
Eastern origin 9 9 9 9 10 11 11
Northern origin 9 8 9 8 9 10 11
Southern origin 3 2 4 3 5 5 5
15
Eastern origin 9 9 9 9 10 11 11
Northern origin 10 10 10 9 11 11 12
Southern origin 4 3 4 2 5 6 6
25
Eastern origin 9 8 9 8 10 11 11
Northern origin 9 9 9 8 10 11 12
Southern origin 3 3 3 3 5 6 6
Lead
time
(s)
5
Eastern origin 1.88 1.88 1.88 1.88 2.08 2.29 2.29
Northern origin 1.14 1.01 1.14 1.01 1.14 1.27 1.39
Southern origin 0.67 0.44 0.89 0.67 1.11 1.11 1.11
15
Eastern origin 1.88 1.88 1.88 1.88 2.08 2.29 2.29
Northern origin 1.27 1.27 1.27 1.14 1.39 1.39 1.52
Southern origin 0.89 0.67 0.89 0.44 1.11 1.33 1.33
25
Eastern origin 1.88 1.67 1.88 1.67 2.08 2.29 2.29
Northern origin 1.14 1.14 1.14 1.01 1.27 1.39 1.52
Southern origin 0.67 0.67 0.67 0.67 1.11 1.33 1.33
Fig. 7. Lead distance comparison among various models for Scenario 1.
Table 4. The shape of vehicle trajectories and heading profiles are
shown in Fig. 4. The experimental study in Section 4.1 found that
the proposed KCTN performance for input sequence length of 15
is optimal, taking prediction accuracy and computational com-
plexity into account. The comparative performance of proposed
model with respect to traveled distance is presented in Fig. 9
by using sequence length 15. The lead time is calculated when
the proposed KCTN reaches 99% accuracy as listed in Table 6. A
vehicle driving from east side can make a short left-turn (east-
south) or a long right-turn (east-north). The proposed KCTN offers
a lead time of 2.08 s with 99% accuracy. Due to similar shape
of east-south and north–south tracks over a distance of 11 m to
15 m, there is a loss in forecast accuracy at distance of 11 m. For
vehicles coming from north, the proposed KCTN offers a lead time
of 1.39 s. The vehicles coming from south can take long right-
turn (south-east) and straight ahead (south–north). The proposed
KCTN offers a lead time of 1.11s for south origin vehicles. These
findings are also graphically represented in Fig. 10.
Comparatively, the proposed KCTN achieves higher accuracy
earlier than other models and offers a bigger time window in
terms of lead time. For northern side coming vehicles, proposed
CTN and other models start decreasing their accuracies at 11 m
traveled distance. However, the proposed CTN performs well,
with only 1% accuracy drop as compared to other models. The
proposed KCTN can handle such a situation, and there is no loss
of accuracy at this range (11 m to 15 m). To overcome this issue
in CTN, all vehicle tracks are re-framed to make similar tracks
in a turning behavior-based labeled class. Further explanation is
presented in next Section 4.3.
4.3. Scenario 3: Track classification based on vehicle turning behav-
ior in entrance based frame of references
To re-frame vehicle tracks, three reference points (respective
to vehicle entry points at intersection) and three reference direc-
tions are considered to calculate relative X/Y position and heading
of the vehicle, respectively. For example, relative X/Y position of
22
O. Sharma, N.C. Sahoo and N.B. Puhan ISA Transactions 133 (2023) 13–28
Fig. 8. Lead time comparison among various models for Scenario 1.
Fig. 9. Accuracy vs distance traveled for turning behavior based classification in scenario 2.
Table 6
Lead time (s) comparison in Scenario 2.
LSTM GRU Bi-LSTM CNN TF CTN KCTN
[31] [26]
Eastern origin 1.67 1.88 1.88 1.67 2.08 2.08 2.08
North origin 0.89 1.14 1.14 1.14 1.27 1.27 1.39
Southern origin 0.44 0.89 0.89 0.89 1.11 1.11 1.11
east side coming vehicle is calculated by taking east side entrance
point as reference point and heading is calculated by taking trav-
eling direction towards west (east to west) as reference direction.
For north side coming vehicle, north side entry point is used as
reference point to calculate relative X/Y position of vehicle and
traveling direction towards south (north to south) is used as ref-
erence direction. Similar approach is used for south side coming
vehicles. Thus, east-south and north-east tracks are re-framed
to look similar to represent left-turn tracks. The east-north and
south-east tracks are re-framed to look similar to represent long
right-hand turn. The north–south and south–north tracks have
been re-framed to look similar to represent go-straight. Thus, all
vehicle tracks are labeled as short left-turn, long right-turn and
go-straight. The vehicle tracks and the heading profiles are shown
in Fig. 11.
Fig. 12 shows comparative performance of the proposed KCTN
to predict vehicle behavior. The lead time comparison in relation
to vehicle origin is shown in Table 7. The proposed KCTN offers
2.08 s lead time for east side coming vehicles. The proposed KCTN
coverage with 99% accuracy for north-side coming vehicles and
provides a lead time of 1.39 s. For south side coming vehicles,
the proposed KCTN offers a lead distance of 5 m and a lead
time of 1.11s. These findings are also graphically represented
in Fig. 13. The proposed model and other RNN-based models are
23
O. Sharma, N.C. Sahoo and N.B. Puhan ISA Transactions 133 (2023) 13–28
Fig. 10. Lead time comparison among various models for Scenario 2 (input sequence length = 15).
Fig. 11. (a) Average tracks of vehicles. (b) Heading profiles of traveling vehicles.
Table 7
Lead time (s) comparison in Scenario 3.
LSTM GRU Bi-LSTM CNN TF CTN KCTN
[31] [26]
Eastern origin 1.88 1.88 1.88 1.88 1.88 2.08 2.08
North origin 1.14 1.14 0.89 0.89 1.14 1.27 1.39
Southern origin 0.67 0.67 0.67 0.44 0.89 0.89 1.11
initially unable to achieve a high level of prediction accuracy due
to vehicle’s similar tracks and direction of travel for go-straight
and right-turn behavior on the route initially covered.
4.4. Remarks on overall performance
In the above studies, results show that the proposed KCTN
worked excellently in all three scenarios as illustrated in Sec-
tions 4.1,4.2 and 4.3. However, the authors stand with first
scenario as a preferable scenario based on accuracy and real life
application. The third scenario can be dropped as it offers less lead
time than first scenario. The second scenario may also be dropped
because of the decrease in accuracy for the vehicle coming from
east side at traveled distance 11 m, as shown in Fig. 9. Instead of
accuracy and lead time, other drawbacks exist in case of second
and third scenarios, if vehicle origin (entrance side) is not known
in real life implementation. At unsignalized roundabout, it is
possible that vehicle origin side may not be recorded due to the
lack of a vehicle management system or vehicle commutation.
To explain this, a target vehicle ‘K’ is shown in Fig. 14. In order
to predict behavior of vehicle ‘K’ in scenario 3, the model needs
a sequence of feature vectors which can only be calculated by
knowing entrance side of vehicle ‘K’. Thus, the model will not
be able to estimate driver turning behavior. In scenario 2, for
example, the model predicts that vehicle ‘K’ will turn right. Both
east-north and south-east tracks can be considered as right-turn
behavior; thus, after predicting right turn behavior, the model
24
O. Sharma, N.C. Sahoo and N.B. Puhan ISA Transactions 133 (2023) 13–28
Fig. 12. Accuracy vs distance traveled for turning behavior based classification in scenario 3.
Fig. 13. Lead time comparison among various models for Scenario 3 (input sequence length = 15).
cannot estimate the destination without knowing entrance side
of vehicle ‘K’.
Finally, conclusion of this situation assessment study shows
importance of the correct selection of the output class labeling
and importance of information on vehicle entrance side during
online model performance. Thus, the authors stand by
destination-based labeling selection.
The statistically significant differences between the proposed
and other models are investigated below to emphasize proposed
model’s effectiveness better. The approach used in this work is to
predict driver behavior and then consolidate the predictions over
whole dataset. Towards this purpose, McNemar’s Test (within-
subjects chi-squared test) has now been performed to compare
predictive accuracy of various models [6769]. McNemar’s test is
Table 8
2×2 contingency table.
Model2correct Model2wrong
Model1correct NANB
Model1wrong NCND
based on a contingency table (2 ×2) that compares predictions
of two models, as shown in Table 8.
where,
NA= Number of instances where both models give correct pre-
dictions,
NB= Number of instances where Model1gives correct predictions
and Model2gives incorrect predictions,
25
O. Sharma, N.C. Sahoo and N.B. Puhan ISA Transactions 133 (2023) 13–28
Fig. 14. Example of a target vehicle at three-way roundabout.
Table 9
Behavior prediction accuracy and statistical analysis.
Eastern origin Northern origin Southern origin
(Traveled distance 1 m, (Traveled distance 2 m, (Traveled distance 16 m,
558 input samples) 559 input samples) 363 input samples)
Accuracy (%) χ2- value p-value Accuracy (%) χ2- value p-value Accuracy (%) χ2- value p-value
LSTM 92.1 40 0 94.3 20.8 0.000005 97 6.1 0.0133
GRU 87.4 66 0 92.5 34 0 95 13 0.0003
Bi-LSTM 88.5 60 0 90.5 45 0 95.9 10 0.0015
CNN 89.9 52 0 93.7 27 0 92 24 0.000001
TF 96.6 15 0.0001 96.9 5.3 0.0218 97.8 2.3 0.1306
CTN 99.1 1.3 0.24 97.8 4.1 0.04 98.6 0.5 0.4795
KCTN 99.7 98.9 99.2
NC= Number of instances where Model1gives incorrect predic-
tions and Model2gives correct predictions, and
ND= Number of instances where both Models give incorrect
predictions.
McNemar’s test statistic (‘‘chi-squared’’) is computed as fol-
lows:
χ2=(|NBNC| 1)2
(NB+NC)(21)
A statistical measure known as ‘p-value’ is also determined in
addition to chi-squared value, where p-value indicates probability
of an empirically larger chi-squared value. A high chi-squared
value or a low p-value indicates that there is a significant statisti-
cal performance difference between two models. The probability
density function (pdf) of chi-squared distribution (also chi-square
or χ2-distribution) is a special case of gamma distribution, which
is one of the most extensively used probability distributions in
hypothesis testing. The chi-squared distribution does not well
approximate the p-value if either NBor NCis small (i.e., NB+
NC<25). The exact p-value is then calculated using a binomial
distribution. If p-value is lower than chosen significance level,
the null hypothesis that performances of two models are equal
is rejected. Considering the significance level (probability of the
study rejecting null hypothesis) of 5%, the null hypothesis that
two models’ performances are equal is rejected. The respective
χ2and p-values are also mentioned in Table 9.
As reported in Table 5, the results show that both the proposed
networks (CTN and KCTN) outperform current state-of-the-art
models. However, the proposed KCTN provides better perfor-
mance than CTN. The proposed KCTN achieves 99% accuracy
earlier than other models for all three sequence lengths (5, 15,
and 25). The best results of the model are produced with a
sequence length of 15 for vehicle destination-based classification.
Table 10
Computing time (millisecond) comparison between models.
Sequence LSTM GRU Bi-LSTM CNN TF CTN KCTN
length [31] [26]
5 3.9 3 7.7 1.2 6.2 3.8 3.9
15 7.8 6.9 16.6 1.5 6.8 5.1 5.3
25 12.5 10.9 24.6 1.7 7.9 6.5 6.9
Thus contingency table is formulated for all models with respect
to KCTN.
The chi-squared values represent statistical differences in the
models’ performances. The significant statistical differences are
indicated by high chi-squared values. To determine statistically
significant difference precisely, p-values can also be compared
with chosen significance level (α=0.05). Except for CTN (Eastern
Origin), all models have p-values less than 0.05 (α), as shown in
Table A4. Accordingly, the null hypothesis is rejected and statisti-
cally significant differences between the models are established.
The proposed model training is carried out at a learning rate
of 0.0001 with a batch size of 100. The execution time to predict
driver intention for a single input sample is listed in Table 10.
As shown in Table 10, length of input sequence has very little
effect on computation time for CNN. However, performance of
CNN is not satisfactory in terms of lead distance and lead time.
In comparison to RNN-based models (LSTM, GRU and Bi-LSTM),
computation times of TF-based models (TF, CTN, and KCTN) are
relatively less affected by varying input sequence lengths (5 to
25). The computation time of generic TF is less influenced by in-
put sequence length, but its behavior prediction accuracy is lower
than the proposed CTN and KCTN. As a result, proposed models
26
O. Sharma, N.C. Sahoo and N.B. Puhan ISA Transactions 133 (2023) 13–28
are relatively less influenced by input sequence length as com-
pared to RNN-based models. An Intel(R) Xeon(R) CPU E5-2643 V4
based desktop computer is used for all these measurements.
There are some limitations of this study which demonstrate
that the proposed TF-based model is effective for driver be-
havior modeling (DBM) for conflict resolution at unsignalized
roundabout. However, this model requires more data samples
for training. The robustness of the proposed model needs to be
further evaluated and analyzed in a noisy environment where
accurate information on location, velocity, and heading angle
may not be available. In this work, the host vehicle’s behavior
is predicted by considering its past trajectory without taking
surrounding vehicle’s trajectory information into account. Hence,
the model remains insensitive to speed of surrounding objects. A
new study involving the dynamics of surrounding objects will be
taken up in future work.
5. Conclusion
In this paper, a new CTN for predicting driver intent is pro-
posed to support ADAS in estimating future position of road users
at an urban, single-lane roundabout. The proposed KCTN model’s
performance is assessed based on vehicle’s destination and turn-
ing behavior. The performance of the proposed network is su-
perior in both cases, but destination-based classification takes
precedence over turning behavior-based classification. The exper-
iments are conducted with a real-world dataset and results show
that the proposed model is capable of competitive performance
with current state-of-the-art models, and only a small (0.6 s)
amount of observation data is required.
In destination-based behavior prediction scenario, the pro-
posed KCTN outperforms other models and provides 11 m, 12 m
and 6 m lead distance for eastern, northern and southern origin
vehicles, respectively. Based on average vehicle speed, the pro-
posed model can provide lead times of 2.29 s, 1.52 s and 1.33 s
for eastern, northern and southern origin vehicles, respectively,
which can avoid possible conflict situations. In turning-based
behavior prediction scenario, the proposed KCTN provides lead
times of 2.08 s, 1.39 s and 1.11 s for eastern, northern and
southern origin vehicles respectively, which are higher than other
models.
This research shows that TF performs well enough for time
series challenges like prediction and classification. According to
performance results, TF is efficient in terms of computation time.
It can avoid gradient vanishing problem in event of long sequen-
tial input since it accepts entire sequential input sequence at once
rather than sequentially as RNN-based models do. It is necessary
to know driving behavior of all adjacent traffic participants in
order to generate their future trajectories, which are then used
to estimate collision risk in collision avoidance block of ADAS. In
future work, the proposed model will be employed as a behavior
predictor model, and collision estimation will be done based on
predictions to navigate through intersection safely.
Declaration of competing interest
The authors declare that they have no known competing finan-
cial interests or personal relationships that could have appeared
to influence the work reported in this paper.
Acknowledgments
The research carried out in this paper is supported by KPIT
Technologies Ltd., Bangalore, India, under the research grant for
the project entitled ‘‘Driver Behavior Modelling for Autonomous
Driving’’.
References
[1] Masi S, Xu P, Bonnifait P. Roundabout crossing with interval occupancy
and virtual instances of road users. IEEE Trans Intell Transp Syst 2020.
[2] Sharma O, Sahoo NC, Puhan NB. Recent advances in motion and behavior
planning techniques for software architecture of autonomous vehicles: A
state-of-the-art survey. Eng Appl Artif Intell 2021;101:104211.
[3] Sharma O, Sahoo NC, B. Puhan N. A survey on smooth path generation
techniques for nonholonomic autonomous vehicle systems. In: IECON
2019-45th annual conference of the IEEE industrial electronics society. Vol.
1. IEEE; 2019, p. 5167–72.
[4] Yao R, Zeng W, Chen Y, He Z. A deep learning framework for modelling
left-turning vehicle behaviour considering diagonal-crossing motorcycle
conflicts at mixed-flow intersections. Transp Res C 2021;132:103415.
[5] Schindler R, Piccinini GB. Truck drivers’ behavior in encounters with vul-
nerable road users at intersections: Results from a test-track experiment.
Accid Anal Prev 2021;159:106289.
[6] Awad N, Lasheen A, Elnaggar M, Kamel A. Model predictive control with
fuzzy logic switching for path tracking of autonomous vehicles. ISA Trans
2021.
[7] Gu W, Cai S, Hu Y, Zhang H, Chen H. Trajectory planning and tracking
control of a ground mobile robot:A reconstruction approach towards space
vehicle. ISA Trans 2019;87:116–28.
[8] Sharma O, Sahoo N, Puhan N. Highway discretionary lane changing
behavior recognition using continuous and discrete hidden Markov model.
In: 2021 IEEE international intelligent transportation systems conference.
IEEE; 2021, p. 1476–81.
[9] Yang S, Wang W, Jiang Y, Wu J, Zhang S, Deng W. What contributes to
driving behavior prediction at unsignalized intersections? Transp Res C
2019;108:100–14.
[10] Kazemi R, Abdollahzade M. Introducing an evolving local neuro-fuzzy
model application to modeling of car-following behavior. ISA Trans
2015;59:375–84.
[11] Wang H, Gu M, Wu S, Wang C. A driver’s car-following behavior pre-
diction model based on multi-sensors data. EURASIP J Wireless Commun
Networking 2020;2020(1):1–12.
[12] Paden B, Čáp M, Yong SZ, Yershov D, Frazzoli E. A survey of motion
planning and control techniques for self-driving urban vehicles. IEEE Trans
Intell Veh 2016;1(1):33–55.
[13] Zhou D, Ma Z, Sun J. Autonomous vehicles’ turning motion planning
for conflict areas at mixed-flow intersections. IEEE Trans Intell Veh
2019;5(2):204–16.
[14] Li S, Yang L, Gao Z, Li K. Stabilization strategies of a general nonlinear
car-following model with varying reaction-time delay of the drivers. ISA
Trans 2014;53(6):1739–45.
[15] Amsalu SB, Homaifar A. Driver behavior modeling near intersections
using hidden Markov model based on genetic algorithm. In: 2016 IEEE
international conference on intelligent transportation engineering. IEEE;
2016, p. 193–200.
[16] Sharma O, Sahoo NC, Puhan NB. Highway lane-changing prediction using
a hierarchical software architecture based on support vector machine and
continuous hidden Markov model. Int J Intell Transp Syst Res 2022.
[17] Aoude GS, Desaraju VR, Stephens LH, How J. Driver behavior classification
at intersections and validation on large naturalistic data set. IEEE Trans
Intell Transp Syst 2012;13(2):724–36.
[18] Jain A, Singh A, Koppula HS, Soh S, Saxena A. Recurrent neural networks
for driver activity anticipation via sensory-fusion architecture. In: 2016
IEEE international conference on robotics and automation. IEEE; 2016, p.
3118–25.
[19] Hurwitz DS, Wang H, Knodler Jr. MA, Ni D, Moore D. Fuzzy sets to describe
driver behavior in the dilemma zone of high-speed signalized intersections.
Transp Res F Traffic Psychol Behav 2012;15(2):132–43.
[20] Belkhouche F. Collaboration and optimal conflict resolution at an unsignal-
ized intersection. IEEE Trans Intell Transp Syst 2018;20(6):2301–12.
[21] Mozaffari S, Al-Jarrah OY, Dianati M, Jennings P, Mouzakitis A. Deep
learning-based vehicle behavior prediction for autonomous driving
applications: A review. IEEE Trans Intell Transp Syst 2020.
[22] Jeong Y, Kim S, Yi K. Surround vehicle motion prediction using LSTM-
RNN for motion planning of autonomous vehicles at multi-lane turn
intersections. IEEE Open J Intell Transp Syst 2020;1:2–14.
[23] Zyner A, Worrall S, Ward J, Nebot E. Long short term memory for driver
intent prediction. In: 2017 IEEE intelligent vehicles symposium. IEEE; 2017,
p. 1484–9.
[24] Phillips DJ, Wheeler TA, Kochenderfer MJ. Generalizable intention predic-
tion of human drivers at intersections. In: 2017 IEEE intelligent vehicles
symposium. IEEE; 2017, p. 1665–70.
[25] Zhang T, Song W, Fu M, Yang Y, Wang M. Vehicle motion prediction at
intersections based on the turning intention and prior trajectories model.
IEEE/CAA J Autom Sin 2021.
27
O. Sharma, N.C. Sahoo and N.B. Puhan ISA Transactions 133 (2023) 13–28
[26] Zhang H, Fu R. A hybrid approach for turning intention prediction based
on time series forecasting and deep learning. Sensors 2020;20(17):4887.
[27] Liu Y, Zhao P, Qin D, Li G, Chen Z, Zhang Y. Driving intention identification
based on long short-term memory and a case study in shifting strategy
optimization. IEEE Access 2019;7:128593–605.
[28] Tian R, Li S, Li N, Kolmanovsky I, Girard A, Yildiz Y. Adaptive game-
theoretic decision making for autonomous vehicle control at roundabouts.
In: 2018 IEEE conference on decision and control. IEEE; 2018, p. 321–6.
[29] Masi S, Xu P, Bonnifait P. A curvilinear decision method for two-lane
roundabout crossing and its validation under realistic traffic flow. In: 2020
IEEE intelligent vehicles symposium. IEEE; 2020, p. 1290–6.
[30] Rodrigues M, McGordon A, Gest G, Marco J. Autonomous navigation in
interaction-based environments—a case of non-signalized roundabouts.
IEEE Trans Intell Veh 2018;3(4):425–38.
[31] Zyner A, Worrall S, Nebot E. A recurrent neural network solution for
predicting driver intention at unsignalized intersections. IEEE Robot Autom
Lett 2018;3(3):1759–64.
[32] Zyner A, Worrall S, Nebot EM. Acfr five roundabouts dataset: Natural-
istic driving at unsignalized intersections. IEEE Intell Transp Syst Mag
2019;11(4):8–18.
[33] Zyner A, Worrall S, Nebot E. Naturalistic driver intention and path
prediction using recurrent neural networks. IEEE Trans Intell Transp Syst
2019;21(4):1584–94.
[34] Zhang Y, Gao B, Guo L, Guo H, Chen H. Adaptive decision-making for auto-
mated vehicles under roundabout scenarios using optimization embedded
reinforcement learning. IEEE Trans Neural Netw Learn Syst 2020.
[35] Xiang S, Qin Y, Zhu C, Wang Y, Chen H. Lstm networks based on
attention ordered neurons for gear remaining life prediction. ISA Trans
2020;106:343–54.
[36] Bai S, Kolter JZ, Koltun V. An empirical evaluation of generic convolutional
and recurrent networks for sequence modeling. 2018, arXiv preprint arXiv:
1803.01271.
[37] Luo W, Yang B, Urtasun R. Fast and furious: Real time end-to-end 3d
detection, tracking and motion forecasting with a single convolutional net.
In: Proceedings of the IEEE conference on computer vision and pattern
recognition. 2018, p. 3569–77.
[38] Schöller C, Aravantinos V, Lay F, Knoll A. What the constant velocity model
can teach us about pedestrian motion prediction. IEEE Robot Autom Lett
2020;5(2):1696–703.
[39] Becker S, Hug R, Hübner W, Arens M. An evaluation of trajectory prediction
approaches and notes on the trajnet benchmark. 2018, arXiv preprint
arXiv:1805.07663.
[40] Becker S, Hug R, Hubner W, Arens M. Red: A simple but effective baseline
predictor for the trajnet benchmark. In: Proceedings of the European
conference on computer vision (ECCV) workshops. 2018.
[41] Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, et al.
Attention is all you need. In: Advances in neural information processing
systems. 2017, p. 5998–6008.
[42] Giuliari F, Hasan I, Cristani M, Galasso F. Transformer networks for
trajectory forecasting. In: 2020 25th International conference on pattern
recognition. IEEE; 2021, p. 10335–42.
[43] Liu Y, Zhang J, Fang L, Jiang Q, Zhou B. Multimodal motion prediction
with stacked transformers. In: Proceedings of the IEEE/CVF conference on
computer vision and pattern recognition. 2021, p. 7577–86.
[44] Yuan Y, Weng X, Ou Y, Kitani K. Agentformer: Agent-aware transformers
for socio-temporal multi-agent forecasting. 2021, arXiv preprint arXiv:
2103.14023.
[45] Devlin J, Chang M-W, Lee K, Toutanova K. Bert: Pre-training of deep
bidirectional transformers for language understanding. 2018, arXiv preprint
arXiv:1810.04805.
[46] Zhao Y, Ding B, Zhang Y, Yang L, Hao X. Online cement clinker quality
monitoring: A soft sensor model based on multivariate time series analysis
and CNN. ISA Trans 2021;117:180–95.
[47] Han T, Liu C, Yang W, Jiang D. Learning transferable features in deep
convolutional neural networks for diagnosing unseen machine conditions.
ISA Trans 2019;93:341–53.
[48] Fang C, He D, Li K, Liu Y, Wang F. Image-based thickener mud layer height
prediction with attention mechanism-based CNN. ISA Trans 2021.
[49] Wang H, Liu Z, Peng D, Cheng Z. Attention-guided joint learning CNN with
noise robustness for bearing fault diagnosis and vibration signal denoising.
ISA Trans 2021.
[50] Zoumpourlis G, Doumanoglou A, Vretos N, Daras P. Non-linear convolution
filters for cnn-based learning. In: Proceedings of the IEEE international
conference on computer vision. 2017, p. 4761–9.
[51] LeCun Y, Bottou L, Bengio Y, Haffner P. Gradient-based learning applied to
document recognition. Proc IEEE 1998;86(11):2278–324.
[52] Zhang Y, Li J, Guo Y, Xu C, Bao J, Song Y. Vehicle driving behavior
recognition based on multi-view convolutional neural network with joint
data augmentation. IEEE Trans Veh Technol 2019;68(5):4223–34.
[53] Izquierdo R, Quintanar A, Parra I, Fernández-Llorca D, Sotelo M. Ex-
perimental validation of lane-change intention prediction methodologies
based on CNN and LSTM. In: 2019 IEEE intelligent transportation systems
conference. IEEE; 2019, p. 3657–62.
[54] Izquierdo R, Quintanar A, Parra I, Fernandez-Llorca D, Sotelo MA. The
prevention dataset, . 2019, https://prevention-dataset.uah.es.
[55] Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T,
et al. An image is worth 16x16 words: Transformers for image recognition
at scale. 2020, arXiv preprint arXiv:2010.11929.
[56] Pandey P, Deepthi A, Mandal B, Puhan NB. FoodNet: Recognizing
foods using ensemble of deep networks. IEEE Signal Process Lett
2017;24(12):1758–62.
[57] Mishra SS, Mandal B, Puhan NB. Multi-level dual-attention based CNN for
macular optical coherence tomography classification. IEEE Signal Process
Lett 2019;26(12):1793–7.
[58] Henriques JF, Caseiro R, Martins P, Batista J. High-speed tracking
with kernelized correlation filters. IEEE Trans Pattern Anal Mach Intell
2014;37(3):583–96.
[59] Cui Y, Zhou F, Wang J, Liu X, Lin Y, Belongie S. Kernel pooling for
convolutional neural networks. In: Proceedings of the IEEE conference on
computer vision and pattern recognition. 2017, p. 2921–30.
[60] Blondel M, Ishihata M, Fujino A, Ueda N. Polynomial networks and
factorization machines: New insights and efficient training algorithms. In:
International conference on machine learning. PMLR; 2016, p. 850–8.
[61] Mairal J, Koniusz P, Harchaoui Z, Schmid C. Convolutional kernel networks.
Adv Neural Inf Process Syst 2014;27:2627–35.
[62] Wang C, Yang J, Xie L, Yuan J. Kervolutional neural networks. In: Pro-
ceedings of the IEEE/CVF conference on computer vision and pattern
recognition. 2019, p. 31–40.
[63] Marsi S, Bhattacharya J, Molina R, Ramponi G. A non-linear convolution
network for image processing. Electronics 2021;10(2):201.
[64] Xu K, Wang C, Chen C, Wu W, Scherer S. AirCode: A robust object encoding
method. IEEE Robot Autom Lett 2022.
[65] Tang L, Xuan J, Shi T, Zhang Q. EnvelopeNet: A robust convolutional neural
network with optimal kernels for intelligent fault diagnosis of rolling
bearings. Measurement 2021;180:109563.
[66] Abdallah HB, Henry CJ, Ramanna S. 1-dimensional polynomial neu-
ral networks for audio signal related problems. Knowl-Based Syst
2022;108174.
[67] McNemar Q. Note on the sampling error of the difference be-
tween correlated proportions or percentages. Psychometrika 1947;12(2):
153–7.
[68] Edwards AL. Note on the ‘‘correction for continuity’’ in testing the sig-
nificance of the difference between correlated proportions. Psychometrika
1948;13(3):185–7.
[69] Dietterich TG. Approximate statistical tests for comparing supervised
classification learning algorithms. Neural Comput 1998;10(7):1895–923.
28
... The U-Net is adjusted to three levels from the original four to reduce complexity and counter overfitting ( Fig. 1) [5], [8]. The U-Net includes a contracting path with blocks featuring max-pooling, ReLU activation, and dual 3 × 3 convolutions [9]. The expansive path upsamples via 2 × 2 convolutions, concatenates cropped feature maps, and generates a segmented image through 1 × 1 convolutions. ...
... In the SA network, an MHA layer correlates district-level rainfall (Q) with overall Assam data (K -V ). Additional information about the MHA mechanism can be referenced in [9], [10], [11], and [12]. Notably, a dedicated decoder for district-level data is utilized due to its significance (Fig. 1). ...
Article
Predicting Heavy Rainfall Events (HREs) with lead time poses a significant challenge for meteorological agencies, especially in mountainous regions like Assam. In this study, we simulated a real-time HRE that occurred between June 13 and 17, 2023, resulting in severe flooding in Assam. To enhance rainfall prediction, we integrated output from the Weather Research and Forecasting (WRF) model into a Deep Learning (DL) model. When comparing the district-level performance of WRF and DL models, it becomes evident that the DL model excels in capturing HREs with a significant accuracy of 54.4%, outperforming WRF’s accuracy of only 22.8%. The proposed model demonstrates a mean absolute error (MAE) of under 30 mm, outperforming WRF’s more than 50 mm MAE for Days 2-4, as compared with the India Meteorological Department (IMD). Remarkably, the DL model accurately represents rainfall intensity and magnitude in the western and southern parts of Assam. This study is the first of its kind to focus on a district-scale analysis in Assam.
... Thus, the length of the input sequence has little impact on computation time, resolving the gradient vanishing problem [59][60][61]. In various sequence-to-sequence learning problems, including natural language processing (NLP), TF-based models have increasingly gained interest [62][63][64][65][66][67][68]. ...
Article
Full-text available
In order to navigate through complex traffic scenarios safely and efficiently, the autonomous vehicle (AV) predicts its own behavior and future trajectory based on the predicted trajectories of surrounding vehicles to avoid potential collisions. Further, the predicted trajectories of surrounding vehicles (target vehicles) are greatly influenced by their driving behavior and prior trajectory. In this article, we propose a novel Transformer-based composite network to predict both driver behavior and future trajectory of a target vehicle in a highway driving scenario. The powerful multi-head attention mechanism of the transformer is exploited to extract social-temporal interaction between target vehicle and its surrounding vehicles. The prediction of both lateral and longitudinal behavior is carried out within the behavior prediction module, and this additional information is further utilized by the trajectory predictor module to ensure precise trajectory prediction. Furthermore, mixture density network is augmented in the model to handle uncertainties in the predicted trajectories. The proposed model’s performance is compared with several state-of-the-art models on real-world Next Generation Simulation (NGSIM) dataset. The results indicate the superiority of the proposed model over all contemporary state-of-the-art models, as evaluated using Root Mean Square Error (RMSE) metric. The proposed model predicts a 5s long trajectory with an 11% lower RMSE than the state-of-the-art model.
... Inspired by information retrieval techniques, MHA incorporates the concepts of queries, keys, and values. In this context, let's consider the example of a query matrix [31,32]. The attention weights (W A ) are calculated using the scaled dot product, as described in Eq. 3. By applying linear projections to the Q, K, and V matrices multiple times and parallelizing the scaled dot product attention for each head, the MHA mechanism can attend to and weigh relevant features from different subspaces. ...
Article
Full-text available
Heavy rainfall events prediction at the local scale imposes a big challenge for meteorological agencies over the complex terrain areas in India such as Assam, Uttarakhand, and Himachal Pradesh and causes flash floods with severe consequences throughout the area causing a huge socio-economical loss over these regions. Assam is currently experiencing severe flooding in June 2023. Due to the limits of deterministic numerical weather models in accurately forecasting these events, this work investigates the incorporation of deep learning (DL) models, particularly spatial attention-based U-Net, using simulated daily collected rainfall outputs from various parametrization schemes. This is a pioneering effort to improve district-scale rainfall using the spatio-attention U-Net DL method, particularly over the orographically complex region such as Assam. The proposed model outperformed individual and ensemble Weather Research and Forecasting (WRF) model outputs over four days in June 2022, demonstrating greater abilities to forecast rainfall at the district scale with a mean absolute error of less than 10 mm. Additionally, the proposed model considerably outperformed WRF models by 51.3% in categorical rainfall prediction, achieving a high prediction accuracy of 91.9%. Furthermore, the proposed model has demonstrated improved spatial variation as compared to the WRF model by correctly predicting severe rainfall occurrences at the district scale, including Barpeta, Kamrup, Kokrajhar, and Nalbari. The WRF projections regularly underestimated rainfall intensity (< 100 mm), whereas the DL model's estimates matched actual rainfall readings from the India Meteorological Department (> 150 mm). On the quantitative estimation of rainfall thresholds using different skill scores, Equitable threat score values are more than 0.5 for all categories for the proposed model. In a nutshell, the findings of the study have direct implications for improving early warning systems and associated follow-up action in terms of developing efficient strategies toward better preparedness, mitigation, and adaptation measures over complex hilly regions to reduce loss of lives and properties.
... In this study, a TF-based encoder-decoder architecture is proposed to enhance the prediction accuracy and solve the gradient-vanishing problem of LSTM-based networks. The model processes the entire input sequence at once, overcoming limitations of RNN-based models [11][12][13][14][15]. However, TF models have not been extensively explored for multilane highway trajectory prediction. ...
Conference Paper
Full-text available
—The autonomous vehicle uses the expected trajectories of nearby vehicles to anticipate its own actions and path, ensuring safe and efficient navigation in complex traffic scenarios. The most influential factors in determining the future trajectory of the target vehicle are its past trajectory and movements. This research introduces a novel approach that combines a Convolutional Neural Network (CNN) and a multi-head attention-based network to predict the trajectory of autonomous vehicles on multi-lane highways. The CNN is employed to extract various time-varying features, whereas the Transformer’s multi-head attention (MHA) effectively captures the space-time interactions between the target and its neighbouring vehicles. Using the NGSIM dataset, the proposed model’s performance is assessed, and compared with sequential models built using recurrent neural networks (RNNs). The results demonstrate that the proposed model outperforms others models by achieving a 10% reduction in Root Mean Square Error (RMSE) for predicting trajectories over 5 seconds duration.
... In recent years, autonomous driving technologies have been rapidly developed and have shown great potential in improving traffic efficiency, driving safety, and driving comfort [1][2][3]. However, ensuring driving safety is always the primary concern for autonomous vehicles [4], which usually sacrifice driving comfort to ensure safety. ...
Article
Full-text available
In the process of autonomous vehicle lane changing, a reliable decision-making system is crucial for driving safety and comfort. However, traditional decision-making systems have short-term characteristics, which makes them susceptible to real-time inference from surrounding vehicles. Usually, system sacrifices driving comfort to ensure the safety of the lane change. Balancing driving safety and comfort has always been a research challenge. Long-term trajectory prediction can provide accurate future trajectories of target vehicles, providing reliable long-term information to compensate for the short-term variability of decision systems. This paper proposes a novel decision-making model with long-term trajectory prediction for lane-changing. First, we constructed a long-term trajectory prediction model to predict the trajectories of surrounding vehicles. Besides, we built a lane change decision-making model based on fuzzy inferencing, considering the predicted trajectories to infer the relative relationship between other vehicles and the self-driving car. The establishment of the fuzzy rule library considered the vehicle speed, acceleration, system delay time, driver delay time and the distance between vehicles. Finally, we created a dataset for training and testing the trajectory prediction model, and we built 4 cases simulation environments, for two or three vehicles on a straight road or curved road, respectively, to test the decision-making model. Experimental results show that our proposed model can ensure driving safety and improve driving comfort.
... Convolution layers gather important features from the spatial observation environment and assign intermediate features to output classes. Successful CNN image-related applications (Dosovitskiy et al. 2020;Pandey et al. 2017;Sharma et al. 2022;Hess and Bores 2022) inspired the proposed model. The proposed model computes intermediate features using CNN and encodes them to maximize the rainfall prediction performance. ...
Article
Full-text available
Indian Summer Monsoon (ISM) rainfall is largely contributed by synoptic scale low-pressure systems over the Bay of Bengal and moves towards Indian landmass through eastern Indian states such as Odisha. These systems often cause heavy to very heavy rainfall localized events. The prediction of these events with high accuracy is still a major challenge for deterministic weather models. For the first time, this study has used machine learning (ML) and deep learning (DL) methods to improve the rainfall forecast using Weather Research and Forecast (WRF) forecasts output up to a lead time of 96 h (day 4) at the district scale of Odisha. Our findings demonstrate that the ML model improves the cumulative rainfall forecast (> 70%) but not more than the DL (multilayer perceptron (MLP) and convolutional neural network (CNN)) models, i.e., > 80%. Overall, on average, the DL model improved the rainfall prediction accuracy by 14% compared to ML models and 16% compared to the WRF model respectively. Results suggested that the CNN predicts rainfall with more than 70% for heavy and very heavy rainfall events for all days. It is also noted that WRF microphysics schemes are biased towards light rainfall class and the same has been effectively corrected by DL models. Furthermore, CNN shows promising results with more than 80% percent accuracy in forecasting rainfall for heavy rainfall events at the district scale. The inclusion of DL models in the Numerical Weather Prediction (NWP) model forecast output convincingly enhances the prediction skills. The findings of this study are highly significant for operational agencies, and disaster managers for effective planning, management, and preparedness at the district scale.
... Zang et al. [29] proposed a method for recognizing and predicting lane change intentions of vehicles using contextual traffic information to improve car-following control. After determining the intention, relative gap and vehicle speed based collision estimation function are utilized to provide collision probability in a driving scenario [30][31][32][33][34][35][36]. ...
Article
Full-text available
The collision avoidance system in an autonomous vehicle, intended to address traffic safety issues, has a crucial function called collision estimation. It accomplishes this by identifying potential dangers and notifying the drivers in advance or by using autonomous control to navigate safely. In this work, a novel approach is proposed for generating and selecting a lane change trajectory for the vehicle in a driving scenario where two vehicles are simultaneously executing lane change processes on highways and approaching the same target lane. Moreover, a novel fuzzy logic estimator based on time-to-collision (TTC) and time-to-gap (TTG) is designed to estimate the collision risk. In the collision avoidance process, the proposed estimator is utilized to determine the risk of a collision with polynomial function-based generation of possible lane change trajectories. The safest lane change trajectory is then provided to the motion controller so that it can navigate the vehicle safely through such a challenging lane change scenario. This work also investigates Stanley and Pure Pursuit controllers to follow the optimized trajectory. The simulation experiment results demonstrate that the proposed approach for dynamic trajectory generation during the lane change process can successfully handle this type of challenging situation and prevent a potential collision. Experimental results also indicate that monitoring the movement of the nearby lane-changing vehicle is crucial for safe lane change execution and that the proposed approach successfully handles the challenging situation preventing potential collision.
... As previously mentioned, the U-Net can be divided into two parts. The first part is the contracting path, which adopts a typical convolution neural network (CNN) architecture [26]. Each block in this path consists of two consecutive 3 × 3 convolutions, followed by a ReLU activation unit and a max-pooling layer. ...
Article
Full-text available
Predicting heavy rainfall events (HREs) in real-time poses a significant challenge in India, particularly in complex terrain regions like Assam, where these hydro-meteorological events are frequently associated with flash floods with severe consequences over the region. The devastating HREs in June 2022 led to numerous casualties, extensive damage, and economic losses exceeding 200 crores, necessitating the evacuation of over 4 million individuals. Even recently, June 2023, Assam went through immense flooding situation. Due to the limitations of deterministic numerical weather models in accurately forecasting these events, the study explores the incorporation of deep learning (DL) models, specifically U-Nets, using simulated daily accumulated rainfall outputs from various parameterization schemes. Over a four-day period in June 2022, the U-Net-based model demonstrated superior skills in predicting rainfall at the district scale, achieving a mean absolute error (MAE) of less than 12 mm, outperforming individual and ensemble model outputs. Comparing the DL model’s performance to the weather research and forecasting (WRF) forecasts, it exhibited a remarkable 64.78% reduction in MAE across Assam. Notably, the proposed model accurately predicted HREs in specific districts such as Barpeta, Kamrup, Kokrajhar, and Nalbari, showcasing improved spatial variation compared with the WRF model. The DL model’s predictions aligned with actual rainfall ( ${>} 150$ mm) observations from the India Meteorological Department (IMD), while the WRF forecasts consistently underestimated rainfall intensity ( ${< }100$ mm). Furthermore, the proposed model achieved a high prediction accuracy of 77.9% in categorical rainfall prediction, significantly outperforming the WRF schemes by 38.1%.
... A Transformerbased encoder-decoder architecture addresses gradient vanishing and computation time issues by accepting the complete input sequence, unlike RNN-based models. Transformer (TF) models are popular for sequence-to-sequence learning issues [24][25][26][27][28]. However, vehicle trajectory prediction on multi-lane highways has not been investigated using TF-based modelling. ...
Conference Paper
Full-text available
The autonomous vehicle anticipates its own behaviour and future trajectory based on the expected trajectories of surrounding vehicles to prevent a potential collision in order to navigate through complex traffic scenarios safely and effectively. The estimated trajectories of surrounding vehicles (target vehicles) are also influenced by past trajectory and positions of its surroundings. In this study, a novel Transformer-based network is used to predict autonomous vehicle trajectory in highway driving. Transformer’s multi-head attention method is employed to capture social-temporal interaction between the target vehicle and its surroundings. The performance of the proposed model is compared with Recurrent Neural Network (RNN) based sequential models, using the NGSIM dataset. The results show that the proposed model predicts 5s long trajectory with 10% lower Root-Mean-Square Error (RMSE) than the RNN-based state-of-the-art model.
Article
Lane changing behavior is one of the most essential and complex driving attributes. The lack of proper lane changing behavior can lead to collisions and traffic congestion. In this work, a novel hierarchical software architecture for the prediction of lane changing behavior on highways has been developed and evaluated. The two-layer hierarchical structure of the proposed model is based on a support vector machine (SVM) in the first layer followed by another model based on continuous Hidden Markov Model (HMM) incorporated with a Gaussian Mixture Model (GMM) in the second layer. The trajectory classification predicted in the first layer by the SVM is binary, i.e., Lane Change (LC) and Lane Keep (LK) behaviors. The second layer of the software architecture further classifies the LC behavior output of the first layer to left-lane change (LLC) and right-lane change (RLC) behaviors using the model of continuous HMM (CHMM) incorporated with GMM. The developed model has been evaluated using the real-world dataset of U.S. Highway 101 and Interstate 80 from Federal Highway Administration’s Next Generation Simulation (NGSIM). The first layer prediction is performed within an approximately 10 seconds time window. The positions, velocity and Time to Collision (TTC) of the target and surrounding vehicles are taken as input parameters in the model execution of the second layer. The test results show that the proposed hierarchical model exhibits 91% accuracy for LLC, 87% accuracy for RLC and 99% accuracy for LK behaviors. This model can be effectively used as a lane changing suggestion system in the advanced driver assistance systems (ADAS).
Article
In addition to being extremely non-linear, modern problems require millions if not billions of parameters to solve or at least to get a good approximation of the solution, and neural networks are known to assimilate that complexity by deepening and widening their topology in order to increase the level of non-linearity needed for a better approximation. However, compact topologies are always preferred to deeper ones as they offer the advantage of using less computational units and less parameters. This compacity comes at the price of reduced non-linearity and thus, of limited solution search space. We propose the 1-Dimensional Polynomial Neural Network (1DPNN) model that uses automatic polynomial kernel estimation for 1-Dimensional Convolutional Neural Networks (1DCNNs) and that introduces a high degree of non-linearity from the first layer which can compensate the need for deep and/or wide topologies. We show that this non-linearity enables the model to yield better results with less computational and spatial complexity than a regular 1DCNN on various classification and regression problems related to audio signals, even though it introduces more computational and spatial complexity on a neuronal level. The experiments were conducted on three publicly available datasets and demonstrate that, on the problems that were tackled, the proposed model can extract more relevant information from the data than a 1DCNN in less time and with less memory.
Article
Object encoding and identification is crucial for many robotic tasks such as autonomous exploration and semantic relocalization. Existing works heavily rely on the tracking of detected objects but have difficulty to recall revisited objects precisely. In this paper, we propose a novel object encoding method, which is named as AirCode, based on a graph of key-points. To be robust to the number of key-points detected, we propose a feature sparse encoding and object dense encoding method to ensure that each key-point can only affect a small part of the object descriptors, leading it to be robust to viewpoint changes, scaling, occlusion, and even object deformation. In the experiments, we show that it achieves superior performance for object identification than the state-of-the art algorithms and is able to provide reliable semantic relocalization. It is a plug-and-play module and we expect that it will play an important role in various applications.
Article
This paper introduces an integrated path tracking control strategy for autonomous vehicles. The proposed control strategy is based on a multi-input multi-output linear model predictive control (LMPC) with a fuzzy logic switching system. The designed MPC is based on Laguerre networks. The main target of the designed MPC is to produce the optimal control signals of the steering angle and the angular velocity while considering the physical constraints of the control signals and the measurements noise. Since the vehicle model is highly nonlinear and is operated over a wide range of operating points, different linearized models are obtained. The controller parameters for each linear model are designed and tuned. The gab metric analysis is used to select a number of these models to simplify the design of the proposed controller. Then, these models are combined using a fuzzy logic controller to switch between them. To test the proposed controller performance, different paths are generated using path planning algorithms. These paths simulate different vehicle maneuvers scenarios. The simulation results show that the designed tracking controller has a tracking performance on different designed paths better than that of a Linear quadratic gaussian (LQG) tracking controller, discussed in this paper.
Article
Mechanical system usually operates in harsh environments, and the monitored vibration signal faces substantial noise interference, which brings great challenges to the robust fault diagnosis. This paper proposes a novel attention-guided joint learning convolutional neural network (JL-CNN) for mechanical equipment condition monitoring. Fault diagnosis task (FD-Task) and signal denoising task (SD-Task) are integrated into an end-to-end CNN architecture, achieving good noise robustness through dual-task joint learning. JL-CNN mainly includes a joint feature encoding network and two attention-based encoder networks. This architecture allows FD-Task and SD-Task can achieve deep cooperation and mutual learning. The JL-CNN is evaluated on the wheelset bearing dataset and motor bearing dataset, which shows that JL-CNN has excellent fault diagnosis ability and signal denoising ability, and it has good performance under strong noise and unknown noise.
Article
Mud layer height of thickener is the key quality index of thickening process which is difficult to achieve real-time detection with existing methods in reality. While the need of developing a soft sensor model which can be used for real-time detection of mud layer height, we proposed an end-to-end mud layer height prediction method with attention mechanism-based convolutional neural network (CNN). The dynamic features are firstly extracted from the image samples based on CNN, and then two types of attention mechanism are embedded sequentially to contribute to more precise prediction results. Compared with the traditional spatial attention mechanism, the regional spatial attention mechanism we proposed selectively divides the spatial feature map into regions, while regions containing important features are assigned larger weights. Adding the channel and regional spatial attention mechanism in CNN not only effectively improve both the precision and calculation speed, but also affect the dimension of the output feature map, so as to avoid the loss of channel or spatial attention information of the feature map. To verify the validity of the proposed method, different attention mechanisms are embedded in the CNN, and the corresponding experiments are carried out on the dataset of the thickener mud layer. The experimental results demonstrate the feasibility and effectiveness of the mud layer height prediction method.