ArticlePDF Available

Kernelized convolutional transformer network based driver behavior estimation for conflict resolution at unsignalized roundabout

July 2022
ISA Transactions 133(1)

July 2022
133(1)

DOI:10.1016/j.isatra.2022.07.004

Authors:

Omveer Sharma

University of Haifa

Niladri Puhan

Indian Institute of Technology Bhubaneswar

The modelling of driver behavior plays an essential role in developing Advanced Driver Assistance Systems (ADAS) to support the driver in various complex driving scenarios. The behavior estimation of surrounding vehicles is crucial for an autonomous vehicle to safely navigate through an unsignalized intersection. This work proposes a novel kernelized convolutional transformer network (KCTN) with multi-head attention (MHA) mechanism to estimate driver behavior at a challenging unsignalized three-way roundabout. More emphasis has been placed on creating convolution in non-linear space by introducing a kervolution operation into the proposed network. It generalises convolution, improves model capacity, and captures higher-order feature interactions by using Gaussian kernel function. The proposed model is validated using the real-world ACFR dataset, where it outperforms current state-of-the-art in terms of behavior prediction accuracy and provides a significant lead time before potential conflict situations.

Content uploaded by Omveer Sharma

Content may be subject to copyright.

ISA Transactions 133 (2023) 13–28

Contents lists available at ScienceDirect

ISA Transactions

journal homepage: www.elsevier.com/locate/isatrans

Research article

Kernelized convolutional transformer network based driver behavior

estimation for conflict resolution at unsignalized roundabout

Omveer Sharma, N.C. Sahoo ∗, Niladri B. Puhan

School of Electrical Sciences, Indian Institute of Technology Bhubaneswar, Odisha, India

article info

Article history:

Received 12 February 2022

Received in revised form 6 July 2022

Accepted 6 July 2022

Available online 16 July 2022

Keywords:

Intelligent vehicle

Deep learning

Driver behavior

Roundabout

Convolutional neural network

Attention mechanism

abstract

The modeling of driver behavior plays an essential role in developing Advanced Driver Assistance

Systems (ADAS) to support the driver in various complex driving scenarios. The behavior estimation

of surrounding vehicles is crucial for an autonomous vehicle to safely navigate through an unsignalized

intersection. This work proposes a novel kernelized convolutional transformer network (KCTN) with

multi-head attention (MHA) mechanism to estimate driver behavior at a challenging unsignalized

three-way roundabout. More emphasis has been placed on creating convolution in non-linear space by

introducing a kervolution operation into the proposed network. It generalizes convolution, improves

model capacity, and captures higher-order feature interactions by using Gaussian kernel function. The

proposed model is validated using the real-world ACFR dataset, where it outperforms current state-of-

the-art in terms of behavior prediction accuracy and provides a significant lead time before potential

conflict situations.

1. Introduction

Future driver assistance and safety systems for vehicles will

incorporate contextual information such as driver’s intention and

expected behavior to achieve predictive driving. As driver behav-

ior cannot be measured directly, it is a challenging task to collect

this information making it an open field of research [1–7]. The

behavior estimation functionality of a safety system is needed to

predict future position of surrounding traffic participants based

on current and past observations of traffic environment [8–14].

While driving behavior is highly concentrated under organized

conditions, especially with strictly graded paths and traffic lights,

unmarked street scenes are significantly less focused. Driving at

intersections with no traffic lights is challenging and necessitates

significant interaction and estimation from other drivers. A wide

range of driving styles that a driver might display while navigat-

ing through an unsignalized intersection, adds to complexity of

the problem. One such intersection is the roundabout, extensively

found in numerous urban areas.

The driver intention estimation at the intersection is a widely

studied problem. The approaches like Hidden Markov Models

(HMMs) [15,16], Support Vector Machines (SVM) [17] and Deep

learning-based networks [18] have been applied successfully for

intention prediction. Amsalu and Homaifar [15] used HMMs with

∗Corresponding author.

E-mail addresses: os10@iitbbs.ac.in (O. Sharma), ncsahoo@iitbbs.ac.in

(N.C. Sahoo), nbpuhan@iitbbs.ac.in (N.B. Puhan).

Hybrid State System (HSS) framework to estimate turning be-

haviors near intersections. Aoude et al. [17] also used SVM and

HMM to estimate compliant and violating behaviors at road in-

tersection. A fuzzy logic (FL) based model is proposed to describe

driver behavior in dilemma zone of high-speed signalized inter-

sections [19]. It may be noted that such a model works well for

specific feature patterns. The driver behavior is heavily dependent

on vehicle’s prior states (trajectory) so that driver behavior pre-

dictor model should be able to learn the sequential pattern and

intelligently propagate past information to estimate behavior in

current state. In [20], authors proposed a closed Lagrange based

solution that uses a reciprocal collaborative process in terms of

time and cost to solve collision detection and avoidance.

More recently, deep learning-based techniques have been

explored to deal with complex environments such as intersec-

tions [21]. Jeong et al. [22] designed a Model Predictive Control

(MPC), and long short-term memory (LSTM) based motion plan-

ner to determine acceleration commands on predicted states

of surrounding vehicles at three-way intersections. The LSTM

models have been utilized to classify vehicle tracks into turn-left,

turn-right and go-straight at both three-way and four-way square

intersections [23–25]. Zhang and Fu [26] used a bi-directional

LSTM (Bi-LSTM) network to recognize turning behavior and

achieved 94.2% and 93.5% accuracies after 2s and 1s, respectively,

at four-way intersection. Liu et al. [27] used LSTM to classify

driving intents at the intersection with inputs of acceleration,

speed and brake pedal, and achieved a prediction accuracy of

95.3%.

https://doi.org/10.1016/j.isatra.2022.07.004

O. Sharma, N.C. Sahoo and N.B. Puhan ISA Transactions 133 (2023) 13–28

These works are either in a well-structured intersection or in

relays to convey driver’s intentions via vehicle-to-vehicle com-

mutation. However, behavior in unsignalized roundabout is dif-

ferent from a normal intersection [28–30]. Zyner et al. [31] used

LSTM model to predict exit destination of vehicles at an unsignal-

ized roundabout by using position, speed and heading of the

target vehicle. The authors also presented ACFR dataset of five

unsignalized roundabouts, a collection of over 23,000 vehicle

trajectories [32]. Zyner et al. [33] introduced clustering process

in conjunction with LSTM model that extracts possible paths in

prediction output and sorts them by probability scores. The opti-

mization process is integrated into reinforcement learning frame-

work to simultaneously decide behavior, desired acceleration, and

action time at the roundabout [34].

The driver’s behavior is highly influenced by vehicle’s prior

states (trajectory), so driver behavior predictor model should be

able to learn the sequential pattern and intelligently propagate

past data to forecast behavior in current state. The RNN (recurrent

neural network)-based models are commonly employed to deal

with sequential data in case of time series prediction and classifi-

cation problems [35], although they have two drawbacks. Firstly,

such models suffer from gradient vanishing problem for long in-

put sequences. Secondly, RNN-based models propagate input in a

regressive manner; as a result, models require more computation

time. The length of input sequence has a significant impact on

computation time of such models. In driver behavior modeling,

long input sequences increase computation time and decrease

prediction accuracy. More recently, memory mechanism for RNN

processing in a sequential step-by-step manner and their ability

to replicate social interaction have also been analyzed [36–40].

Transformer (TF) based models have become popular in recent

years for overcoming sequence-to-sequence learning difficulties.

Unlike RNN-based models, TF accepts complete input sequence

at once. As a result, the length of input sequence has little effect

on their computation time, and it can handle gradient vanishing

problem. Using an attention mechanism in a TF network, the

authors came up with a new approach to address this sequential

memory mechanism [41]. The distinction between TF and RNN is

that RNN processes observations sequentially before attempting

to forecast auto-regressively, whereas TF ‘‘considers’’ all available

observations and weights them via an attention mechanism.

The generic TF model is deployed for pedestrian trajectory

prediction where the problem is formulated as sequence-to-

sequence learning like in natural language processing (NLP)

[42–44]. However, TF-based modeling has not been explored for

driver behavior prediction, which is a sequence-to-point (clas-

sification) problem. In this work, the generic TF is modified to

perform driver behavior classification task. This work presents

a novel Convolutional Transformer Network (CTN) to estimate

driver intent at an unsignalized roundabout using a short seg-

ment of tracking data that can be collected using overhead cam-

eras and Lidars in real-time. The proposed network is inspired by

recent effectiveness of TF network in natural language processing

[41,45] and takes benefit of TF’s ability to process entire input

sequence in a single time step with less computation time.

The novelty/contribution of present work is in improvements

carried out on generic TF to address driver behavior prediction

task. In particular, the following modifications have been carried

out on generic TF model.

1. The decoder layer of TF is replaced by fully connected

layers. This is done because there is no need for a decoder

as the model’s output is not sequential.

2. A large number of features is a key component of an

effective TF model (similar to word embedding layer in

NLP). For example, the row input sequence only contains

four features (two location coordinates, heading direction,

and speed). Accordingly, a convolutional neural network

(CNN) is deployed to extract multiple temporal dependant

features in high-dimensional space.

3. The activation layer can only give point-wise non-linearity

in CNN-based models [46–49]. To increase proposed CTN’s

capacity to establish convolution in non-linear space, ker-

volution (kernel convolution) methodology is adopted. The

higher-order non-linear feature maps improve discrimina-

tion of subsequent linear classifiers [50]. Thus the pro-

posed model integrated benefits of both TF and kernel-

ized convolution-based feature extraction techniques. As

a result, compared to an RNN-based model, the proposed

model is more accurate in predicting driver behavior and

takes nearly half the time for computation.

In this work, the proposed CTN and KCTN are tested on real-

world data and results demonstrate that the proposed models

are suitable for driver’s intent classification task and outperform

state-of-the-art prediction techniques. It is expected that these

models can be applied in conjunction with positional informa-

tion of other vehicles to achieve a higher degree of safety in

autonomous driving. The proposed models can be integrated

into autonomous vehicle’s advanced driver assistance systems

(ADAS) to test real-world performance. The contributions of this

study are primarily based on above mentioned ideas. Below we

summarize our main technological contributions:

1. By exploiting strength of TF, the proposed CTN and KCTN

require less computation time than other sequential net-

works (Eg. LSTM, GTN, Bi-LSTM).

2. Kernel trick is employed to generalize convolution process

in nonlinear space. This enables TF encoder layer to extract

intermediate time features of vehicle trajectories.

3. The proposed CTN and KCTN are evaluated to forecast

drivers’ intentions using a real-world dataset on the round-

about. The experiments illustrate that the proposed CTN

and KCTN perform better than existing RNN-based models.

4. In NLP, the TF has proven its effectiveness for sequence-to-

sequence learning. However, this study attempts to address

the way for future improvements in TF-based models for

driver behavior prediction, which can be viewed as a time

series classification problem.

Thus the proposed model integrates benefits of both TF and

kernelized convolution operation. As a result, the proposed model

is more accurate in terms of classification and takes less time

to compute. This work is structured as follows. In Section 2,

summary of dataset and problem formulation are presented. Sec-

tion 3then explains network architecture of the proposed model.

Section 4describes results and Section 5concludes the work.

2. Dataset and problem formulation

The model used a real-world dataset from vehicles crossing an

unsignalized roundabout in Leith-Croydon, Sydney, Australia [31].

The data is collected on each detected vehicle, including relative

X/Y position with respect to frame of reference (meters), speed

(meters/second), heading (radians), size (width/height meters),

classification (bike, car, truck, pedestrian), and classification con-

fidence at a sampling rate of 25 Hz. The vehicle tracks are labeled

with three entrances and three exits from the intersection, as

shown in Fig. 1. It also shows three conflict points. The conflict

area is defined as a place where two vehicles from different sides

can collide for first time. For example, at the bottom-right conflict

point of Fig. 1, vehicles entering from south can collide with

vehicles entering from east for first time. As a result, vehicles

O. Sharma, N.C. Sahoo and N.B. Puhan ISA Transactions 133 (2023) 13–28

Fig. 1. Diagram of intersection studied. The conflict points are marked with

squares [31].

Table 1

Number of trajectory samples in Leith-Croydon intersection (ACFR dataset) [31].

Origin Destinations

East North South Total

East 0 2418 368 2786

North 2268 0 530 2798

South 1122 688 0 1810

Total 3390 3106 898 7394

approaching from south must be aware of the destination (exit

side) of vehicles approaching from east. Only vehicles that pass

through both entry and exit points are selected in this research.

The summary of dataset is shown in Table 1. The dataset con-

tains vehicle tracks, S= [S1,S2,S3,...,SN]where Nis number

of vehicles in the dataset. Each vehicle track Sicontains entire

trajectory concerning position (lateral (xt) and longitudinal (yt)),

heading (θt) and speed (ut) which exist over time-steps t. All

tracks are broken down into all possible sequences of length L

and used as input for the model. The extraction of input sequence

is summarized in following equations.

S={(X1,X2,X3,...,XT)n,Cn}N

n=1(1)

Xt= [xt,yt, θt,ut](2)

I=Xp,Xp+1,...,Xp+L−1,CnT+1−L

p=1N

n=1(3)

Where, Sis the set of all trajectory samples, Tis length of nth

track in S,Xtis the feature vector of nth track at time t,Cnis

target class of nth track, and Iis the extracted sequences which

are fed to train and validate the model. All tracks are aligned with

the distance traveled from intersection entry point, which are

then utilized to evaluate classification accuracies of the proposed

model at different traveled distances [31].

3. Network architecture

The CNN is a powerful deep learning model primarily used

for image-related applications like image classification [51]. Al-

though CNN with an RNN-based model can be effective in se-

quence learning applications such as driver behavior prediction,

to our knowledge it has not been explored with numeric ACFR

dataset. For instance, [52,53] explores CNN-LSTM architecture

for driver behavior prediction using PREVENTION [54] image

dataset. In CNN-RNN based models, convolution layers learn to

extract crucial features from observation sequences and assign

intermediate features to different output classes. The RNN-based

model can find interdependence in time series sequences and

identify the appropriate mode. However, members of RNN family

use inputs sequentially and require high computation time. The

RNN-based model may also experience difficulty due to vanishing

gradient problem with long input sequences. These limitations

can be avoided in proposed CTN by utilizing TF encoder layer

for processing entire input sequence. The encoder is a compo-

nent of TF that determines which sections of input should be

focused on. Encoder is successfully implemented for graphical

inputs (like images) and sequential numeric data for classification

purposes [55]. However, this work uses sequential numeric data

for driver behavior prediction. The motivation for the proposed

model is drawn from successful instances of CNN and TF mod-

els in their respective fields of applications [41,45,55–57]. The

proposed model computes intermediate features using CNN and

then encodes them to capitalize on both CNN and TF benefits.

Precisely, only encoder layers of TF are used with sequential input

that contains multiple convolutional features. The original feature

space is expanded using two parallel CNN branches consisting

of 1D convolutions to obtain discriminative features, which are

concatenated with initial input. The decoder layers in vanilla TF

network generate the output sequence for sequence-to-sequence

learning. However, driver behavior prediction can be considered

as a sequence-to-point learning problem. So, decoder layers are

replaced by feed-forward layers to reduce network complexity

and computation time. The proposed network architecture is

shown in Fig. 2.

In proposed network, original input sequence (relative posi-

tion of vehicle, vehicle speed, and vehicle heading) is fed to both

CNN branches to expand the feature space. The input dimension

is L×4. where, L is input sequence’s length (time steps) with

4 features (relative X/Y position, heading and speed). Each CNN

branch has two convolution layers, and each convolution layer

uses 254 kernels. Different kernels generate different feature

representations, resulting in feature diversity and efficiency. The

filters used in these branches are of sizes 2 and 3, respectively.

Mathematically, the output of tth layer ltis:

lt=φ(rt∗at+bt) (4)

Where, φis ReLU activation function, rtis an input, ∗denotes

convolution operation, atis the learned kernel, and btis bias.

The output features from two parallel branches are concatenated

along with original input features to generate output (Ocnn) di-

mension of L×512. The output is passed through positional

encoding layers to encode temporal information and utilize se-

quential correlation of time steps. The positional encoding layer

uses both sine and cosine functions [41]. The output of positional

encoding layer (Opos) has dimension of L×512 and is calculated

as:

Opos =Ocnn +PE (5)

Where, PE is positional encoding coefficient matrix. The output of

positional encoding layer is fed to the encoder layer. Each encoder

layer consists of a MHA layer and a fully connected feed-forward

network (FFN). A residual connection on each sub-layer is main-

tained to direct flow of information and gradient. The residual

link and output of the sub-layer are added and normalized before

passing forward. In MHA, eight parallel self-attention layers are

concatenated (head, h = 8) as shown in Fig. 3. The number of

heads and encoder layers are decided based on outcomes of the

experiments. A more detailed explanation will be provided in

‘Results and discussion’ section (4). The encoder’s MHA block

generates attention vectors. The MHA generates information at

various positions from different representation subspaces. The

O. Sharma, N.C. Sahoo and N.B. Puhan ISA Transactions 133 (2023) 13–28

Fig. 2. Proposed Convolutional Transformer Network (CTN).

Fig. 3. Multi-head attention (MHA) mechanism for eight heads.

single head attention function in self-attention sub-layer is de-

fined by assignment of queries (q), keys (k) and values (ν) to an

output, and it is computed as weighted sum of values. The query’s

compatibility function and related key are used to determine how

much weight (attention) each value gets. To compute attention

function simultaneously in a matrix form, the queries, keys and

values are represented as matrices Q,Kand V, respectively. The

matrices Q,K, and Vcan be considered weight matrices that help

in keeping connection between feature vectors in a sequence.

Thus encoder layer (specially MHA layer) establishes attention

between feature vectors at different time instants. In this work,

sequential data (multivariate time-series data) is used to train and

validate the proposed model. The attention is calculated as:

Attention(Q,K,V)=soft max Q K T

√dkV(6)

Where, dkis the dimension of the key vector. For MHA, the output

matrix is calculated as:

MultiHead(Q,K,V)=Concat(head1,...,headh)Wo(7)

headi=Attention(QW Q

i,KW K

i,VW V

i) (8)

Where, WQ

i∈Rdmodel×dk,WK

i∈Rdmodel×dk,WV

i∈Rdmodel×dV

and Wo∈RhdV×dmodel are parameter matrices, and dVis the

dimension of value vector. The encoder layer produces an output

of dimension dmodel= 512. In attention layer of encoder layer,

the keys, values and queries come from previous encoder layer.

Thus, each position in encoder layer can account for all positions

in previous layer, and the model learns dependencies between

input time steps. The MHA mechanism is explored in-depth in

following paragraph.

To explain MHA mechanism, an input sample is shown in

Table 2. The input sample has a dimension of N ×L, as indicated

O. Sharma, N.C. Sahoo and N.B. Puhan ISA Transactions 133 (2023) 13–28

Table 2

An illustrative example with input sequence of length 15.

Features Time

t0t1t2t3t4t5t6t7t8t9t10 t11 t12 t13 t14

Speed 4.9 4.9 4.9 4.9 5 5.1 5.1 5.2 5.2 5.2 5.1 5.2 5.2 5.1 5.1

X−9.4 −9.3 −9.3 −9.2 −9.2 −9.1 −9−8.9 −8.8 −8.7 −8.6 −8.5 −8.4 −8.3 −8.1

Y 1.4 1.3 1.2 1.1 1 0.8 0.7 0.5 0.3 0.1 −0.1 −0.3 −0.4 −0.5 −0.7

Heading −1.3 −1.3 −1.3 −1.2 −1.2 −1.2 −1.2 −1.2 −1.2 −1.1 −1.1 −1.1 −1.1 −1−1

in Table 2, where Nis number of features (4), and L is sequence

length. In this work, sequence lengths (L) of 5, 15, and 25 are used.

Here, matrices Q,K, and Vare used to compute attention which

represents the relationship between feature vectors at different

time instants of an input sequence (in this work, the number

of time instants is same as input sequence length). The MHA

mechanism is explained below.

Firstly, this original input sequence is fed to CNN to expand the

feature space. The dimension of output of CNN (Ocnn) is 512 ×15.

Then, to encode temporal information, output (Ocnn) is transposed

and passed through positional encoding layers. This output (Opos)

is then taken as input (IEncoder ) to MHA layer of the encoder. The

MHA mechanism on input (IEncoder ) is performed in the following

steps.

Step 1: Linear transformations are carried out on input (IEncoder )

by using three weight matrices WQ∈Rdmodel×dmodel ,WK∈

Rdmodel×dmodel and WV∈Rdmodel ×dmodel to compute matrices Q,

K, and V, respectively, where dmodel is dimension of the model

(dmodel =512).

[Q]15×512 =[IEncoder ]15×512.WQ512×512 (9a)

[K]15×512 =[IEncoder ]15×512.WK512×512 (9b)

[V]15×512 =[IEncoder ]15×512.WV512×512 (9c)

Step 2: For each head ‘i’, Q,K, and Vmatrices (as obtained in Step

1) are passed through three linear transformations to obtain Qi,

Kiand Viby using learnable weight matrices WQ

i∈Rdmodel×dk,

i∈Rdmodel×dkand WV

i∈Rdmodel×dv, respectively, where dk

is dimension of key vector and dvis dimension of value vector.

In proposed model, 8 heads (h = 8) are used. Thus, dk=dv=

dmodel

h=64.

[Qi]15×64 =[Q]15×512.WQ

i512×64 (10a)

[Ki]15×64 =[K]15×512.WK

i512×64 (10b)

[Vi]15×64 =[V]15×512.WV

i512×64 (10c)

Step 3: The attention weights (scores) for value vectors of ith head

are calculated as:

WiA15×15 =softmax [Qi]15×64KiT64×15

√dk(11)

It is noted that dimension of the above-mentioned weight ma-

trix is 15 ×15. In this weight matrix, the element WiA

fg repre-

sents attention between feature vectors at time instant ‘f′with

time instant ‘g′. Thus this weight matrix represents weighted

relationship between all-time instants (here sequence length is

15).

Step 4: The output of headiis calculated as:

[headi]15×64 =WiA15×15.[Vi]15×64 (12)

It is noted that matrix Viis a linear transformation of input

sequence IEncoder and attention weight matrix WA

irepresents

weighted relationship between time instant features. Thus these

matrices perform the same role as RNN does without adding to

computation time by taking whole sequence once.

Step 5: The outputs of all 8 heads are concatenated as:

[MHAout ]15×512 =concat [head1]15×64 ,

[head2]15×64,..., [head8]15×64 (13)

Step 6: The final output of MHA layer is calculated by a linear

transformation of concatenated output from all heads by using

weight matrix Wo∈Rhdv×dmodel .

[MultiHead(Q,K,V)]15×512 =[MHAout ]15×512 .Wo8∗64×512 (14)

Thus it is clear that input of MHA layer (IEncoder ) and output

of MHA layer (MultiHead(Q,K,V)) have same dimensionality

(15 ×512).

The output of MHA layer in first encoder layer OMHA1(as

defined in Eq. (7)) is processed through Add & Normalization

(Norm). The processed output in first encoder layer (Oadd &norm1)

is calculated as:

Oadd &norm1=Norm(OMH A1+Opos) (15)

The output after normalization process has dimension of L×

512. This output OAdd &Norm1is fed to the FFN. In FFN, similar

linear transformation is carried out across different positions. The

output of FFN in first encoder layer (OFFN 1) is calculated as:

OFFN1=σOadd &nor m1W1+B1W2+B2(16)

Where, dimensions of input (Oadd &norm1) and output (OF FN 1) are

same as L×512. The output of FFN in first encoder layer (OFFN1)

is processed through Add & Normalization. The processed output

of first encoder layer (OEncoder1) is given as:

OEncoder1=Norm(OFFN 1+Oadd &nor m1) (17)

The output of first encoder layer (OEncoder1) is fed to second

encoder layer subsequently. The outputs of second encoder layer

(OEncoder2) and third encoder layer (OEncoder3) are calculated sim-

ilarly to first layer as in Eqs. (6)–(17). The output OEncoder3is

flattened and fed to first feed-forward (FF) layer to generate

output (OFF ) as per Eq. (16). Then output OFF is passed through

softmax layer to generate final output (O) that denotes the class

(east/north/south) to which the input belongs.

The detailed procedure involving training and testing of pro-

posed algorithm is outlined in Table 3. In following, B is batch

size, L is input sequence length, number of features in raw input

sequence is 4, and model dimension (dmodel) is 512.

Using kervolution technique, the proposed CTN obtains even

more performance improvement. The motivation derives from

the effective use of kervolution to improve CNN models in their

respective fields of application. Because convolutional layers are

linear, they cannot express non-linear behavior properly. How-

ever, the activation layers’ non-linearity, such as rectified linear

unit (ReLU), can only give point-wise non-linearity. It has been

shown that higher-order non-linear feature maps can improve

O. Sharma, N.C. Sahoo and N.B. Puhan ISA Transactions 133 (2023) 13–28

Table 3

Training and testing process.

(1) TRAINING PROCESS Dimension

INPUT: Labeled training data as, ˜

S=S(1),S(2) ,...,S(E),Eis total of classes. B,L, 4

Step 1: F1

CNN1

←−− ˜

S; The raw training data (˜

S) are sent into first CNN branch to get extracted feature vectors, as

per Eqn. (4). CNN branch contains 254 kernels size of 2.

B,L, 254

Step 2: F2

CNN2

←−− ˜

S; The raw training data (˜

S) are sent into second CNN branch to get extracted feature vectors,

as per Eqn. (4). CNN branch contains 254 kernels size of 3.

B,L, 254

Step 3: Ocnn

Concat.

←−−− ˜

S,F1,F2, The output features from two parallel branches are concatenated along with

raw input features to generate output (Ocnn).

B,L, 512

Step 4: Opos

←− Ocnn; The output (Ocnn ) is passed through positional encoding layers, as per Eqn. (5). B,L, 512

IEncoder1=Opos;IEncoder1is input to the encoder. B,L, 512

Step 5: for i 1 to 3 (number of encoder layers)

OMHAi

MHA

←−− Opos ; The output (Ocnn ) is passed through MHA layers, as per Eqn. (7). B,L, 512

Oadd &norm1

add & norm.

←−−−−−− {Opos,OMHAi}.; The output and are added and then normalized in ‘Add &

Normalization’ layer, as per Eqn. (15).

B,L, 512

OFNNi

FNN

←−− Oadd &norm1

i; The output (Oadd &norm1

i) is fed to FFN. In FFN, the linear transformation is

carried out, as per Eqn. (16).

B,L, 512

OEncoderi

add & norm.

←−−−−−− {OFNNi,Oadd &norm1

i}; The output Opos and OMHAiare added and then normalized in

‘Add & Normalization’ layer, as per Eqn. (17).

B,L, 512

IEncoderi+1=OEncoderi; The output OEncoderiis passed to next encoder layer as input IEncoderi+1. B,L, 512

end

Step 6: OFlat

Flatten

←−−− OEncoder3; Output of last encoder layer (OEncoder3) is flattened. B,L, 512

Step 7: OFNN

FNN

←−− OFlat ; The flattened output (OFlat ) is fed to FFN, as per Eqn. (16). B,L, 512

Step 8: Osoftmax

←−−− OFNN ; The output (OFNN ) is passed through softmax layer.

Calculate Cross-Entropy loss and back propagate it to update learnable parameters of model. B,L, 512

(2) TESTING PROCESS Dimension

INPUT: ⌢

xis an input sample. B,L, 4

Step 1: ⌢

cCTN

←−− ˜

x;⌢

xis fed to the proposed CTN.

⌢

cis probability of ⌢

xbelonging to going straight, left turn and right turn.

Step 2: c=maxE

j=1cj,j=1,2,3; the driver behavior (going straight, left-turn or right-turn) that ⌢

xbelongs to

the discrimination of subsequent linear classifiers [50,58–60].

Authors claim that by using the kernel approach to generalize

convolution to non-linear operations, CNN can perform better

[61–66]. The kervolution is defined as follows:

Zp+1

i=φ(rp), φ(ap

i)(18)

Where, Zp+1

iis ith feature map of output layer (p+1th layer), rp

is feature map of input layer (pth layer), ap

iis ith kernel for input

layer, and high-dimensional mapping for rpand ap

iis denoted

by φ(.). The transition from low to high dimensions results in

a significant increase in computational burden. However, there

is a kernel trick that can deliberately reduce the burden. The

computation using the kernel trick is as follows:

φ(rp), φ(ap

i)=k(rp,ap

i) (19)

Where, kernel function of rpand ap

iis represented by k(rp,ap

i):

Rn×Rn→R. The Gaussian kernel function is utilized in this

work to implement kervolution procedure, which is defined as:

k(r,a)=exp(−γ∥r−a∥2) (20)

Where, rrefers for input vector, afor convolution kernel, and γ

for gamma value.

4. Results and discussion

The proposed model’s performance is evaluated by using a

real-world dataset. Total 7394 vehicle tracks are divided into

5914 tracks for training with normalization and 1480 tracks for

testing. In [31], data acquisition is performed using a vehicle that

has a system similar to the autonomous vehicle. The center of

occupied parking space by this vehicle is used as reference point

to calculate relative position of passing vehicles, as shown in

Fig. 4(a). The heading of passing vehicle is calculated by taking

traveled direction towards south (north to south) as reference

heading (zero in radians). This common frame is used to get ve-

hicle trajectories concerning the vehicle entrance side. The speed

and heading profiles along with traveled distance are shown in

Fig. 4(b) and Fig. 4(c), respectively. These tracks can be labeled in

two ways: based on track destination (east, north or south) and

based on turns (left-turn, right-turn or go-straight). The vehicle

tracks and their labeling are shown in Table 4. The proposed CTN

and KCTN are tested under both labeling conditions. The results

of both proposed networks are compared with those obtained

from LSTM-based model [31], Bi-LSTM model [26], vanilla TF

model (6 encoder layers), GRU-based model (built by replacing

the LSTM cells in 31 with GRU), and a CNN model (first portion

of the proposed CTN). It is emphasized that all models (includ-

ing refs 26,31) used in this work are trained and tested under

identical conditions. All models are trained and evaluated using

same training and testing samples. To represent vehicle state at a

specific time instant, all models employ four features (two spatial

coordinates, heading direction, and speed).

O. Sharma, N.C. Sahoo and N.B. Puhan ISA Transactions 133 (2023) 13–28

Fig. 4. (a) Average tracks of vehicles following left hand driving. (b) Heading profiles. (c) Speed profiles. Profiles of all tracks grouped into the six classes in

‘origin–destination’ pairs.

Table 4

Track Labeling based on destinations and turns.

Origin–destination Track labeling based Track labeling

on destination based on turn

East-north North Right-turn

East-south South Left-turn

North-east East Left-turn

North–south South Go-straight

South-east East Right-turn

South–north North Go-straight

4.1. Scenario 1: Track classification based on vehicle destination in

a common frame of reference

The track classification based on destination is performed

using past tracks. The average vehicle tracks from each original

side in roundabout are shown in Fig. 4(a). The speed and heading

profiles along with the traveled distance are shown in Fig. 4(b)

and Fig. 4(c), respectively. Based on experimental results, three

encoder layers and eight parallel self-attention heads are used in

proposed model’s MHA modules. The experimental results for dif-

ferent numbers of encoder layers and varying numbers of heads

are shown in Fig. 5, where input sequence length 15 is used.

The distance traveled from intersection entry is represented by

x-axis. The vehicle has not yet entered intersection, as indicated

by the negative traveled distance. The experimental results show

that beyond three encoder layers and eight heads, accuracies

saturate. As a result, taking into account the computational time,

we chose three encoder layers and eight heads in the proposed

model.

4.1.1. Performance of the proposed model in comparison to current

state-of-the-art models

Three network lengths are selected for evaluation: 5, 15 and

25-time steps long, corresponding to about 0.2, 0.6 and 1-second

data. Fig. 6 shows the performance of proposed networks (CTN

and KCTN) to predict vehicle’s destination. The x-axis represents

distance traveled from the intersection entrance. The vertical red

dash-line indicates the conflict point (12 m, 14 m and 22 m away

from east, north and south entrance point, respectively). Based on

the distance traveled at which the model achieves an accuracy of

99%, lead distance (distance of vehicle from the conflict point) is

estimated; further, lead time (time to reach the conflict point)

is calculated based on lead distance and average speed of the

vehicle. The lead distance and lead time in relation to vehicle

origin are presented in Table 5.

O. Sharma, N.C. Sahoo and N.B. Puhan ISA Transactions 133 (2023) 13–28

Fig. 5. Accuracy vs distance traveled relative to the start of the intersection. (a) An experimental study varying the number of parallel self attention heads (H) in

MHA module. (b) An experimental study varying the number of encoder layers (L) in the proposed model architecture.

Vehicles entering intersection from east side can go towards

north or south. The proposed KCTN achieved an accuracy of 99%

at 1 m traveled distance for all three sequence lengths. The

collision point for eastern origin vehicle occurs at 12 m traveled

distance, so KCTN offers a lead distance of 11 m. The lead time is

calculated based on lead distance and average speed (4.8 m/s) at

the forecast point. Thus, KCTN offers 2.29 s lead time for vehicles

originating in the east. Vehicles entering from north can travel

either east or south. The proposed KCTN achieves 99% accuracy at

3, 2, and 3 m traveled distance using input sequence lengths of 5,

15 and 25, respectively. The point of conflict occurs at a distance

of 14 m; therefore, KCTN offers lead distances of 11, 12 and 12 m

for sequence lengths of 5, 15 and 25, respectively. Based on lead

distance and vehicle’s average speed (7.9 m/s) at prediction point,

maximum lead time is 1.52s by taking input sequence lengths

15 and 25. The vehicles coming from south can travel towards

east or north. The proposed KCTN achieves 99% accuracy at a

traveled distance of 16 m for sequence lengths 15 and 25. The

point of conflict for southern origin vehicle is at a distance of

22 m; consequently, the proposed model offers a lead distance

of 6 m. The model provides a calculated lead time of 1.33 s based

on average speed (4.5 m/s). These findings are also graphically

represented in Figs. 7 and 8.

The result shows that both proposed networks (CTN and

KCTN) outperform current state-of-the-art models. However, the

O. Sharma, N.C. Sahoo and N.B. Puhan ISA Transactions 133 (2023) 13–28

Fig. 6. Accuracy vs distance traveled relative to the start of the intersection for scenario 1. (a) Input sequence length 5. (b) Input sequence length 15. (c) Input

sequence length 25.

proposed KCTN provides better performance than CTN. The pro-

posed KCTN achieves 99% accuracy earlier than other models for

all three sequence lengths (5, 15 and 25). The best results of the

model are produced with a sequence length of 15. The model

offers maximum lead time of 2.29 s, 1.52 s and 1.33 s for east,

north and south origin vehicles, respectively.

4.2. Scenario 2: Track classification based on vehicle turning behav-

ior in common frame of reference

The proposed KCTN is used for turn-based (turning behavior)

classification in this scenario. All vehicle tracks are marked as:

short left-turn, long right-turn and go-straight as presented in

O. Sharma, N.C. Sahoo and N.B. Puhan ISA Transactions 133 (2023) 13–28

Table 5

Lead distance and lead time comparison among various models for Scenario 1.

Input

sequence

length

LSTM[31] GRU Bi-LSTM[26] CNN TF CTN KCTN

Lead

distance

(m)

Eastern origin 9 9 9 9 10 11 11

Northern origin 9 8 9 8 9 10 11

Southern origin 3 2 4 3 5 5 5

Eastern origin 9 9 9 9 10 11 11

Northern origin 10 10 10 9 11 11 12

Southern origin 4 3 4 2 5 6 6

Eastern origin 9 8 9 8 10 11 11

Northern origin 9 9 9 8 10 11 12

Southern origin 3 3 3 3 5 6 6

Lead

time

(s)

Eastern origin 1.88 1.88 1.88 1.88 2.08 2.29 2.29

Northern origin 1.14 1.01 1.14 1.01 1.14 1.27 1.39

Southern origin 0.67 0.44 0.89 0.67 1.11 1.11 1.11

Eastern origin 1.88 1.88 1.88 1.88 2.08 2.29 2.29

Northern origin 1.27 1.27 1.27 1.14 1.39 1.39 1.52

Southern origin 0.89 0.67 0.89 0.44 1.11 1.33 1.33

Eastern origin 1.88 1.67 1.88 1.67 2.08 2.29 2.29

Northern origin 1.14 1.14 1.14 1.01 1.27 1.39 1.52

Southern origin 0.67 0.67 0.67 0.67 1.11 1.33 1.33

Fig. 7. Lead distance comparison among various models for Scenario 1.

Table 4. The shape of vehicle trajectories and heading profiles are

shown in Fig. 4. The experimental study in Section 4.1 found that

the proposed KCTN performance for input sequence length of 15

is optimal, taking prediction accuracy and computational com-

plexity into account. The comparative performance of proposed

model with respect to traveled distance is presented in Fig. 9

by using sequence length 15. The lead time is calculated when

the proposed KCTN reaches 99% accuracy as listed in Table 6. A

vehicle driving from east side can make a short left-turn (east-

south) or a long right-turn (east-north). The proposed KCTN offers

a lead time of 2.08 s with 99% accuracy. Due to similar shape

of east-south and north–south tracks over a distance of 11 m to

15 m, there is a loss in forecast accuracy at distance of 11 m. For

vehicles coming from north, the proposed KCTN offers a lead time

of 1.39 s. The vehicles coming from south can take long right-

turn (south-east) and straight ahead (south–north). The proposed

KCTN offers a lead time of 1.11s for south origin vehicles. These

findings are also graphically represented in Fig. 10.

Comparatively, the proposed KCTN achieves higher accuracy

earlier than other models and offers a bigger time window in

terms of lead time. For northern side coming vehicles, proposed

CTN and other models start decreasing their accuracies at 11 m

traveled distance. However, the proposed CTN performs well,

with only 1% accuracy drop as compared to other models. The

proposed KCTN can handle such a situation, and there is no loss

of accuracy at this range (11 m to 15 m). To overcome this issue

in CTN, all vehicle tracks are re-framed to make similar tracks

in a turning behavior-based labeled class. Further explanation is

presented in next Section 4.3.

4.3. Scenario 3: Track classification based on vehicle turning behav-

ior in entrance based frame of references

To re-frame vehicle tracks, three reference points (respective

to vehicle entry points at intersection) and three reference direc-

tions are considered to calculate relative X/Y position and heading

of the vehicle, respectively. For example, relative X/Y position of

O. Sharma, N.C. Sahoo and N.B. Puhan ISA Transactions 133 (2023) 13–28

Fig. 8. Lead time comparison among various models for Scenario 1.

Fig. 9. Accuracy vs distance traveled for turning behavior based classification in scenario 2.

Table 6

Lead time (s) comparison in Scenario 2.

LSTM GRU Bi-LSTM CNN TF CTN KCTN

[31] [26]

Eastern origin 1.67 1.88 1.88 1.67 2.08 2.08 2.08

North origin 0.89 1.14 1.14 1.14 1.27 1.27 1.39

Southern origin 0.44 0.89 0.89 0.89 1.11 1.11 1.11

east side coming vehicle is calculated by taking east side entrance

point as reference point and heading is calculated by taking trav-

eling direction towards west (east to west) as reference direction.

For north side coming vehicle, north side entry point is used as

reference point to calculate relative X/Y position of vehicle and

traveling direction towards south (north to south) is used as ref-

erence direction. Similar approach is used for south side coming

vehicles. Thus, east-south and north-east tracks are re-framed

to look similar to represent left-turn tracks. The east-north and

south-east tracks are re-framed to look similar to represent long

right-hand turn. The north–south and south–north tracks have

been re-framed to look similar to represent go-straight. Thus, all

vehicle tracks are labeled as short left-turn, long right-turn and

go-straight. The vehicle tracks and the heading profiles are shown

in Fig. 11.

Fig. 12 shows comparative performance of the proposed KCTN

to predict vehicle behavior. The lead time comparison in relation

to vehicle origin is shown in Table 7. The proposed KCTN offers

2.08 s lead time for east side coming vehicles. The proposed KCTN

coverage with 99% accuracy for north-side coming vehicles and

provides a lead time of 1.39 s. For south side coming vehicles,

the proposed KCTN offers a lead distance of 5 m and a lead

time of 1.11s. These findings are also graphically represented

in Fig. 13. The proposed model and other RNN-based models are

O. Sharma, N.C. Sahoo and N.B. Puhan ISA Transactions 133 (2023) 13–28

Fig. 10. Lead time comparison among various models for Scenario 2 (input sequence length = 15).

Fig. 11. (a) Average tracks of vehicles. (b) Heading profiles of traveling vehicles.

Table 7

Lead time (s) comparison in Scenario 3.

LSTM GRU Bi-LSTM CNN TF CTN KCTN

[31] [26]

Eastern origin 1.88 1.88 1.88 1.88 1.88 2.08 2.08

North origin 1.14 1.14 0.89 0.89 1.14 1.27 1.39

Southern origin 0.67 0.67 0.67 0.44 0.89 0.89 1.11

initially unable to achieve a high level of prediction accuracy due

to vehicle’s similar tracks and direction of travel for go-straight

and right-turn behavior on the route initially covered.

4.4. Remarks on overall performance

In the above studies, results show that the proposed KCTN

worked excellently in all three scenarios as illustrated in Sec-

tions 4.1,4.2 and 4.3. However, the authors stand with first

scenario as a preferable scenario based on accuracy and real life

application. The third scenario can be dropped as it offers less lead

time than first scenario. The second scenario may also be dropped

because of the decrease in accuracy for the vehicle coming from

east side at traveled distance 11 m, as shown in Fig. 9. Instead of

accuracy and lead time, other drawbacks exist in case of second

and third scenarios, if vehicle origin (entrance side) is not known

in real life implementation. At unsignalized roundabout, it is

possible that vehicle origin side may not be recorded due to the

lack of a vehicle management system or vehicle commutation.

To explain this, a target vehicle ‘K’ is shown in Fig. 14. In order

to predict behavior of vehicle ‘K’ in scenario 3, the model needs

a sequence of feature vectors which can only be calculated by

knowing entrance side of vehicle ‘K’. Thus, the model will not

be able to estimate driver turning behavior. In scenario 2, for

example, the model predicts that vehicle ‘K’ will turn right. Both

east-north and south-east tracks can be considered as right-turn

behavior; thus, after predicting right turn behavior, the model

O. Sharma, N.C. Sahoo and N.B. Puhan ISA Transactions 133 (2023) 13–28

Fig. 12. Accuracy vs distance traveled for turning behavior based classification in scenario 3.

Fig. 13. Lead time comparison among various models for Scenario 3 (input sequence length = 15).

cannot estimate the destination without knowing entrance side

of vehicle ‘K’.

Finally, conclusion of this situation assessment study shows

importance of the correct selection of the output class labeling

and importance of information on vehicle entrance side during

online model performance. Thus, the authors stand by

destination-based labeling selection.

The statistically significant differences between the proposed

and other models are investigated below to emphasize proposed

model’s effectiveness better. The approach used in this work is to

predict driver behavior and then consolidate the predictions over

whole dataset. Towards this purpose, McNemar’s Test (within-

subjects chi-squared test) has now been performed to compare

predictive accuracy of various models [67–69]. McNemar’s test is

Table 8

2×2 contingency table.

Model2correct Model2wrong

Model1correct NANB

Model1wrong NCND

based on a contingency table (2 ×2) that compares predictions

of two models, as shown in Table 8.

where,

NA= Number of instances where both models give correct pre-

dictions,

NB= Number of instances where Model1gives correct predictions

and Model2gives incorrect predictions,

O. Sharma, N.C. Sahoo and N.B. Puhan ISA Transactions 133 (2023) 13–28

Fig. 14. Example of a target vehicle at three-way roundabout.

Table 9

Behavior prediction accuracy and statistical analysis.

Eastern origin Northern origin Southern origin

(Traveled distance 1 m, (Traveled distance 2 m, (Traveled distance 16 m,

558 input samples) 559 input samples) 363 input samples)

Accuracy (%) χ2- value p-value Accuracy (%) χ2- value p-value Accuracy (%) χ2- value p-value

LSTM 92.1 40 ≃0 94.3 20.8 0.000005 97 6.1 0.0133

GRU 87.4 66 ≃0 92.5 34 ≃0 95 13 0.0003

Bi-LSTM 88.5 60 ≃0 90.5 45 ≃0 95.9 10 0.0015

CNN 89.9 52 ≃0 93.7 27 ≃0 92 24 0.000001

TF 96.6 15 0.0001 96.9 5.3 0.0218 97.8 2.3 0.1306

CTN 99.1 1.3 0.24 97.8 4.1 0.04 98.6 0.5 0.4795

KCTN 99.7 – – 98.9 – – 99.2 – –

NC= Number of instances where Model1gives incorrect predic-

tions and Model2gives correct predictions, and

ND= Number of instances where both Models give incorrect

predictions.

McNemar’s test statistic (‘‘chi-squared’’) is computed as fol-

lows:

χ2=(|NB−NC| − 1)2

(NB+NC)(21)

A statistical measure known as ‘p-value’ is also determined in

addition to chi-squared value, where p-value indicates probability

of an empirically larger chi-squared value. A high chi-squared

value or a low p-value indicates that there is a significant statisti-

cal performance difference between two models. The probability

density function (pdf) of chi-squared distribution (also chi-square

or χ2-distribution) is a special case of gamma distribution, which

is one of the most extensively used probability distributions in

hypothesis testing. The chi-squared distribution does not well

approximate the p-value if either NBor NCis small (i.e., NB+

NC<25). The exact p-value is then calculated using a binomial

distribution. If p-value is lower than chosen significance level,

the null hypothesis that performances of two models are equal

is rejected. Considering the significance level (probability of the

study rejecting null hypothesis) of 5%, the null hypothesis that

two models’ performances are equal is rejected. The respective

χ2and p-values are also mentioned in Table 9.

As reported in Table 5, the results show that both the proposed

networks (CTN and KCTN) outperform current state-of-the-art

models. However, the proposed KCTN provides better perfor-

mance than CTN. The proposed KCTN achieves 99% accuracy

earlier than other models for all three sequence lengths (5, 15,

and 25). The best results of the model are produced with a

sequence length of 15 for vehicle destination-based classification.

Table 10

Computing time (millisecond) comparison between models.

Sequence LSTM GRU Bi-LSTM CNN TF CTN KCTN

length [31] [26]

5 3.9 3 7.7 1.2 6.2 3.8 3.9

15 7.8 6.9 16.6 1.5 6.8 5.1 5.3

25 12.5 10.9 24.6 1.7 7.9 6.5 6.9

Thus contingency table is formulated for all models with respect

to KCTN.

The chi-squared values represent statistical differences in the

models’ performances. The significant statistical differences are

indicated by high chi-squared values. To determine statistically

significant difference precisely, p-values can also be compared

with chosen significance level (α=0.05). Except for CTN (Eastern

Origin), all models have p-values less than 0.05 (α), as shown in

Table A4. Accordingly, the null hypothesis is rejected and statisti-

cally significant differences between the models are established.

The proposed model training is carried out at a learning rate

of 0.0001 with a batch size of 100. The execution time to predict

driver intention for a single input sample is listed in Table 10.

As shown in Table 10, length of input sequence has very little

effect on computation time for CNN. However, performance of

CNN is not satisfactory in terms of lead distance and lead time.

In comparison to RNN-based models (LSTM, GRU and Bi-LSTM),

computation times of TF-based models (TF, CTN, and KCTN) are

relatively less affected by varying input sequence lengths (5 to

25). The computation time of generic TF is less influenced by in-

put sequence length, but its behavior prediction accuracy is lower

than the proposed CTN and KCTN. As a result, proposed models

O. Sharma, N.C. Sahoo and N.B. Puhan ISA Transactions 133 (2023) 13–28

are relatively less influenced by input sequence length as com-

pared to RNN-based models. An Intel(R) Xeon(R) CPU E5-2643 V4

based desktop computer is used for all these measurements.

There are some limitations of this study which demonstrate

that the proposed TF-based model is effective for driver be-

havior modeling (DBM) for conflict resolution at unsignalized

roundabout. However, this model requires more data samples

for training. The robustness of the proposed model needs to be

further evaluated and analyzed in a noisy environment where

accurate information on location, velocity, and heading angle

may not be available. In this work, the host vehicle’s behavior

is predicted by considering its past trajectory without taking

surrounding vehicle’s trajectory information into account. Hence,

the model remains insensitive to speed of surrounding objects. A

new study involving the dynamics of surrounding objects will be

taken up in future work.

5. Conclusion

In this paper, a new CTN for predicting driver intent is pro-

posed to support ADAS in estimating future position of road users

at an urban, single-lane roundabout. The proposed KCTN model’s

performance is assessed based on vehicle’s destination and turn-

ing behavior. The performance of the proposed network is su-

perior in both cases, but destination-based classification takes

precedence over turning behavior-based classification. The exper-

iments are conducted with a real-world dataset and results show

that the proposed model is capable of competitive performance

with current state-of-the-art models, and only a small (0.6 s)

amount of observation data is required.

In destination-based behavior prediction scenario, the pro-

posed KCTN outperforms other models and provides 11 m, 12 m

and 6 m lead distance for eastern, northern and southern origin

vehicles, respectively. Based on average vehicle speed, the pro-

posed model can provide lead times of 2.29 s, 1.52 s and 1.33 s

for eastern, northern and southern origin vehicles, respectively,

which can avoid possible conflict situations. In turning-based

behavior prediction scenario, the proposed KCTN provides lead

times of 2.08 s, 1.39 s and 1.11 s for eastern, northern and

southern origin vehicles respectively, which are higher than other

models.

This research shows that TF performs well enough for time

series challenges like prediction and classification. According to

performance results, TF is efficient in terms of computation time.

It can avoid gradient vanishing problem in event of long sequen-

tial input since it accepts entire sequential input sequence at once

rather than sequentially as RNN-based models do. It is necessary

to know driving behavior of all adjacent traffic participants in

order to generate their future trajectories, which are then used

to estimate collision risk in collision avoidance block of ADAS. In

future work, the proposed model will be employed as a behavior

predictor model, and collision estimation will be done based on

predictions to navigate through intersection safely.

Declaration of competing interest

The authors declare that they have no known competing finan-

cial interests or personal relationships that could have appeared

to influence the work reported in this paper.

Acknowledgments

The research carried out in this paper is supported by KPIT

Technologies Ltd., Bangalore, India, under the research grant for

the project entitled ‘‘Driver Behavior Modelling for Autonomous

Driving’’.

References

[1] Masi S, Xu P, Bonnifait P. Roundabout crossing with interval occupancy

and virtual instances of road users. IEEE Trans Intell Transp Syst 2020.

[2] Sharma O, Sahoo NC, Puhan NB. Recent advances in motion and behavior

planning techniques for software architecture of autonomous vehicles: A

state-of-the-art survey. Eng Appl Artif Intell 2021;101:104211.

[3] Sharma O, Sahoo NC, B. Puhan N. A survey on smooth path generation

techniques for nonholonomic autonomous vehicle systems. In: IECON

2019-45th annual conference of the IEEE industrial electronics society. Vol.

1. IEEE; 2019, p. 5167–72.

[4] Yao R, Zeng W, Chen Y, He Z. A deep learning framework for modelling

left-turning vehicle behaviour considering diagonal-crossing motorcycle

conflicts at mixed-flow intersections. Transp Res C 2021;132:103415.

[5] Schindler R, Piccinini GB. Truck drivers’ behavior in encounters with vul-

nerable road users at intersections: Results from a test-track experiment.

Accid Anal Prev 2021;159:106289.

[6] Awad N, Lasheen A, Elnaggar M, Kamel A. Model predictive control with

fuzzy logic switching for path tracking of autonomous vehicles. ISA Trans

2021.

[7] Gu W, Cai S, Hu Y, Zhang H, Chen H. Trajectory planning and tracking

control of a ground mobile robot:A reconstruction approach towards space

vehicle. ISA Trans 2019;87:116–28.

[8] Sharma O, Sahoo N, Puhan N. Highway discretionary lane changing

behavior recognition using continuous and discrete hidden Markov model.

In: 2021 IEEE international intelligent transportation systems conference.

IEEE; 2021, p. 1476–81.

[9] Yang S, Wang W, Jiang Y, Wu J, Zhang S, Deng W. What contributes to

driving behavior prediction at unsignalized intersections? Transp Res C

2019;108:100–14.

[10] Kazemi R, Abdollahzade M. Introducing an evolving local neuro-fuzzy

model – application to modeling of car-following behavior. ISA Trans

2015;59:375–84.

[11] Wang H, Gu M, Wu S, Wang C. A driver’s car-following behavior pre-

diction model based on multi-sensors data. EURASIP J Wireless Commun

Networking 2020;2020(1):1–12.

[12] Paden B, Čáp M, Yong SZ, Yershov D, Frazzoli E. A survey of motion

planning and control techniques for self-driving urban vehicles. IEEE Trans

Intell Veh 2016;1(1):33–55.

[13] Zhou D, Ma Z, Sun J. Autonomous vehicles’ turning motion planning

for conflict areas at mixed-flow intersections. IEEE Trans Intell Veh

2019;5(2):204–16.

[14] Li S, Yang L, Gao Z, Li K. Stabilization strategies of a general nonlinear

car-following model with varying reaction-time delay of the drivers. ISA

Trans 2014;53(6):1739–45.

[15] Amsalu SB, Homaifar A. Driver behavior modeling near intersections

using hidden Markov model based on genetic algorithm. In: 2016 IEEE

international conference on intelligent transportation engineering. IEEE;

2016, p. 193–200.

[16] Sharma O, Sahoo NC, Puhan NB. Highway lane-changing prediction using

a hierarchical software architecture based on support vector machine and

continuous hidden Markov model. Int J Intell Transp Syst Res 2022.

[17] Aoude GS, Desaraju VR, Stephens LH, How J. Driver behavior classification

at intersections and validation on large naturalistic data set. IEEE Trans

Intell Transp Syst 2012;13(2):724–36.

[18] Jain A, Singh A, Koppula HS, Soh S, Saxena A. Recurrent neural networks

for driver activity anticipation via sensory-fusion architecture. In: 2016

IEEE international conference on robotics and automation. IEEE; 2016, p.

3118–25.

[19] Hurwitz DS, Wang H, Knodler Jr. MA, Ni D, Moore D. Fuzzy sets to describe

driver behavior in the dilemma zone of high-speed signalized intersections.

Transp Res F Traffic Psychol Behav 2012;15(2):132–43.

[20] Belkhouche F. Collaboration and optimal conflict resolution at an unsignal-

ized intersection. IEEE Trans Intell Transp Syst 2018;20(6):2301–12.

[21] Mozaffari S, Al-Jarrah OY, Dianati M, Jennings P, Mouzakitis A. Deep

learning-based vehicle behavior prediction for autonomous driving

applications: A review. IEEE Trans Intell Transp Syst 2020.

[22] Jeong Y, Kim S, Yi K. Surround vehicle motion prediction using LSTM-

RNN for motion planning of autonomous vehicles at multi-lane turn

intersections. IEEE Open J Intell Transp Syst 2020;1:2–14.

[23] Zyner A, Worrall S, Ward J, Nebot E. Long short term memory for driver

intent prediction. In: 2017 IEEE intelligent vehicles symposium. IEEE; 2017,

p. 1484–9.

[24] Phillips DJ, Wheeler TA, Kochenderfer MJ. Generalizable intention predic-

tion of human drivers at intersections. In: 2017 IEEE intelligent vehicles

symposium. IEEE; 2017, p. 1665–70.

[25] Zhang T, Song W, Fu M, Yang Y, Wang M. Vehicle motion prediction at

intersections based on the turning intention and prior trajectories model.

IEEE/CAA J Autom Sin 2021.

O. Sharma, N.C. Sahoo and N.B. Puhan ISA Transactions 133 (2023) 13–28

[26] Zhang H, Fu R. A hybrid approach for turning intention prediction based

on time series forecasting and deep learning. Sensors 2020;20(17):4887.

[27] Liu Y, Zhao P, Qin D, Li G, Chen Z, Zhang Y. Driving intention identification

based on long short-term memory and a case study in shifting strategy

optimization. IEEE Access 2019;7:128593–605.

[28] Tian R, Li S, Li N, Kolmanovsky I, Girard A, Yildiz Y. Adaptive game-

theoretic decision making for autonomous vehicle control at roundabouts.

In: 2018 IEEE conference on decision and control. IEEE; 2018, p. 321–6.

[29] Masi S, Xu P, Bonnifait P. A curvilinear decision method for two-lane

roundabout crossing and its validation under realistic traffic flow. In: 2020

IEEE intelligent vehicles symposium. IEEE; 2020, p. 1290–6.

[30] Rodrigues M, McGordon A, Gest G, Marco J. Autonomous navigation in

interaction-based environments—a case of non-signalized roundabouts.

IEEE Trans Intell Veh 2018;3(4):425–38.

[31] Zyner A, Worrall S, Nebot E. A recurrent neural network solution for

predicting driver intention at unsignalized intersections. IEEE Robot Autom

Lett 2018;3(3):1759–64.

[32] Zyner A, Worrall S, Nebot EM. Acfr five roundabouts dataset: Natural-

istic driving at unsignalized intersections. IEEE Intell Transp Syst Mag

2019;11(4):8–18.

[33] Zyner A, Worrall S, Nebot E. Naturalistic driver intention and path

prediction using recurrent neural networks. IEEE Trans Intell Transp Syst

2019;21(4):1584–94.

[34] Zhang Y, Gao B, Guo L, Guo H, Chen H. Adaptive decision-making for auto-

mated vehicles under roundabout scenarios using optimization embedded

reinforcement learning. IEEE Trans Neural Netw Learn Syst 2020.

[35] Xiang S, Qin Y, Zhu C, Wang Y, Chen H. Lstm networks based on

attention ordered neurons for gear remaining life prediction. ISA Trans

2020;106:343–54.

[36] Bai S, Kolter JZ, Koltun V. An empirical evaluation of generic convolutional

and recurrent networks for sequence modeling. 2018, arXiv preprint arXiv:

1803.01271.

[37] Luo W, Yang B, Urtasun R. Fast and furious: Real time end-to-end 3d

detection, tracking and motion forecasting with a single convolutional net.

In: Proceedings of the IEEE conference on computer vision and pattern

recognition. 2018, p. 3569–77.

[38] Schöller C, Aravantinos V, Lay F, Knoll A. What the constant velocity model

can teach us about pedestrian motion prediction. IEEE Robot Autom Lett

2020;5(2):1696–703.

[39] Becker S, Hug R, Hübner W, Arens M. An evaluation of trajectory prediction

approaches and notes on the trajnet benchmark. 2018, arXiv preprint

arXiv:1805.07663.

[40] Becker S, Hug R, Hubner W, Arens M. Red: A simple but effective baseline

predictor for the trajnet benchmark. In: Proceedings of the European

conference on computer vision (ECCV) workshops. 2018.

[41] Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, et al.

Attention is all you need. In: Advances in neural information processing

systems. 2017, p. 5998–6008.

[42] Giuliari F, Hasan I, Cristani M, Galasso F. Transformer networks for

trajectory forecasting. In: 2020 25th International conference on pattern

recognition. IEEE; 2021, p. 10335–42.

[43] Liu Y, Zhang J, Fang L, Jiang Q, Zhou B. Multimodal motion prediction

with stacked transformers. In: Proceedings of the IEEE/CVF conference on

computer vision and pattern recognition. 2021, p. 7577–86.

[44] Yuan Y, Weng X, Ou Y, Kitani K. Agentformer: Agent-aware transformers

for socio-temporal multi-agent forecasting. 2021, arXiv preprint arXiv:

2103.14023.

[45] Devlin J, Chang M-W, Lee K, Toutanova K. Bert: Pre-training of deep

bidirectional transformers for language understanding. 2018, arXiv preprint

arXiv:1810.04805.

[46] Zhao Y, Ding B, Zhang Y, Yang L, Hao X. Online cement clinker quality

monitoring: A soft sensor model based on multivariate time series analysis

and CNN. ISA Trans 2021;117:180–95.

[47] Han T, Liu C, Yang W, Jiang D. Learning transferable features in deep

convolutional neural networks for diagnosing unseen machine conditions.

ISA Trans 2019;93:341–53.

[48] Fang C, He D, Li K, Liu Y, Wang F. Image-based thickener mud layer height

prediction with attention mechanism-based CNN. ISA Trans 2021.

[49] Wang H, Liu Z, Peng D, Cheng Z. Attention-guided joint learning CNN with

noise robustness for bearing fault diagnosis and vibration signal denoising.

ISA Trans 2021.

[50] Zoumpourlis G, Doumanoglou A, Vretos N, Daras P. Non-linear convolution

filters for cnn-based learning. In: Proceedings of the IEEE international

conference on computer vision. 2017, p. 4761–9.

[51] LeCun Y, Bottou L, Bengio Y, Haffner P. Gradient-based learning applied to

document recognition. Proc IEEE 1998;86(11):2278–324.

[52] Zhang Y, Li J, Guo Y, Xu C, Bao J, Song Y. Vehicle driving behavior

recognition based on multi-view convolutional neural network with joint

data augmentation. IEEE Trans Veh Technol 2019;68(5):4223–34.

[53] Izquierdo R, Quintanar A, Parra I, Fernández-Llorca D, Sotelo M. Ex-

perimental validation of lane-change intention prediction methodologies

based on CNN and LSTM. In: 2019 IEEE intelligent transportation systems

conference. IEEE; 2019, p. 3657–62.

[54] Izquierdo R, Quintanar A, Parra I, Fernandez-Llorca D, Sotelo MA. The

prevention dataset, . 2019, https://prevention-dataset.uah.es.

[55] Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T,

et al. An image is worth 16x16 words: Transformers for image recognition

at scale. 2020, arXiv preprint arXiv:2010.11929.

[56] Pandey P, Deepthi A, Mandal B, Puhan NB. FoodNet: Recognizing

foods using ensemble of deep networks. IEEE Signal Process Lett

2017;24(12):1758–62.

[57] Mishra SS, Mandal B, Puhan NB. Multi-level dual-attention based CNN for

macular optical coherence tomography classification. IEEE Signal Process

Lett 2019;26(12):1793–7.

[58] Henriques JF, Caseiro R, Martins P, Batista J. High-speed tracking

with kernelized correlation filters. IEEE Trans Pattern Anal Mach Intell

2014;37(3):583–96.

[59] Cui Y, Zhou F, Wang J, Liu X, Lin Y, Belongie S. Kernel pooling for

convolutional neural networks. In: Proceedings of the IEEE conference on

computer vision and pattern recognition. 2017, p. 2921–30.

[60] Blondel M, Ishihata M, Fujino A, Ueda N. Polynomial networks and

factorization machines: New insights and efficient training algorithms. In:

International conference on machine learning. PMLR; 2016, p. 850–8.

[61] Mairal J, Koniusz P, Harchaoui Z, Schmid C. Convolutional kernel networks.

Adv Neural Inf Process Syst 2014;27:2627–35.

[62] Wang C, Yang J, Xie L, Yuan J. Kervolutional neural networks. In: Pro-

ceedings of the IEEE/CVF conference on computer vision and pattern

recognition. 2019, p. 31–40.

[63] Marsi S, Bhattacharya J, Molina R, Ramponi G. A non-linear convolution

network for image processing. Electronics 2021;10(2):201.

[64] Xu K, Wang C, Chen C, Wu W, Scherer S. AirCode: A robust object encoding

method. IEEE Robot Autom Lett 2022.

[65] Tang L, Xuan J, Shi T, Zhang Q. EnvelopeNet: A robust convolutional neural

network with optimal kernels for intelligent fault diagnosis of rolling

bearings. Measurement 2021;180:109563.

[66] Abdallah HB, Henry CJ, Ramanna S. 1-dimensional polynomial neu-

ral networks for audio signal related problems. Knowl-Based Syst

2022;108174.

[67] McNemar Q. Note on the sampling error of the difference be-

tween correlated proportions or percentages. Psychometrika 1947;12(2):

153–7.

[68] Edwards AL. Note on the ‘‘correction for continuity’’ in testing the sig-

nificance of the difference between correlated proportions. Psychometrika

1948;13(3):185–7.

[69] Dietterich TG. Approximate statistical tests for comparing supervised

classification learning algorithms. Neural Comput 1998;10(7):1895–923.

Minimization of Forecast Error Using Deep Learning for Real-Time Heavy Rainfall Events Over Assam

Article

Jan 2024

Predicting Heavy Rainfall Events (HREs) with lead time poses a significant challenge for meteorological agencies, especially in mountainous regions like Assam. In this study, we simulated a real-time HRE that occurred between June 13 and 17, 2023, resulting in severe flooding in Assam. To enhance rainfall prediction, we integrated output from the Weather Research and Forecasting (WRF) model into a Deep Learning (DL) model. When comparing the district-level performance of WRF and DL models, it becomes evident that the DL model excels in capturing HREs with a significant accuracy of 54.4%, outperforming WRF’s accuracy of only 22.8%. The proposed model demonstrates a mean absolute error (MAE) of under 30 mm, outperforming WRF’s more than 50 mm MAE for Days 2-4, as compared with the India Meteorological Department (IMD). Remarkably, the DL model accurately represents rainfall intensity and magnitude in the western and southern parts of Assam. This study is the first of its kind to focus on a district-scale analysis in Assam.

Transformer based composite network for autonomous driving trajectory prediction on multi-lane highways

Article

Full-text available

Apr 2024
APPL INTELL

In order to navigate through complex traffic scenarios safely and efficiently, the autonomous vehicle (AV) predicts its own behavior and future trajectory based on the predicted trajectories of surrounding vehicles to avoid potential collisions. Further, the predicted trajectories of surrounding vehicles (target vehicles) are greatly influenced by their driving behavior and prior trajectory. In this article, we propose a novel Transformer-based composite network to predict both driver behavior and future trajectory of a target vehicle in a highway driving scenario. The powerful multi-head attention mechanism of the transformer is exploited to extract social-temporal interaction between target vehicle and its surrounding vehicles. The prediction of both lateral and longitudinal behavior is carried out within the behavior prediction module, and this additional information is further utilized by the trajectory predictor module to ensure precise trajectory prediction. Furthermore, mixture density network is augmented in the model to handle uncertainties in the predicted trajectories. The proposed model’s performance is compared with several state-of-the-art models on real-world Next Generation Simulation (NGSIM) dataset. The results indicate the superiority of the proposed model over all contemporary state-of-the-art models, as evaluated using Root Mean Square Error (RMSE) metric. The proposed model predicts a 5s long trajectory with an 11% lower RMSE than the state-of-the-art model.

Spatio-attention-based network to improve heavy rainfall prediction over the complex terrain of Assam

Article

Full-text available

Apr 2024
NEURAL COMPUT APPL

Heavy rainfall events prediction at the local scale imposes a big challenge for meteorological agencies over the complex terrain areas in India such as Assam, Uttarakhand, and Himachal Pradesh and causes flash floods with severe consequences throughout the area causing a huge socio-economical loss over these regions. Assam is currently experiencing severe flooding in June 2023. Due to the limits of deterministic numerical weather models in accurately forecasting these events, this work investigates the incorporation of deep learning (DL) models, particularly spatial attention-based U-Net, using simulated daily collected rainfall outputs from various parametrization schemes. This is a pioneering effort to improve district-scale rainfall using the spatio-attention U-Net DL method, particularly over the orographically complex region such as Assam. The proposed model outperformed individual and ensemble Weather Research and Forecasting (WRF) model outputs over four days in June 2022, demonstrating greater abilities to forecast rainfall at the district scale with a mean absolute error of less than 10 mm. Additionally, the proposed model considerably outperformed WRF models by 51.3% in categorical rainfall prediction, achieving a high prediction accuracy of 91.9%. Furthermore, the proposed model has demonstrated improved spatial variation as compared to the WRF model by correctly predicting severe rainfall occurrences at the district scale, including Barpeta, Kamrup, Kokrajhar, and Nalbari. The WRF projections regularly underestimated rainfall intensity (< 100 mm), whereas the DL model's estimates matched actual rainfall readings from the India Meteorological Department (> 150 mm). On the quantitative estimation of rainfall thresholds using different skill scores, Equitable threat score values are more than 0.5 for all categories for the proposed model. In a nutshell, the findings of the study have direct implications for improving early warning systems and associated follow-up action in terms of developing efficient strategies toward better preparedness, mitigation, and adaptation measures over complex hilly regions to reduce loss of lives and properties.

A CNN and Multi-Head Attention-Based Deep Learning Network for Trajectory Prediction of Autonomous Vehicles on Multi-Lane Highways

Conference Paper

Full-text available

Oct 2023

—The autonomous vehicle uses the expected trajectories of nearby vehicles to anticipate its own actions and path, ensuring safe and efficient navigation in complex traffic scenarios. The most influential factors in determining the future trajectory of the target vehicle are its past trajectory and movements. This research introduces a novel approach that combines a Convolutional Neural Network (CNN) and a multi-head attention-based network to predict the trajectory of autonomous vehicles on multi-lane highways. The CNN is employed to extract various time-varying features, whereas the Transformer’s multi-head attention (MHA) effectively captures the space-time interactions between the target and its neighbouring vehicles. Using the NGSIM dataset, the proposed model’s performance is assessed, and compared with sequential models built using recurrent neural networks (RNNs). The results demonstrate that the proposed model outperforms others models by achieving a 10% reduction in Root Mean Square Error (RMSE) for predicting trajectories over 5 seconds duration.

A Novel Lane-Change Decision-Making With Long-Time Trajectory Prediction for Autonomous Vehicle

Article

Full-text available

Jan 2023

In the process of autonomous vehicle lane changing, a reliable decision-making system is crucial for driving safety and comfort. However, traditional decision-making systems have short-term characteristics, which makes them susceptible to real-time inference from surrounding vehicles. Usually, system sacrifices driving comfort to ensure the safety of the lane change. Balancing driving safety and comfort has always been a research challenge. Long-term trajectory prediction can provide accurate future trajectories of target vehicles, providing reliable long-term information to compensate for the short-term variability of decision systems. This paper proposes a novel decision-making model with long-term trajectory prediction for lane-changing. First, we constructed a long-term trajectory prediction model to predict the trajectories of surrounding vehicles. Besides, we built a lane change decision-making model based on fuzzy inferencing, considering the predicted trajectories to infer the relative relationship between other vehicles and the self-driving car. The establishment of the fuzzy rule library considered the vehicle speed, acceleration, system delay time, driver delay time and the distance between vehicles. Finally, we created a dataset for training and testing the trajectory prediction model, and we built 4 cases simulation environments, for two or three vehicles on a straight road or curved road, respectively, to test the decision-making model. Experimental results show that our proposed model can ensure driving safety and improve driving comfort.

Improving rainfall forecast at the district scale over the eastern Indian region using deep neural network

Article

Full-text available

Nov 2023
THEOR APPL CLIMATOL

Indian Summer Monsoon (ISM) rainfall is largely contributed by synoptic scale low-pressure systems over the Bay of Bengal and moves towards Indian landmass through eastern Indian states such as Odisha. These systems often cause heavy to very heavy rainfall localized events. The prediction of these events with high accuracy is still a major challenge for deterministic weather models. For the first time, this study has used machine learning (ML) and deep learning (DL) methods to improve the rainfall forecast using Weather Research and Forecast (WRF) forecasts output up to a lead time of 96 h (day 4) at the district scale of Odisha. Our findings demonstrate that the ML model improves the cumulative rainfall forecast (> 70%) but not more than the DL (multilayer perceptron (MLP) and convolutional neural network (CNN)) models, i.e., > 80%. Overall, on average, the DL model improved the rainfall prediction accuracy by 14% compared to ML models and 16% compared to the WRF model respectively. Results suggested that the CNN predicts rainfall with more than 70% for heavy and very heavy rainfall events for all days. It is also noted that WRF microphysics schemes are biased towards light rainfall class and the same has been effectively corrected by DL models. Furthermore, CNN shows promising results with more than 80% percent accuracy in forecasting rainfall for heavy rainfall events at the district scale. The inclusion of DL models in the Numerical Weather Prediction (NWP) model forecast output convincingly enhances the prediction skills. The findings of this study are highly significant for operational agencies, and disaster managers for effective planning, management, and preparedness at the district scale.

Dynamic Planning of Optimally Safe Lane-change Trajectory for Autonomous Driving on Multi-lane Highways Using a Fuzzy Logic–based Collision Estimator

Article

Full-text available

Nov 2023

The collision avoidance system in an autonomous vehicle, intended to address traffic safety issues, has a crucial function called collision estimation. It accomplishes this by identifying potential dangers and notifying the drivers in advance or by using autonomous control to navigate safely. In this work, a novel approach is proposed for generating and selecting a lane change trajectory for the vehicle in a driving scenario where two vehicles are simultaneously executing lane change processes on highways and approaching the same target lane. Moreover, a novel fuzzy logic estimator based on time-to-collision (TTC) and time-to-gap (TTG) is designed to estimate the collision risk. In the collision avoidance process, the proposed estimator is utilized to determine the risk of a collision with polynomial function-based generation of possible lane change trajectories. The safest lane change trajectory is then provided to the motion controller so that it can navigate the vehicle safely through such a challenging lane change scenario. This work also investigates Stanley and Pure Pursuit controllers to follow the optimized trajectory. The simulation experiment results demonstrate that the proposed approach for dynamic trajectory generation during the lane change process can successfully handle this type of challenging situation and prevent a potential collision. Experimental results also indicate that monitoring the movement of the nearby lane-changing vehicle is crucial for safe lane change execution and that the proposed approach successfully handles the challenging situation preventing potential collision.

Improvement in District Scale Heavy Rainfall Prediction Over Complex Terrain of North East India Using Deep Learning

Article

Full-text available

Oct 2023
IEEE T GEOSCI REMOTE

Predicting heavy rainfall events (HREs) in real-time poses a significant challenge in India, particularly in complex terrain regions like Assam, where these hydro-meteorological events are frequently associated with flash floods with severe consequences over the region. The devastating HREs in June 2022 led to numerous casualties, extensive damage, and economic losses exceeding 200 crores, necessitating the evacuation of over 4 million individuals. Even recently, June 2023, Assam went through immense flooding situation. Due to the limitations of deterministic numerical weather models in accurately forecasting these events, the study explores the incorporation of deep learning (DL) models, specifically U-Nets, using simulated daily accumulated rainfall outputs from various parameterization schemes. Over a four-day period in June 2022, the U-Net-based model demonstrated superior skills in predicting rainfall at the district scale, achieving a mean absolute error (MAE) of less than 12 mm, outperforming individual and ensemble model outputs. Comparing the DL model’s performance to the weather research and forecasting (WRF) forecasts, it exhibited a remarkable 64.78% reduction in MAE across Assam. Notably, the proposed model accurately predicted HREs in specific districts such as Barpeta, Kamrup, Kokrajhar, and Nalbari, showcasing improved spatial variation compared with the WRF model. The DL model’s predictions aligned with actual rainfall ( ${>} 150$ mm) observations from the India Meteorological Department (IMD), while the WRF forecasts consistently underestimated rainfall intensity ( ${< }100$ mm). Furthermore, the proposed model achieved a high prediction accuracy of 77.9% in categorical rainfall prediction, significantly outperforming the WRF schemes by 38.1%.

Autonomous Vehicle Trajectory Prediction on Multi-Lane Highways Using Attention Based Model

Conference Paper

Full-text available

Aug 2023

The autonomous vehicle anticipates its own behaviour and future trajectory based on the expected trajectories of surrounding vehicles to prevent a potential collision in order to navigate through complex traffic scenarios safely and effectively. The estimated trajectories of surrounding vehicles (target vehicles) are also influenced by past trajectory and positions of its surroundings. In this study, a novel Transformer-based network is used to predict autonomous vehicle trajectory in highway driving. Transformer’s multi-head attention method is employed to capture social-temporal interaction between the target vehicle and its surroundings. The performance of the proposed model is compared with Recurrent Neural Network (RNN) based sequential models, using the NGSIM dataset. The results show that the proposed model predicts 5s long trajectory with 10% lower Root-Mean-Square Error (RMSE) than the RNN-based state-of-the-art model.

A Generalized Driving Risk Assessment on High-Speed Highways Using Field Theory

Article

Sep 2023

AgentFormer: Agent-Aware Transformers for Socio-Temporal Multi-Agent Forecasting

Conference Paper

Full-text available

Oct 2021

Highway Discretionary Lane Changing Behavior Recognition Using Continuous and Discrete Hidden Markov Model

Conference Paper

Full-text available

Sep 2021

Under the Hood of Transformer Networks for Trajectory Forecasting

Article

Feb 2023
PATTERN RECOGN

Highway Lane-Changing Prediction Using a Hierarchical Software Architecture based on Support Vector Machine and Continuous Hidden Markov Model

Article

Jun 2022

Lane changing behavior is one of the most essential and complex driving attributes. The lack of proper lane changing behavior can lead to collisions and traffic congestion. In this work, a novel hierarchical software architecture for the prediction of lane changing behavior on highways has been developed and evaluated. The two-layer hierarchical structure of the proposed model is based on a support vector machine (SVM) in the first layer followed by another model based on continuous Hidden Markov Model (HMM) incorporated with a Gaussian Mixture Model (GMM) in the second layer. The trajectory classification predicted in the first layer by the SVM is binary, i.e., Lane Change (LC) and Lane Keep (LK) behaviors. The second layer of the software architecture further classifies the LC behavior output of the first layer to left-lane change (LLC) and right-lane change (RLC) behaviors using the model of continuous HMM (CHMM) incorporated with GMM. The developed model has been evaluated using the real-world dataset of U.S. Highway 101 and Interstate 80 from Federal Highway Administration’s Next Generation Simulation (NGSIM). The first layer prediction is performed within an approximately 10 seconds time window. The positions, velocity and Time to Collision (TTC) of the target and surrounding vehicles are taken as input parameters in the model execution of the second layer. The test results show that the proposed hierarchical model exhibits 91% accuracy for LLC, 87% accuracy for RLC and 99% accuracy for LK behaviors. This model can be effectively used as a lane changing suggestion system in the advanced driver assistance systems (ADAS).

1-Dimensional Polynomial Neural Networks for audio signal related problems

Article

Jan 2022
KNOWL-BASED SYST

In addition to being extremely non-linear, modern problems require millions if not billions of parameters to solve or at least to get a good approximation of the solution, and neural networks are known to assimilate that complexity by deepening and widening their topology in order to increase the level of non-linearity needed for a better approximation. However, compact topologies are always preferred to deeper ones as they offer the advantage of using less computational units and less parameters. This compacity comes at the price of reduced non-linearity and thus, of limited solution search space. We propose the 1-Dimensional Polynomial Neural Network (1DPNN) model that uses automatic polynomial kernel estimation for 1-Dimensional Convolutional Neural Networks (1DCNNs) and that introduces a high degree of non-linearity from the first layer which can compensate the need for deep and/or wide topologies. We show that this non-linearity enables the model to yield better results with less computational and spatial complexity than a regular 1DCNN on various classification and regression problems related to audio signals, even though it introduces more computational and spatial complexity on a neuronal level. The experiments were conducted on three publicly available datasets and demonstrate that, on the problems that were tackled, the proposed model can extract more relevant information from the data than a 1DCNN in less time and with less memory.

AirCode: A Robust Object Encoding Method

Article

Jan 2022

Object encoding and identification is crucial for many robotic tasks such as autonomous exploration and semantic relocalization. Existing works heavily rely on the tracking of detected objects but have difficulty to recall revisited objects precisely. In this paper, we propose a novel object encoding method, which is named as AirCode, based on a graph of key-points. To be robust to the number of key-points detected, we propose a feature sparse encoding and object dense encoding method to ensure that each key-point can only affect a small part of the object descriptors, leading it to be robust to viewpoint changes, scaling, occlusion, and even object deformation. In the experiments, we show that it achieves superior performance for object identification than the state-of-the art algorithms and is able to provide reliable semantic relocalization. It is a plug-and-play module and we expect that it will play an important role in various applications.

Model predictive control with fuzzy logic switching for path tracking of autonomous vehicles

Article

Dec 2021
ISA T

This paper introduces an integrated path tracking control strategy for autonomous vehicles. The proposed control strategy is based on a multi-input multi-output linear model predictive control (LMPC) with a fuzzy logic switching system. The designed MPC is based on Laguerre networks. The main target of the designed MPC is to produce the optimal control signals of the steering angle and the angular velocity while considering the physical constraints of the control signals and the measurements noise. Since the vehicle model is highly nonlinear and is operated over a wide range of operating points, different linearized models are obtained. The controller parameters for each linear model are designed and tuned. The gab metric analysis is used to select a number of these models to simplify the design of the proposed controller. Then, these models are combined using a fuzzy logic controller to switch between them. To test the proposed controller performance, different paths are generated using path planning algorithms. These paths simulate different vehicle maneuvers scenarios. The simulation results show that the designed tracking controller has a tracking performance on different designed paths better than that of a Linear quadratic gaussian (LQG) tracking controller, discussed in this paper.

Attention-guided joint learning CNN with noise robustness for bearing fault diagnosis and vibration signal denoising

Article

Dec 2021
ISA T

Mechanical system usually operates in harsh environments, and the monitored vibration signal faces substantial noise interference, which brings great challenges to the robust fault diagnosis. This paper proposes a novel attention-guided joint learning convolutional neural network (JL-CNN) for mechanical equipment condition monitoring. Fault diagnosis task (FD-Task) and signal denoising task (SD-Task) are integrated into an end-to-end CNN architecture, achieving good noise robustness through dual-task joint learning. JL-CNN mainly includes a joint feature encoding network and two attention-based encoder networks. This architecture allows FD-Task and SD-Task can achieve deep cooperation and mutual learning. The JL-CNN is evaluated on the wheelset bearing dataset and motor bearing dataset, which shows that JL-CNN has excellent fault diagnosis ability and signal denoising ability, and it has good performance under strong noise and unknown noise.

Image-based thickener mud layer height prediction with attention mechanism-based CNN

Article

Nov 2021
ISA T

Mud layer height of thickener is the key quality index of thickening process which is difficult to achieve real-time detection with existing methods in reality. While the need of developing a soft sensor model which can be used for real-time detection of mud layer height, we proposed an end-to-end mud layer height prediction method with attention mechanism-based convolutional neural network (CNN). The dynamic features are firstly extracted from the image samples based on CNN, and then two types of attention mechanism are embedded sequentially to contribute to more precise prediction results. Compared with the traditional spatial attention mechanism, the regional spatial attention mechanism we proposed selectively divides the spatial feature map into regions, while regions containing important features are assigned larger weights. Adding the channel and regional spatial attention mechanism in CNN not only effectively improve both the precision and calculation speed, but also affect the dimension of the output feature map, so as to avoid the loss of channel or spatial attention information of the feature map. To verify the validity of the proposed method, different attention mechanisms are embedded in the CNN, and the corresponding experiments are carried out on the dataset of the thickener mud layer. The experimental results demonstrate the feasibility and effectiveness of the mud layer height prediction method.

Multimodal Motion Prediction with Stacked Transformers

Conference Paper

Jun 2021

Kernelized convolutional transformer network based driver behavior estimation for conflict resolution at unsignalized roundabout

Abstract

Recommended publications

Efficient and Robust LiDAR-Based End-to-End Navigation

Autonomous Vehicle Trajectory Prediction on Multi-Lane Highways Using Attention Based Model

Dynamic Planning of Optimally Safe Lane-change Trajectory for Autonomous Driving on Multi-lane Highw...

A CNN and Multi-Head Attention-Based Deep Learning Network for Trajectory Prediction of Autonomous V...

Transformer based composite network for autonomous driving trajectory prediction on multi-lane highw...