ArticlePDF Available

Probabilistic Prediction of Vehicle Semantic Intention and Motion

April 2018

April 2018

Authors:

Yeping Hu

University of California, Berkeley

Wei Zhan

University of California, Berkeley

Masayoshi Tomizuka

University of California, Berkeley

Accurately predicting the possible behaviors of traffic participants is an essential capability for future autonomous vehicles. The majority of current researches fix the number of driving intentions by considering only a specific scenario. However, distinct driving environments usually contain various possible driving maneuvers. Therefore, a intention prediction method that can adapt to different traffic scenarios is needed. To further improve the overall vehicle prediction performance, motion information is usually incorporated with classified intentions. As suggested in some literature, the methods that directly predict possible goal locations can achieve better performance for long-term motion prediction than other approaches due to their automatic incorporation of environment constraints. Moreover, by obtaining the temporal information of the predicted destinations, the optimal trajectories for predicted vehicles as well as the desirable path for ego autonomous vehicle could be easily generated. In this paper, we propose a Semantic-based Intention and Motion Prediction (SIMP) method, which can be adapted to any driving scenarios by using semantic-defined vehicle behaviors. It utilizes a probabilistic framework based on deep neural network to estimate the intentions, final locations, and the corresponding time information for surrounding vehicles. An exemplar real-world scenario was used to implement and examine the proposed method.

Insertion areas (colored regions) under different driving scenarios for the predicted vehicle.

…

Structure of the SIMP Method

…

Figures - uploaded by Yeping Hu

Content may be subject to copyright.

Content uploaded by Yeping Hu

Content may be subject to copyright.

Probabilistic Prediction of Vehicle Semantic Intention and Motion

Yeping Hu, Wei Zhan and Masayoshi Tomizuka

Abstract— Accurately predicting the possible behaviors of

trafﬁc participants is an essential capability for future au-

tonomous vehicles. The majority of current researches ﬁx the

number of driving intentions by considering only a speciﬁc

scenario. However, distinct driving environments usually con-

tain various possible driving maneuvers. Therefore, a intention

prediction method that can adapt to different trafﬁc scenarios

is needed. To further improve the overall vehicle prediction

performance, motion information is usually incorporated with

classiﬁed intentions. As suggested in some literature, the meth-

ods that directly predict possible goal locations can achieve

better performance for long-term motion prediction than other

approaches due to their automatic incorporation of environment

constraints. Moreover, by obtaining the temporal information of

the predicted destinations, the optimal trajectories for predicted

vehicles as well as the desirable path for ego autonomous vehicle

could be easily generated. In this paper, we propose a Semantic-

based Intention and Motion Prediction (SIMP) method, which

can be adapted to any driving scenarios by using semantic-

deﬁned vehicle behaviors. It utilizes a probabilistic framework

based on deep neural network to estimate the intentions,

ﬁnal locations, and the corresponding time information for

surrounding vehicles. An exemplar real-world scenario was

used to implement and examine the proposed method.

I. INTRODUCTION

Safety is the most fundamental aspect to consider for both

human drivers and autonomous vehicles. Human drivers are

capable of using past experience and intuitions to avoid

potential accidents by predicting the behaviors of other

drivers. However, some drivers have poor driving habits such

as changing lanes without using turn signals, which adds dif-

ﬁculties for prediction. Moreover, human drivers might easily

overlook dangerous situations due to limited concentration.

Therefore, the Advanced Driver Assistance Systems (ADAS)

should have the ability to simultaneously and accurately

anticipate future behaviors of multiple trafﬁc participants

under various driving scenarios, which may then assure a

safe, comfortable and cooperative driving experience.

There have been numerous works focused on predicting

vehicle behavior which can be divided into two categories:

intention/maneuver prediction and motion prediction.

Many intention estimation problems have been solved by us-

ing classiﬁcation strategies, such as Support Vector Machine

(SVM) [1], Bayesian classiﬁer [2], Hidden Markov Models

(HMMs) [5], and Multilayer Perceptron (MLP) [4]. Most

of these approaches were only designed for one particular

scenario associated with limited intentions. For example,

[1]-[4] dealt with non-junction segment such as highway,

which involves lane keeping (LK), lane change left (LCL)

Y. Hu, W. Zhan and M. Tomizuka are with the Department of Me-

chanical Engineering, University of California, Berkeley, CA 94720 USA

[yeping hu, wzhan, tomizuka@berkeley.edu]

Fig. 1. Insertion areas (colored regions) under different driving scenarios

for the predicted vehicle.

and lane change right (LCR) maneuvers. Whereas [5]-[7]

concentrated on junction segment such as intersection, which

includes left turn, right turn, and go straight maneuvers.

However, in order for autonomous vehicles to drive through

dynamically changing trafﬁc scenes in real life, an intention

prediction module that can adapt to different scenarios with

various possible driving maneuvers is necessary. [8] proposed

a maneuver estimation approach for generic trafﬁc scenarios,

but the classiﬁed driving maneuvers are too speciﬁc, which

will not only require multiple manually-selected classiﬁca-

tion thresholds, but also raise problems when unclassiﬁed

maneuvers occur.

As a result, we proposed to use semantics to represent

the driver intention, which is deﬁned as the intent to enter

each insertion area. These areas can be the available gaps

between any two vehicles on the road or can be the lane

entrances/exits. Fig. 1 visualizes the insertion areas under

distinct environments. An advantage of using semantic ap-

proach is situations can be modeled in a uniﬁed way [9]

such that varying driving scenarios will have no effect on

our semantics deﬁned problem. Even for a scenario that has a

combination of all the road structures in Fig. 1, the proposed

semantic deﬁnition still holds.

Motion prediction is mostly treated as a regression prob-

lem, where it tries to forecast the short-term movements and

long-term trajectories of vehicles. By incorporating motion

prediction with intention estimation, not only the high-

arXiv:1804.03629v1 [cs.LG] 10 Apr 2018

level behavioral information, but also the future state of the

predicted vehicle can be obtained. For short-term motion

prediction, various approaches such as constant acceleration

(CA), Intelligent Driver Model (IDM) [7], and Particle Filter

(PF) [10] have been suggested. The main limitation of

these works, however, is that they either considered simple

cases such as car following or did not take environment

information into account.

For future trajectory estimation, Dynamic Bayesian Net-

works (DBN) [11] and other regression models have been

used in several studies. Methods based on artiﬁcial neural

network (ANN) are also widely applied. In [11], the authors

used the LSTM to predict the vehicle trajectory in highway

situation. [12] brought forward a Deep Neural Networks

(DNN) to obtain the lateral acceleration and longitudinal

velocity. However, these approaches only predicted the most

likely trajectory for the vehicle without considering uncer-

tainties in the environment. To counter this issue, a Varia-

tional Gaussian Mixture Model (VGMM) was proposed for

probabilistic long-term motion prediction [14]. Nevertheless,

the method was only tested in a simulation environment and

the input contains history information over a long period of

time, which is usually unaccessible in reality. There are also

researches that project the prediction step of a tracking ﬁlter

forward over time, but the growing uncertainties often cause

future positions to end up at some physically impossible

locations.

In contrast, works such as [15][16] highlighted that by

predicting goal locations and assuming that agents navigate

toward those locations by following some optimal paths,

the accuracy of long-term prediction can be improved. The

main advantage of postulating destinations instead of trajec-

tories is that it allows one to represent various dynamics

and to automatically incorporate environment constraints for

unreachable regions.

Apart from obtaining the possible goals of predicted

vehicles, the required time to reach those locations is also an

essential information especially for the subsequent trajectory

planning of the ego vehicle. Therefore, many attempts have

been made in order to directly predict temporal information.

[17] used LSTM to forecase time-to-lane-change (TTLC)

of vehicles under highway scenarios. A recent work [18]

utilized the Linear Quantile Regression (LQR) and Quantile

Regression Forests (QRF) methods for the probabilistic re-

gression task of TTLC. The authors also concluded that QRF

has better performance than LQR.

In this paper, Semantic-based Intention and Motion Pre-

diction (SIMP) method is proposed. It utilizes deep neural

network to formulate a probabilistic framework which can

predict the possible semantic intention and motion of the

selected vehicle under various driving scenarios. The intro-

duced semantics for this prediction problem is deﬁned as

answering the question of ”Which area will the predicted

vehicle most likely insert into? Where and when?”, which

incorporates both the goal position and the time information

into each insertion area. Moreover, the adoption of probabil-

ity can take into account the uncertainty of drivers as well

as the evolution of the trafﬁc situations.

The remainder of the paper is organized as follows: Sec-

tion II provides the concept of the proposed SIMP method;

Section III discusses an exemplar scenario to apply SIMP;

evaluations and results are provided in Section IV; and

Section V concludes the paper.

II. CON CEPT OF SEMANTIC-BA SED INTENTION AND

MOTI ON PREDICTION (SIMP)

In this section, we ﬁrst provide a brief overview of Mixture

Density Network (MDN), which is an idea we utilize for

our proposed method. Then, the detailed formulation and

structure of the SIMP method are illustrated.

A. Mixture Density Network (MDN)

Mixture Density Network is a combination of ANN and

mixture density model, which was ﬁrst introduced by Bishop

[19]. The mixture density model can be used to estimate the

underlying distribution of data, typically by assuming that

each data point has some probability under a certain type

of distribution. By using a mixture model, more ﬂexibility

can be given to model completely general conditional density

function p(y|x), where xis a set of input features and yis

a set of output. The probability density of the target data is

then represented as a linear combination of kernel functions

in the form

p(y|x) =

m=1

αm(x)φm(y|x),(1)

where M denotes the total number of mixture components

and the parameter αm(x)denotes the m-th mixing coef-

ﬁcient of the corresponding kernel function φm(y|x). Al-

though various choices for the kernel function was possible,

for this paper, we utilize the Gaussian kernel of the form

φm(y|x) = N(y|µm(x), σ2

m(x)).(2)

Such formulation is called the Gaussian Mixture Model

(GMM)-based MDN, where a MDN maps input xto the

parameters of the GMM (mixing coefﬁcient αm, mean µm,

and variance σ2

m), which in turn gives a full probability

density function of the output y. It is important to note

that the parameters of the GMM need to satisfy speciﬁc

conditions in order to be valid: the mixing coefﬁcients αm

should be positive and sum to 1; the standard deviation σm

should be positive. The use of softmax function and expo-

nential operator in (3) fulﬁlls the aforementioned constraints.

In addition, no extra condition is needed for the mean µm.

αm=exp(zα

i=1 exp(zα

i), σm= exp(zσ

m), µm=zµ

m(3)

The parameters zα

m,zσ

m,zµ

mare the direct outputs of the

MDN corresponding to the mixture weight, variance and

mean for the m-th Gaussian component in the GMM.

The objective of training the MDN is to minimize the

negative log-likelihood as loss function

Loss =−X

logM

m=1

αn

m(xn)φm(yn|xn),(4)

where ndenotes the number of training data. The detailed

derivations on closed-form gradient formulation can be found

in [19], which demonstrated the capability of training the

MDN using back propagation.

B. Proposed SIMP Method

Our task is to generate probability distributions of the de-

signed semantic description given some representation of the

current state. We assign a Gaussian Mixture Model (GMM)

to each insertion area and multiple GMMs will be involved

in one driving scenario. Each Gaussian mixture models the

probability distribution of a certain type of motion for the

predicted vehicle. Since obtaining the insertion location and

the arriving time are the focus of our interests, a 2D Gaussian

mixture is used and the predicted variables are constructed

as a two dimensional vector: y= [ys, yt]T. The variable

ysdescribing the vehicle locations and the variable yt

describing the time information, can be speciﬁcally deﬁned

according to the driving environment.

Given the current state features x, the probability distri-

bution yaover a single area afor the predicted vehicle is of

the form

f(ya|x) =

m=1

αmN(ya|µm,Σm)(5)

with mean and covariance constructed as

µm=µs,m

µt,m ,Σm=σ2

s,m ρmσs,mσt,m

ρmσs,mσt,m σ2

t,m ,(6)

where ρm∈[−1,1] is the correlation coefﬁcient.

In addition to formulate a regression model for each

insertion area, we also require the probability of entering

each area for the predicted vehicle. Therefore, Deep Neural

Network (DNN) was used as the basis for our Semantic-

based Intention and Motion Prediction (SIMP) structure. The

output of the network contains both necessary parameters for

every 2D Gaussian Mixture Model (GMM) and the weight

wafor each insertion area a.

For the desired outputs, we expect not only the largest

weight to be associated to the actual inserted area, but also

the highest probability at the correct location and time for

the output distributions of that area. Consequently, we deﬁne

our loss function as

L=W1−X

logNa

a=1

ˆwn

af(yn

a|x)

+W2−X

a=1

ˆwn

alog(wn

a),

(7)

where Nadenotes the total number of insertion areas and

ˆwadenotes the ground truth, which is the one-hot-encoding

of the ﬁnal area that the predicted vehicle entered. The last

term denotes the cross-entropy loss of the area weights.

Parameters W1and W2need to be manually tuned such

that the two loss components will have the same order of

magnitude during training.

Various Functions

f(ys1,y

t1|x)

f(ysNa,y

tNa|x)

wNa

Input .

PNa

Fig. 2. Structure of the SIMP Method

The overall architecture of our SIMP method is shown in

Fig. 2. Due to the ﬁrst-order Markov assumption, the input

features depend only on the current time step. The network

consists of an input layer, several fully connected layers,

and a dropout layer which ensures better generalization and

prevents overﬁtting of the training data. After passing dif-

ferent types of parameters through corresponding functions,

the output will satisfy the aforementioned constraints. For Na

insertion areas, the total number of output parameters can be

calculated as: Na∗(M∗6+1). The interpretation is: there is

a weight parameter waassociated to each area a∈Na, and

for every m∈Mwithin an area, six parameters are needed,

a={αm, µs,m, µt,m , σs,m, σt,m, ρm}, to formulate the

2D GMM.

III. ANEXEMPLAR HIG HWAY SCENARIO

In this section, we use an exemplar highway scenario to

apply the proposed Semantic-based Intention and Motion

Prediction (SIMP) method. The data source and detailed

problem formulation are presented.

A. Dataset

All the data we used was taken from the NGSIM US 101

dataset which is publicly available online at [20]. It contains

detailed vehicle trajectory data collected on the highway

with 10 Hz sampling frequency. The measurement area is

approximately 640 meters (2100 feet) in length and there are

ﬁve freeway lanes plus an auxiliary lane for the on/off-ramp.

For each vehicle that performs a lane change maneuver,

we picked up to 40 frames (4s) before the vehicle’s center

intersects the lane mark; for vehicles that keep driving on

the same lane for a long period, we considered these frames

as input for the lane keeping maneuver. A total of 17,179

frames were selected from the dataset and splitted into 80%

for training and 20% for testing.

B. Scenario and Problem Description

A representation of the exemplar highway driving scenario

is shown in Fig. 3. The yellow car is the vehicle we decide

to predict; the three blue cars (car2, car4, and car6) are the

reference vehicles, which are selected as having the closest

Euclidean distance to the predicted vehicle on each of the

three lanes (we consider only the front vehicle on predicted

vehicle’s lane); the four gray cars (car1, car3, car5, and

car7) are named as ‘other vehicles’, which are vehicles in

front and behind each of the two reference cars: car2 and

car6. If any of these surrounding vehicles is too far from

car1 car2 car3

car4

car5 car6 car7

Fig. 3. An exemplar driving scenario

the predicted vehicle, we consider it as nonexistence within

the range of the current scenario. Therefore, for each input

frame, a maximum of three driving lanes and seven vehicles

are considered.

In Fig. 3, there are ﬁve circled areas that our predicted

vehicle could end up going into and we name them as

Dynamic Insertion Area (DIA). If the predicted vehicle

(yellow car) inserts into area 1-4, a lane change behavior

is indicated; however, if it inserts into area 5, a lane keeping

behavior is implied. These areas are dynamic because both

their locations and sizes will vary at each time step.

In this particular highway scenario setting, the output ys

represents the absolute distance between the ﬁnal insertion

point and the corresponding reference vehicle for that in-

serted area; ytrepresents the time-to-lane-change (TTLC)

of the predicted vehicle. When the center of the vehicle

intersects the lane mark, TTLC = 0. For the lane keeping

situation, TTLC is set to a large number (4s) to represent

that the vehicle has not yet decided to change the lane.

C. Features and Structure Details

For each input frame, a total of 25 input features are

selected which are listed in Table I. Each input frame corre-

sponds to 3 types of labels extracted from data: area weight,

ﬁnal goal location, and remaining insertion time. According

to the data, the longitudinal direction is the driving direction.

The current lane center (CLC) denotes the midpoint of the

current lane. Because of the small angle difference between

the front and the predicted vehicle, only the relative angle

information for the left and right reference vehicles are

considered. Time-to-collision (TTC) is calculated by dividing

the speed difference by the relative distance of two vehicles.

We compute the inverse of time-to-collision (iTTC) instead

due to the existence of inﬁnity TTC value as the speed

difference gets close to zero.

As mentioned previously, there will be a maximum of

seven cars within each input frame. If, however, a vehicle

does not exist, we assign its longitudinal distance to a large

number and its velocity to be the same as that of the predicted

vehicle. If there is no available lane on one side of the

predicted vehicle, we set the three vehicles in that nonexistent

lane to be close to each other and the reference vehicle to

be directly above/below the predicted vehicle. Similarly, all

these three vehicles are set to have the same speed as the

predicted vehicle. Such setting can guarantee the feasibility

of the predicted results.

As for the network structure, we use three fully connected

layers of 400 neurons each, with tanh non-linear activation

function. After that, a dropout layer of rate 0.5 is appended.

The parameter Nais ﬁve for this particular scenario.

TABLE I

FEATU RES FO R ONE INPU T FRAME

Feature Description

Predicted

Vehicle

pred Absolute velocity in longitudinal direction

CLCpred Lateral distance to the current lane center

Reference

Vehicles

ref Absolute velocity in longitudinal direction

ref,pred Position in longitudinal direction, relative to

predicted vehicle

(l,r),pred Relative lateral position between left/right

reference vehicle and predicted vehicle

θ(l,r),pred Relative angle between left/right reference

vehicle and predicted vehicle

iT T Cf,pred Inverse time-to-collision between front ref-

erence vehicle and predicted vehicle

Other

Vehicles

oAbsolute velocity in longitudinal direction

o,pred Position in longitudinal direction, relative to

predicted vehicle

iT T Co,ref Inverse time-to-collision relative to corre-

sponding reference vehicle

IV. EVALUATI ON AND RESULTS

In this section, different evaluation techniques are pre-

sented to assess the model quality and the ﬁnal results are

discussed.

A. Evaluation Setup

1) Baseline Model: To evaluate our SIMP method, we

trained a Support Vector Machine (SVM) [21] and a Quantile

Regression Forests (QRF) [22] separately. Since SVM is

wildly used for classiﬁcation problems, we compared it with

the intention prediction part of our framework. The QRF is

a combination of Quantile Regression and Random Forests

[23], which extends the concept of tree ensemble learning to

probabilistic prediction. Instead of point estimating the con-

ditional mean for the selected variables like other regression

methods, the objective is to estimate an arbitrary conditional

quantile. The quantiles can provide detailed information

of the minimum and maximum values for the dependent

variable and encompass the uncertainty estimation. Hence,

we compared our motion prediction part of the probabilistic

framework with the QRF method for evaluation. The details

of the baseline models are presented below

•SVM: kernel = (Gaussian) radial basis function (RBF)

•QRF: ntree = 1000, mtry = 5, nodesize = 10

where ntree is the number of trees in the forest, mtry is the

number of random features in each tree, and nodesize is the

minimal size of terminal nodes. All these parameters were

selected using ﬁve-fold cross validation.

Predicted Vehicle

Other Vehicle

Reference Vehicle

Sampled Points

Ground Truth

400 450 500 550 600 650

500 550 600 650 700 750

600 650 700 750 800 850

Frame: 1 / 40

Frame: 19 / 40

Frame: 20 / 40

Frame: 40 / 40

1350 1400 1450 1500 1550 1600

1450 1500 1550 1600 1650 1700

1500 1550 1600

1550 1600 1650 1700 1750 1800

1650 1700 1750

Frame: 40 / 40

Frame: 29 / 40

Frame: 13 / 40

Frame: 1 / 40

(a) Typical Lane Change (b) Sudden Change of Reference Vehicle

Fig. 4. Two example cases to visualize the performance. In each testing frame, 50 points were sampled by two steps: 1) multiply the total number of dots

by each DIA weight. 2) for every dot assigned to each DIA, sample it according to the corresponding distribution of that area. (The unit of the horizontal

axis is in feet.)

2) Evaluation for Intention Estimation: For training and

testing, each sample from our data was assigned to a

semantic intention class, which is expressed as I∈

{area1, area2, area3, area4, area5}. However, since these

dynamic insertion areas (DIA) change constantly during the

driving period, it is hard to detect the ﬁnal insertion area

at the early stage. Therefore, for better evaluation at the

beginning of the input driving segments, we merged the orig-

inal ﬁve semantic intentions into three: {LCL, LC R, LK},

where {area1, area2} ∈ LC L,{area3, area4} ∈ LCR,

and {area5} ∈ LK. During training, the input features for

SVM were the same as our method, and the labels were the

corresponding ﬁnal DIA numbers. The evaluation contains

three steps:

i. For all testing data, create the Receiver Operating

Characteristic (ROC) curve to compare our method

with SVM. (Use the simpliﬁed 3 intention classes.)

ii. Find the best threshold from the ROC curve and use it

to calculate the recall, precision, F1 score as well as

the average prediction time for both methods.

iii. For testing data that has a TTLC smaller than the

obtained average prediction time, analyze the perfor-

mance of each DIA. (Use the original 5 semantic

intention classes.)

3) Evaluation for Motion Prediction: In our problem

setting, two semantic described motions are predicted: ﬁ-

nal locations in each insertion area (destination) and the

remaining time to reach those locations (TTLC). For the

conditional distribution of each motion, we expect not only

small difference between the predicted mean and the actual

value, but also centralized distribution around the predicted

mean. Hence, we evaluated the root mean squared error

(RMSE) of the output mean as well as the conﬁdence interval

for both the QRF and the SIMP method. The number of

mixture components Mfor each DIA was set to one for

analysis purpose. For the training process of QRF, we trained

two separate random forest quantile regressors, where the

input features remains the same and the label is either the

location or the time information.

Two different intervals were chosen to assess the testing

results for each method:

•SIMP-1σ: one standard deviation interval

•SIMP-2σ: two standard deviation interval

•QRF-68%: 16% to 84% quantile interval

•QRF-95%: 2.5% to 97.5% quantile interval.

B. Results and Discussion

1) Visualization of Selected Cases: We selected two dis-

tinct trafﬁc situations to visualize our results. Each situation

had 40 frames (4s) and we chose four representative frames

from each case to illustrate the overall performance. The full

video can be found on https://www.youtube.com/

watch?v=6A3Hl-mRhbI.

A typical lane change situation is illustrated in Fig. 4(a)

where the sampled points are all in the proper DIA for each

frame. It is reasonable to have several possible areas at the

early stage since there are multiple choices for the driver and

no speciﬁc one has been chosen yet. It should be note that

it is difﬁcult to numerically justify the correctness of these

circumstances without using the human-labeled ground truth.

However, as soon as the driver decides where to go, our result

could be compared with the label extracted from data. We

further used this case to illustrate the TTLC prediction result

in Fig. 5. The differences between our resulted mean and the

ground truth are all smaller than 0.3s within three seconds

before lane change; besides, the predicted TTLC values for

other insertion areas remain in reasonable ranges.

Since the reference vehicle will switch from one to another

while the predicted vehicle is driving, we need to guarantee

the capability of our method to handle such cases without

large discontinuity on the prediction result. Therefore, we

examined one of such cases shown in Fig. 4(b) and it can

be observed that such sudden change occurs between frame

19 and 20. During this period, our sampled points are able

to keep in the correct DIA and tightly distributed around the

red target line.

2) Intention Estimation: The ROC curves of the SIMP

and the SVM methods are visualized in Fig. 6. The curves

were created by plotting the true positive rate (TPR) against

the false positive rate (FPR) at various threshold settings.

Similar to [17], we deﬁned two positive classes (lane change

left and right) and one negative class (lane keeping). The area

under the ROC curve (AUC) can be used as an aggregated

measure of the classiﬁer performance. The true positive

(TP) represents correct prediction of either lane change left

or right, the false positive (FP) indicates mispredicting the

lane change direction, and the false negative (FN) means

incorrectly predicting a lane change into lane keeping.

From Fig. 6 and AUC values, we observe that our method

outperforms SVM for lane change maneuvers. A threshold of

0.3 for classiﬁcation was chosen for making the best trade-

off between a high TPR and a low FPR. Given the selected

threshold, we can further calculate the precision and recall

precision =T P

T P +F P , recall =T P

T P +F N (8)

and the F1 score can be obtained by the formula

F1 = 2∗precision ∗recall

precision +recall ,(9)

which denotes how good the classiﬁcation abilities are.

Moreover, how early the lane change can be recognized

is also in the focus of our interests. Thus, we calculated

the average prediction time from the testing data that were

classiﬁed as true positive. The overall performance of the

two methods are compared in Table II. It is apparent from

table that the proposed method has better performance than

SVM in terms of both prediction accuracy and the average

prediction time.

Since our method can correctly forecast the predicted

vehicle’s intention approximately 2s in advance to the actual

lane change according to Table II, we further plotted the ROC

curve and calculated the AUC for each dynamic insertions

area (DIA) to examine how well can SIMP predict the ﬁnal

insertion region. The obtained AUC values for Area1, Area2,

and Area3 are all equal to 1, and Area4 has a 0.994 AUC

value. The result implies that the proposed method can not

only detect the lane change direction but also the speciﬁc

dynamic insertion area (DIA) with high accuracy for the

selected time window.

TABLE II

PER FORM ANCE COMPARISON

Method Precision Recall F1-

Score

Avg. Predict

Time (s)

SVM 0.859 0.919 0.888 1.911

SIMPF 0.936 0.925 0.931 1.957

3) Motion Prediction: The comparison results between

QRF and the proposed method for two motion prediction

tasks are shown in Fig. 7 and Fig. 8. The mean for QRF

was obtained by calculating the 50% quantile (or median)

Fig. 5. TTLC illustration for the case in Fig. 4(a). We sampled 100 points

from the mixture distribution of each related DIA and plotted the mean as

well as the 3σand 1σprediction intervals for these samples. When area

weight is too small to be associated with sampled points, the TTLC result

of that area at the corresponding frame will be colored in gray.

Fig. 6. ROC curve comparison

assuming symmetric distribution. We utilized the testing data

that has a TTLC smaller than the average prediction time

derived in the previous section. The mean and conﬁdence

interval were calculated from the obtained output distribution

of the correct insertion area. As can be seen in the plots,

the RMSE of our approach for both motion predictions are

smaller compared with the QRF method. The RMSE error

of the TTLC tends towards zero for the lane change cases

by using the SIMP method. One thing need to mention is

that the error for the destination prediction is not close to

zero even at t= 0. However, this is not unexpected given

the fact that the predicted distance is relative to the reference

car. Thus, the results might deviate due to the consideration

of any velocity variance of the reference vehicle.

For the conﬁdence interval comparison, it is obvious to see

that the performance of our proposed method surpasses QRF

especially for the TTLC prediction, where the 2-σinterval

of the SIMP method is even smaller than the 68% interval

Fig. 7. Comparison of Time-to-Lane-Change (TTLC) Prediction

Fig. 8. Comparison of Destination Prediction

of the QRF. The gradually decreasing difference between

the one and two standard deviation interval as well as the

declining interval values imply that our predicted Gaussian

distribution is becoming more centralized around the ground

truth as the TTLC approaching zero.

V. CONCLUSIONS

In this paper, a Semantic-based Intention and Motion

Prediction (SIMP) method was proposed, which can generate

various designated conditional distributions for predicted

vehicles under any circumstances. An exemplar highway

scenario with real-world data was used to apply the idea of

SIMP. First, two representative driving cases were utilized to

visualize the testing result. Then the intention prediction and

the motion prediction part were separately compared with

two different baseline models: SVM and QRF. Our approach

outperforms these methods in terms of both the prediction

error and the conﬁdence intervals. The key conclusion is that

by combining different prediction tasks using semantics in

a single framework, we can not only easily generalize the

idea into any trafﬁc scenarios but also obtain competitive

performance compared to traditional methods. The output

goal position and time information can be further used

to generate optimal trajectories for predicted vehicles and

eventually obtain a desirable path for our own autonomous

vehicle. For future work, we will examine the SIMP method

on more complex scenarios as well as take into account the

occurrence of vehicle occlusion.

REFERENCES

[1] H. M. Mandalia and D. D. Salvucci, “Using support vector machines

for lane change detection,” in Proc. of the Human Factors and

Ergonomics Society 49th Annual Meeting, 2015.

[2] J. C. McCall, D. P. Wipf, M. M. Trivedi, and B. D. Rao, “Lane change

intent analysis using robust operators and sparse bayesian learning,”

IEEE Transactions on Intelligent Transportation Systems, vol. 8, no.

3, pp. 431-440, 2007.

[3] P. Kumar, M. Perrollaz, S. Lefvre, and C. Laugier, “Learning-based

approach for online lane change intention prediction,” in 2013 IEEE

Intelligent Vehicles Symposium (IV), Jun. 2013, pp. 797-802.

[4] S. Yoon and D. Kum, “The multilayer perceptron approach to lateral

motion prediction of surrounding vehicles for autonomous vehicles,” in

2016 IEEE Intelligent Vehicles Symposium (IV), Jun. 2016, pp. 1307-

1312.

[5] T. Streubel, K. H. Hoffmann, “Prediction of driver intended path at

intersections,” in 2014 IEEE Intelligent Vehicles Symposium (IV), Jun.

2014, pp. 1189-1194.

[6] D. J. Phillips, T. A. Wheeler, and M. J. Kochenderfer, “Generalizable

Intention Prediction of Human Drivers at Intersections,” in 2017 IEEE

Intelligent Vehicles Symposium (IV), pp. 1665-1670, 2017.

[7] M. Liebner, M. Baumann, F. Klanner and C. Stiller, “Driver intent

inference at urban intersections using the intelligent driver model”, in

2012 IEEE Intelligent Vehicles Symposium (IV), Jun. 2012, pp. 1162-

1167.

[8] S. Klingelschmitt, V. Willert, and J. Eggert, “Probabilistic, discrim-

inative maneuver estimation in generic trafﬁc scenes using pairwise

probability coupling,” in 2016 IEEE 19th International Conference on

Intelligent Transportation Systems (ITSC), pp. 1269-1276.

[9] S. Shalev-Shwartz, S. Shammah, and A. Shashua, “On a formal model

of safe and scalable self-driving cars,” arXiv:1708.06374, 2017.

[10] S. Hoermann, D. Stumper, and K. Dietmayer, “Probabilistic long-term

prediction for autonomous vehicles,” in 2017 IEEE Intelligent Vehicles

Symposium (IV), Jun. 2017.

[11] T. Gindele, S. Brechtel, and R. Dillmann, “A probabilistic model

for estimating driver behaviors and vehicle trajectories in trafﬁc

environments,” in 2010 IEEE International Conference on Intelligent

Transportation Systems (ITSC), pp. 1625-1631.

[12] F. Altch´

e, and A. De La Fortelle, “An LSTM network for highway

trajectory prediction,” in 2017 IEEE 20th International Conference on

Intelligent Transportation Systems (ITSC): Workshop. IEEE, 2017.

[13] D. Lenz, F. Diehl, M. T. Le, and A. Knoll, “Deep neural networks for

Markovian interactive scene prediction in highway scenarios,” in 2017

IEEE Intelligent Vehicles Symposium (IV), Jun. 2017, pp. 685-692.

[14] J. Wiest, M. H¨

offken, and U. Kreßel, and K. Dietmayer, “Probabilistic

Trajectory Prediction with Gaussian Mixture Models,” in 2012 IEEE

Intelligent Vehicles Symposium (IV), Jun. 2015, pp 141-146.

[15] B. D. Ziebart, N. Ratliff, G. Gallagher, C. Mertz, K. Peterson, J. A.

Bagnell, M. Hebert, A. K. Dey, and S. Srinivasa, “Planning-based

prediction for pedestrians,” in IROS, 2009.

[16] E. Rehder and H. Kloeden, “Goal-Directed Pedestrian Prediction”,

In Proceedings of 2015 IEEE International Conference on Computer

Vision Workshop, pp. 139-147, 2015.

[17] H. Q. Dang, J. F¨

urnkranz, A. Biedermann, and M. Hoepﬂ, “Time-to-

Lane-Change Prediction with Deep Learning,” in 2017 IEEE 20th In-

ternational Conference on Intelligent Transportation Systems (ITSC).

IEEE, 2017.

[18] C. Wissing, T. Nattermann, K. H. Glander, and T. Bertram, “Prob-

abilistic time-to-lane-change prediction on highways,” in 2017 IEEE

Intelligent Vehicles Symposium (IV), Jun. 2017, pp. 1452-1457.

[19] C. M. Bishop, “Mixture Density Network”, 1994.

[20] U.S. Department of Transportation Intelligent Transportation Systems

Joint Program Ofﬁce (JPO). Avaliable: https://www.its.dot.gov/data/

[21] C. Cortes and V. Vapnik, “Support-vector networks,” Maching Lean-

ing, vol. 20, no. 3, pp. 273-197, 1995.

[22] N. Meinshausen, “Quantile regression forests,” Journal of Machine

Learning Research, vol. 7, pp. 983-999, 2006.

[23] L. Breiman, “Random forests,” Machine Learning, vol.45, no. 1, pp.

5-32, 2001.

Learning Road Scene-level Representations via Semantic Region Prediction

Preprint

Jan 2023

In this work, we tackle two vital tasks in automated driving systems, i.e., driver intent prediction and risk object identification from egocentric images. Mainly, we investigate the question: what would be good road scene-level representations for these two tasks? We contend that a scene-level representation must capture higher-level semantic and geometric representations of traffic scenes around ego-vehicle while performing actions to their destinations. To this end, we introduce the representation of semantic regions, which are areas where ego-vehicles visit while taking an afforded action (e.g., left-turn at 4-way intersections). We propose to learn scene-level representations via a novel semantic region prediction task and an automatic semantic region labeling algorithm. Extensive evaluations are conducted on the HDD and nuScenes datasets, and the learned representations lead to state-of-the-art performance for driver intention prediction and risk object identification.

Behavioral Intention Prediction in Driving Scenes: A Survey

Preprint

Full-text available

Nov 2022

In the driving scene, the road participants usually show frequent interaction and intention understanding with the surrounding. Ego-agent (each road participant itself) conducts the prediction of what behavior will be done by other road users all the time and expects a shared and consistent understanding. For instance, we need to predict the next movement of other road users and expect a consistent joint action to avoid unexpected accident. Behavioral Intention Prediction (BIP) is to simulate such a human consideration process and fulfill the beginning time prediction of specific behaviors. It provides an earlier signal promptly than the specific behaviors for whether the surrounding road participants will present specific behavior (crossing, overtaking, and turning, etc.) in near future or not. More and more works in BIP are based on deep learning models to take advantage of big data, and focus on developing effective inference approaches (e.g., explainable inference, cross-modality fusion, and simulation augmentation). Therefore, in this work, we focus on BIP-conditioned prediction tasks, including trajectory prediction, behavior prediction, and accident prediction and explore the differences among various works in this field. Based on this investigation and the findings, we discuss the open problems in behavioral intention prediction and propose future research directions.

Towards Trustworthy Multi-Modal Motion Prediction: Evaluation and Interpretability

Preprint

Full-text available

Oct 2022

Predicting the motion of other road agents enables autonomous vehicles to perform safe and efficient path planning. This task is very complex, as the behaviour of road agents depends on many factors and the number of possible future trajectories can be considerable (multi-modal). Most approaches proposed to address multi-modal motion prediction are based on complex machine learning systems that have limited interpretability. Moreover, the metrics used in current benchmarks do not evaluate all aspects of the problem, such as the diversity and admissibility of the output. In this work, we aim to advance towards the design of trustworthy motion prediction systems, based on some of the requirements for the design of Trustworthy Artificial Intelligence. We focus on evaluation criteria, robustness, and interpretability of outputs. First, we comprehensively analyse the evaluation metrics, identify the main gaps of current benchmarks, and propose a new holistic evaluation framework. In addition, we formulate a method for the assessment of spatial and temporal robustness by simulating noise in the perception system. We propose an intent prediction layer that can be attached to multi-modal motion prediction models to enhance the interpretability of the outputs and generate more balanced results in the proposed evaluation framework. Finally, the interpretability of the outputs is assessed by means of a survey that explores different elements in the visualization of the multi-modal trajectories and intentions.

SIA-Net: Scalable Interaction-Aware Network for Vehicle Trajectory Prediction Based on Self-Attention

Conference Paper

Oct 2022

Vehicle Trajectory Prediction in Roundabout Based on the Joint Learning of Taillight State and Historical Trajectory

Conference Paper

Aug 2022

Prediction of a Road User's Most Likely Future Positions via Simple Kernel Density Estimation

Conference Paper

Nov 2022

TAE: A Semi-supervised Controllable Behavior-aware Trajectory Generator and Predictor

Conference Paper

Oct 2022

Prediction-Uncertainty-Aware Decision-Making for Autonomous Vehicles

Article

Dec 2022

Motion prediction is the fundamental input for decision-making in autonomous vehicles. The current motion prediction solutions are designed with a strong reliance on black box predictions based on neural networks (NNs), which is unacceptable for safety-critical applications. Motion prediction with high uncertainty can cause conflicting decisions and even catastrophic results. To address this issue, an uncertainty estimation approach based on the deep ensemble technique is proposed for motion prediction in this paper. Subsequently, the estimated uncertainty is considered in the decision-making module to improve driving safety. Firstly, a motion prediction model based on long short-term memory (LSTM) is built and the deep ensemble technique is utilized to obtain both epistemic and aleatoric uncertainty of the prediction model. Besides, an uncertainty-aware potential field is developed to process the prediction uncertainty. Furthermore, a decision-making framework is proposed based on the model predictive control algorithm that considers the uncertainty-aware potential field, road boundaries, and multiple constraints of vehicle dynamics. Finally, the public available NGSIM , HighD and INTERACTION datasets are used to evaluate the proposed motion prediction model. More importantly, two traffic scenarios are also extracted from NGSIM and INTERACTION datasets to verify the effectiveness of the proposed decision-making method and in particular, its real-time performance is shown by employing a hardware-in-the-loop (HiL) experiment bench.

Analyzing and Enhancing Closed-loop Stability in Reactive Simulation

Conference Paper

Oct 2022

Scenario-Transferable Semantic Graph Reasoning for Interaction-Aware Probabilistic Prediction

Article

Dec 2022

Accurately predicting the possible behaviors of traffic participants is an essential capability for autonomous vehicles. Since autonomous vehicles need to navigate in dynamically changing environments, they are expected to make accurate predictions regardless of where they are and what driving circumstances they encountered. Several methodologies have been proposed to solve prediction problems under different traffic situations. These works usually combine agent trajectories with either color-coded or vectorized high definition (HD) map as input representations and encode this information for behavior prediction tasks. However, not all the information is relevant in the scene for the forecasting and such irrelevant information may be even distracting to the forecasting in certain situations. Therefore, in this paper, we propose a novel generic representation for various driving environments by taking the advantage of semantics and domain knowledge. Using semantics enables situations to be modeled in a uniform way and applying domain knowledge filters out unrelated elements to target vehicle’s future behaviors. We then propose a general semantic behavior prediction framework to effectively utilize these representations by formulating them into spatial-temporal semantic graphs and reasoning internal relations among these graphs. We theoretically and empirically validate the proposed framework under highly interactive and complex scenarios, demonstrating that our method not only achieves state-of-the-art performance, but also processes desirable zero-shot transferability.

An LSTM network for highway trajectory prediction

Conference Paper

Oct 2017

Time-to-lane-change prediction with deep learning

Conference Paper

Oct 2017

On a Formal Model of Safe and Scalable Self-driving Cars

Article

Aug 2017

In recent years, car makers and tech companies are racing toward self driving cars. It seems that the main parameter in this race is who will have the first car on the road. The goal of this paper is to add to the equation two additional crucial parameters. The first is standardization of safety assurance --- what are the minimal requirements that every self-driving car must satisfy, and how can we verify these requirements. The second parameter is scalability --- engineering solutions that lead to unleashed costs will not scale to millions of cars, which will push interest in this field into a niche academic corner, which might drive the entire field into a "winter of autonomous driving". In the first part of the paper we propose a white-box, interpretable, mathematical model for safety assurance. In the second part we describe a design of a system that adheres to our safety assurance requirements and is scalable to millions of cars.

Generalizable intention prediction of human drivers at intersections

Conference Paper

Jun 2017

Probabilistic time-to-lane-change prediction on highways

Conference Paper

Jun 2017

Deep neural networks for Markovian interactive scene prediction in highway scenarios

Conference Paper

Jun 2017

Probabilistic Long-Term Prediction for Autonomous Vehicles

Conference Paper

Jun 2017

Long-term prediction of traffic participants is crucial to enable autonomous driving on public roads. The quality of the prediction directly affects the frequency of trajectory planning. With a poor estimation of the future development, more computational effort has to be put in re-planning, and a safe vehicle state at the end of the planning horizon is not guaranteed. A holistic probabilistic prediction, considering inputs, results and parameters as random variables, highly reduces the problem. A time frame of several seconds requires a probabilistic description of the scene evolution, where uncertainty or accuracy is represented by the trajectory distribution. Following this strategy, a novel evaluation method is needed, coping with the fact, that the future evolution of a scene is also uncertain. We present a method to evaluate the probabilistic prediction of real traffic scenes with varying start conditions. The proposed prediction is based on a particle filter, estimating behavior describing parameters of a microscopic traffic model. Experiments on real traffic data with random leading vehicles show the applicability in terms of convergence, enabling long-term prediction using forward propagation.

Support-vector networks

Article

Jan 2009

Thesupport-vector network is a new learning machine for two-group classification problems. The machine conceptually implements the following idea: input vectors are non-linearly mapped to a very high-dimension feature space. In this feature space a linear decision surface is constructed. Special properties of the decision surface ensures high generalization ability of the learning machine. The idea behind the support-vector network was previously implemented for the restricted case where the training data can be separated without errors. We here extend this result to non-separable training data.High generalization ability of support-vector networks utilizing polynomial input transformations is demonstrated. We also compare the performance of the support-vector network to various classical learning algorithms that all took part in a benchmark study of Optical Character Recognition.

Probabilistic, Discriminative Maneuver Estimation in Generic Traffic Scenes using Pairwise Probability Coupling

Conference Paper

Nov 2016

Future advanced driver assistance systems as well as autonomous vehicles are expected to further increase their areas of applicability. Reliable maneuver estimations are a prerequisite for many of the intended functionalities. Accordingly , maneuver estimation systems need to cover a wide range of scenarios. The majority of recently presented approaches are targeted at fixed scenarios. However, having specialized maneuver estimation systems covering each possible scenario is unrealistic. Therefore, we present an approach for tackling discriminative maneuver estimations in generic traffic scenes. It is based on reusable, partial classifiers that are combined online using a technique called pairwise probability coupling. As a result we are able to make discriminative maneuver estimations in generic traffic scenes. The benefits and applicability are presented on inner-city real-world data sets. Our evaluation indicates that the assembled probabilistic maneuver estimation is not only able to outperform generative models; it surpasses the performance of specially designed models due to the reduced complexities of the partial classifiers.

The multilayer perceptron approach to lateral motion prediction of surrounding vehicles for autonomous vehicles

Conference Paper

Jun 2016

For safe and reliable autonomous driving systems, prediction of surrounding vehicles' future behavior and potential risks are critical. The state-of-the-art prediction algorithms tend to show limited performance on long-term predictions due to their deterministic nature. In this paper, a probabilistic lateral motion prediction algorithm is proposed based on multilayer perceptron (MLP) approach. The MLP model consists of two parts; target lane and trajectory models. In order to develop an intuitive and accurate prediction algorithm, a lane-based trajectory prediction model is introduced based on the fact that vehicles drive within a lane except for during lane changes. More specifically, a set of three representative trajectories with different levels of lane-change positions are generated for each target lane, and real-world traffic data is categorized by each trajectory for MLP training. These target lane and trajectory models enable the stochastic MLP modeling and training. The proposed MLP model outputs probabilities of how likely a vehicle will follow each trajectory and each lane for a given input of vehicle position history including current position. For training the MLP model, Next Generation Simulation traffic data are used. Simulation results show that the proposed algorithm detects lane-changes one to one and a half second earlier than existing methods and three seconds before lane crossing with about ninety percentages accuracy.

Probabilistic Prediction of Vehicle Semantic Intention and Motion

Abstract and Figures

Recommended publications

An Approach to Vehicle Trajectory Prediction Using Automatically Generated Traffic Maps

A New Method for Traffic Signs Classification Using Probabilistic Neural Networks

Probabilistic Trajectory Prediction in Intelligent Driving

A novel parking service using wireless networks