ArticlePDF Available

Abstract and Figures

Accurately predicting the possible behaviors of traffic participants is an essential capability for future autonomous vehicles. The majority of current researches fix the number of driving intentions by considering only a specific scenario. However, distinct driving environments usually contain various possible driving maneuvers. Therefore, a intention prediction method that can adapt to different traffic scenarios is needed. To further improve the overall vehicle prediction performance, motion information is usually incorporated with classified intentions. As suggested in some literature, the methods that directly predict possible goal locations can achieve better performance for long-term motion prediction than other approaches due to their automatic incorporation of environment constraints. Moreover, by obtaining the temporal information of the predicted destinations, the optimal trajectories for predicted vehicles as well as the desirable path for ego autonomous vehicle could be easily generated. In this paper, we propose a Semantic-based Intention and Motion Prediction (SIMP) method, which can be adapted to any driving scenarios by using semantic-defined vehicle behaviors. It utilizes a probabilistic framework based on deep neural network to estimate the intentions, final locations, and the corresponding time information for surrounding vehicles. An exemplar real-world scenario was used to implement and examine the proposed method.
Content may be subject to copyright.
Probabilistic Prediction of Vehicle Semantic Intention and Motion
Yeping Hu, Wei Zhan and Masayoshi Tomizuka
Abstract Accurately predicting the possible behaviors of
traffic participants is an essential capability for future au-
tonomous vehicles. The majority of current researches fix the
number of driving intentions by considering only a specific
scenario. However, distinct driving environments usually con-
tain various possible driving maneuvers. Therefore, a intention
prediction method that can adapt to different traffic scenarios
is needed. To further improve the overall vehicle prediction
performance, motion information is usually incorporated with
classified intentions. As suggested in some literature, the meth-
ods that directly predict possible goal locations can achieve
better performance for long-term motion prediction than other
approaches due to their automatic incorporation of environment
constraints. Moreover, by obtaining the temporal information of
the predicted destinations, the optimal trajectories for predicted
vehicles as well as the desirable path for ego autonomous vehicle
could be easily generated. In this paper, we propose a Semantic-
based Intention and Motion Prediction (SIMP) method, which
can be adapted to any driving scenarios by using semantic-
defined vehicle behaviors. It utilizes a probabilistic framework
based on deep neural network to estimate the intentions,
final locations, and the corresponding time information for
surrounding vehicles. An exemplar real-world scenario was
used to implement and examine the proposed method.
I. INTRODUCTION
Safety is the most fundamental aspect to consider for both
human drivers and autonomous vehicles. Human drivers are
capable of using past experience and intuitions to avoid
potential accidents by predicting the behaviors of other
drivers. However, some drivers have poor driving habits such
as changing lanes without using turn signals, which adds dif-
ficulties for prediction. Moreover, human drivers might easily
overlook dangerous situations due to limited concentration.
Therefore, the Advanced Driver Assistance Systems (ADAS)
should have the ability to simultaneously and accurately
anticipate future behaviors of multiple traffic participants
under various driving scenarios, which may then assure a
safe, comfortable and cooperative driving experience.
There have been numerous works focused on predicting
vehicle behavior which can be divided into two categories:
intention/maneuver prediction and motion prediction.
Many intention estimation problems have been solved by us-
ing classification strategies, such as Support Vector Machine
(SVM) [1], Bayesian classifier [2], Hidden Markov Models
(HMMs) [5], and Multilayer Perceptron (MLP) [4]. Most
of these approaches were only designed for one particular
scenario associated with limited intentions. For example,
[1]-[4] dealt with non-junction segment such as highway,
which involves lane keeping (LK), lane change left (LCL)
Y. Hu, W. Zhan and M. Tomizuka are with the Department of Me-
chanical Engineering, University of California, Berkeley, CA 94720 USA
[yeping hu, wzhan, tomizuka@berkeley.edu]
Fig. 1. Insertion areas (colored regions) under different driving scenarios
for the predicted vehicle.
and lane change right (LCR) maneuvers. Whereas [5]-[7]
concentrated on junction segment such as intersection, which
includes left turn, right turn, and go straight maneuvers.
However, in order for autonomous vehicles to drive through
dynamically changing traffic scenes in real life, an intention
prediction module that can adapt to different scenarios with
various possible driving maneuvers is necessary. [8] proposed
a maneuver estimation approach for generic traffic scenarios,
but the classified driving maneuvers are too specific, which
will not only require multiple manually-selected classifica-
tion thresholds, but also raise problems when unclassified
maneuvers occur.
As a result, we proposed to use semantics to represent
the driver intention, which is defined as the intent to enter
each insertion area. These areas can be the available gaps
between any two vehicles on the road or can be the lane
entrances/exits. Fig. 1 visualizes the insertion areas under
distinct environments. An advantage of using semantic ap-
proach is situations can be modeled in a unified way [9]
such that varying driving scenarios will have no effect on
our semantics defined problem. Even for a scenario that has a
combination of all the road structures in Fig. 1, the proposed
semantic definition still holds.
Motion prediction is mostly treated as a regression prob-
lem, where it tries to forecast the short-term movements and
long-term trajectories of vehicles. By incorporating motion
prediction with intention estimation, not only the high-
arXiv:1804.03629v1 [cs.LG] 10 Apr 2018
level behavioral information, but also the future state of the
predicted vehicle can be obtained. For short-term motion
prediction, various approaches such as constant acceleration
(CA), Intelligent Driver Model (IDM) [7], and Particle Filter
(PF) [10] have been suggested. The main limitation of
these works, however, is that they either considered simple
cases such as car following or did not take environment
information into account.
For future trajectory estimation, Dynamic Bayesian Net-
works (DBN) [11] and other regression models have been
used in several studies. Methods based on artificial neural
network (ANN) are also widely applied. In [11], the authors
used the LSTM to predict the vehicle trajectory in highway
situation. [12] brought forward a Deep Neural Networks
(DNN) to obtain the lateral acceleration and longitudinal
velocity. However, these approaches only predicted the most
likely trajectory for the vehicle without considering uncer-
tainties in the environment. To counter this issue, a Varia-
tional Gaussian Mixture Model (VGMM) was proposed for
probabilistic long-term motion prediction [14]. Nevertheless,
the method was only tested in a simulation environment and
the input contains history information over a long period of
time, which is usually unaccessible in reality. There are also
researches that project the prediction step of a tracking filter
forward over time, but the growing uncertainties often cause
future positions to end up at some physically impossible
locations.
In contrast, works such as [15][16] highlighted that by
predicting goal locations and assuming that agents navigate
toward those locations by following some optimal paths,
the accuracy of long-term prediction can be improved. The
main advantage of postulating destinations instead of trajec-
tories is that it allows one to represent various dynamics
and to automatically incorporate environment constraints for
unreachable regions.
Apart from obtaining the possible goals of predicted
vehicles, the required time to reach those locations is also an
essential information especially for the subsequent trajectory
planning of the ego vehicle. Therefore, many attempts have
been made in order to directly predict temporal information.
[17] used LSTM to forecase time-to-lane-change (TTLC)
of vehicles under highway scenarios. A recent work [18]
utilized the Linear Quantile Regression (LQR) and Quantile
Regression Forests (QRF) methods for the probabilistic re-
gression task of TTLC. The authors also concluded that QRF
has better performance than LQR.
In this paper, Semantic-based Intention and Motion Pre-
diction (SIMP) method is proposed. It utilizes deep neural
network to formulate a probabilistic framework which can
predict the possible semantic intention and motion of the
selected vehicle under various driving scenarios. The intro-
duced semantics for this prediction problem is defined as
answering the question of ”Which area will the predicted
vehicle most likely insert into? Where and when?”, which
incorporates both the goal position and the time information
into each insertion area. Moreover, the adoption of probabil-
ity can take into account the uncertainty of drivers as well
as the evolution of the traffic situations.
The remainder of the paper is organized as follows: Sec-
tion II provides the concept of the proposed SIMP method;
Section III discusses an exemplar scenario to apply SIMP;
evaluations and results are provided in Section IV; and
Section V concludes the paper.
II. CON CEPT OF SEMANTIC-BA SED INTENTION AND
MOTI ON PREDICTION (SIMP)
In this section, we first provide a brief overview of Mixture
Density Network (MDN), which is an idea we utilize for
our proposed method. Then, the detailed formulation and
structure of the SIMP method are illustrated.
A. Mixture Density Network (MDN)
Mixture Density Network is a combination of ANN and
mixture density model, which was first introduced by Bishop
[19]. The mixture density model can be used to estimate the
underlying distribution of data, typically by assuming that
each data point has some probability under a certain type
of distribution. By using a mixture model, more flexibility
can be given to model completely general conditional density
function p(y|x), where xis a set of input features and yis
a set of output. The probability density of the target data is
then represented as a linear combination of kernel functions
in the form
p(y|x) =
M
X
m=1
αm(x)φm(y|x),(1)
where M denotes the total number of mixture components
and the parameter αm(x)denotes the m-th mixing coef-
ficient of the corresponding kernel function φm(y|x). Al-
though various choices for the kernel function was possible,
for this paper, we utilize the Gaussian kernel of the form
φm(y|x) = N(y|µm(x), σ2
m(x)).(2)
Such formulation is called the Gaussian Mixture Model
(GMM)-based MDN, where a MDN maps input xto the
parameters of the GMM (mixing coefficient αm, mean µm,
and variance σ2
m), which in turn gives a full probability
density function of the output y. It is important to note
that the parameters of the GMM need to satisfy specific
conditions in order to be valid: the mixing coefficients αm
should be positive and sum to 1; the standard deviation σm
should be positive. The use of softmax function and expo-
nential operator in (3) fulfills the aforementioned constraints.
In addition, no extra condition is needed for the mean µm.
αm=exp(zα
m)
PM
i=1 exp(zα
i), σm= exp(zσ
m), µm=zµ
m(3)
The parameters zα
m,zσ
m,zµ
mare the direct outputs of the
MDN corresponding to the mixture weight, variance and
mean for the m-th Gaussian component in the GMM.
The objective of training the MDN is to minimize the
negative log-likelihood as loss function
Loss =X
n
logM
X
m=1
αn
m(xn)φm(yn|xn),(4)
where ndenotes the number of training data. The detailed
derivations on closed-form gradient formulation can be found
in [19], which demonstrated the capability of training the
MDN using back propagation.
B. Proposed SIMP Method
Our task is to generate probability distributions of the de-
signed semantic description given some representation of the
current state. We assign a Gaussian Mixture Model (GMM)
to each insertion area and multiple GMMs will be involved
in one driving scenario. Each Gaussian mixture models the
probability distribution of a certain type of motion for the
predicted vehicle. Since obtaining the insertion location and
the arriving time are the focus of our interests, a 2D Gaussian
mixture is used and the predicted variables are constructed
as a two dimensional vector: y= [ys, yt]T. The variable
ysdescribing the vehicle locations and the variable yt
describing the time information, can be specifically defined
according to the driving environment.
Given the current state features x, the probability distri-
bution yaover a single area afor the predicted vehicle is of
the form
f(ya|x) =
M
X
m=1
αmN(ya|µm,Σm)(5)
with mean and covariance constructed as
µm=µs,m
µt,m ,Σm=σ2
s,m ρmσs,mσt,m
ρmσs,mσt,m σ2
t,m ,(6)
where ρm[1,1] is the correlation coefficient.
In addition to formulate a regression model for each
insertion area, we also require the probability of entering
each area for the predicted vehicle. Therefore, Deep Neural
Network (DNN) was used as the basis for our Semantic-
based Intention and Motion Prediction (SIMP) structure. The
output of the network contains both necessary parameters for
every 2D Gaussian Mixture Model (GMM) and the weight
wafor each insertion area a.
For the desired outputs, we expect not only the largest
weight to be associated to the actual inserted area, but also
the highest probability at the correct location and time for
the output distributions of that area. Consequently, we define
our loss function as
L=W1X
n
logNa
X
a=1
ˆwn
af(yn
a|x)
+W2X
n
Na
X
a=1
ˆwn
alog(wn
a),
(7)
where Nadenotes the total number of insertion areas and
ˆwadenotes the ground truth, which is the one-hot-encoding
of the final area that the predicted vehicle entered. The last
term denotes the cross-entropy loss of the area weights.
Parameters W1and W2need to be manually tuned such
that the two loss components will have the same order of
magnitude during training.
Various Functions
w1
f(ys1,y
t1|x)
f(ysNa,y
tNa|x)
wNa
Input .
x
P1
PNa
Fig. 2. Structure of the SIMP Method
The overall architecture of our SIMP method is shown in
Fig. 2. Due to the first-order Markov assumption, the input
features depend only on the current time step. The network
consists of an input layer, several fully connected layers,
and a dropout layer which ensures better generalization and
prevents overfitting of the training data. After passing dif-
ferent types of parameters through corresponding functions,
the output will satisfy the aforementioned constraints. For Na
insertion areas, the total number of output parameters can be
calculated as: Na(M6+1). The interpretation is: there is
a weight parameter waassociated to each area aNa, and
for every mMwithin an area, six parameters are needed,
Pm
a={αm, µs,m, µt,m , σs,m, σt,m, ρm}, to formulate the
2D GMM.
III. ANEXEMPLAR HIG HWAY SCENARIO
In this section, we use an exemplar highway scenario to
apply the proposed Semantic-based Intention and Motion
Prediction (SIMP) method. The data source and detailed
problem formulation are presented.
A. Dataset
All the data we used was taken from the NGSIM US 101
dataset which is publicly available online at [20]. It contains
detailed vehicle trajectory data collected on the highway
with 10 Hz sampling frequency. The measurement area is
approximately 640 meters (2100 feet) in length and there are
five freeway lanes plus an auxiliary lane for the on/off-ramp.
For each vehicle that performs a lane change maneuver,
we picked up to 40 frames (4s) before the vehicle’s center
intersects the lane mark; for vehicles that keep driving on
the same lane for a long period, we considered these frames
as input for the lane keeping maneuver. A total of 17,179
frames were selected from the dataset and splitted into 80%
for training and 20% for testing.
B. Scenario and Problem Description
A representation of the exemplar highway driving scenario
is shown in Fig. 3. The yellow car is the vehicle we decide
to predict; the three blue cars (car2, car4, and car6) are the
reference vehicles, which are selected as having the closest
Euclidean distance to the predicted vehicle on each of the
three lanes (we consider only the front vehicle on predicted
vehicle’s lane); the four gray cars (car1, car3, car5, and
car7) are named as ‘other vehicles’, which are vehicles in
front and behind each of the two reference cars: car2 and
car6. If any of these surrounding vehicles is too far from
12
34
5
car1 car2 car3
car4
car5 car6 car7
Fig. 3. An exemplar driving scenario
the predicted vehicle, we consider it as nonexistence within
the range of the current scenario. Therefore, for each input
frame, a maximum of three driving lanes and seven vehicles
are considered.
In Fig. 3, there are five circled areas that our predicted
vehicle could end up going into and we name them as
Dynamic Insertion Area (DIA). If the predicted vehicle
(yellow car) inserts into area 1-4, a lane change behavior
is indicated; however, if it inserts into area 5, a lane keeping
behavior is implied. These areas are dynamic because both
their locations and sizes will vary at each time step.
In this particular highway scenario setting, the output ys
represents the absolute distance between the final insertion
point and the corresponding reference vehicle for that in-
serted area; ytrepresents the time-to-lane-change (TTLC)
of the predicted vehicle. When the center of the vehicle
intersects the lane mark, TTLC = 0. For the lane keeping
situation, TTLC is set to a large number (4s) to represent
that the vehicle has not yet decided to change the lane.
C. Features and Structure Details
For each input frame, a total of 25 input features are
selected which are listed in Table I. Each input frame corre-
sponds to 3 types of labels extracted from data: area weight,
final goal location, and remaining insertion time. According
to the data, the longitudinal direction is the driving direction.
The current lane center (CLC) denotes the midpoint of the
current lane. Because of the small angle difference between
the front and the predicted vehicle, only the relative angle
information for the left and right reference vehicles are
considered. Time-to-collision (TTC) is calculated by dividing
the speed difference by the relative distance of two vehicles.
We compute the inverse of time-to-collision (iTTC) instead
due to the existence of infinity TTC value as the speed
difference gets close to zero.
As mentioned previously, there will be a maximum of
seven cars within each input frame. If, however, a vehicle
does not exist, we assign its longitudinal distance to a large
number and its velocity to be the same as that of the predicted
vehicle. If there is no available lane on one side of the
predicted vehicle, we set the three vehicles in that nonexistent
lane to be close to each other and the reference vehicle to
be directly above/below the predicted vehicle. Similarly, all
these three vehicles are set to have the same speed as the
predicted vehicle. Such setting can guarantee the feasibility
of the predicted results.
As for the network structure, we use three fully connected
layers of 400 neurons each, with tanh non-linear activation
function. After that, a dropout layer of rate 0.5 is appended.
The parameter Nais five for this particular scenario.
TABLE I
FEATU RES FO R ONE INPU T FRAME
Feature Description
Predicted
Vehicle
vy
pred Absolute velocity in longitudinal direction
dx
CLCpred Lateral distance to the current lane center
Reference
Vehicles
vy
ref Absolute velocity in longitudinal direction
dy
ref,pred Position in longitudinal direction, relative to
predicted vehicle
dx
(l,r),pred Relative lateral position between left/right
reference vehicle and predicted vehicle
θ(l,r),pred Relative angle between left/right reference
vehicle and predicted vehicle
iT T Cf,pred Inverse time-to-collision between front ref-
erence vehicle and predicted vehicle
Other
Vehicles
vy
oAbsolute velocity in longitudinal direction
dy
o,pred Position in longitudinal direction, relative to
predicted vehicle
iT T Co,ref Inverse time-to-collision relative to corre-
sponding reference vehicle
IV. EVALUATI ON AND RESULTS
In this section, different evaluation techniques are pre-
sented to assess the model quality and the final results are
discussed.
A. Evaluation Setup
1) Baseline Model: To evaluate our SIMP method, we
trained a Support Vector Machine (SVM) [21] and a Quantile
Regression Forests (QRF) [22] separately. Since SVM is
wildly used for classification problems, we compared it with
the intention prediction part of our framework. The QRF is
a combination of Quantile Regression and Random Forests
[23], which extends the concept of tree ensemble learning to
probabilistic prediction. Instead of point estimating the con-
ditional mean for the selected variables like other regression
methods, the objective is to estimate an arbitrary conditional
quantile. The quantiles can provide detailed information
of the minimum and maximum values for the dependent
variable and encompass the uncertainty estimation. Hence,
we compared our motion prediction part of the probabilistic
framework with the QRF method for evaluation. The details
of the baseline models are presented below
SVM: kernel = (Gaussian) radial basis function (RBF)
QRF: ntree = 1000, mtry = 5, nodesize = 10
where ntree is the number of trees in the forest, mtry is the
number of random features in each tree, and nodesize is the
minimal size of terminal nodes. All these parameters were
selected using five-fold cross validation.
Predicted Vehicle
Other Vehicle
Reference Vehicle
Sampled Points
Ground Truth
400 450 500 550 600 650
500 550 600 650 700 750
500 550 600 650 700 750
600 650 700 750 800 850
Frame: 1 / 40
Frame: 19 / 40
Frame: 20 / 40
Frame: 40 / 40
1350 1400 1450 1500 1550 1600
1450 1500 1550 1600 1650 1700
1500 1550 1600
1550 1600 1650 1700 1750 1800
1650 1700 1750
Frame: 40 / 40
Frame: 29 / 40
Frame: 13 / 40
Frame: 1 / 40
(a) Typical Lane Change (b) Sudden Change of Reference Vehicle
Fig. 4. Two example cases to visualize the performance. In each testing frame, 50 points were sampled by two steps: 1) multiply the total number of dots
by each DIA weight. 2) for every dot assigned to each DIA, sample it according to the corresponding distribution of that area. (The unit of the horizontal
axis is in feet.)
2) Evaluation for Intention Estimation: For training and
testing, each sample from our data was assigned to a
semantic intention class, which is expressed as I
{area1, area2, area3, area4, area5}. However, since these
dynamic insertion areas (DIA) change constantly during the
driving period, it is hard to detect the final insertion area
at the early stage. Therefore, for better evaluation at the
beginning of the input driving segments, we merged the orig-
inal five semantic intentions into three: {LCL, LC R, LK},
where {area1, area2} ∈ LC L,{area3, area4} ∈ LCR,
and {area5} ∈ LK. During training, the input features for
SVM were the same as our method, and the labels were the
corresponding final DIA numbers. The evaluation contains
three steps:
i. For all testing data, create the Receiver Operating
Characteristic (ROC) curve to compare our method
with SVM. (Use the simplified 3 intention classes.)
ii. Find the best threshold from the ROC curve and use it
to calculate the recall, precision, F1 score as well as
the average prediction time for both methods.
iii. For testing data that has a TTLC smaller than the
obtained average prediction time, analyze the perfor-
mance of each DIA. (Use the original 5 semantic
intention classes.)
3) Evaluation for Motion Prediction: In our problem
setting, two semantic described motions are predicted: fi-
nal locations in each insertion area (destination) and the
remaining time to reach those locations (TTLC). For the
conditional distribution of each motion, we expect not only
small difference between the predicted mean and the actual
value, but also centralized distribution around the predicted
mean. Hence, we evaluated the root mean squared error
(RMSE) of the output mean as well as the confidence interval
for both the QRF and the SIMP method. The number of
mixture components Mfor each DIA was set to one for
analysis purpose. For the training process of QRF, we trained
two separate random forest quantile regressors, where the
input features remains the same and the label is either the
location or the time information.
Two different intervals were chosen to assess the testing
results for each method:
SIMP-1σ: one standard deviation interval
SIMP-2σ: two standard deviation interval
QRF-68%: 16% to 84% quantile interval
QRF-95%: 2.5% to 97.5% quantile interval.
B. Results and Discussion
1) Visualization of Selected Cases: We selected two dis-
tinct traffic situations to visualize our results. Each situation
had 40 frames (4s) and we chose four representative frames
from each case to illustrate the overall performance. The full
video can be found on https://www.youtube.com/
watch?v=6A3Hl-mRhbI.
A typical lane change situation is illustrated in Fig. 4(a)
where the sampled points are all in the proper DIA for each
frame. It is reasonable to have several possible areas at the
early stage since there are multiple choices for the driver and
no specific one has been chosen yet. It should be note that
it is difficult to numerically justify the correctness of these
circumstances without using the human-labeled ground truth.
However, as soon as the driver decides where to go, our result
could be compared with the label extracted from data. We
further used this case to illustrate the TTLC prediction result
in Fig. 5. The differences between our resulted mean and the
ground truth are all smaller than 0.3s within three seconds
before lane change; besides, the predicted TTLC values for
other insertion areas remain in reasonable ranges.
Since the reference vehicle will switch from one to another
while the predicted vehicle is driving, we need to guarantee
the capability of our method to handle such cases without
large discontinuity on the prediction result. Therefore, we
examined one of such cases shown in Fig. 4(b) and it can
be observed that such sudden change occurs between frame
19 and 20. During this period, our sampled points are able
to keep in the correct DIA and tightly distributed around the
red target line.
2) Intention Estimation: The ROC curves of the SIMP
and the SVM methods are visualized in Fig. 6. The curves
were created by plotting the true positive rate (TPR) against
the false positive rate (FPR) at various threshold settings.
Similar to [17], we defined two positive classes (lane change
left and right) and one negative class (lane keeping). The area
under the ROC curve (AUC) can be used as an aggregated
measure of the classifier performance. The true positive
(TP) represents correct prediction of either lane change left
or right, the false positive (FP) indicates mispredicting the
lane change direction, and the false negative (FN) means
incorrectly predicting a lane change into lane keeping.
From Fig. 6 and AUC values, we observe that our method
outperforms SVM for lane change maneuvers. A threshold of
0.3 for classification was chosen for making the best trade-
off between a high TPR and a low FPR. Given the selected
threshold, we can further calculate the precision and recall
as
precision =T P
T P +F P , recall =T P
T P +F N (8)
and the F1 score can be obtained by the formula
F1 = 2precision recall
precision +recall ,(9)
which denotes how good the classification abilities are.
Moreover, how early the lane change can be recognized
is also in the focus of our interests. Thus, we calculated
the average prediction time from the testing data that were
classified as true positive. The overall performance of the
two methods are compared in Table II. It is apparent from
table that the proposed method has better performance than
SVM in terms of both prediction accuracy and the average
prediction time.
Since our method can correctly forecast the predicted
vehicle’s intention approximately 2s in advance to the actual
lane change according to Table II, we further plotted the ROC
curve and calculated the AUC for each dynamic insertions
area (DIA) to examine how well can SIMP predict the final
insertion region. The obtained AUC values for Area1, Area2,
and Area3 are all equal to 1, and Area4 has a 0.994 AUC
value. The result implies that the proposed method can not
only detect the lane change direction but also the specific
dynamic insertion area (DIA) with high accuracy for the
selected time window.
TABLE II
PER FORM ANCE COMPARISON
Method Precision Recall F1-
Score
Avg. Predict
Time (s)
SVM 0.859 0.919 0.888 1.911
SIMPF 0.936 0.925 0.931 1.957
3) Motion Prediction: The comparison results between
QRF and the proposed method for two motion prediction
tasks are shown in Fig. 7 and Fig. 8. The mean for QRF
was obtained by calculating the 50% quantile (or median)
Fig. 5. TTLC illustration for the case in Fig. 4(a). We sampled 100 points
from the mixture distribution of each related DIA and plotted the mean as
well as the 3σand 1σprediction intervals for these samples. When area
weight is too small to be associated with sampled points, the TTLC result
of that area at the corresponding frame will be colored in gray.
Fig. 6. ROC curve comparison
assuming symmetric distribution. We utilized the testing data
that has a TTLC smaller than the average prediction time
derived in the previous section. The mean and confidence
interval were calculated from the obtained output distribution
of the correct insertion area. As can be seen in the plots,
the RMSE of our approach for both motion predictions are
smaller compared with the QRF method. The RMSE error
of the TTLC tends towards zero for the lane change cases
by using the SIMP method. One thing need to mention is
that the error for the destination prediction is not close to
zero even at t= 0. However, this is not unexpected given
the fact that the predicted distance is relative to the reference
car. Thus, the results might deviate due to the consideration
of any velocity variance of the reference vehicle.
For the confidence interval comparison, it is obvious to see
that the performance of our proposed method surpasses QRF
especially for the TTLC prediction, where the 2-σinterval
of the SIMP method is even smaller than the 68% interval
Fig. 7. Comparison of Time-to-Lane-Change (TTLC) Prediction
Fig. 8. Comparison of Destination Prediction
of the QRF. The gradually decreasing difference between
the one and two standard deviation interval as well as the
declining interval values imply that our predicted Gaussian
distribution is becoming more centralized around the ground
truth as the TTLC approaching zero.
V. CONCLUSIONS
In this paper, a Semantic-based Intention and Motion
Prediction (SIMP) method was proposed, which can generate
various designated conditional distributions for predicted
vehicles under any circumstances. An exemplar highway
scenario with real-world data was used to apply the idea of
SIMP. First, two representative driving cases were utilized to
visualize the testing result. Then the intention prediction and
the motion prediction part were separately compared with
two different baseline models: SVM and QRF. Our approach
outperforms these methods in terms of both the prediction
error and the confidence intervals. The key conclusion is that
by combining different prediction tasks using semantics in
a single framework, we can not only easily generalize the
idea into any traffic scenarios but also obtain competitive
performance compared to traditional methods. The output
goal position and time information can be further used
to generate optimal trajectories for predicted vehicles and
eventually obtain a desirable path for our own autonomous
vehicle. For future work, we will examine the SIMP method
on more complex scenarios as well as take into account the
occurrence of vehicle occlusion.
REFERENCES
[1] H. M. Mandalia and D. D. Salvucci, “Using support vector machines
for lane change detection,” in Proc. of the Human Factors and
Ergonomics Society 49th Annual Meeting, 2015.
[2] J. C. McCall, D. P. Wipf, M. M. Trivedi, and B. D. Rao, “Lane change
intent analysis using robust operators and sparse bayesian learning,”
IEEE Transactions on Intelligent Transportation Systems, vol. 8, no.
3, pp. 431-440, 2007.
[3] P. Kumar, M. Perrollaz, S. Lefvre, and C. Laugier, “Learning-based
approach for online lane change intention prediction,” in 2013 IEEE
Intelligent Vehicles Symposium (IV), Jun. 2013, pp. 797-802.
[4] S. Yoon and D. Kum, “The multilayer perceptron approach to lateral
motion prediction of surrounding vehicles for autonomous vehicles,” in
2016 IEEE Intelligent Vehicles Symposium (IV), Jun. 2016, pp. 1307-
1312.
[5] T. Streubel, K. H. Hoffmann, “Prediction of driver intended path at
intersections,” in 2014 IEEE Intelligent Vehicles Symposium (IV), Jun.
2014, pp. 1189-1194.
[6] D. J. Phillips, T. A. Wheeler, and M. J. Kochenderfer, “Generalizable
Intention Prediction of Human Drivers at Intersections,” in 2017 IEEE
Intelligent Vehicles Symposium (IV), pp. 1665-1670, 2017.
[7] M. Liebner, M. Baumann, F. Klanner and C. Stiller, “Driver intent
inference at urban intersections using the intelligent driver model”, in
2012 IEEE Intelligent Vehicles Symposium (IV), Jun. 2012, pp. 1162-
1167.
[8] S. Klingelschmitt, V. Willert, and J. Eggert, “Probabilistic, discrim-
inative maneuver estimation in generic traffic scenes using pairwise
probability coupling,” in 2016 IEEE 19th International Conference on
Intelligent Transportation Systems (ITSC), pp. 1269-1276.
[9] S. Shalev-Shwartz, S. Shammah, and A. Shashua, “On a formal model
of safe and scalable self-driving cars,arXiv:1708.06374, 2017.
[10] S. Hoermann, D. Stumper, and K. Dietmayer, “Probabilistic long-term
prediction for autonomous vehicles,” in 2017 IEEE Intelligent Vehicles
Symposium (IV), Jun. 2017.
[11] T. Gindele, S. Brechtel, and R. Dillmann, “A probabilistic model
for estimating driver behaviors and vehicle trajectories in traffic
environments,” in 2010 IEEE International Conference on Intelligent
Transportation Systems (ITSC), pp. 1625-1631.
[12] F. Altch´
e, and A. De La Fortelle, “An LSTM network for highway
trajectory prediction,” in 2017 IEEE 20th International Conference on
Intelligent Transportation Systems (ITSC): Workshop. IEEE, 2017.
[13] D. Lenz, F. Diehl, M. T. Le, and A. Knoll, “Deep neural networks for
Markovian interactive scene prediction in highway scenarios,” in 2017
IEEE Intelligent Vehicles Symposium (IV), Jun. 2017, pp. 685-692.
[14] J. Wiest, M. H¨
offken, and U. Kreßel, and K. Dietmayer, “Probabilistic
Trajectory Prediction with Gaussian Mixture Models,” in 2012 IEEE
Intelligent Vehicles Symposium (IV), Jun. 2015, pp 141-146.
[15] B. D. Ziebart, N. Ratliff, G. Gallagher, C. Mertz, K. Peterson, J. A.
Bagnell, M. Hebert, A. K. Dey, and S. Srinivasa, “Planning-based
prediction for pedestrians,” in IROS, 2009.
[16] E. Rehder and H. Kloeden, “Goal-Directed Pedestrian Prediction”,
In Proceedings of 2015 IEEE International Conference on Computer
Vision Workshop, pp. 139-147, 2015.
[17] H. Q. Dang, J. F¨
urnkranz, A. Biedermann, and M. Hoepfl, “Time-to-
Lane-Change Prediction with Deep Learning,” in 2017 IEEE 20th In-
ternational Conference on Intelligent Transportation Systems (ITSC).
IEEE, 2017.
[18] C. Wissing, T. Nattermann, K. H. Glander, and T. Bertram, “Prob-
abilistic time-to-lane-change prediction on highways,” in 2017 IEEE
Intelligent Vehicles Symposium (IV), Jun. 2017, pp. 1452-1457.
[19] C. M. Bishop, “Mixture Density Network”, 1994.
[20] U.S. Department of Transportation Intelligent Transportation Systems
Joint Program Office (JPO). Avaliable: https://www.its.dot.gov/data/
[21] C. Cortes and V. Vapnik, “Support-vector networks,” Maching Lean-
ing, vol. 20, no. 3, pp. 273-197, 1995.
[22] N. Meinshausen, “Quantile regression forests,” Journal of Machine
Learning Research, vol. 7, pp. 983-999, 2006.
[23] L. Breiman, “Random forests,” Machine Learning, vol.45, no. 1, pp.
5-32, 2001.
... Driver Intention Prediction. Advanced driver-assistance systems predict driver intention [21,22,35,36,37,38] to avoid potential hazards. Doshi et al., [21] predict driver's intent via reasoning distances to lane markings and vehicle dynamics for driver intention prediction in highway scenarios. ...
... Recently, Casas et al., [37] leverage rasterized HD maps as input deep neural networks for intent prediction. Instead of formulating intent prediction as a recognition problem, Hu et al., [38] formulate intention prediction as entering an insertion area defined on a pre-computed road topology map. For instance, if the intent is turning left, the corresponding insertion area is T 1 as shown in Fig. 2a. ...
Preprint
In this work, we tackle two vital tasks in automated driving systems, i.e., driver intent prediction and risk object identification from egocentric images. Mainly, we investigate the question: what would be good road scene-level representations for these two tasks? We contend that a scene-level representation must capture higher-level semantic and geometric representations of traffic scenes around ego-vehicle while performing actions to their destinations. To this end, we introduce the representation of semantic regions, which are areas where ego-vehicles visit while taking an afforded action (e.g., left-turn at 4-way intersections). We propose to learn scene-level representations via a novel semantic region prediction task and an automatic semantic region labeling algorithm. Extensive evaluations are conducted on the HDD and nuScenes datasets, and the learned representations lead to state-of-the-art performance for driver intention prediction and risk object identification.
... Intention Types [82] 2016/ITSC CNN, LSTM, SVM I concat C, NC [83] 2018/IEEE TIV LSTM T concat C, NC [84] 2018/IV CNN I, P, L concat C, NC [85] 2018/IV CNN, LSTM I, P, L concat C, NC [86] 2018/SITIS SVM,ANN,kNN,Decision Trees I, L, M, H concat C, NC [87] 2019/BMVC GRU I, P, L, E concat C, NC [88] 2019/ICCV DenseNet-121 I, P, L, E concat C, NC [89] 2019/ICCV LSTM I, L concat C, NC, W, L, NL [90] 2019/ICRA Residual Encoder-Decoder,3DCNN I concat C, NC [91] 2019/ICRA Spatial-Temporal (ST) DenseNet I w/o fusion C, NC [92] 2019/ITSC GCN P concat C, NC [93] 2020/ACSSC STDenseNet I, P, L Early-Middle-Late concat C, NC [18] 2020/hEART LSTM L, V concat C, NC [94] 2020 [110] 2022/arXiv CNN, GRU I, P, L, E concat C, NC [111] 2022/CVPR Self-Attention, Memory network T, F Attentive fusion C, NC [112] 2022/IEEE SPL Vison Transformer I, L, P concat C, NC [113] 2022/IEEE TITS SVM P, V, H, W concat C, NC [114] 2022/IEEE TITS GCN, CNN I, P, L, E, S concat C, NC [115] 2022/IEEE TITS GCN P concat C, NC [116] 2022/IEEE TIV CNN, GRU I, L, P, E Attentive fusion C, NC [117] 2022/IV Transformer L concat C, NC [118] 2022/IV CNN, LSTM I, S, L, E Attentive fusion C, NC [119] 2022/IV GCN I, E concat C, NC [120] 2017/ITSC LSTM E, HD concat LC [121] 2018 achieved by the router network in DRN. Fig. 8 illustrates the modality level fusion in DynMM, where the input modalities, x 1 and x 2 , are inferred by three expert networks and each network has a link with the output y. ...
Preprint
Full-text available
In the driving scene, the road participants usually show frequent interaction and intention understanding with the surrounding. Ego-agent (each road participant itself) conducts the prediction of what behavior will be done by other road users all the time and expects a shared and consistent understanding. For instance, we need to predict the next movement of other road users and expect a consistent joint action to avoid unexpected accident. Behavioral Intention Prediction (BIP) is to simulate such a human consideration process and fulfill the beginning time prediction of specific behaviors. It provides an earlier signal promptly than the specific behaviors for whether the surrounding road participants will present specific behavior (crossing, overtaking, and turning, etc.) in near future or not. More and more works in BIP are based on deep learning models to take advantage of big data, and focus on developing effective inference approaches (e.g., explainable inference, cross-modality fusion, and simulation augmentation). Therefore, in this work, we focus on BIP-conditioned prediction tasks, including trajectory prediction, behavior prediction, and accident prediction and explore the differences among various works in this field. Based on this investigation and the findings, we discuss the open problems in behavioral intention prediction and propose future research directions.
... Most studies in the field of motion prediction work on trajectory prediction and only a few on intention prediction [46], [47], which frame the task as a classification problem. However, these rely on predefined trajectories obtained by handcrafted principles, failing to capture comprehensive representations for the future distribution. ...
Preprint
Full-text available
Predicting the motion of other road agents enables autonomous vehicles to perform safe and efficient path planning. This task is very complex, as the behaviour of road agents depends on many factors and the number of possible future trajectories can be considerable (multi-modal). Most approaches proposed to address multi-modal motion prediction are based on complex machine learning systems that have limited interpretability. Moreover, the metrics used in current benchmarks do not evaluate all aspects of the problem, such as the diversity and admissibility of the output. In this work, we aim to advance towards the design of trustworthy motion prediction systems, based on some of the requirements for the design of Trustworthy Artificial Intelligence. We focus on evaluation criteria, robustness, and interpretability of outputs. First, we comprehensively analyse the evaluation metrics, identify the main gaps of current benchmarks, and propose a new holistic evaluation framework. In addition, we formulate a method for the assessment of spatial and temporal robustness by simulating noise in the perception system. We propose an intent prediction layer that can be attached to multi-modal motion prediction models to enhance the interpretability of the outputs and generate more balanced results in the proposed evaluation framework. Finally, the interpretability of the outputs is assessed by means of a survey that explores different elements in the visualization of the multi-modal trajectories and intentions.
Article
Motion prediction is the fundamental input for decision-making in autonomous vehicles. The current motion prediction solutions are designed with a strong reliance on black box predictions based on neural networks (NNs), which is unacceptable for safety-critical applications. Motion prediction with high uncertainty can cause conflicting decisions and even catastrophic results. To address this issue, an uncertainty estimation approach based on the deep ensemble technique is proposed for motion prediction in this paper. Subsequently, the estimated uncertainty is considered in the decision-making module to improve driving safety. Firstly, a motion prediction model based on long short-term memory (LSTM) is built and the deep ensemble technique is utilized to obtain both epistemic and aleatoric uncertainty of the prediction model. Besides, an uncertainty-aware potential field is developed to process the prediction uncertainty. Furthermore, a decision-making framework is proposed based on the model predictive control algorithm that considers the uncertainty-aware potential field, road boundaries, and multiple constraints of vehicle dynamics. Finally, the public available NGSIM , HighD and INTERACTION datasets are used to evaluate the proposed motion prediction model. More importantly, two traffic scenarios are also extracted from NGSIM and INTERACTION datasets to verify the effectiveness of the proposed decision-making method and in particular, its real-time performance is shown by employing a hardware-in-the-loop (HiL) experiment bench.
Article
Accurately predicting the possible behaviors of traffic participants is an essential capability for autonomous vehicles. Since autonomous vehicles need to navigate in dynamically changing environments, they are expected to make accurate predictions regardless of where they are and what driving circumstances they encountered. Several methodologies have been proposed to solve prediction problems under different traffic situations. These works usually combine agent trajectories with either color-coded or vectorized high definition (HD) map as input representations and encode this information for behavior prediction tasks. However, not all the information is relevant in the scene for the forecasting and such irrelevant information may be even distracting to the forecasting in certain situations. Therefore, in this paper, we propose a novel generic representation for various driving environments by taking the advantage of semantics and domain knowledge. Using semantics enables situations to be modeled in a uniform way and applying domain knowledge filters out unrelated elements to target vehicle’s future behaviors. We then propose a general semantic behavior prediction framework to effectively utilize these representations by formulating them into spatial-temporal semantic graphs and reasoning internal relations among these graphs. We theoretically and empirically validate the proposed framework under highly interactive and complex scenarios, demonstrating that our method not only achieves state-of-the-art performance, but also processes desirable zero-shot transferability.
Article
In recent years, car makers and tech companies are racing toward self driving cars. It seems that the main parameter in this race is who will have the first car on the road. The goal of this paper is to add to the equation two additional crucial parameters. The first is standardization of safety assurance --- what are the minimal requirements that every self-driving car must satisfy, and how can we verify these requirements. The second parameter is scalability --- engineering solutions that lead to unleashed costs will not scale to millions of cars, which will push interest in this field into a niche academic corner, which might drive the entire field into a "winter of autonomous driving". In the first part of the paper we propose a white-box, interpretable, mathematical model for safety assurance. In the second part we describe a design of a system that adheres to our safety assurance requirements and is scalable to millions of cars.
Conference Paper
Long-term prediction of traffic participants is crucial to enable autonomous driving on public roads. The quality of the prediction directly affects the frequency of trajectory planning. With a poor estimation of the future development, more computational effort has to be put in re-planning, and a safe vehicle state at the end of the planning horizon is not guaranteed. A holistic probabilistic prediction, considering inputs, results and parameters as random variables, highly reduces the problem. A time frame of several seconds requires a probabilistic description of the scene evolution, where uncertainty or accuracy is represented by the trajectory distribution. Following this strategy, a novel evaluation method is needed, coping with the fact, that the future evolution of a scene is also uncertain. We present a method to evaluate the probabilistic prediction of real traffic scenes with varying start conditions. The proposed prediction is based on a particle filter, estimating behavior describing parameters of a microscopic traffic model. Experiments on real traffic data with random leading vehicles show the applicability in terms of convergence, enabling long-term prediction using forward propagation.
Article
Thesupport-vector network is a new learning machine for two-group classification problems. The machine conceptually implements the following idea: input vectors are non-linearly mapped to a very high-dimension feature space. In this feature space a linear decision surface is constructed. Special properties of the decision surface ensures high generalization ability of the learning machine. The idea behind the support-vector network was previously implemented for the restricted case where the training data can be separated without errors. We here extend this result to non-separable training data.High generalization ability of support-vector networks utilizing polynomial input transformations is demonstrated. We also compare the performance of the support-vector network to various classical learning algorithms that all took part in a benchmark study of Optical Character Recognition.
Conference Paper
Future advanced driver assistance systems as well as autonomous vehicles are expected to further increase their areas of applicability. Reliable maneuver estimations are a prerequisite for many of the intended functionalities. Accordingly , maneuver estimation systems need to cover a wide range of scenarios. The majority of recently presented approaches are targeted at fixed scenarios. However, having specialized maneuver estimation systems covering each possible scenario is unrealistic. Therefore, we present an approach for tackling discriminative maneuver estimations in generic traffic scenes. It is based on reusable, partial classifiers that are combined online using a technique called pairwise probability coupling. As a result we are able to make discriminative maneuver estimations in generic traffic scenes. The benefits and applicability are presented on inner-city real-world data sets. Our evaluation indicates that the assembled probabilistic maneuver estimation is not only able to outperform generative models; it surpasses the performance of specially designed models due to the reduced complexities of the partial classifiers.
Conference Paper
For safe and reliable autonomous driving systems, prediction of surrounding vehicles' future behavior and potential risks are critical. The state-of-the-art prediction algorithms tend to show limited performance on long-term predictions due to their deterministic nature. In this paper, a probabilistic lateral motion prediction algorithm is proposed based on multilayer perceptron (MLP) approach. The MLP model consists of two parts; target lane and trajectory models. In order to develop an intuitive and accurate prediction algorithm, a lane-based trajectory prediction model is introduced based on the fact that vehicles drive within a lane except for during lane changes. More specifically, a set of three representative trajectories with different levels of lane-change positions are generated for each target lane, and real-world traffic data is categorized by each trajectory for MLP training. These target lane and trajectory models enable the stochastic MLP modeling and training. The proposed MLP model outputs probabilities of how likely a vehicle will follow each trajectory and each lane for a given input of vehicle position history including current position. For training the MLP model, Next Generation Simulation traffic data are used. Simulation results show that the proposed algorithm detects lane-changes one to one and a half second earlier than existing methods and three seconds before lane crossing with about ninety percentages accuracy.