ArticlePDF Available

Modeling Traffic Incident Duration Using Quantile Regression

Authors:

Abstract and Figures

Traffic incidents occur frequently on urban roadways and cause incident induced congestion. Predicting incident duration is a key step in managing these events. Ordinary least squares (OLS) regression models can be estimated to relate the mean of incident duration data with its correlates. Because of the presence of larger incidents, duration distributions are often right-skewed; that is, the OLS model underpredicts the durations of larger incidents. Therefore, this study applies a modeling technique known as quantile regression to predict more accurately the skewed distribution of incident durations. Quantile regression estimates the relationships between correlates and a chosen percentile—for example, the 75th or 95th percentile—while the OLS regression is based on the mean of incident duration. With the use of incident data related to more than 85,000 (2013 to 2015) incidents for highways in the Hampton Roads area of Virginia, quantile regression results indicate that the magnitudes of parameters and predictions can be quite different compared with OLS regression. In addition to predicting durations of larger incidents more accurately, quantile regressions can estimate the probability of an incident lasting for a specific duration; for example, incidents involving congestion and delay have an approximately 25% chance of lasting more than 100.8 min, while incidents excluding congestion and delay are estimated to have a 25% chance of lasting more than 43.3 min. Such information is helpful in accurately predicting durations and developing potential applications for using quantile regressions for better traffic incident management.
Content may be subject to copyright.
139
Transportation Research Record: Journal of the Transportation Research Board,
No. 2554, Transportation Research Board, Washington, D.C., 2016, pp. 139–148.
DOI: 10.3141/2554-15
Traffic incidents occur frequently on urban roadways and cause incident-
induced congestion. Predicting incident duration is a key step in manag-
ing these events. Ordinary least squares (OLS) regression models can be
estimated to relate the mean of incident duration data with its correlates.
Because of the presence of larger incidents, duration distributions are
often right-skewed; that is, the OLS model underpredicts the durations
of larger incidents. Therefore, this study applies a modeling technique
known as quantile regression to predict more accurately the skewed dis-
tribution of incident durations. Quantile regression estimates the relation-
ships between correlates and a chosen percentile—for example, the 75th or
95th percentile—while the OLS regression is based on the mean of
incident duration. With the use of incident data related to more than
85,000 (2013 to 2015) incidents for highways in the Hampton Roads
area of Virginia, quantile regression results indicate that the magnitudes
of parameters and predictions can be quite different compared with
OLS regression. In addition to predicting durations of larger incidents
more accurately, quantile regressions can estimate the probability of an
incident lasting for a specific duration; for example, incidents involv-
ing congestion and delay have an approximately 25% chance of lasting
more than 100.8 min, while incidents excluding congestion and delay
are estimated to have a 25% chance of lasting more than 43.3 min.
Such information is helpful in accurately predicting durations and
developing potential applications for using quantile regressions for
better traffic incident management.
Traffic incidents occur frequently on roadways, resulting in con-
gestion, commuter anxiety, and harmful vehicular emissions (1–3).
One traffic incident management strategy is to disseminate accurate
incident duration information to travelers (e.g., through variable
message signs), who can then make more informed travel decisions
(4, 5). Another approach would be to actively redirect traffic in a
road network to avoid incident-induced congestion. In both cases,
accurate predictions of incident durations are required.
Incident duration is defined as the time between the occurrence
of an incident and the clearance of the roadway (6–8). Traditionally,
researchers have applied ordinary least squares (OLS) models (i.e.,
the linear regression models) to predict incident duration (9–13). By
definition, OLS models examine the (conditional) mean of incident
durations. Therefore, incidents that are much shorter or longer than
average cannot be accurately captured with OLS models. To model
those incidents, this study proposes to use quantile regression.
Quantile regression is a statistical technique that can relate quantiles
of the incident duration distribution to explanatory variables (14).
While traffic operations managers might be more interested in the
higher quantiles, that is, longer duration incidents, quantile regres-
sion, as shall be shown, is equally suitable for modeling shorter-
than-average incidents. This study discusses potential applications
of quantile regression in traffic incident management. In general,
with quantile regression, transportation professionals (e.g., traffic
operators in transportation management centers) can benefit by
accurately predicting the incident duration and potentially reducing
large-scale incident durations through appropriate solutions.
LITERATURE REVIEW
Various techniques have been reported in the literature for modeling
traffic incident duration. The techniques can be grouped into several
categories: statistical models, tree modeling, intelligence techniques,
and mixed modeling. Brief discussions of each follow.
Statistical models. Linear regression models were estimated to
provide real-time incident information to travelers. OLS regres-
sion, OLS with logarithmic transformation, and a series of truncated
regression models were targeted at skewed data distributions and
sequential availability of incident information in real time (9–13).
Partial least squares regressions were also studied (15). Traditional
negative binomial and modified negative binomial were also used
(16). Various studies have developed parametric accelerated failure
time survival models for incident durations arising from crashes and
hazards and for incidents involving stationary vehicles (17–21).
Tree modeling. Ji used decision trees to predict freeway incident
durations on the basis of the multimodal fusion algorithm (22).
Chang and Chang reported good performance of the classification
tree method for short-duration incident predictions (23).
Intelligence techniques. Neural networks were used in various
studies (24, 25). However, they have not been used to update
duration prediction information dynamically (26).
Mixed modeling. Lin et al. combined a discrete choice model and
a rule-based model for predicting incident duration (27). He et al.
used the hybrid tree-based quantile regression (28). Xiaoqiang et al.
used the classification and regression tree method (29). The classi-
fication tree, rule-based tree model, and discrete choice model were
studied sequentially by Kim et al. to improve prediction accuracy
(30). Li et al. applied topic modeling, the multinomial logistic model,
and the parametric hazard-based model (31).
Model comparison has also been of interest. Li and Shang com-
pared prediction models, including the classification and regression
Modeling Traffic Incident Duration
Using Quantile Regression
Asad J. Khattak, Jun Liu, Behram Wali, Xiaobing Li, and ManWo Ng
A. J. Khattak, 322 John D. Tickle Building; J. Liu, 325 John D. Tickle Building;
and B. Wali and X. Li, 311D John D. Tickle Building, Department of Civil and
Environmental Engineering, College of Engineering, University of Tennessee,
851 Neyland Drive, Knoxville TN 37996-2313. M. Ng, Department of Infor-
mation Technology and Decision Sciences, Strome College of Business, Old
Dominion University, Norfolk, VA 23529. Corresponding author: A. J. Khattak,
akhattak@utk.edu.
140 Transportation Research Record 2554
tree, chi-squared automatic interaction detector, and exhaustive
chi-squared automatic interaction detector, on the basis of perfor-
mance criteria such as mean absolute percentage error and root
mean square error (RMSE) (17). They found that RMSE and mean
absolute percentage error were relatively low for 15- to 45-min-long
durations, while for long durations, prediction accuracy was largely
decreased.
Researchers have found that the prediction accuracy for long-
duration incidents is generally lower than for short-duration incidents
(23). The benefit of predicting long durations is not as visible as for
shorter durations, as the distribution of incident durations is rather
dispersed (28). Therefore, quantile regression is chosen as the key
method able to account for the dispersed distribution of responses.
Quantile regression has been explored by researchers in various fields.
Machado and Silva successfully applied quantile regression to health
care through a jittering procedure (32). Qin et al. (33) and Qin (34)
explored the application of quantile regression on traffic crash data.
To summarize, the gaps in the existing literature on incident
prediction are related to (a) prediction accuracy of durations and
(b) the practice of using “black box” models to assist in incident
management. In regard to prediction accuracy, while previous studies
have demonstrated the application of various modeling techniques
to predict incident durations, their prediction accuracy has been a
recurring concern, owing to the skewed distribution of incident dura-
tions. Theoretically, quantile regression should provide more accu-
rate incident duration predictions since it can account for dispersed
and skewed distributions of incident durations. In regard to practice,
some researchers have developed models—such as the classification
tree model (23), classification and regression tree, and chi-squared
automatic interaction detector (35)—for predicting short versus long
durations. Their models may be good in predicting the duration of
particular types of incidents. However, these models can be black
boxes that do not provide clear intuition to users about correlations
between various factors and incident durations. The estimation of
correlations is important for incident duration prediction as it can
help develop solutions for incident management. Quantile regression
is able to estimate variations in correlates of incidents, which means
that more focused solutions that address long- and medium-duration
incidents can be developed. Such information can be very helpful for
incident management.
METHODOLOGY
Data Sources
This study used various data sources, including incident data pro-
vided by the Hampton Roads Smart Traffic Center in Virginia Beach,
Virginia. These data were collected by the Safety Service Patrol (SSP)
of the Hampton Roads area. The records cover the incidents that
occurred in the 2013 to 2015 period on freeways; the records include
the start and end times, incident duration, incident type, agencies that
responded to incidents, and so on. Other data sources used include
the road inventory data provided by the Hampton Roads Planning
District Commission.
OLS and Quantile Regression
For completeness, in this section the OLS and quantile regression
techniques to be used in the next section of this paper are briefly
reviewed. This study compares the traditional OLS model with the
quantile regression model, which is considered to be more suitable to
model the dispersed distribution of incident durations.
OLS Model
The OLS model is given by
yx
ijij i
j
n
(1)
0
1
=
where
yi = dependent variable, that is, duration of ith incident (min),
i = 1, 2, . . . , m;
β0 = intercept;
βj = coefficient of independent variable j, j = 1, 2, . . . , n;
xij = value of independent variables j in ith incident; and
εi = estimation error or residual for ith incident.
The error εi is assumed to be normally distributed with a mean of
zero and a finite variance. Coefficients of the independent variables
are estimated by minimizing the mean squared error criterion:
−β −β
==
yx
ijij
j
n
i
m
(2)
0
1
2
1
The resulting least squares estimates of β0 and βj are then denoted
by ˆ
β0 and ˆ
βj, respectively. OLS models provide intuitive estimations
of the relationship between incident duration and associated factors:
one unit increase in an independent variable leads to an increase of ˆ
βj
in the mean incident duration, with all other variables held constant.
Quantile Regression
OLS models may be a good choice for predictions in which the mean
values are of interest. For a more complete picture of the distribution
of incident durations, quantile regression becomes more appropriate.
Particularly, rather than modeling only the average incident duration
(as in OLS regression), quantile regression can model the relationship
of any quantile with a set of explanatory variables (8).
Contrary to OLS models that minimize the mean squared error,
quantile regression minimizes a sum that gives asymmetric penal-
ties (1 q)| εi
| for overprediction and qi| for underprediction, where
q is the quantile point of the outcomes. For example, if one wants to
model the median incident duration, one would choose q = 0.5. The
prediction errors in quantile regression are given by
yx
i
q
i
q
j
q
ij
j
n
ˆˆ (3)
0
1
ε= −β −β
=
where ˆ
βq
0 is the estimated intercept at quantile point q, 0 < q < 1, and
ˆ
βq
j is the estimated coefficient of independent variable j at quantile
point q. More specifically, the coefficients ˆ
βq
0 and ˆ
βq
j are estimated
by minimizing the following objective function (14):
qy xq
yx
i
q
j
q
ij
j
n
i
q
j
q
ij
j
n
iy x
n
iy x
n
iq
j
qij
j
n
iq
j
qij
j
n
1
(4)
0
1
0
1
:: 0
1
0
1
∑∑
()
−β −β +−−β −β
==
≥β
==
Khattak, Liu, Wali, Li, and Ng 141
where yi is a dependent variable, that is, the duration of ith incident
(min), i = 1, 2, . . . , n; and xij is the value of independent variables j
in the ith incident.
Incident Duration Prediction
From the perspective of modeling outcomes, OLS models provide
intuitive results, giving a single number that is the predicted mean,
while quantile regression can provide estimates for any quantile q,
where q can be any number between 0 and 1. Thus, quantile regres-
sion can be seen as providing estimates of the entire (conditional)
distribution of incident durations given certain conditions and does
not give incident duration prediction directly, that is, it does not
provide a single number of how many minutes an incident may last.
This study applies a location-based prediction method to predict the
incident durations with quantile regression.
Location-Based Prediction
Location-based prediction can be applied if regional historical inci-
dent data are available (36). It assumes that traffic safety outcomes
do not change dramatically in a short period; the durations of inci-
dents in one segment or intersection remain in the same quantile of
all incidents in a region. For example, if the historical data show that
durations of incidents in one segment are likely to be at the 75th per-
centile, the predicted durations for this segment are approximately
the estimates of quantile regression at the 75th percentile. In this
study, the quantile regressions for duration prediction are made
at the 5th, 15th, 25th, . . . , 95th percentiles, as shown in Figure 1.
Thus, the predicted duration can be obtained at the 5th percentile
regression if the observed value is less than the 10th percentile, or
at the 15th percentile regression if the observed value is within the
10th to the 20th percentile, and so forth. With the location-based
prediction method, the incident duration can be predicted with
yy
mqyq
mqyq
mqyq
m
ˆˆ
5, if
15,if
95,if
(5)
010
10 20
90 100
=
=<
=<
=<
where
ˆ
y = predicted incident duration using location-based prediction
method,
ˆ
ym = predicted incident duration at center of interval m (i.e., per-
centile location),
y = average of historical incident duration at particular location
(e.g., bottleneck), and
qp = pth percentile value of durations of incidents in region.
Model Comparison
This study compares the two modeling techniques—that is, OLS
and quantile regression models—by calculating the RMSE for the
resulting incident duration predictions. A smaller RMSE indicates a
better prediction. The RMSE can be calculated as follows:
yy
n
ii
i
n
RMSE
ˆ
(6)
2
1
()
=
=
where
n = number of observations,
yi = observed duration for ith incident in data set, and
ˆ
yi = predicted duration for ith incident in data set.
MODELING RESULTS
Descriptive Statistics
Table 1 presents descriptive statistics of variables selected for analy-
sis and modeling. Figure 2 shows the distributions of incident dura-
tions of valid observations, N = 85,624. Observations with missing
information were removed from the data set. The descriptive sta-
tistics of selected variables seem to be within reasonable ranges.
The distribution of incident duration is widely dispersed. The mean
duration was 50.96 min, with a standard deviation of 107.13 min.
The maximum incident duration was 1,419 min. Thus, it is clear
that the dispersed distribution of incident duration implies that the
mean duration does not appropriately represent a full picture of all
incidents.
The variable “detection source” refers to how an incident is
detected. Seven dummy variables were created: the SSP, closed-
circuit television (CCTV), citizen call, contractor call, field
device or police, Virginia Department of Transportation field
staff, and the Virginia State Police. The majority of incidents,
60.4%, were reported through SSP. In regard to incident type,
disabled incidents represented 60.7% of the sampled incidents.
Three roadway types were considered in the analysis: Interstates,
primary roads, and urban roads; 83% of the incidents occurred on
Interstates.
In regard to temporal characteristics, the developed models incor-
porate the associations between a.m. peak (0600 to 1000 hours),
p.m. peak (1600 to 1900 hours), midday (1000 to 1600 hours), and
night (1900 to 0600 hours) and incident durations, respectively.
Definitions for the aforementioned temporal variables are adopted
while taking guidance from several past studies, for example, see the
Urban Mobility Scorecard (37). Of the incidents that occurred, 36%
and 30% occurred during the night and at midday, respectively.
5th 15th 25th 35th 45th 55th 65th 75th 95th85th
Possible Incident Duraon
FIGURE 1 Intervals and locations of quantile regression.
142 Transportation Research Record 2554
TABLE 1 Descriptive Statistics of Incident Data from Hampton Roads, Virginia
Variable Valid NMean Frequency SD Min. Max. VIF
Incident duration (min) 85,624 50.960 na 107.134 1 1,419 na
Detection source
SSP 85,624 0.604 51,717 0.488 0 1 na
CCTV 85,624 0.203 17,382 0.402 0 1 1.64
Citizen call 85,624 0.003 257 0.059 0 1 1.01
Contractor call 85,624 0.103 8,819 0.304 0 1 2.11
Field device or police 85,624 0.001 86 0.040 0 1 1.01
Virginia DOT field staff 85,624 0.006 514 0.079 0 1 1.13
VSP 85,624 0.076 6,507 0.266 0 1 1.16
Incident type
Accident 85,624 0.097 8,306 0.296 0 1 na
Congestion/delay 85,624 0.037 3,168 0.189 0 1 2.57
Disabled vehicle 85,624 0.607 51,974 0.488 0 1 4.66
Other 85,624 0.255 21,834 0.436 0 1 7.76
Vehicle fire 85,624 0.002 172 0.045 0 1 1.03
Roadway type
Interstate 85,624 0.830 71,068 0.374 0 1 2.69
Primary 85,624 0.040 3,425 0.197 0 1 1.61
Urban 85,624 0.007 599 0.088 0 1 1.12
Time of day
a.m. peak 85,624 0.176 15,070 0.380 0 1 na
Midday 85,624 0.300 25,687 0.458 0 1 1.92
p.m. peak 85,624 0.161 13,785 0.367 0 1 1.64
Night 85,624 0.362 30,998 0.480 0 1 2.03
Day of week
Weekday 85,624 0.767 65,674 0.422 0 1 na
Weekend 85,624 0.232 19,865 0.422 0 1 1.02
Injury count 85,624 0.017 na 0.175 0 6 1.37
Number of involved vehicles 85,624 0.814 na 0.627 0 11 3.37
Rescue responded (1–yes, 0–no) 85,624 0.029 2,483 0.168 0 1 1.63
Work zone involved (1–yes, 0–no) 85,624 0.002 171 0.046 0 1 1.02
Note: VIF = variance inflation factor; na = not applicable; DOT = department of transportation; VSP = Virginia State Police.
0 200 400 600 800 1,000 1,200 1,400
30,000
0
60,000
Incident Duration (min)
Frequency
FIGURE 2 Duration distribution of traffic incidents in sample: Hampton Roads (N 5 85,624).
Khattak, Liu, Wali, Li, and Ng 143
Moreover, the descriptive statistics reveal that 76.7% of the inci-
dents occurred on weekdays. On average, 0.814 vehicles were
involved in sampled incidents, whereas the mean injury count in
the data set was found to be 0.017. Last, rescue services responded
to only 2.9% of the incidents.
Incident Duration Models
Table 2 presents the outputs of OLS and quantile regression models
estimated at the 25th, 50th, 75th, and 95th percentiles. Most of the
variables are statistically significant (at the 95% level). The signs of
the coefficients are as expected. In general, the coefficients of the
OLS model are within the range of the coefficients estimated by the
quantile regression models.
The OLS model provides only one set of coefficients, indicating
the amount of increase or decrease in the average incident duration
with one unit increase in an independent variable, with other vari-
ables being held constant. Quantile regression provides one set of
coefficients for each quantile considered. For a given quantile, the
interpretation of the coefficients is the same as in an OLS model; it is
the change in the incident duration in a given quantile category, with
one unit increase in the independent variable. Figure 3 presents the
coefficients of key factors at continuous quantiles, relative to the coef-
ficients estimated with OLS regression. The coefficients of quantile
regression vary across different quantiles, while OLS coefficients
are constant.
From the OLS model, it can be seen that compared with SSP
detected incidents, those detected by CCTV, contractor call, and
Virginia State Police are expected to be 27.92, 24.93, and 6.41 min
TABLE 2 OLS and Quantile Regression Models
OLS (mean) 25th Percentile
Median
(50th percentile) 75th Percentile 95th Percentile
Variable βtβtβtβtβt
Detection source
SSP Base Base Base Base Base
CCTV 27.92 31.17 5.00 48.93 11.00 38.73 17.00 13.00 23.00 7.44
Citizen call 1.86 0.39 6.00 11.06 10.00 6.64 10.00 1.44 3.00 0.18
Contractor call 24.93 18.61 5.00 32.66 9.01 21.55 9.00 4.59 36.00 7.78
Field device or police 86.25 12.43 11.00 13.88 22.00 9.99 108.50 10.70 99.00 4.13
Virginia DOT field staff 27.62 7.33 5.00 11.62 9.00 7.53 11.00 2.00 27.00 2.08
VSP 6.41 5.65 8.00 61.69 11.00 30.52 11.00 6.63 8.00 2.04
Incident type
Accident Base Base Base Base Base
Congestion/delay 40.05 15.85 12.00 41.57 27.00 33.66 57.50 15.57 159.00 18.22
Disabled vehicle 15.22 11.97 3.00 20.64 13.00 32.19 27.00 14.52 49.00 11.15
Other 45.92 24.08 2.00 9.18 9.00 14.86 35.50 12.73 343.00 52.06
Vehicle fire 7.88 1.29 9.00 12.93 7.00 3.62 8.00 0.90 14.00 0.67
Roadway type
Interstate 9.93 1.25 2.00 13.99 6.00 15.11 8.00 4.37 21.00 4.86
Primary 32.37 1.81 2.01 9.63 8.00 13.86 15.00 5.64 37.00 5.89
Urban 33.26 3.45 2.03 5.06 10.00 9.11 19.00 3.76 26.00 2.18
Time of day
a.m. peak Base Base Base Base Base
Midday 12.86 15.24 0.00 0.00 2.00 7.47 5.00 4.05 51.00 17.50
p.m. peak 6.91 7.16 0.00 0.00 2.01 6.53 4.00 2.83 47.00 14.10
Night 12.14 14.43 1.00 10.40 1.00 3.74 3.00 2.44 19.00 6.54
Day of week
Weekday Base Base Base Base Base
Weekend 4.13 6.21 0.00 0.00 0 0.00 1.00 1.03 12.00 5.21
Injury count 9.86 5.40 10.50 50.31 8.00 13.79 8.00 3.00 7.00 1.11
Number of involved vehicles 5.03 5.97 3.00 31.10 4.00 14.92 5.50 4.46 5.00 1.71
Rescue responded (1–yes, 0–no) 18.48 8.94 21.00 88.88 25.00 38.03 20.00 6.62 46.00 6.44
Work zone involved (1–yes, 0–no) 10.95 1.85 5.00 7.41 0.00 0.00 7.50 0.87 1.00 0.05
Constant 16.23 6.98 1.00 3.76 12.00 16.26 33.50 9.86 116 14.44
Number of observations 85,624 85,624 85,624 85,624 85,624
Total sum of squared errors 685,567,430 na na na na
Model sum of squared errors 105,690,879 na na na na
R2.15 .04a.05a.10a.41a
Raw sum of deviations na 837,475.3 1,549,636 2,001,910 1,483,797
Minimum sum of deviations na 807,477.5 1,465,654 1,802,375 872,479.4
aRepresents pseudo-R2 for quantile regression; the median (or any other quantile) regression estimates are based on maximum likelihood for double exponential
distribution. The goodness-of-fit measure is calculated as pseudo-R2 = 1 minimum sum of deviations/raw sum of deviations.
FIGURE 3 Coefficients of OLS and quantile regression models based on Hampton Roads incident data. Black broken line shows estimates from OLS regression; 95% confidence intervals
are shown by black dotted lines. Blue line shows estimates from quantile regression; 95% confidence intervals are shown by shaded region (VDOT 5 Virginia DOT).
Khattak, Liu, Wali, Li, and Ng 145
longer, respectively. From the quantile regression, the coefficients
vary across different percentiles. The differences between SSP and
the other detection sources are greater for the upper percentiles
(i.e., 75th and 95th percentiles), especially for incidents reported
by CCTV, contractor call, and field device or police. For example, for
long incidents (in the 95th percentile relative to their duration), when
an incident is first reported by CCTV, then the incident duration will
be longer by as much as 23 min compared with when the incident is
reported by SSP.
On average, the incident duration resulting from congestion
or delay is 40.05 min longer than for accidents, while the quantile
regression indicates that the associations between incident type being
“congestion/delay” and incident durations are significantly higher
at the 75th and 95th percentiles. This observation intuitively indi-
cates that once an incident occurs, associations between “congestion/
delay” and incident duration become stronger as incident duration
increases.
Incidents on freeways are positively correlated with incident
durations. On average, an incident on an Interstate is expected to
last 9.93 min. However, quantile regression reveals significantly
varying positive correlation between Interstate incidents and inci-
dent durations, with larger positive correlation at higher quantiles.
Likewise, the positive correlation between incidents occurring on
urban routes is higher at higher quantiles as compared with lower
quantiles. The results from quantile regression thus provide more
exhaustive insights about complex interactions, which can help in
the development of more-informed incident management strategies.
As compared with a.m. peak incidents, incidents occurring during
midday are on average 12.86 min shorter. Nighttime incidents are on
average 12.14 min longer than a.m. peak incidents. Contrarily, the
results from quantile regression suggest that the association between
higher quantile incident duration and midday incident is strongly
negative as compared with lower quantile incident duration. There
could be several reasons for this finding. For instance, once an inci-
dent turns out to be longer, there could be other potential observed
or unobserved factors or both that may contribute to an incident’s
longer duration. In the presence of such unobserved factors that may
be associated with longer incident durations, the influence of mid-
day incident on incident duration may be relatively smaller. Quantile
regression shows that for incidents that normally last longer than the
median, an incident on the weekend may last even longer, accord-
ing to the larger magnitudes of the coefficients at higher quantiles,
as shown in Figure 3. The number of vehicles involved in incident
and injury counts has a positive relationship with incident duration.
If rescue responds to an incident, the incident is expected to last on
average 18.48 min longer compared with an incident that does not
receive a response from rescue. The increase would be 46 min at the
95th percentile, indicating a more pronounced positive association.
This is, however, merely a correlation since rescue services may, in
turn, be needed for larger incidents, and the rescue services likely
decrease the duration of the incidents compared with the duration if
rescue had not responded.
Using the coefficients from quantile regression, this study pro-
poses another way to interpret the quantile regression results. Table 3
provides the estimation of incident duration by holding all vari-
ables at their mean values: the mean incident duration is 44.10 min,
6.68 min at the 25th percentile, 13.86 min at the median, 45.45 min
at the 75th percentile, and 186.54 min at the 95th percentile. All
these numbers are close to the distributions of the 85,624 incidents
sampled in the study. Table 3 allows one to predict the incident dura-
tion given a certain value of the independent variable while control-
ling for other variables at their means. Changes in the probability that
an incident with a given duration will occur owing to the change in
values of independent variables are quantified.
For example, all other factors are at their means, and only the inci-
dent type is allowed to vary. The incident duration at the 75th percen-
tile is estimated to be 45.45 2.13 = 43.32 min when the incident is
not related to congestion or delay, meaning there is a 25% chance that
an incident lasts at least 43.32 min if it is not the result of congestion
or delay. When the incident is related to congestion or delay, inci-
dent duration at the 75th percentile is calculated to be 45.45 2.13 +
57.50 = 100.82 min, indicating a 25% chance that an incident will
last 100.82 min or longer. Notably, the 75th percentile incident
duration for congestion or delay is 100.82 min, which is close to
the 95th percentile estimation for other (unclassified) incidents. The
associations of other factors with incidents can be interpreted in the
same way. The exact increase or decrease in the chance or probabil-
ity can be obtained by comparing estimations at other percentiles,
such as the 25th or 50th.
Performance Comparison
As mentioned earlier, incident durations can be predicted by the OLS
model and by quantile regression models. This study used the location-
based method to obtain the predicted values based on the estimation
of quantile regression. The quantile regressions for incident duration
prediction are made at the 5th, 15th, 25th, . . . , 95th percentiles. To
predict incident durations with quantile regression, individual quan-
tile regressions estimated at the 5th, 15th, 25th, . . . , 95th percentiles
are used. Next, the incident duration associated with increments of
the 10th percentiles are calculated. If a specific observed value for the
incident duration value falls within a percentile—for example, if it is
less than the 10th percentile (suppose it is equal to 2 min)—then the
5th percentile regression is used to predict incident durations in this
bin. Likewise, if the observed incident duration is between the 40th
and 50th percentile (i.e., greater than 9 and less than 14 min), then
the 45th percentile regression is used to predict the incident duration
in this bin, and so on. Thus, the combined predictions (using the 5th,
15th, 25th, . . . , 95th percentile equations) from quantile regression
are compared with the single equation (mean) OLS predictions.
The RMSEs are calculated with Equation 6. Their values show the
extent of the difference between the predicted and observed incident
durations. The RMSE for OLS is 82.29 min, while for the quan-
tile regression with location-based prediction, it is 57.49 min. The
quantile regression is observed to be significantly better in predicting
incident durations through the location-based method. The location-
based method seems the best in regard to accurately predicting the
incident duration; however, historical data are required for the use
of this method.
POTENTIAL APPLICATIONS
There are potential applications of the quantile regression method
in traffic incident management. First, the models can more accu-
rately predict incident durations in real time and, second, analysis
of correlates can be used to design strategies for reducing incident
durations. Transportation researchers and professionals in different
areas may use the method proposed in this study to develop their
local quantile regression models for regional incident management.
146 Transportation Research Record 2554
Predicting Incident Duration
At some critical locations (such as bottlenecks) in the road network,
there may be incidents that normally last longer than the regional
average. If an incident occurs at such a location, then higher percen-
tile regressions can be applied to predict the incident duration. For
example, incident data in Hampton Roads show that the duration of
incidents at entrances of the Hampton Roads Bridge Tunnel are
longer and in the 75th percentile compared with incidents in the
region. Therefore, the 75th percentile regression model can be used
to obtain the initial incident duration prediction for this bottleneck.
Other triggers that move the models to higher percentiles include
unclassified “other” incidents (as opposed to accidents), injury
counts, and number of involved vehicles. The model in Table 2
presents the 75th percentile regression for predicting the durations
of future incidents at this bottleneck.
Reducing Incident Duration
In addition to incident duration prediction, quantile regression has
the potential to provide transportation practitioners with solutions
to reduce the duration of incidents. Specifically, the correlates of
higher or lower percentile regressions can highlight factors that can
potentially reduce incident durations. Incidents on Interstates for
smaller incidents at the 25th percentile are associated with 2-min-
longer incident durations, but at the 95th percentile, that is, for large-
scale incidents, they are associated with 21-min-longer durations.
Similarly, if incidents are captured through CCTV, then the dura-
tions of larger incidents may increase substantially as compared with
those captured via SSP. Strategies that can reduce the number of
people injured and the number of involved vehicles can also reduce
the durations of larger incidents.
LIMITATIONS
The results of this study depend heavily on the accuracy of infor-
mation documented in the database. The data collected were based
on incident reporters and investigators. Reporting errors may exist.
Further, this study analyzed a limited number of factors. If other
variables are included in the model specification, the associations
between incident duration and related factors may be different.
The data used in this study are based on incidents that occurred
TABLE 3 Estimation of Incident Duration at Means of Independent Variables
OLS (mean) 25th Percentile
Median (50th
percentile) 75th Percentile 95th Percentile
Variable Xβ β Xβ β Xβ β Xβ β Xβ β X
Detection source
SSP 0.604 Base Base Base Base Base
CCTV 0.203 27.92 5.67 5.00 1.02 11.00 2.23 17.00 3.45 23.00 4.67
Citizen call 0.003 1.86 0.01 6.00 0.02 10.00 0.03 10.00 0.03 3.00 0.01
Contractor call 0.103 24.93 2.57 5.00 0.52 9.01 0.93 9.00 0.93 36.00 3.71
Field device or police 0.001 86.25 0.09 11.00 0.01 22.00 0.02 108.50 0.11 99.00 0.10
Virginia DOT field staff 0.006 27.62 0.17 5.00 0.03 9.00 0.05 11.00 0.07 27.00 0.16
VSP 0.076 6.41 0.49 8.00 0.61 11.00 0.84 11.00 0.84 8.00 0.61
Incident type
Accident 0.097 Base Base Base Base Base
Congestion/delay 0.037 40.05 1.48 12.00 0.44 27.00 1.00 57.50 2.13 159.00 5.88
Disabled vehicle 0.607 15.22 9.24 3.00 1.82 13.00 7.89 27.00 16.39 49.00 29.74
Other 0.255 45.92 11.71 2.00 0.51 9.00 2.30 35.50 9.05 343.00 87.47
Vehicle fire 0.002 7.88 0.02 9.00 0.02 7.00 0.01 8.00 0.02 14.00 0.03
Roadway type
Interstate 0.830 9.93 8.24 2.00 1.66 6.00 4.98 8.00 6.64 21.00 17.43
Primary 0.040 32.37 1.29 2.01 0.08 8.00 0.32 15.00 0.60 37.00 1.48
Urban 0.007 33.26 0.23 2.03 0.01 10.00 0.07 19.00 0.13 26.00 0.18
Time of day
a.m. peak 0.176 Base Base Base Base Base
Midday 0.300 12.86 3.86 0.00 0.00 2.00 0.60 5.00 1.50 51.00 15.30
p.m. peak 0.161 6.91 1.11 0.00 0.00 2.01 0.32 4.00 0.64 47.00 7.57
Night 0.362 12.14 4.39 1.00 0.36 1.00 0.36 3.00 1.09 19.00 6.88
Day of week
Weekday 0.767 Base Base Base Base Base
Weekend 0.232 4.13 0.96 0.00 0.00 0 0.00 1.00 0.23 12.00 2.78
Injury count 0.017 9.86 0.17 10.50 0.18 8.00 0.14 8.00 0.14 7.00 0.12
Number of involved vehicles 0.814 5.03 4.09 3.00 2.44 4.00 3.26 5.50 4.48 5.00 4.07
Rescue responded (1–yes, 0–no) 0.029 18.48 0.54 21.00 0.61 25.00 0.73 20.00 0.58 46.00 1.33
Work zone involved (1–yes, 0–no) 0.002 10.95 0.02 5.00 0.01 0.00 0.00 7.50 0.02 1.00 0.00
Constant — 16.23 16.23 1.00 1.00 12.00 12.00 33.50 33.50 116 116
Estimate at means Σ(β ∗ X) 44.10 6.68 13.86 45.45 186.54
Khattak, Liu, Wali, Li, and Ng 147
in Hampton Roads, Virginia, during the 2013 to 2015 period. The
results may vary if data from other areas are used for estimation.
More detailed data about road geometry and incident characteris-
tics can potentially enhance the model specification. For example,
this study did not account for shoulders and ramp characteristics,
if they were affected or otherwise. Such data can be added and
the modeling framework enhanced to develop more appropriate
incident management solutions.
CONCLUSIONS
This study applied the quantile regression technique to predict inci-
dent duration, providing a broader range of information for incident
duration predictions. Unlike OLS regression models that provide
estimates of average incident durations, quantile regression is able
to estimate the entire distribution of incident durations by modeling
its quantiles.
In general, estimates of the OLS model are within the ranges of the
estimates made by the quantile regression models. This study dem-
onstrated the estimation of quantile regression models at the 25th,
50th, 75th, and 95th percentiles. Differences between congestion-
and delay-related incidents compared with accidents are greater at
higher percentiles, especially at the 75th percentile, implying that
congestion has a substantial influence on large incidents that nor-
mally last longer than 75% of all incidents. For factors related to the
number of involved vehicles and the number of injuries, the greater
coefficients are found at higher percentiles. Further, given the quan-
tile regression estimates, this study presented a way to predict the
change of probability that an incident with a given duration will occur
owing to changes in values of independent variables. It is estimated
that compared with the accidents, congestion- and delay-related inci-
dents are associated with a nearly 25% increase in the probability of
having an incident lasting for 100.82 min. Last, the OLS and quantile
regression models were compared in relation to the accuracy of the
incident duration prediction. The comparison showed that the quan-
tile regressions using the location-based method better predicted the
incident duration compared with the OLS model.
The information generated by quantile regression is useful in pre-
dicting the incident duration for certain groups of incidents, help-
ing with incident management, especially for some areas and road
segments where incidents are normally longer than other incidents.
Potential applications have been discussed. They can be applied in
real-life contexts, benefiting incident managers in transportation
management centers. Decision support tools that can apply these
models for predictive analytics in transportation management centers
are under development by the research team.
ACKNOWLEDGMENTS
The authors thank Hampton Roads Smart Traffic Center, Virginia
Department of Transportation, for sharing valuable data. The statis-
tical software Stata was used for modeling. The authors are thank-
ful for the support received from the Southeastern Transportation
Center through a grant, the Center for Transportation Research, and
the Transportation Engineering and Science Program in the Depart-
ment of Civil and Environmental Engineering at the University of
Tennessee. The Office of the Secretary of Transportation sponsorship
is greatly appreciated.
REFERENCES
1. Zhang, H., M. Cetin, and A. J. Khattak. Joint Analysis of Queuing
Delays Associated with Secondary Incidents. Journal of Intelligent
Transportation Systems, Vol. 19, No. 2, 2015, pp. 192–204.
2. Ng, M., A. J. Khattak, and W. K. Talley. Modeling the Time to the Next
Primary and Secondary Incident: A Semi-Markov Stochastic Process
Approach. Transportation Research Part B, Vol. 58, 2013, pp. 44–57.
3. Hu, J., B. J. Schroeder, and N. M. Rouphail. Rationale for Incorporat-
ing Queue Discharge Flow into Highway Capacity Manual Procedure
for Analysis of Freeway Facilities. In Transportation Research Record:
Journal of the Transportation Research Board, No. 2286, Transporta-
tion Research Board of the National Academies, Washington, D.C.,
2012, pp. 76–83.
4. Zhang, H., and A. J. Khattak. Analysis of Cascading Incident Event
Durations on Urban Freeways. In Transportation Research Record:
Journal of the Transportation Research Board, No. 2178, Transpor-
tation Research Board of the National Academies, Washington, D.C.,
2010, pp. 30–39.
5. Khattak, A. J., X. Wang, and H. Zhang. Spatial Analysis and Modeling
of Traffic Incidents for Proactive Incident Management and Strategic
Planning. In Transportation Research Record: Journal of the Transpor-
tation Research Board, No. 2178, Transportation Research Board of the
National Academies, Washington, D.C., 2010, pp. 128–137.
6. Valenti, G., M. Lelli, and D. Cucina. A Comparative Study of Mod-
els for the Incident Duration Prediction. European Transport Research
Review, Vol. 2, No. 2, 2010, pp. 103–111.
7. Lee, Y., and C. H. Wei. A Computerized Feature Selection Method
Using Genetic Algorithms to Forecast Freeway Accident Duration
Times. Computer-Aided Civil and Infrastructure Engineering, Vol. 25,
No. 2, 2010, pp. 132–148.
8. Zhang, H., and A. J. Khattak. What Is the Role of Multiple Secondary
Incidents in Traffic Operations? Journal of Transportation Engineering,
Vol. 136, No. 11, 2010, pp. 986–997.
9. Garib, A., A. Radwan, and H. N. Al-Deek. Estimating Magnitude and
Duration of Incident Delays. Journal of Transportation Engineering,
Vol. 123, No. 6, 1997, pp. 459–466.
10. Golob, T. F., W. W. Recker, and J. D. Leonard. An Analysis of the Severity
and Incident Duration of Truck-Involved Freeway Accidents. Accident
Analysis and Prevention, Vol. 19, No. 5, 1987, pp. 375–395.
11. Giuliano, G., Incident Characteristics, Frequency, and Duration on a
High Volume Urban Freeway. Transportation Research Part A, Vol. 23,
No. 5, 1989, pp. 387–396.
12. Khattak, A. J., H. M. Al-Deek, and R. W. Hall. Concept of an Advanced
Traveler Information System Testbed for the Bay Area: Research
Issues. Journal of Intelligent Transportation Systems, Vol. 2, No. 1,
1994, pp. 45–71.
13. Khattak, A. J., X. Wang, and H. Zhang. Incident Management Inte-
gration Tool: Dynamically Predicting Incident Durations, Secondary
Incident Occurrence, and Incident Delays. IET Intelligent Transport
Systems, Vol. 6, No. 2, 2012, pp. 204–214.
14. Koenker, R. Quantile Regression. Cambridge University Press, United
Kingdom, 2005.
15. Junhua, W., C. Haozhe, and Q. Shi. Estimating Freeway Incident Dura-
tion Using Accelerated Failure Time Modeling. Safety Science, Vol. 54,
2013, pp. 43–50.
16. El-Basyouny, K., and T. A. Sayed. Comparison of Two Negative Bi-
nomial Regression Techniques in Developing Accident Prediction
Models. In Transportation Research Record: Journal of the Trans-
portation Research Board, No. 1950, Transportation Research Board
of the National Academies, Washington, D.C., 2006, pp. 9–16.
17. Li, R., and P. Shang. Incident Duration Modeling Using Flexible
Parametric Hazard-Based Models. Computational Intelligence and
Neuroscience, Vol. 2014, 2014, p. 33.
18. Wang, X., S. Chen, and W. Zheng. Traffic Incident Duration Prediction
Based on Partial Least Squares Regression. In Procedia—Social and
Behavioral Sciences, Vol. 96, 2013, pp. 425–432.
19. Chung, Y. Development of an Accident Duration Prediction Model
on the Korean Freeway Systems. Accident Analysis and Prevention,
Vol. 42, No. 1, 2010, pp. 282–289.
20. Zou, Y., K. Henrickson, D. Lord, Y. Wang, and K. Xu. Application of
Finite Mixture Models for Analyzing Freeway Incident Clearance Time.
Transportmetrica A: Transport Science, Vol. 12, No. 2, 2016, pp. 99–115.
148 Transportation Research Record 2554
21. Qi, Y., and H. Teng, An Information-Based Time Sequential Approach
to Online Incident Duration Prediction. Journal of Intelligent Trans-
portation Systems, Vol. 12, No. 1, 2008, pp. 1–12.
22. Ji, Y. Prediction of Freeway Incident Duration Based on the Multi-
Model Fusion Algorithm. Presented at 2011 International Conference
on Remote Sensing, Environment and Transportation Engineering
(RSETE), Nanjing, China, 2011.
23. Chang, H.-L., and T.-P. Chang. Prediction of Freeway Incident Dura-
tion Based on Classification Tree Analysis. Journal of the Eastern Asia
Society for Transportation Studies, Vol. 10, 2013, pp. 1964–1977.
24. Park, H., A. Haghani, and X. Zhang. Interpretation of Bayesian Neural
Networks for Predicting the Duration of Detected Incidents. Journal of
Intelligent Transportation Systems, Vol. 19, No. 1, 2015, pp. 1–16.
25. Wei, C.-H., and Y. Lee. Sequential Forecast of Incident Duration Using
Artificial Neural Network Models. Accident Analysis and Prevention,
Vol. 39, No. 5, 2007, pp. 944–954.
26. Vlahogianni, E. I., and M. G. Karlaftis. Fuzzy-Entropy Neural Net-
work Freeway Incident Duration Modeling with Single and Competing
Uncertainties. Computer-Aided Civil and Infrastructure Engineering,
Vol. 28, No. 6, 2013, pp. 420–433.
27. Lin, P.-W., N. Zou, and G.-L. Chang. Integration of a Discrete Choice
Model and a Rule-Based System for Estimation of Incident Duration:
A Case Study in Maryland. Presented at 83rd Annual Meeting of the
Transportation Research Board, Washington, D.C., 2004.
28. He, Q., Y. Kamarianakis, K. Jintanakul, and L. Wynter. Incident Dura-
tion Prediction with Hybrid Tree-Based Quantile Regression. Advances
in Dynamic Network Modeling in Complex Transportation Systems,
Vol. 2, 2013, pp. 287–305.
29. Xiaoqiang, Z., L. Ruimin, and Y. Xinxin. Incident Duration Model on
Urban Freeways Based on Classification and Regression Tree. Pre-
sented at Second International Conference on Intelligent Computation
Technology and Automation (ICICTA ’09), Hunan, China, 2009.
30. Kim, W., S. Natarajan, and G.-L. Chang. Empirical Analysis and Mod-
eling of Freeway Incident Duration. Presented at 11th International
IEEE Conference on Intelligent Transportation Systems (ITSC 2008),
Beijing, 2008.
31. Li, R., F. C. Pereira, and M. E. Ben-Akiva. Competing Risk Mixture
Model and Text Analysis for Sequential Incident Duration Prediction.
Transportation Research Part C, Vol. 54, 2015, pp. 74–85.
32. Machado, J. A. F., and J. S. Silva. Quantiles for Counts. Journal of the
American Statistical Association, Vol. 100, No. 472, 2005, pp. 1226–1237.
33. Qin, X., M. Ng, and P. E. Reyes. Identifying Crash-Prone Locations
with Quantile Regression. Accident Analysis and Prevention, Vol. 42,
No. 6, 2010, pp. 1531–1537.
34. Qin, X. Quantile Effects of Causal Factors on Crash Distributions.
In Transportation Research Record: Journal of the Transportation
Research Board, No. 2279, Transportation Research Board of the
National Academies, Washington, D.C., 2012, pp. 40–46.
35. Ruimin, L., Z. Xiaoqiang, Y. Xinxin, L. Junwei, C. Nan, and Z. Jie.
Incident Duration Model on Urban Freeways Using Three Differ-
ent Algorithms of Decision Tree. Presented at 2010 International
Conference on Intelligent Computation Technology and Automation
(ICICTA), Changsha, China, 2010.
36. Zhang, H., and A. Khattak. Spatiotemporal Patterns of Primary and
Secondary Incidents on Urban Freeways. In Transportation Research
Record: Journal of the Transportation Research Board, No. 2229, Trans-
portation Research Board of the National Academies, Washington, D.C.,
2011, pp. 19–27.
37. Schrank, D., B. Eisele, and T. Lomax. 2012 Urban Mobility Scorecard.
Texas A&M Transportation Institute, College Station, 2012.
The views presented in this paper are those of the authors, who are responsible
for the facts and the accuracy of the information provided.
The Standing Committee on Freeway Operations peer-reviewed this paper.
... Diğer çalışmalar temizleme (clearance) süresi [46], [47], [48], [49], [50], [51], müdahale (response) süresi [52], [53], müdahale ve temizleme süresi [54], [55], [56], [57], [58], [59], [60], [61], hazırlık (preparation), seyahat (travel) ve temizleme süresi [62], [63], [64], kaza tıkanıklığı (crash congestion) süresi [65], [66], kaza meydana gelme (accident occurrence) süresi [67], [68] veya diğer zaman aşamalarına [44], [64], [69] odaklanmıştır. Önceki çalışmalardan birinde, müdahale süresi iki kategoriye ayrılmıştır: [70], [71], [72], [73], [74], [75], toplamı binlerce hatta milyonlarca olan çok sayıda olay kaydı içermektedir. Bazı veri kümeleri 100.000 olay kaydından oluşmaktadır [8], [76], [77], [78], [79]. ...
... SVM, karmaşık karar sınırlarını ele alarak hiper düzlemleri kullanarak optimum sınıf ayrımını bulur [134]. [55], [59], [70], [71], [91], Quantile Regresyon [73], [107], Copula Tabanlı Yaklaşım [56], [61], [79], diğer farklı regresyon modelleri [106], [160] ve istatistiksel yaklaşımlar [8], [85] ...
Thesis
Full-text available
Bu doktora tezinin amacı İstanbul’daki trafik kaza verilerine dayalı olarak trafik kaza süresini tahmin etmek ve kaza süresini etkileyen temel faktörleri belirlemektir. Tez çalışmasında İstanbul Büyükşehir Belediyesi ve Emniyet Genel Müdürlüğü kurumlarından elde edilen İstanbul’a ait kaza bilgisi veri setleri kullanılmıştır. Veriler, veri madenciliği kapsamında incelenmiştir. Ayıklanan veri setine istatistik testleri ve makine öğrenmesi algoritmaları uygulanarak trafik kaza süresi tahmini gerçekleştirilmiştir. Elde edilen makine öğrenmesi eğitim sonuçlarına göre en başarılı algoritma R-Kare: 0.85 ile Topluluk Ağacı olurken, test sonuçlarına göre en başarılı algoritma R-Kare: 0.91 ile Sinir Ağları olmuştur. ---------------------------------------------------------------------------------------- The aim of this study is to predict the traffic accident duration based on traffic accident data in Istanbul and to identify the main factors affecting the accident duration. The accident data sets obtained from Istanbul Metropolitan Municipality and General Directorate of Security are used in this study. The data were analyzed within the scope of data mining. Statistical tests and machine learning algorithms were applied to the extracted data set and prediction of traffic accident duration was performed. According to the machine learning training results, the best model is Ensemble Tree with R-Square: 0.85 and according to the test results, the best model is Neural Networks with R-Square: 0.91.
... Such models are based on the assumption that the duration of a traffic incident can be predicted by a set of factors, including the incident type, time of day, and prevailing weather conditions, as well as the traffic volume. These models encompass widely used regression models [6][7][8][9][10], probabilistic statistical models [11][12][13], hazard-based models [14][15][16][17][18], copula-based models [19], finite mixture models [20,21], etc. They typically assume that the data follow a certain distribution and model and predict the distribution of the data. ...
Article
Full-text available
Traffic incidents pose substantial hazards to public safety and wellbeing, and accurately estimating their duration is pivotal for efficient resource allocation, emergency response, and traffic management. However, existing research often faces limitations in terms of limited datasets, and struggles to achieve satisfactory results in both prediction accuracy and interpretability. This paper established a novel prediction model of traffic incident duration by utilizing a tabular network-TabNet model, while also investigating its interpretability. The study incorporates various novel aspects. It encompasses an extensive temporal and spatial scope by incorporating six years of traffic safety big data from Tianjin, China. The TabNet model aligns well with the tabular incident data, and exhibits a robust predictive performance. The model achieves a mean absolute error (MAE) of 17.04 min and root mean squared error (RMSE) of 22.01 min, which outperforms other alternative models. Furthermore, by leveraging the interpretability of TabNet, the paper ranks the key factors that significantly influence incident duration and conducts further analysis. The findings emphasize that road type, casualties, weather conditions (particularly overcast), and the number of motor and non-motor vehicles are the most influential factors. The result provides valuable insights for traffic authorities, thus improving the efficiency and effectiveness of traffic management strategies.
Article
The occurrence of incidents seriously affects the operation of the whole urban railway system and passengers’ travel experience. Accurate delay prediction is important for traffic control and management under incidents. Few studies were reported on incident prediction in urban railway systems because of the unexpected nature of incidents and the lack of comprehensive incident data. Existing models used to predict incident delay can be divided into statistical methods and traditional machine learning methods, as well as ensemble learning methods. This study conducts a methodology review for these models by comparing their performance in predicting incident delays using a large-scale incident dataset collected from an urban railway system in Hong Kong. Three statistical models and six machine/ensemble learning methods are examined: ordinary least squares, accelerated failure time, quantile regression (QR), support vector regression (SVR), K-nearest neighbor, random forest, adaptive boosting, gradient boosting decision tree (GBDT), and extreme gradient boosting (XGBoost) tree. The results indicate that statistical models perform better than machine/ensemble learning models in predicting train delays under incidents. The QR, SVR, and XGBoost tree models outperform other models in incident delay prediction in their respective methodological categories. The factors of the incident type and affected line type present the most significant effects on incident delay prediction in selected models.
Article
Full-text available
Traffic accidents are often inaccurately reported, with incorrect location and disruption duration due to various external factors. This can result in imprecise predictions and inaccurate decision-making in data-driven models. To address these challenges, our study presents a comprehensive framework for traffic disruption segmentation from traffic speed data (obtained from Caltrans Performance Measurements system) in the time-space proximity of reported accidents (from Countrywide Traffic Accident dataset). Furthermore, we evaluate multiple machine learning models on reported, estimated, and manually marked disruption intervals, and demonstrate that our enhanced modelling approach reduces the root mean squared error (RMSE) of traffic accident duration prediction while providing higher similarity with disruptions observed in traffic speed. Our algorithm yields higher disruption detection precision than reported accident timelines. Although using multiple segments offers a slight decrease in the quality of results, it highlights more disruptions. Future research could explore expanding the algorithm’s complexity and applying it to improve traffic incident impact predictions.
Article
Full-text available
Unexpected congestions are a common problem in the lives of urban citizens who need to travel to carry out their activities. This type of congestion causes unexpected delays to drivers and has traffic accidents and their duration as the main factor for their formation. In order to contribute to this problem, this study aimed to analyze the duration of traffic accidents on arterial roads of Fortaleza, Brazil, and their relationship with their causal factors. The duration of accidents was estimated based on traffic data obtained from electronic surveillance equipment, as the accident databases did not have this information. For this purpose, we generated profiles of speed and flow proportion per lane for days with accident and typical days to differentiate the impact on traffic caused by an accident from a typical traffic variability. The method detected the duration of 316 accidents with an average duration of 71 minutes and a standard deviation of 43 minutes. Next, a set of suggested hypotheses to explain the variability of accident duration was analyzed using survival models. The calibrated model showed that the severity of the accident, the traffic conditions at the accident location, the quantity and scheduling of the traffic agents, and the number of vehicles involved can have a significant impact on accident duration.
Article
Full-text available
This study introduces Bayesian learning to neural networks for accurate prediction of incident duration. Network parameters are updated using a Hybrid Monte Carlo algorithm, and yield reasonable accuracy with mean absolute percentage error of 29%. A pedagogical rule extraction algorithm (TREPAN) is applied to extract comprehensible representations from the neural networks. The TREPAN facilitates better comprehensibility with M-of-N expression, and maintains high predictive accuracy to its respective network. Extracted decision trees provide a discovery and explanation of previously unknown relationships present in incident nature, and represent a series of decisions to assist traffic management operators in better decision making. Furthermore, to quantify the importance of variables from the neural network, a connection weight approach is used. Factors appearing in the first splitter of decision tree show high relative importance indicating that they are influential for longer or shorter incident duration. Interpretation of Bayesian neural networks is an important addition to the Advanced Traveler Information Systems toolkit.
Article
Full-text available
The prediction of the traffic incident duration is a very important issue to the Advanced Traffic Incident Management (ATIM). An accurate prediction of incident duration makes a lot contributes to making appropriate decisions to deal with incidents for traffic managers. The paper employed the Partial Least Squares Regression (PLSR) to build model between incident duration and its influence factors. Three models were established for three types of incident correspondingly, i.e. stopped vehicle, lost load and accident. Meanwhile, a model without distinguishing the incident type was built as a comparison. The experiments results indicated that the model obtained high prediction accuracy for those incidents which last 20 minutes to 90 minutes. The models got prediction accuracy of 77.24%, 86.59%, 83.33% and 71.30% for stopped vehicle, lost load, accident and all incidents within 20 minutes error, respectively. The results indicated that the PLSR has a promising application to predict traffic incident duration
Article
Full-text available
The freeway facilities methodology in the 2010 Highway Capacity Manual (HCM) is the only HCM methodology that encompasses undersaturated and congested flow regimes over multiple periods. However, the methodology is limited by its assumption of a fixed capacity threshold between the two flow regimes. The method does not consider the two-capacity phenomenon, which suggests that a drop in the throughput from theoretical capacity is observed after breakdown has occurred. A summary of the available literature on empirical evidence of the capacity drop under queue discharge conditions offers a theoretical evaluation of the impact on queue discharge flow based on shock wave theory relationships that form the basis of the queuing model used in the HCM freeway facilities method. Examples illustrate incorporating queue discharge into a freeway facilities analysis, and implications for practice are discussed.
Chapter
Accurate prediction of incident duration is critical for efficient incident management which aims to minimize the impact of non-recurrent congestion. In this chapter, a hybrid tree-based quantile regression method is proposed for incident duration prediction and quantification of the effects of various incident and traffic characteristics that determine duration. Hybrid tree-based quantile regression incorporates the merits of both quantile regression modeling and tree-structured modeling: robustness to outliers, simple interpretation, flexibility in combining categorical covariates, and capturing nonlinear associations. The predictive models presented here are based on variables associated with incident characteristics as well as the traffic conditions before and after incident occurrence. Compared to previous approaches, the hybrid tree-based quantile regression offers higher predictive accuracy.
Article
A number of approaches have been developed for analyzing incident clearance time data and investigating the effects of different explanatory variables on clearance time. Among these methods, hazard-based duration models (i.e., proportional hazard and accelerated failure time models) have been extensively used. The finite mixture model is an alternative approach in survival data analysis, and offers greater flexibility in describing different shapes of the hazard function. Additionally, the finite mixture model assumes that the incident clearance time dataset contains distinct subpopulations, and it allows the effects of explanatory variables to vary between different subpopulations. In this study, a g-component mixture model is applied to analyze incident clearance time. To demonstrate advantages of the proposed finite mixture model framework, incident clearance time data collected on freeway sections in Seattle, Washington State are analyzed. Estimation and prediction results from the proposed mixture model and the accelerated failure time model are presented and compared. The results suggest that the proposed mixture model can better describe the survival probability and hazard probability of incident clearance time, and can provide more accurate prediction compared to the accelerated failure time model. The mixture model can also provide inferences about the effects of explainable variables on different subpopulations present in incident clearance time data. The additional information obtained from the proposed mixture model can be potentially useful for designing targeted incident management strategies for different incident types. Overall, the findings in this study demonstrate that the mixture modeling approach is a useful and informative method for analyzing heterogeneous incident duration data and predicting incident duration on freeways.
Article
Traffic events involving secondary incidents can be particularly problematic for the public and for incident managers. This paper explores the associations of spatial characteristics, including geometric and land use factors, with secondary and nonsecondary incidents. The data used in this study are 2006 incident records from Hampton Roads in Virginia and roadway inventory data, enhanced through geographic information systems to include detailed spatial information. Secondary incidents in the same and opposite directions were identified by using a queue-based method. Such incidents represented nearly 2% of total recorded incidents but showed longer durations than other incidents. The study found statistically significant differences between the distributions of secondary and nonsecondary incidents, implying that higher risks of secondary incidents in certain roadway segments are not necessarily correlated with relatively high risk of nonsecondary incidents. Poisson, zero-inflated Poisson, and negative binomial regression models were estimated by combining traffic exposure, road segment characteristics, and spatial land use information to explore factors associated with secondary incidents. The models provided helpful information for effective assignment of incident management resources and for support of regionally based strategic planning.
Article
Incident-induced traffic congestion is a major source of travel uncertainty. Sometimes multiple incidents occur sequentially because of queue backups, which substantially increase uncertainty. Such cascading incidents can be grouped into one event because of their spatial and temporal proximity. Events consisting of a primary and its secondary incidents are expected to have longer durations than single incidents and therefore to result in larger impacts on traffic. Though relatively rare, such cascading events are a major concern for transportation operations managers, and they are the focus of this paper. A unique event database, based on incident and road inventory data from Hampton Roads, Virginia, is created. Single-pair events (one primary and one secondary incident) and large-scale events (one primary and multiple secondary incidents) are identified and analyzed. "Event duration" is defined as the time elapsed from the notification of a primary incident to the departure of the last responder from the event scene after removal of the primary and associated secondary incidents. Events are further categorized as either contained or extended. If the primary incident is the last one being cleared during such an event, then it is a contained event; otherwise, it is an extended event. Correlates of contained and extended event durations are identified through a set of rigorous statistical models. The findings of this study provide knowledge that can aid in mitigating the impacts of cascading incidents.
Article
Crash data are heterogeneous because they are collected from different sources and locations at different times. This data heterogeneity may cause a significant bias in the estimation of standard errors for the coefficients as well as the coefficients' statistical inferences. In the past decade, several promising modeling strategies have been proposed to handle overdispersed crash data, most of which have focused on estimating the conditional mean crash count. This paper applies an alternative crash modeling approach: quantile regression (QR) in the context of a count data model. The application of QR to model crash frequency is illustrated, and empirical results are interpreted. Poisson gamma, the benchmark statistical model for crash counts, is referenced to estimate the covariate coefficients for the mean crash count. Focusing on the mean may result in important aspects of the data being missed. A more detailed analysis, using a QR model for crash count data, confirms that crash predictors have varying impacts on the different areas of the crash distribution. Moreover, the marginal effects of covariates provide a more direct observation of changes in the quantity, rather than the percentage, of crash frequency when responding to one-unit changes in regressors.