ArticlePDF Available

Examining driver injury severity in intersection-related crashes using cluster analysis and hierarchical Bayesian models

Authors:

Abstract and Figures

Traffic crashes are more likely to occur at intersections where the traffic environment is complicated. In this study, a hybrid approach combining cluster analysis and hierarchical Bayesian models is developed to examine driver injury severity patterns in intersection-related crashes based on two-year crash data in New Mexico. Three clusters are defined by K-means cluster analysis based on weather and roadway environmental conditions in order to reveal drivers' risk compensation instability under diverse external environment. Hierarchical Bayesian random intercept models are developed for each of the three clusters as well as the whole dataset to identify the contributing factors on multilevel driver injury outcomes: property damage only (Level I), complaint of injury and visible injury (Level II), and incapacitating injury and fatality (Level III). Model comparison with an ordinary multinomial logistic model omitting crash data hierarchical features and cross-level interactions verifies the suitability and effectiveness of the proposed hybrid approach. Results show that a number of crash-level variables (time period, weather, light condition, area, and road grade), vehicle/driver-level variables (traffic controls, vehicle action, vehicle type, seatbelt used, driver age, drug/alcohol impaired, and driver age) along with some cross-level interactions (i.e., left turn and night, drug and dark) impose significantly influence driver injury severity. This study provides insightful understandings of the effects of these variables on driver injury severity in intersection-related crashes and beneficial references for developing effective countermeasures for severe crash prevention.
Content may be subject to copyright.
Examining driver injury severity in intersection-related crashes using 1
cluster analysis and hierarchical Bayesian models 2
3
Zhenning Li 4
Department of Civil and Environmental Engineering, University of Hawaii at Manoa 5
2500 Campus Road, Honolulu, HI 96822 6
Tel: 808-745-9048; Email: li2016@hawaii.edu
7
8
Cong Chen, Ph.D. 9
Research Associate 10
Center for Urban Transportation Research, University of South Florida 11
4202 East Fowler Avenue, CUT100, Tampa, FL 33620, USA 12
Tel: 813-974-2344. E-mail: congchen1@cutr.usf.edu
13
14
Yusheng Ci, Ph.D. 15
Associate Professor 16
Department of Transportation Science and Engineering, Harbin Institute of Technology 17
73 Huanghe Road, Harbin, Heilongjiang 150090 18
Email: ciyusheng1999@126.com
19
20
Guohui Zhang, Ph.D., Corresponding Author 21
Assistant Professor 22
Department of Civil and Environmental Engineering, University of Hawaii at Manoa 23
2500 Campus Road, Honolulu, HI 96822 24
Email: guohui@hawaii.edu
25
26
Qiong Wu, Ph.D. 27
Department of Civil and Environmental Engineering, University of Hawaii at Manoa 28
2500 Campus Road, Honolulu, HI 96822 29
Email: wuqiong@hawaii.edu
30
31
Cathy Liu, Ph.D. 32
Assistant Professor 33
Department of Civil and Environmental Engineering, University of Utah 34
110 Central Campus Drive, 2137 MCE, Salt Lake City, UT 84112 35
Email: cathy.liu@utah.edu
36
37
Zhen (Sean) Qian, Ph.D. 38
Assistant Professor 39
Civil and Environmental Engineering, Carnegie Mellon University 40
Pittsburgh, PA 15213-3890 41
Email: seanqian@cmu.edu
42
2
Abstract 1
Traffic crashes are more likely to occur at intersections where the traffic environment is 2
complicated. In this study, a hybrid approach combining cluster analysis and hierarchical 3
Bayesian models is developed to examine driver injury severity patterns in 4
intersection-related crashes based on two-year crash data in New Mexico. Three clusters are 5
defined by K-means cluster analysis based on weather and roadway environmental conditions 6
in order to reveal drivers’ risk compensation instability under diverse external environment. 7
Hierarchical Bayesian random intercept models are developed for each of the three clusters as 8
well as the whole dataset to identify the contributing factors on multilevel driver injury 9
outcomes: property damage only (Level I), complaint of injury and visible injury (Level II), 10
and incapacitating injury and fatality (Level III). Model comparison with an ordinary 11
multinomial logistic model omitting crash data hierarchical features and cross-level 12
interactions verifies the suitability and effectiveness of the proposed hybrid approach. Results 13
show that a number of crash-level variables (time period, weather, light condition, area, and 14
road grade), vehicle/driver-level variables (traffic controls, vehicle action, vehicle type, 15
seatbelt used, driver age, drug/alcohol impaired, and driver age) along with some cross-level 16
interactions (i.e., left turn and night, drug and dark) impose significantly influence driver 17
injury severity. This study provides insightful understandings of the effects of these variables 18
on driver injury severity in intersection-related crashes and beneficial references for 19
developing effective countermeasures for severe crash prevention. 20
21
Keywords: Driver injury severity, Intersection-related crash, Cross-level interaction, K-means 22
cluster analysis, Hierarchical Bayesian model 23
24
3
1. Introduction 1
Intersections are widely considered as hazardous locations among all the road segments 2
because of crossing traffic streams (e.g., motorized vehicles, non-motorized vehicles, and 3
pedestrians) from conflicting paths. The severe consequence is evidenced by crash statistics 4
from all over the world. In the United States, approximately 26% of fatal crashes and 50% of 5
injury crashes nationwide occurred at intersections (FHWA, 2012), resulting in about 9,000 6
fatalities per year (NHTSA, 2015). In Canada, over 30% fatal crashes and 40% of serious 7
injuries occurred at intersections (Barua et al., 2010). Meanwhile, it was reported by Japan 8
Metropolitan Police Department that intersection and near-intersection crashes comprised 9
58.0% of total crashes (Hong et al., 2016). Hence, there have been extensive studies focusing 10
on injury severity outcomes in intersection-related crashes. For instance, Chen et al. (2012) 11
assessed the impacts of several risk factors in intersection related-crashes and found that 12
driver age, gender, speed zone, traffic control type, time of day, crash type and seat belt usage 13
are critical factors associated with crash severity. Wu et al. (2016b) studied the similarities 14
and differences between teenage and adult drivers in intersection-related crashes with two 15
multinomial logit (MNL) regression models, and then proposed several effective safety 16
solutions for the two driver groups to reducing intersection-related crash injury severities. 17
Even though contemporary crash injury severity research has already brought 18
abundantly appropriate conclusions and highlights, there are still some concerns and 19
limitations remaining unaddressed, as was summarized by Mannering and Bhat (2014). 20
Among these limitations, risk compensation instability, resulting from drivers’ active 21
behavioral adjustment to adverse external weather and roadway environmental conditions to 22
reduce to and maintain a low perceived driving risk, makes it difficult to accurately capture 23
the impact of external environment variables on traffic safety. Given the significant diversity 24
in weather (e.g., sunny, overcast, snow, rain, fog, etc.) and roadway environment (e.g., 25
straight, curved, graded, hilltop, etc.) conditions, the influences of these factors as well as 26
drivers’ risk compensation also vary tremendously over crash records, and modeling these 27
influences based on the overall dataset may not always yield accurate estimation. To solve 28
this issue, an appropriate technique is to apply cluster analysis to divide the entire dataset into 29
homogeneous sub-datasets, within each of which data records have similar weather and road 30
environmental characteristics, and the risk compensation effect is expected to be more stable. 31
Cluster analysis has been widely applied in traffic safety analysis. For instance, a latent class 32
cluster analysis (LCA) method was used to identify the homogenous latent class clusters in 33
pedestrian crashes (Sasidharan et al., 2015). A two-stage clustering method consisting of 34
self-organizing maps (SOM) was developed to analyze data from the General Estimates 35
System (GES) crash database (PRATO and Kaplan, 2013). Of all the primary cluster analysis 36
methods, K-means has already been the most popular in traffic safety analysis (Feng et al., 37
2016; Mauro et al., 2013; Mohamed et al., 2013; Zheng et al., 2014). Therefore, in this study, 38
a K-means cluster analysis is conducted to define external environment information based 39
distinctive sub-datasets for driver injury severity pattern investigation and influence 40
estimation, and also put some insights to alleviate the varied effects of risk compensation 41
instability issue. 42
Besides the risk compensation instability issue, unobserved heterogeneity, resulting from 43
4
unobservable contributing factors and data, impose a significant effect on model estimation 1
accuracy and therefore is also a critical issue to be addressed in traffic safety model 2
development (Mannering et al., 2016). Additionally, unobserved heterogeneity is also hidden 3
behind the observed data and variables. Different types of crash contributing variables though 4
could have their discrete influence to some extent, they may also have potentially inextricable 5
interrelationships between each other, and thus impose a sophisticated impact on injury 6
severity as well. For example, it is understandable that drivers under different ages may use 7
similar efforts to maintain safe driving status on straight roadways; however, on horizontally 8
or vertically curved road segments, given their physical conditions, senior drivers may need 9
additional efforts and time to maintain safe driving than drivers of other age groups, and 10
therefore may suffer higher crash risks or more severe injury outcomes (Chen et al., 2015b) . 11
A number of studies have been conducted to address this issue using different methods 12
(Haque et al., 2010; Huang et al., 2008; Zeng and Huang, 2014). For instance, dea et al. 13
(2011) and Chen et al. (2015a) applied Bayesian network models to illustrate the statistical 14
interdependence among crash injury contributing factors in probabilistic form. Chen et al. 15
(2015b) verified the cross-level interactions effects on driver injury outcome through a 16
random intercept model. 17
The multilevel structure of crash data justifies the necessity and advantage of 18
hierarchical/multilevel models in crash severity data modeling, where crash data structure 19
between-level variance and within-level correlation of injury severity are taken into 20
consideration (Huang and Abdel-Aty, 2010). Among the prevalence of hierarchical model 21
applications in traffic safety analysis field, Bayesian inference method is one of the major 22
approaches (Abdel-Aty et al., 2012; Huang and Abdel-Aty, 2010; Yu and Abdel-Aty, 2013). 23
For instance, Huang et al. (2008) developed a Bayesian hierarchical binomial logistic model 24
to identify the significant factors on driver injury severity and vehicle damage in traffic 25
crashes at signalized intersections. Yu and Abdel-Aty (2014) utilized hierarchical Bayesian 26
binary probit models to analyze the crash injury severity on the mountainous freeway and an 27
urban expressway. Therefore, in this study, a hierarchical Bayesian modeling technique is 28
used to capture driver injury severity distributions. 29
The above statements provide comprehensive understandings regarding risk 30
compensation instability, unobserved heterogeneity, and hierarchical structure of crash data. 31
Each of these is proposed in correspondence to crash data and traffic safety model issues, and 32
it is necessary to address all of them in advanced traffic safety model development. However, 33
in the previous studies, there is a lack of models that could consider all these issues. 34
Therefore, in this study, the K-means cluster analysis is implemented to handle risk 35
compensation instability issue-various influence across crash records from external 36
environment factors, and then a hierarchical Bayesian method regarding a two-level data 37
structure (crash-level and vehicle-level) is developed to analyze the impacts of risk factors 38
and cross-level interactions on driver injury severity in intersection-related crashes. The 39
unobserved heterogeneity is simulated with the random parameters following certain 40
pre-determinate distributions and estimated with a Monte Carlo Markov Chain (MCMC) 41
algorithm. The remainder of the paper is organized as follows: Section 2 presents the detailed 42
information of the studied crash dataset, followed by Section 3 describing the methodology 43
5
applied for analyzing risk factors associated with driver injury severity. Section 4 discusses 1
model estimates and marginal effects of the identified contributing factors. Last, Section 5 2
summarizes the major findings of this study and stimulates thoughts about the policy 3
implications for enhancing traffic safety. 4
5
2. Data 6
A two-year crash dataset including all intersection-related crashes in New Mexico from 7
2010 to 2011 is obtained for this research from the Traffic Safety Division at New Mexico 8
Department of Transportation (NMDOT) and the Geospatial and Population Studies 9
Transportation Research Unit at the University of New Mexico. The intersection-related 10
crashes here are defined as the crashes occurring at or around intersections based on NMDOT 11
definitions (NMDOT, 2011a, b). The entire dataset is integrated from three major 12
sub-datasets, including crash dataset presenting the details of each crash, including crash 13
occurrence time and locations, the number of vehicles and persons involved, collision type, 14
weather, and roadway geometrical and environmental characteristics at the crash occurrence, 15
vehicle dataset containing explicit information regarding characteristics of each vehicle in a 16
crash such as vehicle type, vehicle action, occupant injury outcome, travel lane features, 17
traffic control protocol, etc., and driver dataset describing the detailed information of each 18
driver, including driver injury outcome, driver demographic features, and driver behavioral 19
characteristics at crash occurrence. Given the one-to-one association between a vehicle and 20
its driver and many-to-one association between vehicle and crash, all these variables are 21
classified into two hierarchical levels, crash-level and vehicle/driver-level. 22
In this research, 4,603 incomplete records and those with erroneous information from 23
original datasets, such as those with driver age under ten years old or driver gender being 24
unknown, were removed by a careful examination. Finally, 49,073 accurate records are 25
retained for the modeling purpose. The records are evenly distributed across the years. As 26
documented by NMDOT (NMDOT, 2011a, b), driver injury severity is classified into a 27
KABCO scale as K-killed, A-incapacitating injury, B-visible injury, C-complaint of injury, 28
and O-no apparent injury. In this study, the driver injury is classified into three categorical 29
levels: property damage only (Level I, original category O), complaint of injury and visible 30
injury (Level II, original categories B and C), and incapacitating injury and fatality (Level III, 31
original categories A and K), for model structure simplification and estimation efficiency 32
improvement. Continuous integer variables and multi-categorical variables with excessive 33
amounts of original values are modified accordingly to improve modeling efficiency as well, 34
based on our previous studies and engineering experience (Chen et al., 2017, 2015a; Wu et al., 35
2016). The detailed information of the dataset is shown in Ta ble 1. 36
37
TABLE 1 Variable Definition and Summary Statistics 38
39
Variable Driver injury severity Total
Level I Level II Level III
Severity 39,947 81.40% 8,523 17.37% 603 1.23% 49,073
Crash-level variables
6
Day
Sunday 3,031 80.25% 697 18.45% 49 1.30% 3,777
Monday 6,126 81.65% 1,301 17.34% 76 1.01% 7,503
Tuesday 6,377 81.06% 1,389 17.66% 101 1.28% 7,867
Wednesday 6,363 81.20% 1,378 17.59% 95 1.21% 7,836
Thursday 6,435 82.33% 1,287 16.47% 94 1.20% 7,816
Friday 7,289 81.82% 1,512 16.97% 108 1.21% 8,909
Saturday 4,326 80.63% 959 17.88% 80 1.49% 5,365
Time Period
Morning 14,655 81.34% 3,102 17.22% 259 1.44% 18,016
Afternoon 13,927 81.59% 2,946 17.26% 197 1.15% 17,070
Evening 9,276 81.73% 1,961 17.28% 112 0.99% 11,349
Night 2,089 79.19% 514 19.48% 35 1.33% 2,638
Weather
Clear 37,411 81.25% 8,047 17.48% 582 1.27% 46,040
Rain 1,382 81.82% 296 17.53% 11 0.65% 1,689
Snow 1,073 86.53% 160 12.90% 7 0.56% 1,240
Fog 45 75.00% 13 21.67% 2 3.33% 60
Dust 36 81.82% 7 15.91% 1 2.27% 44
Light
Dark 5,300 80.44% 1,205 18.29% 84 1.27% 6,589
Dark with light 1,246 81.44% 267 17.45% 17 1.11% 1,530
Daylight 33,401 81.56% 7,051 17.22% 502 1.23% 40,954
Area
Rural 1,810 75.33% 505 20.99% 91 3.78% 2,406
Urban 38,137 81.72% 8018 17.18% 512 1.10% 46,667
Road Character
Straight 37,896 81.23% 8,165 17.50% 590 1.26% 46,651
Curve 2,051 84.68% 358 14.78% 13 0.54% 2,422
Road Grade
Level 35,344 81.44% 7,528 17.35% 527 1.21% 43,399
Hillcrest 924 84.23% 165 15.04% 8 0.73% 1,097
On grade 3,584 80.34% 810 18.16% 67 1.50% 4,461
Dip 95 81.90% 20 17.24% 1 0.86% 116
Number of Vehicles
Single Vehicle 87 73.11% 26 21.85% 6 5.04% 119
Two Vehicles 35,607 82.37% 7,123 16.48% 498 1.15% 43,228
Multiple Vehicles 4,253 74.28% 1,374 24.00% 99 1.73% 5,726
Vehicle/driver-level variables
Road Surface Condition
Dry 36,942 81.21% 7,972 17.53% 573 1.26% 45,487
Wet 2,087 81.08% 461 17.91% 26 1.01% 2,574
Snow 426 90.64% 43 9.15% 1 0.21% 470
7
Ice 389 92.62% 29 6.90% 2 0.48% 420
Moring Water 29 85.39% 5 14.71% 0 0.00% 34
Loose Material 44 83.02% 8 15.09% 1 1.89% 53
Slush 30 85.71% 5 14.29% 0 0.00% 35
Road Pavement
Paved Road 39,796 81.38% 8,503 17.39% 600 1.23% 48,899
Road not paved 151 86.78% 20 11.49% 3 1.72% 174
Traffic Controls
No Control 6,776 78.91% 1,687 19.65% 124 1.44% 8,587
Stop-Yield Sign 8,558 85.35% 1,358 13.54% 111 1.11% 10,027
Signal Control 24,564 80.79% 5,473 18.00% 367 1.21% 30,404
Railroad Gate 49 89.09% 5 9.09% 1 1.82% 55
Number of Lanes
One Lane 10,541 82.60% 2,060 16.14% 161 1.26% 12,762
Two Lanes 15,821 81.37% 3,387 17.42% 236 1.21% 19,444
Multiple Lanes 13,585 80.54% 3,076 18.24% 206 1.22% 16,867
Vehicle Type
Passenger Car 30,431 79.87% 7,161 18.79% 511 1.34% 38,103
Pick-up 8,831 86.11% 1,334 13.01% 91 0.89% 10,256
Semi 463 96.86% 14 2.93% 1 0.21% 478
Bus 222 94.07% 14 5.93% 0 0.00% 236
Age
Young (<25 years) 10,288 82.69% 2,037 16.37% 116 0.93% 12,441
Middle (25~64 years) 25,514 80.95% 5,612 17.81% 393 1.25% 31,519
Old (>64 years) 4,145 81.07% 874 17.09% 94 1.84% 5,113
Action
Straight 28,913 80.11% 6,719 18.62% 460 1.27% 36,092
Overtaking 178 88.12% 22 10.89% 2 0.99% 202
Right Turn 2,961 91.28% 268 8.26% 15 0.46% 3,244
Left Turn 6,398 81.53% 1,328 16.92% 121 1.54% 7,847
U-Turn 79 88.76% 9 10.11% 1 1.12% 89
Slowing 1,060 85.83% 171 13.85% 4 0.32% 1,235
Backing 358 98.35% 6 1.65% 0 0.00% 364
Seatbelt used 39,819 81.58% 8,429 17.27% 563 1.15% 48,811
Drug/Alcohol Impaired 474 74.29% 134 21.00% 30 4.70% 638
Gender
Male 21,445 85.62% 3,343 13.35% 258 1.03% 25,046
Female 18,502 77.01% 5,180 21.56% 345 1.44% 24,027
1
3. Methodology 2
3.1. Methodology Design 3
Extensive discrete choice models have been implemented to analyze the impacts of 4
various risk factors on different injury severity levels, including MNL, ordered logit, binary 5
8
logit, etc. (Mannering et al., 2016). However, there is no consistent conclusion regarding the 1
applicability and effectiveness of these models in crash data analysis, and it is widely 2
acknowledged that the characteristics of data are the critical points to optimal model selection. 3
As revealed in Tab le 1, the dataset from NMDOT is divided into two levels, crash-level and 4
vehicle/driver-level. The hierarchical feature generates between-crash variance and 5
within-crash correlations that need to be adequately represented in model structure. Besides, 6
it is widely known that external environmental conditions, including weather and roadway 7
conditions, pose significant influence on injury severity outcome, and different weather and 8
roadway conditions have distinctive effects and lead to drivers instability in risk 9
compensation behavior, which is hard to capture accurately with a single model analyzing the 10
overall dataset. Therefore, cluster analysis is necessary to divide the whole dataset into 11
distinctive sub-datasets based on weather and roadway conditions. Additionally, although 12
unobserved heterogeneity has been emphasized and addressed with different models in peer 13
research, an important type of unobserved effect, the interactions between crash-level and 14
vehicle level risk factors, has been ignored in most circumstances. For example, as was partly 15
mentioned in Section 1, young drivers and old drivers may have similar safety performance 16
on straight roadways or clear weather, but their performance may vary significantly on 17
curved roads or under adverse weather conditions where older drivers may take additional 18
time and effort to maintain safe driving due to their relative inferior agility and physical 19
conditions. To our best knowledge, Chen et al. (2015b) were the first to verify the 20
significance of these interactions in driver injury severity prediction. Therefore, it is essential 21
to systematically examine these potential interactions in this analysis. 22
With the above considerations, in this study, the K-means cluster algorithm based on the 23
internal homogeneity and externally heterogeneity of external environment factors on the 24
crash-level is applied to deal with drivers’ risk compensation instability. The subtypes 25
classified by K-means cluster analysis together with the overall sample are then studied with 26
hierarchal Bayesian random intercept model, and the cross-level interactions between 27
crash-level and vehicle/driver-level variables, are examined. The parameters representing the 28
interactions are assumed as random parameters following certain pre-determinate 29
distributions and estimated with an MCMC simulation algorithm. 30
31
3.2. K-means Cluster Analysis 32
K-means cluster analysis is a multivariate statistical method to classify different 33
observations into given 𝐾 groups by their internally homogeneous and externally 34
heterogeneous characteristics. It is a reallocation cluster analysis method, and the general 35
steps of process are shown as follows (Jain, 2010): 1) select as many points as the number of 36
desired clusters to create a initial center for each of these clusters; 2) each observation is then 37
associated with the nearest center to create temporary clusters; 3) the gravity center of each 38
temporary cluster is calculated, and these gravity centers become the new cluster centers; 4) 39
each observation is reallocated to the cluster which has the closest center; 5) this procedure is 40
iterated until model convergence is achieved. 41
Several indices can be applied to determining the proper amount of clusters, K, in the 42
cluster model, i.e., KL criterion (Krzanowski and Lai, 1988), CH criterion (Caliński and 43
9
Harabasz, 1974), and CCC criterion (Sarle, 1983). These statistical figures measure the model 1
fit and simultaneously correct the model’s complexity (a less parsimonious model is better). 2
Besides, an entropy criterion as in Eq. (1) can be used to assess the quality of the clustering 3
solution, where 𝑝 denotes the posterior probability that case 𝑖 belongs to cluster 𝐾 and 4
with the convention that 𝑝 ln𝑝0 if 𝑝 0. In a case of perfect classification, the 5
criterion is equal to 1, and for the worst case clustering the value of the criterion is 0. 6
𝐼𝐾1∑∑  



(1) 7
All these indices are used in this study to measure the goodness of fit, and the best model 8
is selected based on the results of these indices. 9
10
3.3. Random Intercept Model Development and Specification 11
In this research, a modified random intercept model developed by Chen et al. (2015) is 12
used to estimate the effects of the selected variables on driver injury severity levels. The 13
injury severity of 𝑖th driver in 𝑗 th crash, 𝑌
 𝑌
,⋯,𝑌
, is defined as a nominal 14
variable with 𝑀 categorical levels, and in this study 𝑀3. Let 𝑌
 follow the generalized 15
categorical distribution, thus 16
𝑌
~𝑐𝑎𝑡𝑒𝑔𝑜𝑟𝑖𝑐𝑎𝑙𝑃,⋯𝑃
 (2) 17
Then, with regard to Level I severity, the relative probabilities for 𝑖th driver getting 18
involved in 𝑚th (Level II, Level III in this study) injury severity crash, 𝜂, can be 19
assumed by the following equation: 20
𝜂 𝑙𝑜𝑔
 𝛽
 𝛽 𝑉
 (3) 21
where 𝑃 is the total number of vehicle/driver-level variables. 𝛽 is the intercept to be 22
estimated in this model, and 𝛽 is the corresponding coefficient of 𝑉 which stands for 23
the 𝑝th vehicle/driver-level variable for the 𝑖th vehicle/driver in 𝑗th crash. 𝛽 and 𝛽 24
represent the within-crash correlations summarized from the regression modeling of 25
crash-level variables, and are defined as, 26
𝛽 𝛾
 𝛾𝐶
 𝜇
 (4) 27 𝛽 𝛾
 𝛾𝐶
 𝜇
 (5) 28
where 𝑁 is the number of crash-level variables. 𝛾 and 𝛾 are coefficients for 𝐶, 29
the 𝑛th crash-level variable in the 𝑗th crash; 𝛾 and 𝛾 are the intercepts for 𝛽 30
and 𝛽; 𝜇 and 𝜇 are random effects representing between-crash variance, which are 31
consistent for vehicles in the same crash but vary across different crashes. By combining Eqs. 32
(3)-(5), the final model can be expressed as: 33
𝑙𝑜𝑔 
𝜂
 𝛾
 𝛾𝐶
 𝜇
 𝛾 𝜇
𝑉

34
∑∑𝛾𝐶

 𝑉 (6) 35
To avoid high model complexity and improve model efficiency in this research, it is 36
assumed that 𝛾𝐶
 𝜇
 in Eq. (4) is trivial in magnitude and could be kept out of 37
the model, leaving 𝛽 𝛾
, and therefore, the whole model is converted to a random 38
intercept model: 39
𝑙𝑜𝑔 
𝜂
 𝛾
 𝛾𝐶
 𝜇
 𝛾𝑉

40
10
∑∑𝛾𝐶

 𝑉 (7) 1
where 𝛾𝐶𝑉 is the cross-level interaction effect between the crash-level variable, 𝐶, 2
and the vehicle/driver variable, 𝑉; 𝜇 is a randomly distributed term and works with 3
𝛾 as the model intercept. 4
In this research, typical non-informative priors (Chen et al., 2016; Meng et al., 2017; Xie 5
et al., 2017) are provided for the unknown parameters where the intercept terms 𝛾, 𝛾, 6
𝛾, and 𝛾 are all assumed to follow the normal distribution (0,1000), the random 7
effect 𝜇 is assumed to follow normal distribution 0, 𝜎
, and 𝜎
is following an 8
inverse Gamma distribution (0.001, 0.001). The model simulation procedure is conducted 9
with a Monte Carlo Markov Chain (MCMC) algorithm in the R platform with package RStan 10
(Carpenter et al., 2016), and 95% highest density interval (HDI) of mean is provided to 11
indicate the significance of the examined variables. Specifically, a variable is considered as 12
significant if the 95% HDI of its estimated mean does not cover 0, and not significant if 13
otherwise (Gelman et al., 2013). 14
15
3.4. Model Performance Measurement 16
For model performance evaluation, Intra-class Correlation Coefficient (ICC) and 17
deviance information criterion (DIC) are applied to measure model performance. ICC is 18
considered as a way to estimate the proportion of between-crash variance in explaining the 19
overall variance, and has been documented in a number of peer studies. The ICC is defined as 20
follows: 21
𝐼𝐶𝐶 


(8) 22
where, 𝜎
is the between-crash variance which is equal to 𝜎
in this research; 𝜎
is 23
the vehicle/driver-level variance, which is equal to
= 3.29 for a hierarchical logistic 24
distribution (Caceres et al., 2009). A large ICC value close to 1 indicates the significance of 25
between-crash variance in explaining the total variance and demonstrates that a hierarchical 26
model is preferable. Literature shows that ICC values larger than 0.25 can be considered high 27
in terms of explaining the variance at the higher level (crash-level considered herein) 28
(Shaheed et al., 2016). 29
DIC is a Bayesian model complexity measure and is generally used to compare models 30
of arbitrary structure (Spiegelhalter et al., 2002). The detailed definition of DIC has been 31
described in many peer studies and therefore is omitted here. As to model comparison, a 32
model with lower DIC value is typically preferred. 33
34
3.5. Variable Impact Analysis and Pseudo-elasticity 35
Previous studies showed that the variable impact cannot be directly determined by the 36
sign of the estimated mean when analyzed by logit models with multiple injury levels (Chen 37
et al., 2015b), and pseudo-elasticity analysis is needed to evaluate the influence of indicator 38
variables based on the posterior probability change before and after altering the value of each 39
contributing variable. Direct pseudo-elasticity is calculated as the percentage change in 40
probability when an indicator variable is switched (i.e., from 0 to 1 or from 1 to 0): 41
11
𝐸
   
  (9) 1
where 𝑥 is the 𝑗th variable associated with 𝑚th injury severity for 𝑖th crash. 𝑃 is 2
the probability of the crash injury severity (𝑚th) for a given crash (𝑖th) and is expressed as 3
𝑃 

 (10) 4
where 𝛽 donates the coefficient of 𝑥. The direct pseudo-elasticity of a variable value 5
is calculated for each data record, and the average pseudo-elasticity is summarized based on 6
all data records and is used to measure variable influence. 7
8
4. Estimation Results 9
4.1. K-means Cluster Analysis Results 10
In order to address driver risk compensation instability issue under different 11
combinations of road and weather conditions, K-means cluster analysis is conducted in this 12
study. Variables related to road and weather environment in crash-level from the dataset, i.e., 13
day, time period, weather, light condition, area, road character, road grade, and the number of 14
lanes, are utilized in the cluster analysis. In this K-means cluster analysis procedure, one of 15
the critical processes is to determine the best number of clusters, K. Thus, a specified test of 16
K (ranging from 2 to 6) is conducted using the Nbcluster package in R (Charrad et al., 2014). 17
Each test model (K=2, 3, etc.) is set to keep running until it comes to convergence at a 18
statistically significant level of 95% (p-value<0.05). 19
The KL, CH, and CCC indices together with I𝐾 are applied to measure the model 20
goodness-of-fit, and the final results of these criterions are illustrated in Tab le 2. It shows that 21
all the indices get their maximum amount when clustering the entire dataset into three subsets, 22
indicating that the three-cluster configuration is the most appropriate solution. 23
After 19 iterations, the 3-subtype K-means cluster analysis came to convergence. The 24
total 49,073 records are classified into three clusters with 20,194, 9,641, and 19,238 records, 25
and are labeled as Cluster 1, Cluster 2, and Cluster 3, respectively. F-test is employed in the 26
cluster analysis process, and only the variables statistically significant at 95% confidence 27
level (p-value<0.05) are taken into consideration. Ta ble 3 presents the descriptions of 28
significant variables and their frequency differences between each cluster and the overall 29
sample. The frequencies of various variables in the three clusters together with the overall 30
sample are illustrated in the first four columns. The frequency differences (in percentage) 31
between each cluster and the overall sample are presented in the last three columns. The 32
values here falling outside the range between -10% and 10% are highlighted in boldface. 33
From Tabl e 3, the major findings are summarized below: 34
Cluster 1 is characterized by slight differences with the overall sample. Table 3 shows that 35
the deviation between Cluster 1 and the overall sample is quite small that only a few different 36
percentages of values fall outside the range between -10% and 10%. More specifically, 37
Cluster 1 and the overall sample are similar in a majority of variables, and the variables of 38
higher differences are more likely to be those with less frequency, i.e., rural (mean=0.04), and 39
curve (mean=0.04). Taking the overall sample as a normal set of various variables, Cluster 1 40
can be labeled as “records with normal road and weather conditions”. 41
Cluster 2 is far more different from the overall sample. The frequencies of the adverse 42
12
weather values in Cluster 2, i.e., evening, rain, are much higher than those in the other 1
clusters. As shown in Table 3, the frequencies of night and evening in Cluster 2 are about 50% 2
higher with regard to the overall sample; furthermore, the difference in frequency is more 3
than 100% for a number of variable values, such as rain, snow, dark, rural, etc. As a result of 4
these high percentages of the adverse conditions, the ratios of favorable conditions reach the 5
lowest among all the three clusters. Therefore, based on this character, Cluster 2 can be 6
labeled as “records with inferior road and weather conditions”. 7
Cluster 3 is characterized by relatively low frequencies of those variables with higher 8
frequencies and representing adverse weather and roadway conditions in Cluster 2. For 9
instance, the frequencies of rain in Cluster 2 and Cluster 3 are 0.09 (1.5 times more than the 10
overall sample), and 0.01 (80% less than the overall sample), respectively. On the contrast, 11
the frequencies of good conditions, i.e., the clear days, daylight, level road (mean=0.98, 0.97, 12
and 0.93, respectively) are all with the largest mean among the three clusters. Thus, Cluster 3 13
is labeled as “records with favorable road and weather conditions”. 14
15
TABLE 2 Results of Model Criterions 16
Clusters Criterions
KL CH CCC I(K)
2 3.0016 334.1378 78.4458 0.6737
3 3.6384 466.6439 171.0089 0.8425
4 1.0315 235.3864 104.4972 0.6395
5 2.8309 234.0900 47.4255 0.4093
6 0.7409 143.9131 81.7309 0.4289
17
TABLE 3 Variable Descriptions and Frequency Differences between Each Cluster and 18
the Overall Sample 19
Variables Mean Different percentage
Cluster1(C1) Cluster2(C2) Cluster3(C3) Overall(O) (C1-O)/O (C2-O)/O (C3-O)/O
Day (Yes=1, No=0)
Monday 0.15 0.10 0.18 0.15 -1.99% 1.01% 1.58%
Tuesday 0.16 0.11 0.19 0.16 -2.42% 1.77% 1.65%
Wednesday 0.15 0.12 0.19 0.16 -4.61% 7.50% 1.08%
Thursday 0.16 0.13 0.17 0.16 0.24% -3.10% 1.30%
Friday 0.18 0.16 0.19 0.18 1.58% -0.87% -1.22%
Saturday 0.12 0.19 0.06 0.11 6.08% -8.45% -2.15%
Sunday 0.08 0.18 0.02 0.08 5.71% -0.81% -5.58%
Time period (Yes=1, No=0)
Morning 0.36 0.25 0.44 0.37 -2.75% -31.23% 18.53%
Afternoon 0.34 0.28 0.38 0.35 -1.29% -18.08% 10.41%
Evening 0.24 0.38 0.15 0.23 3.64% 64.28% -36.03%
Night 0.05 0.08 0.04 0.05 -7.14% 53.71% -19.42%
Weather (Yes=1, No=0)
13
Clear 0.93 0.83 0.98 0.93 -0.17% -10.62% 5.50%
Rain 0.04 0.09 0.01 0.03 6.18% 157.50% -85.42%
Snow 0.02 0.05 0.00 0.02 -2.05% 170.64% -83.36%
Light condition (Yes=1, No=0)
Daylight 0.82 0.61 0.97 0.83 -2.18% -27.38% 16.01%
Dark 0.12 0.34 0.05 0.13 -10.45% 150.40% -64.41%
Dark with
light
0.03 0.07 0.01 0.03 6.57% 117.79% -65.93%
Area (Yes=1, No=0)
Urban 0.96 0.89 0.97 0.95 1.36% -6.70% 1.93%
Rural 0.04 0.11 0.03 0.05 -26.37% 129.96% -37.45%
Road character (Yes=1, No=0)
Straight 0.96 0.90 0.96 0.95 1.19% -4.94% 1.23%
Curve 0.04 0.10 0.04 0.05 -22.84% 95.34% -23.75%
Number of vehicles (Yes=1, No=0)
Single
Vehicle
0.00 0.00 0.00 0.00 -18.32% -0.53% 19.49%
Two
Vehicles
0.86 0.85 0.91 0.88 -1.88% -3.31% 3.63%
Multi-vehicle 0.13 0.14 0.09 0.12 14.54% 21.97% -26.28%
Road grade (Yes=1, No=0)
Level 0.88 0.78 0.93 0.88 0.23% -11.82% 5.69%
Hillcrest 0.02 0.03 0.02 0.02 -1.20% 17.85% -7.69%
On grade 0.09 0.19 0.04 0.09 -1.40% 107.83% -52.57%
1
4.2. Hierarchical Bayesian Random Intercept Model Simulation and Performance 2
As noted above, the dataset is classified into three clusters by K-means cluster analysis. 3
The major differences among these clusters are variable values regarding road and 4
environmental conditions. To identify the variables’ impacts on driver injury severity, 5
hierarchical Bayesian random intercept models are developed to examine not only the overall 6
sample but also all the three clusters respectively. All the crash-level and vehicle/driver-level 7
variables listed in Tab le 1 are used for modeling purpose. In each of these models, to avoid 8
pseudo-convergence that is typically caused by multimodality of the stationary distribution, 9
three chains using different starting values are simulated for 100,000 iterations, and the trace 10
plot of the iteration chains are carefully examined to verify that all of the chains converge to 11
the same target distribution. The number of interactions when model reaches convergence, 12
known as the “burn-in” (Plummer et al., 2006), in each model is 60,000, 50,000, 40,000, and 13
50,000 for the overall model, Cluster 1 model, Cluster 2 model and Cluster 3 respectively. A 14
thinning scheme to retain every tenth sample is used to reduce the autocorrelation after the 15
“burn-in”, resulting in a total of 4,000, 5,000, 6,000, and 5,000 samples to be considered for 16
posterior parameter estimation. The model simulation procedure was conducted with a Monte 17
Carlo Markov Chain (MCMC) algorithm. Tables 4-7 illustrate the statistically significant 18
variables and their impacts on driver injury severity in terms of posterior mean, standard 19
14
deviation (Sd.), and 95% HDI for the posterior mean for the overall dataset and Clusters 1-3, 1
respectively. Note that the estimation results are discussed with respect to the reference injury 2
severity outcome (Level I). 3
As shown in Ta bles 4 , 5, 6, and 7, the variance of the random effect indicating the 4
magnitude of the between-crash variance correlation for injury crashes in the four models are 5
4.03, 4.32, 4.21, and 5.35, respectively. Thus, based on Eq. (8), the Intra-Class Correlation 6
(ICC) of each model can be calculated and are equal to 0.55, 0.54, 0.49, and 0.36, 7
respectively. The results indicate that 55%, 54%, 49%, and 36% of total variance in injury 8
severity resulted from between-crash variance based on the corresponding dataset, justifying 9
the use of hierarchical model structure on the studied datasets, verifying the appropriateness 10
of the final incomes in each proposed model structure. 11
A model performance comparison analysis is conducted by applying both the 12
hierarchical Bayesian random intercept model and an ordinary multinomial logit model 13
individually on the overall dataset, and the analysis results are illustrated in Table 8. The 14
ordinary MNL model here was developed without considering the hierarchical structure of 15
the crash data. As shown in Ta ble 8, the hierarchical Bayesian random intercept model has a 16
lower DIC value even with a much higher model complexity because of the cross-level 17
interaction terms, indicating that the proposed random intercept model has better performance 18
in model fit and parameter estimation, and that taking data structure into consideration could 19
substantially benefit the model development. 20
21
4.3. Factors Affecting Driver Injury Severity in Intersection-related Crashes 22
Tabl es 4, 5, 6, and 7 show the estimation results of the overall sample, Cluster 1, Cluster 23
2, and Cluster 3 models, respectively. As noted above, the average pseudo-elasticity is 24
implemented in this study to evaluate the marginal effects of the significant factors. The 25
results of pseud-elasticities are presented in Ta ble 9. It should be noted that not only those 26
significant variables in crash-level and vehicle/driver-level but also the cross-level interaction 27
variables are estimated in Ta bles 4, 5, 6, 7, and 9. Similar to our previous study (Chen et al. 28
2015), this study primarily focuses on the variables that have significant impacts on driver 29
incapacitating injury and fatality (Level III), and the influences of these variables on 30
non-incapacitating injury and complaint injury (Level II) will be discussed accordingly and 31
briefly. 32
33
4.3.1. Time period 34
Night is found to be a critical risk factor for driver injury severity when compared to 35
morning period. As shown in Tables 4 , 5, and 6, night is significantly aggravating injury 36
severity in the overall sample, Cluster 1, and Cluster 2. Furthermore, the average 37
pseudo-elasticity in Table 9 also shows that night would increase the probabilities of Level II 38
and Level III driver injuries in these three datasets (12.19%, 19.04%, and 15.31% specific to 39
Level II; 7.97%, 29.72%, and 42.17% specific to Level III). The finding is in line with 40
previous studies, the low visibility conditions, the fatigued drivers, and the higher proportion 41
of drug and alcohol use could be the causes leading to severe driver injury severities in 42
intersection-related crashes (Wang et al., 2015; Ye and Lord, 2014; Sandt et al., 2016). In 43
15
order to improve the visibility during nighttime, retroreflective pavement markings and signs 1
can be implemented to delineate the roadway alignment and decision points such as 2
intersections (Smadi et al., 2008). In addition, appropriate signs and markings placed in 3
highly visible locations can also help to mitigate the likelihood of severe driver injury since 4
they can provide advance warning of upcoming conditions and more time for drivers to make 5
decisions. There are already some fatigue detection and management approaches, including: 6
(1) online operator fatigue detection technologies, (2) fitness-for-duty indicators, and (3) 7
bio-mathematical scheduling models (National Academies of Sciences and Medicine, 2016). 8
Although the implementation of these approaches poses a variety of challenges since many of 9
these devices have not been tested in third-party randomized controlled studies, they will 10
have a wide range of applications. In addition, roadway rumble strips, lane departure warning 11
systems, forward collision warning, blind-spot object detection, adaptive cruise control, 12
automatic emergency braking, and other infrastructure-based or vehicle-based techniques also 13
have impacts on mitigating the effects of fatigue. 14
15
4.3.2. Weather 16
The results in Table 9 show that adverse weather conditions, i.e., snow and rain, lead to 17
various influences on driver injury severity among different datasets. It is found that under 18
snow weather conditions, the probability of getting involved in Level II injury decreases by 19
39.43% and 33.53% in overall sample and Cluster 1, whereas increases by 57.06% in Cluster 20
2, comparing to those under clear weather. Similarly, rain also shows the same flip effects on 21
the above-mentioned subtypes (-47.00%, -43.29%, and 20.61% specific to Level III). The 22
results are in line with previous research that the influence mechanism of adverse weather 23
conditions is still a controversial topic, but the influence is more stable within each cluster. 24
Some researchers hold the opinions that the snow or rain may cause the drivers to be more 25
cautious and drive slowly and the subsequent collisions occurring during snow or rain result 26
in less severe injuries (Eluru et al., 2012), which is known as risk compensation. However, 27
others reported that inclement weather made roads less skid resistant, which resulted in 28
reduced braking and steering capability, and worse impact angles leading to more severe 29
injuries (Kim et al., 2007), indicating the risk compensation effect is not always consistent. In 30
this study, with the assistance of cluster analysis, it is found that the weather effects in 31
Clusters 2 and 3 varies significantly and verifies the risk compensation effect. These diverse 32
effects also justify the necessity of using cluster analysis to separate the overall dataset into 33
different subsets. On the contrast, the favorable impact of snow and rain on driver injury 34
severity in more common weather and road environment datasets, including the overall 35
sample and the Cluster 1 in this study, is similar to the some research results that developed 36
based on the nation-wide or state-wide dataset without considering dataset classification 37
(Chen et al., 2016; Wu et al., 2014). Countermeasures prior to the areas suffering adverse 38
weather to provide hazard warnings, such as lighted variable message signs (VMSs), radiant 39
advisory and regulatory variable speed limits, and/or real-time information display devices, 40
can definitely benefit driving safety in adverse weather conditions. In addition, usage of 41
highway advisory radio prior to and within these areas to disseminate weather hazard 42
information should also have favorable effects. 43
16
1
4.3.3. Light condition 2
Darkness is estimated to be significantly related to driver injury severity to all the four 3
datasets. Results in Table 9 show that the presence of dark condition would undoubtedly 4
increase the probabilities of Level III injuries by approximately 35% to 75%. It is expected 5
since the poor visibility in dark conditions may cause the drivers less agile and unable to react 6
in ways that can reduce the consequences of a crash as it starts to develop. According to a 7
report of National Highway Traffic Safety Administration (NHTSA), the driver fatality rate 8
for driving at dark is three to four times that for that in better light condition driving (Mokdad 9
et al., 2004). Previous studies also showed similar findings and conclusions (Chen et al., 2016, 10
2015b; Pour-Rouholamin and Zhou, 2016). Considering the high crash risk under dark 11
condition, appropriate lighting utility implementation or enhancement is an effective 12
countermeasure to ensuring sufficient driver sight distance and reaction time, especially in 13
dense urban areas. 14
15
4.3.4. Area 16
This variable represents whether the crash occurred in the urban or rural area. The urban 17
area seems considerably safer than rural area because of the low pseudo-elasticities in all the 18
models and both Levels II and III injuries. The reason for it may be that the signage and road 19
markings are more common and appropriate in an urban area than those in rural areas. 20
Besides, vehicles in the urban area generally have lower speeds than those in rural areas due 21
to higher traffic density, denser traffic facility distribution, and more complicated traffic 22
conditions at urban intersections. Previous studies also figured out that driver fatality rates in 23
most rural counties are almost double what they are in urban counties (NHTSA, 2016; 24
Schwab, 2009). Some low-cost improvements for rural intersections including flashing 25
solar-powered LED beacons on advance intersection warning signs and stop signs, dynamic 26
warning sign advising through traffic that a stopped vehicle is at the intersection and may 27
enter the intersection, extension of the through edge line using short skip pattern to assist 28
drivers to stop at the optimum point, etc., could have benefits on reducing the possibility of 29
severe crashes (Chatterjee and Mcdonald, 2004; Hunter et al., 2012). 30
31
4.3.5. Road grade 32
Road grade at crash occurrence is also a critical predictor of driver injury severity, as is 33
revealed in Tables 4, 5, 6. The results indicate that graded roadways will increase the 34
possibilities of both Level II and Level III injuries in the overall sample, Cluster 1, and 35
Cluster 2. The results are understandable since it takes additional efforts from drivers by 36
applying vehicle gas pedals and brakes more frequently to keep the vehicle stable while 37
driving graded roadways, which on the other hand also increases the risk of brake failure and 38
vehicle losing control. Enhanced delineation treatments can alert drivers in advance of grade 39
roads and vary by the severity of the grade and operating speed. In addition, high friction 40
surface treatment also can be implemented to help the drivers to maintain speeds when 41
driving on steep roads. 42
However, another adverse road grade variable, hillcrest, shows a rather complicated 43
17
effect on driver injury severity in different datasets. Table 9 indicates that this variable could 1
decrease the possibilities of getting involved in severe injury (Level III) by 40.65% and 53.08% 2
in overall sample and Cluster 1, respectively; however, it leads to a 42.49% increase of Level 3
III crashes in Cluster 2. This bipolar effect is perhaps a reflection of more careful driving on 4
the hillcrest roadway than under clear weather condition, but increased risk of failure on safe 5
vehicle control with the mixed condition of hillcrest and adverse weather. 6
7
4.3.6. Traffic controls 8
It is expected in the results that the traffic control method in a crash plays a significant 9
role in predicting driver injury severity. As shown in Tabl es 4, 5, 7, and 9, both signal control 10
scheme and stop sign control could decrease the potential of Level II and/or Level III driver 11
injuries in cluster 1 and cluster 3. A probable explanation is that both of these two control 12
schemes lead to a controllable and relatively low driving speed comparing to driving at areas 13
without traffic control schemes. Besides, the roadways with specific control methods, 14
especially traffic signal control, are more likely to be in urban areas which have been 15
demonstrated above to be with the favorable impact on driver injury severity. 16
17
4.3.7. Vehicle action 18
When compared to straight driving, backing and right turn actions are found to be 19
significantly associated with driver injury severity in all the four models. Backing action 20
leads to apparent reductions of probabilities on severe injuries, especially on Level III injuries. 21
It is reasonable since backing vehicles maintain low speed during the action and only have 22
possible contact with other objects at the rear side. Thus, both low speed and sufficient safety 23
gap between the driver and collision point contribute to the low possibilities of the driver 24
being seriously injured. 25
The average pseudo-elasticities of right turn action regarding Level III in the four models 26
are all less than 0, indicating that right turn action has a favorable influence on driver injury 27
severity. The reason could be found in traffic engineering theory, where right turn traffic has 28
the least conflict points than other traffic movements (Cova and Johnson, 2003). Left turn 29
action is found significantly and positively associated with driver injuries, especially on
30
Level III injury, according to its means and elasticities in the overall samples, Cluster 1, and 31
Cluster 2. Ta ble 9 also shows that the cross-level effects between left turn and other 32
crash-level variables, including multi-vehicle and night, tend to aggravate driver injury 33
severity. The result is in line with some previous studies; for instance, left turn was found to 34
have the most significant influence leading to a severe crash at signalized intersections 35
(Abdel-Aty and Keller, 2005). The reason for this finding may be that left-turning traffic is 36
more likely to collide with oncoming traffic or other conflicting traffic at high speeds, 37
resulting in head-on or angle collision and posing a significant impact on driver bodies. 38
39
4.3.8. Vehicle type 40
Bus seems much safer than the passenger car in terms of driver protection, according to 41
its estimated means. As illustrated in Tab le 9, buses significantly decrease the probabilities of 42
Level II injury by 65.84% and 75.63% in the overall sample and Cluster 1 models. This result 43
18
is understandable since buses have a significant advantage in vehicle size and resistance to 1
structural deformation, and therefore protect drivers from devastating collision impact. 2
Besides, buses are generally operated on fixed routes with a pre-determined schedule to 3
ensure timeliness, where bus drivers are very familiar with the traffic environment, leading to 4
less potential for severe injuries. 5
6
4.3.9. Seatbelt use 7
As illustrated in Tab les 4, 5, 6, and 7, driver seatbelt use is an efficient way to protect 8
drivers from suffering severe injuries and fatalities. It shows consistent results that drivers 9
who do not use seatbelts have higher probabilities of suffering severe injury than those 10
wearing seat belts at the crash occurrence. Not only the primary effect of seatbelt use is 11
verified in all the four models for overall sample and Clusters 1-3, a number of cross-level 12
interaction effects between seatbelt use and other crash-level factors, are also significant in 13
reducing driver injury severities. The protective effect of seatbelts has been verified and 14
evaluated in abundant studies (Weiss and Kaplan et al., 2014; Yasmin and Eluru et al., 2014; 15
Chen and Zhang et al., 2016). Therefore, the utilization of seatbelt should be enforced at 16
regional and national levels. 17
18
4.3.10. Driver age 19
Results show that young drivers (<24 years old) behave differently at various road 20
geometry and weather conditions. As shown in Tables 6 and 9, young drivers are more likely 21
to get involved in serious crashes (Level II and Level III) when road and weather condition 22
are dissatisfactory (Cluster 2). On the contrary, they are found to be safer compared with 23
middle-aged drivers when the road and weather environment is normal or favorable, as 24
suggested by the low means and elasticities in Tab les 4, 7, and 9. The reason of this dual 25
effect may be that the driving experience and skills to deal with complex traffic environment 26
that young drivers are generally lack of are not so essential at mediocre or better environment, 27
but play a significant role to ensure safe driving in adverse conditions. Besides, the more 28
exegetic physical conditions and better responsiveness of young drivers allow them to behave
29
as well as or even better than the middle-aged drivers when dealing with the normal 30
environment. Results of the cross-level effects between young and night, young and rain also 31
show that young drivers are more likely to get serious injuries when they are facing complex 32
operating conditions. Graduated driver licensing, learner’s permit length, pre-licensure and 33
post-licensure driver education can help young drivers to acquire driving experience and safe 34
driving experiences in less-risky situations, and therefore decrease the possibility of young 35
drivers suffering severe injuries (Goodwin et al., 2011). 36
Different from young drivers, old drivers are found more likely to suffer Level III 37
injuries in the proposed models, as indicated by the average pseudo-elasticities in Table 9 38
(49.62% for the overall sample, 64.75% for Cluster 2, and 85.11% for Cluster 3). Previous 39
studies showed that the primary reasons for these lie in their chronic medical conditions and 40
functional impairments. Their acute manifestations of chronic conditions (e.g., hypoglycemic 41
attacks) and specific medical diagnoses are found often associated with impairment of skills 42
necessary for successful motor vehicle operations and then lead to severe crashes (Janke, 43
19
1994). Besides, the functional impairments, e.g., vision, cognition, mobility, etc., also 1
contribute to their inferior driving performance (Chen et al., 2016; Hu et al., 1993). Many 2
countermeasures and strategies can help old drivers to drive safer on the road. For instance, 3
formal courses or through communications and outreach provided directly to older drivers or 4
to their relatives can educate and train older drivers to assess their driving capabilities and 5
limitations, and improve their driving skills when possible (DOT, 2003). Some treatments 6
(e.g., eyeglasses, vision-related surgery, etc.) and vehicle adaptations (such as extra mirrors, 7
or extended gear shift levers) may also help old drivers adapt to medical or functional 8
conditions that may affect driving (Goodwin et al., 2011). In addition, sometimes, it is 9
necessary to restrict or revoke driver licenses of the old drivers who cannot drive safely in 10
certain situations or at all. 11
12
4.3.11. Drug/Alcohol-Impaired 13
This factor describes drivers’ state of consciousness and is expected to be significantly 14
associated with driver injury severity in all the four models. As is shown in Table 9, this 15
variable has positive average pseudo-elasticities with considerable magnitude for Level II and 16
Level III injuries in all the four models. Not only the primary effect of drug/alcohol 17
impairment, when it comes together with other dangerous factors, such as night and dark, the 18
pseudo-elasticities increase sharply in the proposed models, especially on Level III injuries 19
(e.g., 282.67% for interaction with night condition in overall sample, and 213.02% for 20
interaction with dark lighting condition in Cluster 2). This provides convincible evidence for 21
educating drivers to keep away from drug and alcohol shortly before driving. Moreover, it is 22
necessary for law enforcement to perform driver with impairment (DWI) tests on roadways 23
on a regular basis, especially at night and dark conditions. 24
To address impaired driving behavior that leads to high fatality and injury risk, a broad 25
series of strategies can be developed, including: 1) administrative license revocation (ALR), 2) 26
publicized sobriety checkpoints, 3) high visibility saturation patrols, 4) preliminary breath 27
test devices (PBTs), 5) passive alcohol sensors, 6) driving while impaired (DWI) courts, and 28
so on. Similar implications can also be drawn from previous studies. A study of the long-term 29
effects of license suspension policies across the US found that ALR reduced alcohol-related 30
fatal crash involvement by five percent and saves approximately 800 lives each year 31
(Wagenaar et al., 2007). Some researchers studied the impacts of checkpoints in seven states, 32
and the results showed that alcohol-related fatalities reduced by 11% to 20% in states that 33
used numerous sobriety checkpoints or other highly visible impaired driving enforcement 34
operations along with intensive publicity of enforcement activities, such as paid advertising 35
(Fell et al., 2008). 36
37
4.3.12. Driver gender 38
Male drivers are less likely to get seriously injured comparing to female drivers 39
regardless of road and weather environment, as are verified by the pseudo-elasticities for all 40
the four models in Table 9, It is noted that only the primary effect of driver gender on driver 41
injury severity has been verified significant, but no significant effects between driver gender 42
and crash-level variables are identified in any of the four models, as are shown in Tables 4-7. 43
20
Sufficient research has similar conclusions with this study (Chen and Chen, 2011; Sivak et al., 1
2010). A previous study focusing on crashes in different weather and road environment also 2
shows that male drivers are much safer than female drivers (Shaheed et al., 2016). 3
4
TABLE 4 Posterior Summaries of Parameter Estimated in the Overall Sample 5
Variable Injury (Level II) Fatal (Level III)
Mean 95% HDI Mean 95% HDI
Intercept 5.72 (5.55, 5.89) 7.26 (6.71, 7.88)
Time period (Morning=Ref)
Night 1.33 (1.13, 1.57) 2.38 (2.15, 2.63)
Weather (Clear=Ref)
Snow -1.87 (-2.26, -1.36) - -
Rain -1.51 (-1.67, -1.33) -1.69 (-2.27, -0.81)
Light condition (Daylight=Ref)
Dark 1.60 (1.15, 2.23) 2.61 (1.71, 3.95)
Area (Rural=Ref)
Urban -3.31 (-3.37, -3.25) -5.96 (-6.75, -5.01)
Road grade (Level=Ref)
On grade 4.92 (4.13, 5.89) 3.89 (3.69, 4.10)
Hillcrest -2.03 (-2.11, -1.94) - -
Traffic controls (No Control=Ref)
Stop-yield sign -7.06 (-7.33, -6.76) -1.18 (-1.56, -0.61)
Signal control - - -6.38 (-6.48, -6.27)
Vehicle action (Straight=Ref)
Right turn -9.01 (-9.07, -8.96) -3.44 (-3.95, -2.81)
Left turn - - 1.94 (1.58, 2.39)
Backing -0.36 (-0.44, -0.26) - -
Slow -5.48 (-5.71, -5.24) -2.22 (-3.22, -0.56)
Vehicle type (Passenger car=Ref)
Bus -3.05 (-3.99, -1.71) - -
Seatbelt used (Not used=Ref)
Used -2.39 (-2.42, 2.36) -1.65 (-1.89, -1.35)
Driver age (Middle=Ref)
Young -3.51 (-4.52, -2.07) -3.44 (-3.92, -2.63)
Old - - -2.10 (-3.11, -0.39)
Drug/Alcohol-Impaired (Not impaired=Ref)
Impaired 3.43 (3.25, 3.62) 4.31 (3.42, 5.46)
Driver gender (Female=Ref)
Male -2.70 (-3.11, -2.20) -1.73 (-1.79, -1.44)
Interactive effects
Left turn Multi-vehicle 3.03 (2.76, 3.33) 4.17 (3.17, 5.52)
Left turn Night 7.85 (7.83, 7.86) 2.88 (2.73, 3.05)
Signal control Night - - -3.88 (-4.42, -3.24)
21
Seatbelt On grade -3.14 (-3.17, -3.11) -2.63 (-3.50, -1.35)
Seatbelt Urban -3.79 (-4.30, -3.17) -7.24 (-7.74, -6.69)
Young Night 4.80 (4.74, 4.85) 3.43 (2.79, 4.24)
Young Rain 2.00 (1.68, 2.39) - -
Drug Night 1.95 (1.44, 2.64) 2.37 (1.39, 3.93)
Drug Dark 1.63 (1.42, 1.89) - -
Between-crash variance 4.03 (0.51)
Intra-crash correlation 0.55
1
TABLE 5 Posterior Summaries of the Parameter Estimate in Cluster 1 2
Variable Injury (Level II) Fatal (Level III)
Mean 95% HDI Mean 95% HDI
Intercept 2.92 (2.83, 3.03) 7.18 (6.26, 8.09)
Time period (Morning=Ref)
Night 4.51 (3.83, 5.35) 3.75 (2.73, 5.16)
Weather (Clear=Ref)
Snow -1.65 (-2.02, -1.27) - -
Rain -8.38 (-9.21, -7.44) -3.79 (-4.30, -3.17)
Light condition (Daylight=Ref)
Dark 3.50 (2.69, 4.93) 2.07 (0.72, 1.59)
Area (Rural=Ref)
Urban -7.36 (-7.58, -7.12) -3.16 (-3.88, -2.21)
Road grade (Level=Ref)
On grade 1.39 (1.06, 1.83) 1.23 (0.96, 1.57)
Hillcrest -6.79 (-8.12, -5.46) - -
Traffic controls (No Control=Ref)
Stop-yield sign -1.91 (-3.61, -0.22) - -
Signal control - - -3.63 (-5.20 , -2.06)
Vehicle action (Straight=Ref)
Right turn -7.99 (-9.26, -6.71) -4.81 (-6.29, -3.33)
Left turn - - 3.80 (3.22, 4.50)
Backing -8.70 (-9.87, -7.52) - -
Vehicle type (Passenger car=Ref)
Bus -8.40 (-8.98, -7.76) - -
Seatbelt used (Not used=Ref)
Used -1.75 (-1.82, -1.68) -1.93 (-2.44, -1.24)
Drug/Alcohol-Impaired (Not impaired=Ref)
Impaired 2.57 (1.82, 3.62) 4.78 (4.55, 5.03)
Driver gender (Female=Ref)
Male -3.10 (-3.20, -2.98) -2.26 (-2.45, -2.04)
Interactive effects
Left turn Night 2.78 (2.55, 3.03) 3.32 (2.62, 4.23)
Signal control Night - - -3.86 (-4.68, -2.78)
22
Seatbelt On grade -4.45 (-5.19, -3.54) - -
Seatbelt Urban -3.47 (-3.76, -3.16) -2.92 (-3.35, 2.42)
Young Night 3.33 (3.07, 3.49) - -
Drug Night 5.41 (4.64, 6.82) - -
Between-crash variance (mean) 3.89 (0.44)
Intra-crash correlation 0.54
1
TABLE 6 Posterior Summaries of the Parameter Estimate in Cluster 2 2
Variable Injury (Level II) Fatal (Level III)
Mean 95% HDI Mean 95% HDI
Intercept 4.58 (2.25, 7.41) 7.94 (5.08, 10.80)
Time period (Morning=Ref)
Night 3.32 (2.62, 4.23) 2.71 (1.83, 3.99)
Weather (Clear=Ref)
Snow 1.40 (1.15, 1.72) 3.02 (2.84, 3.23)
Rain 1.06 (1.04, 1.08) 1.32 (1.16, 1.51)
Fog 3.13 (1.07, 9.14) 1.45 (1.03, 2.12)
Light condition (Daylight=Ref)
Dark 2.13 (2.10, 2.16) 1.14 (1.08, 1.19)
Area (Rural=Ref)
Urban -3.16 (-3.88, -2.21) -2.43 (-2.62, -2.22)
Road grade (Level=Ref)
On grade 1.24 (1.17, 1.32) 1.33 (1.22, 1.41)
Hillcrest 1.89 (1.52, 2.39) - -
Vehicle action (Straight=Ref)
Left turn 1.06 (0.84, 1.28) 1.32 (1.16, 1.51)
Right turn -6.53 (-6.59, -6.47) -2.98 (-3.62, -2.03)
Backing -0.82 (-1.15, -0.30) - -
Seatbelt used (Not used=Ref)
Used -2.31 (-3.69, -1.44) -2.07 (-3.46, -1.22)
Driver age (Middle=Ref)
Young 1.07 (1.05, 1.10) 1.14 (1.10, 1.17)
Old 1.65 (1.35, 2.06) 1.29 (1.19, 1.40)
Drug/Alcohol-Impaired (Not impaired=Ref)
Impaired 2.02 (1.38, 2.96) 2.31 (1.44, 3.69)
Driver gender (Female=Ref)
Male -3.89 (-5.57, -4.13) -3.38 (-3.80, -2.90)
Interactive effects
Left turn Night 1.83 (1.12, 3.00) 2.11 (1.26, 3.53)
Seatbelt Urban -1.80 (-3.13, -0.47) -2.37 (-3.95, -0.80)
Young Night 1.39 (1.03, 1.89) 1.45 (1.05, 1.99)
Young Rain 1.23 (1.12, 1.32) - -
Young Left turn
23
Old Night 2.05 (1.22, 2.69) - -
Drug Night 6.22 (4.18, 8.27) - -
Drug Dark 3.03 (2.62, 3.44) - -
Between-crash variance (mean) 3.22 (0.57)
Intra-crash correlation 0.49
1
TABLE 7 Posterior Summaries of the Parameter Estimate in Cluster 3 2
Variable Injury (Level II) Fatal (Level III)
Mean 95% HDI Mean 95% HDI
Intercept 3.14 (2.96, 3.44) 6.22 (4.18, 8.27)
Light condition (Daylight=Ref)
Dark 2.14 (1.96, 2.31) - -
Area (Rural=Ref)
Urban -4.02 (-4.90, -2.85) -3.44 (-4.22, -2.41)
Traffic controls (No Control=Ref)
Stop-yield sign -2.65 (-3.40, -1.60) -2.46 (-2.89, -1.91)
Signal control - 2.19 (-3.09, -0.76) - -
Vehicle action (Straight=Ref)
Right turn -1.67 (-3.25, -0.10) - -
Backing -4.91 (-6.39, -3.44) - -
Seatbelt used (Not used=Ref)
Used -2.05 (-2.69, -1.22) -2.48 (-3.54, -1.34)
Driver age (Middle=Ref)
Young -3.48 (-4.26, -2.45) -1.75 (-1.92, -1.55)
Old 1.30 (1.02, 1.65) 1.26 (1.20, 1.32)
Drug/Alcohol-Impaired (Not impaired=Ref)
Impaired 1.22 (1.10, 1.37) 1.58 (1.41, 1.73)
Driver gender (Female=Ref)
Male -1.61 (-1.89, -1.35) -4.12 (-5.08, -2.85)
Interactive effects
Signal control Night - - -1.52 (-1.89, -1.03)
Seatbelt On grade -1.97 (-2.64, -0.97) 0.83 (0.76, 0.91)
Seatbelt Urban 0.67 (0.57, 0.71) 0.73 (0.55, 0.89)
Drug Dark 3.03 (2.58, 3.47) - -
Between-crash variance (mean) 1.83 (0.43)
Intra-crash correlation 0.36
3
TABLE 8 Results of Model Comparison 4
Model DIC
Ordinary multinomial logistic model 24390.43
Hierarchical multinomial logistic model 22353.18
5
TABLE 9 Pseudo-elasticity Analysis Results for Proposed Models 6
24
Variable Overall sample Cluster 1 Cluster 2 Cluster 3
I(%)/II(%)/III(%) I(%)/II(%)/III(%) I(%)/II(%)/III(%) I(%)/II(%)/III(%)
Night -2.72/12.19/7.97 -3.38/19.04/29.72 -3.53/15.31/42.17 -/-/-
Rain 0.52/0.90/-47.00 1.11/-3.90/-43.29 -2.74/5.85/20.61 -/-/-
Snow 9.78/-39.43/-90.38 6.50/-33.53/-100.00 -12.20/57.06/773.21 -/-/-
Fog -/-/- -/-/- -17.79/36.68/668.82 -/-/-
Dark -13.19/25.30/34.75 -1.00/25.80/36.91 -22.61/11.59/78.39 -14.88/14.88/74.30
Urban 0.39/-1.07/-10.71 0.21/-0.72/-7.54 1.22/-4.07/-28.28 0.33/-0.82/-5.00
Multi-Vehicle -8.76/38.16/40.70 -/-/- -/-/- -/-/-
Hillcrest 3.47/-13.40/-40.65 2.70/-13.11/-53.08 -6.43/30.99/42.49 -/-/-
On grade -1.31/4.54/22.23 -1.04/7.05/7.50 -1.47/5.58/26.36 -/-/-
Stop-yield sign 4.57/-20.68/-10.15 3.85/-24.98/-14.45 -/-/- 7.58/-23.07/-45.75
Signal control -0.54/3.16/-8.89 0.08/-0.01/-6.37 -/-/- -0.12/0.81/-6.06
Backing 20.82/-90.51/-100.00 15.73/-91.14/-100.00 -/-/- 25.70/-81.76/-100.00
Slow 5.44/-20.28/-73.64 -/-/- -/-/- -/-/-
Left turn 0.16/-2.56/25.49 0.21/-3.90/37.20 0.55/-4.54/17.46 -/-/-
Right turn 12.13/-52.43/-62.37 9.87/-59.55/-28.76 9.21/-43.67/-68.31 16.62/-51.75/-81.90
Bus 15.56/-65.84/-100.00 13.24/-75.63/-100.00 -/-/- -/-/-
Belt used 0.21/-0.57/-6.13 0.26/-0.95/-9.32 0.34/-1.11/-8.37 0.11/-0.20/-2.88
Young 1.59/-5.73/-24.12 -/-/- -0.72/2.28/18.25 2.64/-6.53/-38.82
Old -0.41/-1.58/49.62 -/-/- -0.02/-5.63/64.75 -0.83/-2.70/85.11
Drug -9.52/37.91/94.93 -9.04/46.23/146.21 -17.61/85.49/108.68 -8.01/24.44/47.24
Male 5.18/-23.15/-16.17 1.38/-8.02/-7.67 4.15/-22.26/-2.15 1.97/-7.94/-18.35
Left-turnMulti-vehicle -18.55/73.87/184.92 -/-/- -/-/- -/-/-
Left-turnNight 0.18/-2.23/19.43 -1.29/4.94/45.33 -1.03/5.62/73.52 -/-/-
Signal controlNight -3.11/13.28/18.42 -3.05/15.08/56.84 -/-/- -2.74/9.61/2.91
SeatbeltOn grade -0.91/3.12/15.94 -0.76/6.08/-18.57 -/-/- -/-/-
SeatbeltUrban 0.58/-1.59/-15.71 0.43/-1.55/-15.69 1.55/-5.36/-33.78 0.42/-0.93/-7.67
YoungNight 0.66/-6.96/54.87 -/-/- -5.07/26.24/19.83 -/-/-
YoungRain -3.23/9.36/81.96 -/-/- -6.75/35.35/32.66 -/-/-
DrugNight -8.73/20.93/282.67 -9.96/34.95/390.04 -15.09/51.49/338.23 -/-/-
DrugDark -7.24/21.42/176.81 -/-/- -15.89/66.94/213.02 -6.72/15.18/121.44
1
5. Conclusion 2
This paper applies K-means cluster analysis and hierarchical Bayesian random intercept 3
models to examine driver injury severity in intersection-related crashes. A two-year crash 4
dataset including all intersection-related crashes in New Mexico from 2010 to 2011 is utilized 5
in this research. This paper contributes to contemporary literature regarding a few 6
methodological issues on crash injury severity modeling. First, drivers’ behavior adjustment 7
to sophisticated external environmental conditions leads to risk compensation issue in the 8
crash data model, and the compensation effect is diverse and unstable due to diverse 9
environmental conditions. Besides, most previous studies on injury severity were conducted 10
focusing on the primary and discrete impacts of contributing crash and vehicle/driver 11
25
variables but failing to consider the potential cross-level interactions between crash-level and 1
vehicle/driver-level variables, which have been proved significant in driver injury severity 2
prediction (Chen et al., 2015). Moreover, sufficient understanding of data structure is also of 3
practical importance to capture the unobserved heterogeneity in terms of between-crash 4
variance and within-crash correlation effects in crash data. Therefore, this study applies a 5
K-means cluster analysis to divide the studied dataset into sub-datasets based on weather and 6
roadway environmental conditions, and separate models are used to capture the distinctive 7
environmental effect in each unique sub-dataset. The hierarchical nature and potential 8
unobserved heterogeneity of the studied dataset have been carefully examined in model 9
development, where crash-level variance, between-crash correlation and the cross-level 10
interaction effect between crash and vehicle/driver level variables are fully considered to 11
ensure accurate model results. 12
Three clusters were defined by K-means cluster analysis based on the distribution of 13
weather and roadway environmental conditions. Hierarchical Bayesian random intercept 14
models are applied to the overall data and each of these three clusters. The ICC and DIC 15
values in the results indicate the appropriateness of the proposed model structure and its 16
superiority to ordinary MNL model. The primary variable effects and cross-level effects of 17
crash-level and vehicle/driver level variables were comprehensively examined for each 18
dataset. Results show that a number of crash-level factors (time period, weather, light 19
condition, area, and road grade) along with some vehicle/driver-level factors (traffic controls, 20
vehicle action, vehicle type, seatbelt use, driver age, drug/alcohol impairment, and driver age) 21
have significant influence on driver injury severity. Several road- and weather-related factors 22
are found to have bipolar effects in different clusters, demonstrating that the importance of 23
implementing the K-means cluster analysis method. Furthermore, a number of cross-level 24
impacts are evaluated to be significantly associated with driver injury severity, indicating that 25
this necessity and importance of these interaction effects being included in the model 26
structure. These findings are also useful to understand the respective or combined impacts of 27
these attributes under different conditions. 28
Though this study has drawn some useful findings, there are still some research 29
limitations that may produce inaccurate estimation and affect the model applicability. First, 30
data records with several variable values (snow, dust, backing, etc.) are limited in size due to 31
data insufficiency. The implementation of cluster analysis and multilevel models may 32
produce biased estimation in this circumstance. A better solution to this is that to obtain 33
sufficient dataset of satisfactory quality within larger spatial and temporal domains. Second, 34
though the underlying relationships of the road- and weather-related variables in crash-level 35
and the cross-level effects are estimated by K-means cluster analysis and Bayesian models 36
respectively, the interrelationships of variables within each variable level are not examined in 37
this study. 38
39
Acknowledgment 40
The authors gratefully acknowledge assistance with crash data from the NMDOT. The 41
interpretations are those of the authors and do not necessarily reflect the views of any 42
organizations with which the authors have affiliations. 43
26
1
2
References 3
Abdel-Aty, M., Keller, J., 2005. Exploring the overall and specific crash severity levels at 4
signalized intersections. Accident Analysis & Prevention 37, 417-425. 5
Abdel-Aty, M.A., Hassan, H.M., Ahmed, M., Al-Ghamdi, A.S., 2012. Real-time prediction of 6
visibility related crashes. Transportation Research Part C: Emerging Technologies 24, 288–7
298. doi:10.1016/j.trc.2012.04.001 8
Anastasopoulos, P.C., 2016. Random parameters multivariate tobit and zero-inflated count 9
data models: Addressing unobserved and zero-state heterogeneity in accident injury-severity 10
rate and frequency analysis. Analytic methods in accident research 11, 17-32. 11
Barua, U., Azad, A., Tay, R., 2010. Fatality risk of intersection crashes on rural undivided 12
highways in Alberta, Canada. Transportation Research Record: Journal of the Transportation 13
Research Board, 107-115. 14
Caceres, A., Hall, D.L., Zelaya, F.O., Williams, S.C., Mehta, M.A., 2009. Measuring fMRI 15
reliability with the intra-class correlation coefficient. Neuroimage 45, 758-768. 16
Caliński, T., Harabasz, J., 1974. A dendrite method for cluster analysis. Communications in 17
Statistics-theory and Methods 3, 1-27. 18
Carpenter, B., Gelman, A., Hoffman, M., Lee, D., Goodrich, B., Betancourt, M., Brubaker, 19
M.A., Guo, J., Li, P., Riddell, A., 2016. Stan: A probabilistic programming language. J Stat 20
Softw. 21
Charrad, M., Ghazzali, N., Boiteau, V., Niknafs, A., Charrad, M.M., 2014. Package ‘NbClust’. 22
J. Stat. Soft 61, 1-36. 23
Chen, C., Chen, Y., Ma, J., Zhang, G., Walton, C.M., 2017. Driver behavior formulation in 24
intersection dilemma zones with phone use distraction via a logit-Bayesian network hybrid 25
approach. Journal of Intelligent Transportation Systems 1–14. 26
Chen, C., Zhang, G., Huang, H., Wang, J., Tarefder, R.A., 2016. Examining driver injury 27
severity outcomes in rural non-interstate roadway crashes using a hierarchical ordered logit 28
model. Accident Analysis & Prevention 96, 79–87. 29
Chen, C., Zhang, G., Tarefder, R., Ma, J., Wei, H., Guan, H., 2015a. A multinomial logit 30
model-Bayesian network hybrid approach for driver injury severity analyses in rear-end 31
crashes. Accident Analysis & Prevention 80, 76–88. 32
Chen, C., Zhang, G., Tian, Z., Bogus, S.M., Yang, Y., 2015b. Hierarchical Bayesian random 33
intercept model-based cross-level interaction decomposition for truck driver injury severity
34
investigations. Accident Analysis & Prevention 85, 186–198. 35
Chen, C., Zhang, G., Wang, H., Yang, J., Jin, P.J., Walton, C.M., 2015c. Bayesian 36
network-based formulation and analysis for toll road utilization supported by traffic 37
information provision. Transportation Research Part C: Emerging Technologies 60, 339-359. 38
Chen, C., Zhang, G., Yang, J., Milton, J.C., 2016. An explanatory analysis of driver injury 39
severity in rear-end crashes using a decision table/Naïve Bayes (DTNB) hybrid classifier. 40
Accident Analysis & Prevention 90, 95-107. 41
Chen, F., Chen, S., 2011. Injury severities of truck drivers in single-and multi-vehicle 42
accidents on rural highways. Accident Analysis & Prevention 43, 1677-1688. 43
27
Chen, H., Cao, L., Logan, D.B., 2012. Analysis of risk factors affecting the severity of 1
intersection crashes by logistic regression. Traffic injury prevention 13, 300-307. 2
de Oña, J., López, G., Mujalli, R., Calvo, F.J., 2013. Analysis of traffic accidents on rural 3
highways using Latent Class Clustering and Bayesian Networks. Accident Analysis & 4
Prevention 51, 1-10. 5
de Oña, J., Mujalli, R.O., Calvo, F.J., 2011. Analysis of traffic accident injury severity on 6
Spanish rural highways using Bayesian networks. Accident Analysis & Prevention 43, 402–7
11. doi:10.1016/j.aap.2010.09.010 8
Depaire, B., Wets, G., Vanhoof, K., 2008. Traffic accident segmentation by means of latent 9
class clustering. Accident Analysis & Prevention 40, 1257-1266. 10
Eluru, N., Bagheri, M., Miranda-Moreno, L.F., Fu, L., 2012. A latent class modeling approach 11
for identifying vehicle driver injury severity factors at highway-railway crossings. Accident 12
Analysis & Prevention 47, 119-127. 13
Feng, S., Li, Z., Ci, Y., Zhang, G., 2016. Risk factors affecting fatal bus accident severity: 14
Their impact on different types of bus drivers. Accident Analysis & Prevention 86, 29-39. 15
FHWA, 2012. Highway Statistics 2010, in: Admistration, F.H. (Ed.), Washiton, DC. 16
Hao, W., Daniel, J., 2014. Motor vehicle driver injury severity study under various traffic 17
control at highway-rail grade crossings in the United States. Journal of safety research 51, 18
41-48. 19
Haque, M.M., Chin, H.C., Huang, H., 2010. Applying Bayesian hierarchical models to 20
examine motorcycle crashes at signalized intersections. Accident Analysis & Prevention 42, 21
203–212. 22
Heinen, T., 1996. Latent class and discrete latent trait models: Similarities and differences. 23
Sage Publications, Inc. 24
Hong, S., Min, B., Doi, S., Suzuki, K., 2016. Approaching and stopping behaviors to the 25
intersections of aged drivers compared with young drivers. International Journal of 26
Industrial Ergonomics 54, 32-41. 27
Hu, P.S., Young, J.R., Lu, A., 1993. Highway crash rates and age-related driver limitations: 28
Literature review and evaluation of data bases. Oak Ridge National Lab., TN (United States). 29
Huang, H., Abdel-Aty, M., 2010. Multilevel data and Bayesian analysis in traffic safety. 30
Accident Analysis & Prevention 42, 1556–1565. 31
Huang, H., Chin, H.C., Haque, M.M., 2008. Severity of driver injury and vehicle damage in 32
traffic crashes at intersections: a Bayesian hierarchical analysis. Accident Analysis & 33
Prevention 40, 45-54. 34
Jain, A.K., 2010. Data clustering: 50 years beyond K-means. Pattern recognition letters 31, 35
651-666. 36
Janke, M.K., 1994. Age related disabilities that may impair driving and their assessment. 37
Kim, J.-K., Kim, S., Ulfarsson, G.F., Porrello, L.A., 2007. Bicyclist injury severities in 38
bicycle–motor vehicle accidents. Accident Analysis & Prevention 39, 238-251. 39
Krzanowski, W.J., Lai, Y., 1988. A criterion for determining the number of groups in a data 40
set using sum-of-squares clustering. Biometrics, 23-34. 41
Ma, J., Kockelman, K., 2006. Bayesian multivariate Poisson regression for models of injury 42
count, by severity. Transportation Research Record: Journal of the Transportation Research 43
28
Board, 24-34. 1
Mannering, F.L., Bhat, C.R., 2014. Analytic methods in accident research: methodological 2
frontier and future directions. Analytic Methods in Accident Research 1, 1-22. 3
Mannering, F.L., Shankar, V., Bhat, C.R., 2016. Unobserved heterogeneity and the statistical 4
analysis of highway accident data. Analytic methods in accident research 11, 1–16. 5
doi:10.1016/j.amar.2016.04.001 6
Mauro, R., De Luca, M., Dell’Acqua, G., 2013. Using a K-means clustering algorithm to 7
examine patterns of vehicle crashes in before-after analysis. Modern Applied Science 7, 11. 8
Meng, F., Xu, P., Wong, S.C., Huang, H., Li, Y.C., 2017. Occupant-level injury severity 9
analyses for taxis in Hong Kong: A Bayesian space-time logistic model. Accident Analysis & 10
Prevention 108, 297–307. 11
Mohamed, M.G., Saunier, N., Miranda-Moreno, L.F., Ukkusuri, S.V., 2013. A clustering 12
regression approach: A comprehensive injury severity analysis of pedestrian–vehicle crashes 13
in New York, US and Montreal, Canada. Safety science 54, 27-37. 14
Mokdad, A.H., Marks, J.S., Stroup, D.F., Gerberding, J.L., 2004. Actual causes of death in 15
the United States, 2000. Jama 291, 1238–1245. 16
NHTSA, 2016. 2015 motor vehicle crashes: overview. Traffic safety facts research note 2016, 17
1–9. 18
NHTSA, 2015. Traffic Safety Facts 2015, in: Administration, N.H.T.S. (Ed.), Washington, 19
DC. 20
NMDOT, 2011a. Crash Level Analysis File User's Guide, in: Transportation, N.M.D.o. (Ed.), 21
Santa Fe, New Mexico. 22
NMDOT, 2011b. Vehicle (Detail) Level Analysis File User's Guide, in: Transportation, 23
N.M.D.o. (Ed.), Santa Fe, New Mexico. 24
Pang, B., Lee, L., Vaithyanathan, S., 2002. Thumbs up?: sentiment classification using 25
machine learning techniques, Proceedings of the ACL-02 conference on Empirical methods in 26
natural language processing-Volume 10. Association for Computational Linguistics, pp. 27
79-86. 28
Plummer, M., Best, N., Cowles, K., Vines, K., 2006. CODA: Convergence diagnosis and 29
output analysis for MCMC. R news 6, 7-11. 30
Pour-Rouholamin, M., Zhou, H., 2016. Investigating the risk factors associated with 31
pedestrian injury severity in Illinois. Journal of safety research 57, 9–17. 32
PRATO, C.G., Kaplan, S., 2013. Bus crash patterns in the United States: a clustering 33
approach based on self-organizing maps, WCTR 2013: 13th World Conference on
34
Transportation Research. 35
Russo, B.J., Savolainen, P.T., Schneider, W.H., Anastasopoulos, P.C., 2014. Comparison of 36
factors affecting injury severity in angle collisions by fault status using a random parameters 37
bivariate ordered probit model. Analytic methods in accident research 2, 21-29. 38
Sarle, W.S., 1983. Cubic clustering criterion. SAS Institute. 39
Sasidharan, L., Wu, K.-F., Menendez, M., 2015. Exploring the application of latent class 40
cluster analysis for investigating pedestrian crash injury severities in Switzerland. Accident 41
Analysis & Prevention 85, 219-228. 42
Schwab, C. V, 2009. Agricultural Equipment on Public Roads. 43
29
Shaheed, M.S., Gkritza, K., 2014. A latent class analysis of single-vehicle motorcycle crash 1
severity outcomes. Analytic methods in accident research 2, 30-38. 2
Shaheed, M.S., Gkritza, K., Carriquiry, A.L., Hallmark, S.L., 2016. Analysis of occupant 3
injury severity in winter weather crashes: a fully Bayesian multivariate approach. Analytic 4
methods in accident research 11, 33-47. 5
Sivak, M., Schoettle, B., Rupp, J., 2010. Survival in fatal road crashes: body mass index, 6
gender, and safety belt use. Traffic injury prevention 11, 66-68. 7
Spiegelhalter, D.J., Best, N.G., Carlin, B.P., Van Der Linde, A., 2002. Bayesian measures of 8
model complexity and fit. Journal of the Royal Statistical Society: Series B (Statistical 9
Methodology) 64, 583-639. 10
Tay, R., 2015. A random parameters probit model of urban and rural intersection crashes. 11
Accident Analysis & Prevention 84, 38-40. 12
Thiffault, P., Bergeron, J., 2003. Monotony of road environment and driver fatigue: a 13
simulator study. Accident Analysis & Prevention 35, 381-391. 14
Wang, K., Yasmin, S., Konduri, K.C., Eluru, N., Ivan, J.N., 2015. Copula-Based Joint Model 15
of Injury Severity and Vehicle Damage in Two-Vehicle Crashes. Transportation Research 16
Record: Journal of the Transportation Research Board, 158-166. 17
Wu, Q., Chen, F., Zhang, G., Liu, X.C., Wang, H., Bogus, S.M., 2014. Mixed logit 18
model-based driver injury severity investigations in single-and multi-vehicle crashes on rural 19
two-lane highways. Accident Analysis & Prevention 72, 105-115. 20
Wu, Q., Zhang, G., 2016. Formulating alcohol-influenced driver's injury severities in 21
intersection-related crashes. Transport, 1-12. 22
Wu, Q., Zhang, G., Chen, C., Tarefder, R., Wang, H., Wei, H., 2016a. Heterogeneous impacts 23
of gender-interpreted contributing factors on driver injury severities in single-vehicle rollover 24
crashes. Accident Analysis & Prevention 94, 28-34. 25
Wu, Q., Zhang, G., Ci, Y., Wu, L., Tarefder, R.A., Alcántara, A.D., 2016b. Exploratory 26
multinomial logit model–based driver injury severity analyses for teenage and adult drivers in 27
intersection-related crashes. Traffic injury prevention 17, 413-422. 28
Xie, M., Cheng, W., Gill, G.S., Zhou, J., Jia, X., Choi, S., 2017. Investigation of hit-and-run 29
crash occurrence and severity using real-time loop detector data and hierarchical Bayesian 30
binary logit model with random effects. Traffic injury prevention. 31
Xie, Y., Zhao, K., Huynh, N., 2012. Analysis of driver injury severity in rural single-vehicle 32
crashes. Accident Analysis & Prevention 47, 36-44. 33
Ye, F., Lord, D., 2014. Comparing three commonly used crash severity models on sample 34
size requirements: multinomial logit, ordered probit and mixed logit models. Analytic 35
methods in accident research 1, 72-85. 36
Yu, R., Abdel-Aty, M., 2013. Multi-level Bayesian analyses for single-and multi-vehicle 37
freeway crashes. Accident Analysis & Prevention 58, 97–105. 38
Yu, R., Abdel-Aty, M., 2014. Using hierarchical Bayesian binary probit models to analyze 39
crash injury severity on high speed facilities with real-time traffic data. Accident Analysis & 40
Prevention 62, 161-167. 41
Zeng, Q., Huang, H., 2014. Bayesian spatial joint modeling of traffic crashes on an urban 42
road network. Accident Analysis & Prevention 67, 105–112.
43
30
Zheng, Y., Wang, J., Li, X., Yu, C., Kodaka, K., Li, K., 2014. Driving risk assessment using 1
cluster analysis based on naturalistic driving data, 17th International IEEE Conference on 2
Intelligent Transportation Systems (ITSC). IEEE, pp. 2584-2589. 3
Abdel-Aty, M.A., Hassan, H.M., Ahmed, M., Al-Ghamdi, A.S., 2012. Real-time prediction of 4
visibility related crashes. Transportation Research Part C: Emerging Technologies 24, 5
288–298. doi:10.1016/j.trc.2012.04.001 6
Chatterjee, K., Mcdonald, M., 2004. Effectiveness of using variable message signs to 7
disseminate dynamic traffic information: Evidence from field trails in European cities. 8
Transport Reviews 24, 559–585. 9
Chen, C., Chen, Y., Ma, J., Zhang, G., Walton, C.M., 2017. Driver behavior formulation in 10
intersection dilemma zones with phone use distraction via a logit-Bayesian network 11
hybrid approach. Journal of Intelligent Transportation Systems 1–14. 12
Chen, C., Zhang, G., Huang, H., Wang, J., Tarefder, R.A., 2016. Examining driver injury 13
severity outcomes in rural non-interstate roadway crashes using a hierarchical ordered 14
logit model. Accident Analysis & Prevention 96, 79–87. 15
Chen, C., Zhang, G., Tarefder, R., Ma, J., Wei, H., Guan, H., 2015a. A multinomial logit 16
model-Bayesian network hybrid approach for driver injury severity analyses in rear-end 17
crashes. Accident Analysis & Prevention 80, 76–88. 18
Chen, C., Zhang, G., Tian, Z., Bogus, S.M., Yang, Y., 2015b. Hierarchical Bayesian random 19
intercept model-based cross-level interaction decomposition for truck driver injury 20
severity investigations. Accident Analysis & Prevention 85, 186–198. 21
de Oña, J., Mujalli, R.O., Calvo, F.J., 2011. Analysis of traffic accident injury severity on 22
Spanish rural highways using Bayesian networks. Accident Analysis & Prevention 43, 23
402–11. doi:10.1016/j.aap.2010.09.010 24
DOT, U.S., 2003. Safe Mobility for a Maturing Society: Challenges and Opportunities. 25
Washington, DC web: http://web1. ctaa. 26
org/webmodules/webarticles/articlefiles/safe_mobility. pdf. Last Accessed 7, 2010. 27
Gelman, A., Carlin, J.B., Stern, H.S., Dunson, D.B., Vehtari, A., Rubin, D.B., 2013. Bayesian 28
data analysis. CRC press. 29
Goodwin, A.H., Thomas, L.J., Hall, W.L., Tucker, M.E., 2011. Countermeasures that work: A 30
highway safety countermeasure guide for state highway safety offices. 31
Haque, M.M., Chin, H.C., Huang, H., 2010. Applying Bayesian hierarchical models to 32
examine motorcycle crashes at signalized intersections. Accident Analysis & Prevention 33
42, 203–212. 34
Huang, H., Abdel-Aty, M., 2010. Multilevel data and Bayesian analysis in traffic safety. 35
Accident Analysis & Prevention 42, 1556–1565. 36
Huang, H., Chin, H.C., Haque, M.M., 2008. Severity of driver injury and vehicle damage in 37
traffic crashes at intersections: a Bayesian hierarchical analysis. Accident Analysis & 38
Prevention 40, 45–54. 39
Hunter, W., Srinivasan, R., Martell, C., 2012. Evaluation of rectangular rapid flash beacon at 40
Pinellas Trail Crossing in Saint Petersburg, Florida. Transportation Research Record: 41
Journal of the Transportation Research Board 7–13. 42
Mannering, F.L., Shankar, V., Bhat, C.R., 2016. Unobserved heterogeneity and the statistical 43
31
analysis of highway accident data. Analytic methods in accident research 11, 1–16. 1
doi:10.1016/j.amar.2016.04.001 2
Meng, F., Xu, P., Wong, S.C., Huang, H., Li, Y.C., 2017. Occupant-level injury severity 3
analyses for taxis in Hong Kong: A Bayesian space-time logistic model. Accident 4
Analysis & Prevention 108, 297–307. 5
Mokdad, A.H., Marks, J.S., Stroup, D.F., Gerberding, J.L., 2004. Actual causes of death in 6
the United States, 2000. Jama 291, 1238–1245. 7
National Academies of Sciences and Medicine, E., 2016. Commercial Motor Vehicle Driver 8
Fatigue, Long-Term Health, and Highway Safety: Research Needs. National Academies 9
Press. 10
NHTSA, 2016. 2015 motor vehicle crashes: overview. Traffic safety facts research note 2016, 11
1–9. 12
Pour-Rouholamin, M., Zhou, H., 2016. Investigating the risk factors associated with 13
pedestrian injury severity in Illinois. Journal of safety research 57, 9–17. 14
Sandt, L.S., Marshall, S.W., Rodriguez, D.A., Evenson, K.R., Ennett, S.T., Robinson, W.R., 15
2016. Effect of a community-based pedestrian injury prevention program on driver 16
yielding behavior at marked crosswalks. Accident Analysis & Prevention 93, 169–178. 17
Schwab, C. V, 2009. Agricultural Equipment on Public Roads. 18
Smadi, O., Souleyrette, R., Ormand, D., Hawkins, N., 2008. Pavement marking 19
retroreflectivity: analysis of safety effectiveness. Transportation Research Record: 20
Journal of the Transportation Research Board 17–24. 21
Wagenaar, A.C., Maldonado-Molina, M.M., Ma, L., Tobler, A.L., Komro, K.A., 2007. Effects 22
of legal BAC limits on fatal crash involvement: analyses of 28 states from 1976 through 23
2002. Journal of safety research 38, 493–499. 24
Wu, Q., Zhang, G., Chen, C., Tarefder, R., Wang, H., Wei, H., 2016. Heterogeneous impacts 25
of gender-interpreted contributing factors on driver injury severities in single-vehicle 26
rollover crashes. Accident Analysis & Prevention 94, 28–34. 27
Xie, M., Cheng, W., Gill, G.S., Zhou, J., Jia, X., Choi, S., 2017. Investigation of hit-and-run 28
crash occurrence and severity using real-time loop detector data and hierarchical 29
Bayesian binary logit model with random effects. Traffic injury prevention. 30
Yu, R., Abdel-Aty, M., 2013. Multi-level Bayesian analyses for single-and multi-vehicle 31
freeway crashes. Accident Analysis & Prevention 58, 97–105. 32
Zeng, Q., Huang, H., 2014. Bayesian spatial joint modeling of traffic crashes on an urban 33
road network. Accident Analysis & Prevention 67, 105–112. 34
35
36
37
... Moreover, drivers aged 18 to 25, drivers aged 26 to 40, and male drivers increased the rate of serious injuries to cyclists by 9.9%, 9%, and 3.4%, respectively. These results may be attributed to the majority of accidents in this accident cluster occurring on non-intersection general roadways, and drivers aged 18 to 40 years and male drivers drive more aggressively and faster on general roadways, leading to more serious accidents [37]. Unlike the C1 accident cluster, the C2 accident cluster had a 2.7% increase in the probability of serious injuries for male cyclists. ...
Article
Full-text available
Bicycle safety has emerged as a pressing concern within the vulnerable transportation community. Numerous studies have been conducted to identify the significant factors that contribute to the severity of cyclist injuries, yet the findings have been subject to uncertainty due to unobserved heterogeneity and class imbalance. This research aims to address these issues by developing a model to examine the impact of key factors on cyclist injury severity, accounting for data heterogeneity and imbalance. To incorporate unobserved heterogeneity, a total of 3,895 bicycle accidents were categorized into three homogeneous sub-accident clusters using Latent Class Cluster Analysis (LCA). Additionally, five over-sampling techniques were employed to mitigate the effects of data imbalance in each accident cluster category. Subsequently, Bayesian Network (BN) structure learning algorithms were utilized to construct 32 BN models after pairing the accident data from the four accident cluster types before and after sampling. The optimal BN models for each accident cluster type provided insights into the key factors associated with cyclist injury severity. The results indicate that the key factors influencing serious cyclist injuries vary heterogeneously across different accident clusters. Female cyclists, adverse weather conditions such as rain and snow, and off-peak periods were identified as key factors in several subclasses of accident clusters. Conversely, factors such as the week of the accident, characteristics of the trafficway, the season, drivers failing to yield to the right-of-way, distracted cyclists, and years of driving experience were found to be key factors in only one subcluster of accident clusters. Additionally, factors such as the time of the crash, gender of the cyclist, and weather conditions exhibit varying levels of heterogeneity across different accident clusters, and in some cases, exhibit opposing effects.
... As per the National Highway Traffic Safety Administration (NHTSA) in 2010, around 22% of traffic accidents in the United States result from left-turning vehicles colliding with oncoming traffic at intersections. These incidents often involve frontal or side-impact collisions, leading to significant impacts to the driver's well-being [50]. Similarly, in Canada, around 30% of road fatalities and almost 40% of severe road accidents were reported at intersections [44]. ...
Article
Full-text available
Driving simulators serve as valuable instruments for traffic safety research because they enable the creation of various scenarios that are hard to replicate in the real world. Eye tracker devices have proven to be immensely beneficial in studying eye movements. In this particular study, the objective was to examine potential variations in the adaptability of young male Chinese and Pakistani student drivers to left-hand traffic (LHT) and right-hand traffic (RHT) infrastructures when navigating under unfamiliar driving rules and environments. To achieve this, twenty-one Pakistani and twenty Chinese young male drivers were recruited to participate in different simulated driving scenarios (LHT and RHT). The factors tested were: (1) hazard perception; (2) time to collision (TTC); and (3) intersectional and lane-changing behavior. Using data collected from the driving simulator and eye tracker, differences in adaptability between both pools of drivers were compared using the ANOVA technique. The results showed that young male Chinese drivers were more vigilant and had a higher adaptability to unfamiliar infrastructure (3), they also had a better hazard perception (1) and time to collision (1 and 2). Young male Pakistani drivers had poorer hazard perception (2) and consequently had the shortest brake response time in the RHT scenario (2).
... Driving risks at signalized intersections have long been explored for driving safety and accident reduction. Statistically, there are over 50% injuries and fatalities that occur near the signalized intersections, where driver errors are the leading cause [3]. Based on the k-means cluster analysis and hierarchical Bayesian random intercept models, driver injury severity patterns of intersection-related crashes fnd that drivers' behavioral adjustment to sophisticated external environmental conditions may compensate for crash loss but to unstable degrees. ...
Article
Full-text available
Driving pattern has been increasingly researched to improve driving safety and develop autonomous vehicles. Oriented towards the complex infrastructures at signalized intersections, this research digs into the risk sources brought by different kinds of road elements, including road lane markings, road curbs, median separators, signal timing, and neighboring vehicles around the ego car. Referring to vehicle speed both in the longitudinal and latitudinal dimensions, risk scope and distribution are quantified with the vehicle position of a torus with a Gaussian cross-section. Then, the risk is summed over all the road elements across all the points involved by the ego car, the level of which should be controlled within the threshold value when the ego vehicle explores to minimize trip delay. Thus, autonomous driving strategies are developed with respect to vehicle speed and steering angle. The proposed model is validated with NGSIM data, where a signalized intersection on Peachtree Street is selected and vehicles moving in different directions are analyzed. It is found that the proposed model manages to control vehicles with risk at the accepted level and to enhance the speed level as well as reduce acceleration fluctuations. This research contributes to improving autonomous driving against complex driving conditions for driving safety and efficiency.
Article
Rear-end crashes of commercial trucks (ReC-CTs) account for the main type of truck traffic crashes, and human factors are important influencing factors that cause ReC-CTs. This study aims to investigate systematic human factors involved in ReC-CTs and further explore relationships between human factors at all levels and induced paths of unsafe acts. In this study, a total of 320 in-depth investigation cases of ReC-CTs in China from 2015 to 2022 were collected, and a novel systematic approach integrating the Human Factors Analysis and Classification System (HFACS) with Bayesian networks (BN) was proposed to identify and quantitatively analyze the human factors of ReC-CTs. The analysis of results leads to the following conclusions: 1) An improved HFACS model was constructed to identify 38 human factors related to ReC-CTs and to conduct a classification analysis at a systemic level; 2) The new systems-based method that integrates HFACS with BN, which can highlight the interrelationships among causal categories at various levels, is an effective method to quantitatively analyze the human factors of ReC-CTs; and 3) The influence relationships between unsafe acts and factors at various levels were quantitatively analyzed at a systemic level; the important influencing factors of each level that lead to unsafe acts were identified, and the most likely induced path for each unsafe act was determined. The research results can provide important guidance for effectively controlling the significant human factors at all levels of HFACS and for the targeted formulation of preventive measures for ReC-CTs.
Article
Full-text available
Background Highway safety remains a significant issue, with road crashes being a leading cause of fatalities and injuries. While several studies have been conducted on crash severity, few have analyzed and predicted specific types of crashes, such as fatal crashes. Identifying the key factors associated with fatal crashes and predicting their occurrence can help develop effective preventative measures. Objective This study intended to develop cluster analysis and ML-based models using crash data to extract the prominent factors behind fatal crash occurrences and analyze the inherent pattern of variables contributing to fatal crashes. Methods Several branches and categories of supervised ML models have been implemented for fatality prediction and their results have been compared. SHAP analysis was conducted using the ML model to explore the contributing factors of fatal crashes. Additionally, the underlying hidden patterns of fatal crashes have been evaluated using K-means clustering, and specific fatal crash scenarios have been extracted. Results The deep neural networks model achieved 85% accuracy in predicting fatal crashes in Kansas. Factors, such as speed limits, nighttime, darker road conditions, two-lane highways, highway interchange areas, motorcycle and tractor-trailer involvement, and head-on collisions were found to be influential. Moreover, the clusters were able to discern certain scenarios of fatal crashes. Conclusion The study can provide a clear image of the important factors related to fatal crashes, which can be utilized to create new safety protocols and countermeasures to reduce fatal crashes. The results from cluster analysis can facilitate transportation professionals with representative scenarios, which will benefit in identifying potential fatal crash conditions.
Article
Full-text available
This study proposes a Bayesian spatial joint model of crash prediction including both road segments and intersections located in an urban road network, through which the spatial correlations between heterogeneous types of entities could be considered. A road network in Hillsborough, Florida, with crash, road, and traffic characteristics data for a three-year period was selected in order to compare the proposed joint model with three site-level crash prediction models, that is, the Poisson, negative binomial (NB), and conditional autoregressive (CAR) models. According to the results, the CAR and Joint models outperform the Poisson and NB models in terms of model fitting and predictive performance, which indicates the reasonableness of considering cross-entity spatial correlations. Although the goodness-of-fit and predictive performance of the CAR and Joint models are equivalent in this case study, spatial correlations between segments and the connected intersections are found to be more significant than those solely between segments or between intersections, which supports the employment of the Joint model as an alternative in road-network-level safety modeling.
Book
Broadening its scope to nonstatisticians, Bayesian Methods for Data Analysis, Third Edition provides an accessible introduction to the foundations and applications of Bayesian analysis. Along with a complete reorganization of the material, this edition concentrates more on hierarchical Bayesian modeling as implemented via Markov chain Monte Carlo (MCMC) methods and related data analytic techniques. New to the Third Edition • New data examples, corresponding R and WinBUGS code, and homework problems • Explicit descriptions and illustrations of hierarchical modeling-now commonplace in Bayesian data analysis • A new chapter on Bayesian design that emphasizes Bayesian clinical trials • A completely revised and expanded section on ranking and histogram estimation • A new case study on infectious disease modeling and the 1918 flu epidemic • A solutions manual for qualifying instructors that contains solutions, computer code, and associated output for every homework problem-available both electronically and in print Ideal for Anyone Performing Statistical Analyses Focusing on applications from biostatistics, epidemiology, and medicine, this text builds on the popularity of its predecessors by making it suitable for even more practitioners and students.
Article
This study aimed to identify the factors affecting the crash-related severity level of injuries in taxis and quantify the associations between these factors and taxi occupant injury severity. Casualties resulting from taxi crashes from 2004 to 2013 in Hong Kong were divided into four categories: taxi drivers, taxi passengers, private car drivers and private car passengers. To avoid any biased interpretation caused by unobserved spatial and temporal effects, a Bayesian hierarchical logistic modeling approach with conditional autoregressive priors was applied, and four different model forms were tested. For taxi drivers and passengers, the model with space-time interaction was proven to most properly address the unobserved heterogeneity effects. The results indicated that time of week, number of vehicles involved, weather, point of impact and driver age were closely associated with taxi drivers' injury severity level in a crash. For taxi passengers' injury severity an additional factor, taxi service area, was influential. To investigate the differences between taxis and other traffic, similar models were established for private car drivers and passengers. The results revealed that although location in the network and driver gender significantly influenced private car drivers' injury severity, they did not influence taxi drivers' injury severity. Compared with taxi passengers, the injury severity of private car passengers was more sensitive to average speed and whether seat belts were worn. Older drivers, urban taxis and fatigued driving were identified as factors that increased taxi occupant injury severity in Hong Kong.
Article
Objective: Most of the extensive research dedicated to identifying the influential factors of hit-and-run (HR) crashes has utilized the typical Maximum Likelihood Estimation Binary Logit models, and none of them have employed the real-time traffic data. To fill this gap, this study focused on investigating contributing factors of HR crashes, as well as the severity levels of HR. Methods: This study analyzed four-year crash and real time loop detector data by employing the hierarchical Bayesian models with random effects within a sequential Logit structure. Along with the evaluation of impact of random effects on model fitness and complexity, the prediction capability of the models was also examined. Stepwise incremental sensitivity and specificity were calculated and ROC (Receiver Operating Characteristic) curve was utilized to graphically illustrate the predictive performance of the model. Results: Among the real-time flow variables, the average occupancy and speed from upstream detector was observed to be positively correlated with HR crash possibility. The average upstream speed and speed difference of upstream and downstream speed were correlated with the occurrence of severe HR crashes. Apart from real-time factors, the other variables found influential for HR and severe HR crashes were length of segment, adverse weather conditions, dark lighting conditions with malfunctioning street light, driving under influence of alcohol, width of inner shoulder, and night time. Conclusions: This study suggests the potential traffic conditions of HR and severe HR occurrence, which refer to relatively congested upstream traffic conditions with high upstream speed and significant speed deviations on long segments. The above findings suggest that traffic enforcement should be directed towards mitigating the risky driving under the aforementioned traffic conditions. Moreover, the enforcement agencies may employ alcohol checkpoints to counter DUI during the night time. As per the engineering improvements, wider inner shoulders may be constructed to potentially reduce HR cases and the street lights should be installed and maintained in working conditions to make the roads less prone to such crashes.
Article
This paper uses data collected over a five-year period between 2005 and 2009 in Indiana to estimate random parameters multivariate tobit and zero-inflated count data models of accident injury-severity rates and frequencies, respectively. The proposed modeling approach accounts for unobserved factors that may vary systematically across segments with and without observed or reported accident injury-severities, thus addressing unobserved, zero-accident state and non-zero-accident state heterogeneity. Moreover, the multivariate setting allows accounting for contemporaneous cross-equation error correlation for modeling accident injury-severity rates and frequencies as systems of seemingly unrelated equations. The tobit and zero-inflated count data modeling approaches address the excessive amount of zeros inherent in the two sets of dependent variables (accident injury-severity rates and frequencies, respectively), which are – in nature – continuous and discrete count data, respectively, that are left-censored with a clustering at zero. The random parameters multivariate tobit and zero-inflated count data models are counter-imposed with their equivalent fixed parameters and lower order models, and the results illustrate the statistical superiority of the presented models. Finally, the relative benefits of random parameters modeling are explored by demonstrating the forecasting accuracy of the random parameters multivariate models with the software-generated mean s of the random parameters, and with the observation-specific s of the random parameters.
Article
Background: Few studies have comprehensively evaluated the effectiveness of multi-faceted interventions intended to improve pedestrian safety. "Watch for Me NC" is a multi-faceted, community-based pedestrian safety program that includes widespread media and public engagement in combination with enhanced law enforcement activities (i.e., police outreach and targeted pedestrian safety operations conducted at marked crosswalks) and low-cost engineering improvements at selected crossings. The purpose of this study was to estimate the effect of the law enforcement and engineering improvement components of the program on motor vehicle driver behavior, specifically in terms of increased driver yielding to pedestrians in marked crosswalks. Methods: The study used a pre-post design with a control group, comparing crossing locations receiving enforcement and low-cost engineering treatments (enhanced locations) with locations that did not (standard locations) to examine changes in driver yielding over a 6-month period from 2013 to 2014. A total of 24,941 drivers were observed in 11,817 attempted crossing events at 16 crosswalks in five municipalities that were participating in the program. Observations of real pedestrians attempting to use the crosswalks ("naturalistic" crossing) were supplemented by observations of trained research staff attempting the same crossings following an established protocol ("staged" crossings). Generalized estimating equations (GEE) were used to model driver yielding rates, accounting for repeated observations at the crossing locations and controlling other factors that affect driver behavior in yielding to pedestrians in marked crosswalks. Results: At crossings that did not receive enhancements (targeted police operations or low-cost engineering improvements), driver yielding rates did not change from before to after the Watch for Me NC program. However, yielding rates improved significantly (between 4 and 7 percentage points on average) at the enhanced locations. This was true for both naturalistic and staged crossings. Conclusions: This study provides evidence that enhanced enforcement and low-cost engineering improvements, as a part of a broader program involving community-based outreach, can increase driver yielding to pedestrians in marked crosswalks. These data are important for the staff and decision-makers involved in pedestrian safety programs to gain a better understanding of the different engineering and behavioral mechanisms that could be used to improve driver yielding rates.
Article
Introduction: Pedestrians are known as the most vulnerable road users, which means their needs and safety require specific attention in strategic plans. Given the fact that pedestrians are more prone to higher injury severity levels compared to other road users, this study aims to investigate the risk factors associated with various levels of injury severity that pedestrians experience in Illinois. Method: Ordered-response models are used to analyze single-vehicle, single-pedestrian crash data from 2010 to 2013 in Illinois. As a measure of net change in the effect of significant variables, average direct pseudo-elasticities are calculated that can be further used to prioritize safety countermeasures. A model comparison using AIC and BIC is also provided to compare the performance of the studied ordered-response models. Results: The results recognized many variables associated with severe injuries: older pedestrians (more than 65 years old), pedestrians not wearing contrasting clothing, adult drivers (16-24), drunk drivers, time of day (20:00 to 05:00), divided highways, multilane highways, darkness, and heavy vehicles. On the other hand, crossing the street at crosswalks, older drivers (more than 65 years old), urban areas, and presence of traffic control devices (signal and sign) are associated with decreased probability of severe injuries. Conclusions and Practical Applications: The comparison between three proposed ordered-response models shows that the partial proportional odds (PPO) model outperforms the conventional ordered (proportional odds – PO) model and generalized ordered logit model (GOLM). Based on the findings, stricter rules to address DUI driving is suggested. Educational programs need to focus on older pedestrians given the increasing number of older people in Illinois in the upcoming years. Pedestrians should be educated to use pedestrian crosswalks and contrasting clothing at night. In terms of engineering countermeasures, installation of crosswalks where pedestrian activity is high seems a promising practice.