ArticlePDF Available

Attribute-Based Safety Risk Assessment. II: Predicting Safety Outcomes Using Generalized Linear Models

Authors:

Abstract and Figures

One of the recent advancements in preconstruction safety management is the identification and quantification of risks associated with fundamental attributes of construction work environments that cause injuries. The goal of this paper is to test the validity of using these fundamental risk attributes to predict safety outcomes. The modeling approach required two steps, as follows: (1) a principal component analysis was performed on the safety attributes to reduce dimension of the data and remove collinearity among attributes (the principle component analysis provided insights into the relative importance of the various attributes and provided an orthogonal decomposition of the data), and (2) the leading principal components (which are orthogonal by definition) were used as potential predictors in a generalized linear model with a logit link function to model the probability of different accident categories. The predictive power was then assessed using a rank probability skill score, which quantified the probabilistic skill of the forecasts over the categories. The analysis shows strong predictive skill, making the models attractive for safety managers to use to skilfully forecast the probability of a safety incident given identifiable characteristics of planned work. Researchers in the technology domain may find these models useful in predicting safety outcomes during design, work packaging, and scheduling. (C) 2015 American Society of Civil Engineers.
Content may be subject to copyright.
Attribute-Based Safety Risk Assessment. II: Predicting
Safety Outcomes Using Generalized Linear Models
Behzad Esmaeili, A.M.ASCE1; Matthew R. Hallowell, A.M.ASCE2; and Balaji Rajagopalan, A.M.ASCE3
Abstract: One of the recent advancements in preconstruction safety management is the identification and quantification of risks associated
with fundamental attributes of construction work environments that cause injuries. The goal of this paper is to test the validity of using these
fundamental risk attributes to predict safety outcomes. The modeling approach required two steps, as follows: (1) a principal component
analysis was performed on the safety attributes to reduce dimension of the data and remove collinearity among attributes (the principle
component analysis provided insights into the relative importance of the various attributes and provided an orthogonal decomposition
of the data), and (2) the leading principal components (which are orthogonal by definition) were used as potential predictors in a generalized
linear model with a logit link function to model the probability of different accident categories. The predictive power was then assessed using
a rank probability skill score, which quantified the probabilistic skill of the forecasts over the categories. The analysis shows strong predictive
skill, making the models attractive for safety managers to use to skilfully forecast the probability of a safety incident given identifiable
characteristics of planned work. Researchers in the technology domain may find these models useful in predicting safety outcomes during
design, work packaging, and scheduling. DOI: 10.1061/(ASCE)CO.1943-7862.0000981.© 2015 American Society of Civil Engineers.
Author keywords: Safety risk management; Predictive models; Principal component analysis (PCA); Generalized linear models (GLMs);
Labor and personnel issues.
Introduction
Safe completion of a project is the ultimate goal of any contractor.
To achieve this goal, many contractors are in strict accordance with
Occupational Safety and Health Administration (OSHA) regulations,
which involve providing personal protective equipment and protec-
tive measures. Although these prevention strategies are effective, they
are not enough to achieve excellent safety performance (Hinze et al.
2013). Since most of the compliance practices are passive or reactive,
they do not provide early warnings based on the specific character-
istics of a work environment. For this reason, increasing attention is
being paid to proactive safety strategies that identify precursors of an
incident and assess the risk of potential hazards in advance.
Seeking to predict safety performance and provide early warn-
ing, numerous studies have investigated the relationship between
safety-related outcomes and variables that might affect safety
(Zohar 1980;Tam and Fung 1998;Gillen et al. 2002;Cooper and
Phillips 2004;Chen and Yang 2004;Fang et al. 2006;Johnson
2007;Rozenfeld et al. 2010). Different metrics such as number
of injuries, experience modification rate (EMR), or safe behavior
have been used to measure safety outcomes. Additionally, a variety
of factors have also been used to measure the predictor variables,
such as the characteristics of construction firms, safety program
elements employed, and a firms safety climate. The main limita-
tions of these previous studies are the following: (1) the dynamic
nature of construction projects has been widely ignored, (2) the
predictions are not based on objective or empirical data, and (3) the
safety climate cannot be measured in early stages of a project.
To address these limitations, other studies (Barandan and Usmen
2006;Hallowell et al. 2011;Esmaeili and Hallowell 2013) suggested
assessing the specific safety risks of tasks or trades as a predictor
of the existing level of hazards onsite. However, this approach
has a practical limitation; because the industry is diverse and ever-
evolving, it is impossible to quantify risks for all potential scenarios.
To address these knowledge gaps, Esmaeili (2012)proposedan
attribute-based risk identification and analysis method that helps
practitioners model safety risk independent of specific activities or
construction objects. In this method, the risk of worker injury is con-
sidered to be the direct result of temporal and spatial interactions
among a limited number of fundamental and identifiable attributes
that characterize the work environment. These attributes, which can
be identified in early stages of the project, are mainly related to the
physical conditions of a jobsite such as the presence of open edges,
overhead power lines, and moving equipment in proximity to
workers. The conceivable benefit of an attribute-based risk identifi-
cation system lies in the fact that hazards coinciding with the specific
features (attributes) of a task may be identified and minimized during
the preconstruction phase of the project, thereby alleviating the
workersexposure to risk during actual construction.
The fundamental attributes of construction environments were
used to serve as predictor variables in probabilistic safety models
forecasting the potential injury outcomes of those tasks. A sound
and reliable mathematical approach used extensively in meteoro-
logical science for weather forecasting contributed the formation of
this model. Just as taking measurement of temperature, wind speed,
1Assistant Professor, Durham School of Architectural Engineering and
Construction, Univ. of Nebraska, 113 Nebraska Hall, Lincoln, NE 68588
(corresponding author). E-mail: besmaeili2@unl.edu
2Beavers Endowed Professor of Construction Engineering and Associ-
ate Chair; and Dept. of Civil, Environmental, and Architectural Engineer-
ing, Univ. of Colorado at Boulder (UCB), 428 UCB, 1111 Engineering Dr.,
Boulder, CO 80309-0428. E-mail: matthew.hallowell@colorado.edu
3Professor and Chair, Dept. of Civil, Environmental, and Architectural
Engineering and Cooperative Institute for Research in Environmental
Sciences, Univ. of Colorado at Boulder (UCB), 428 UCB, 1111 Engineer-
ing Dr., Boulder, CO 80309-0428. E-mail: balajir@colorado.edu
Note. This manuscript was submitted on August 4, 2014; approved
on January 6, 2015; published online on March 27, 2015. Discussion
period open until August 27, 2015; separate discussions must be sub-
mitted for individual papers. This paper is part of the Journal of Con-
struction Engineering and Management, © ASCE, ISSN 0733-9364/
04015022(11)/$25.00.
© ASCE 04015022-1 J. Constr. Eng. Manage.
J. Constr. Eng. Manage.
Downloaded from ascelibrary.org by University of Nebraska-Lincoln on 03/31/15. Copyright ASCE. For personal use only; all rights reserved.
pressure, humidity, and their interrelationships are used to help
forecast weather, safety attributes provide the raw data needed to
drive the prediction of workplace injuries and safety incidents.
In order to limit the scope of the research reported in this paper,
struck-by accidents were the focus, which are one of the leading
causes of construction fatalities (Hinze et al. 2005). A large and
reliable national database of construction accidents provided the
objective data employed in the testing of the mathematical models.
It is expected that this approach and the resulting models will
improve researchersability to forecast injuries and anticipate
high-risk periods on a project. Specifically, the predictive models
developed in this paper can help practitioners to choose alternative
means and methods of construction and identify high-risk periods
of a project.
Safety Predictive Models
There are several studies that have attempted to predict safety-
related outcomes (e.g., Tam and Fung 1998). For example, some
researchers (Zohar 1980;Johnson 2007) attempted to relate safety
outcomes (e.g., injury rate) to the factors that affect safe perfor-
mance (e.g., injury prevention practices). Within these studies, a
common independent variable used to forecast safety performance
during construction is safety climate. Safety climate is considered a
subset of organizational climate and can be defined as the so-called
moral perceptions that workers share about the importance of safety
(Zohar 1980). Researchers into this topic searched for empirical
evidence of the relationship between safety climate and safety per-
formance such as the frequency and severity of accidents. In one
of the seminal studies, Zohar (1980) successfully used safety cli-
mate dimensions to predict safety program effectiveness as judged
by safety inspectors in industrial organizations. Glendon and
Litherland (2001) distributed safety climate questionnaires to ex-
amine the relationship between safety climate and safe behavior.
They assumed that safe behavior leads to less-frequent and severe
accidents. In one of the more recent studies in this area, Johnson
(2007) examined the predictive validity of safety climate and found
that safety climate was negatively correlated with the number of
lost workdays due to injury. In another study, Fang et al. (2006)
used logistic regression to investigate the relationship between
safety climate and personal characteristics (e.g., education level).
The Fang et al. (2006) approach was different from the previous
studies in the area of safety climate, because they considered safety
climate as a dependent variable and tried to predict it through per-
sonal characteristics (independent variable).
Other researchers examined other predictive variables. Tam and
Fung (1998) studied the relationship between common safety man-
agement strategies in Hong Kong and their accident rates using
multiple regression analysis. They found that four variables are sig-
nificant in determining safety performance, as follows: (1) postac-
cident investigation, (2) the proportion of subcontracted labor,
(3) safety awards, and (4) and safety training. In another study,
Gillen et al. (2002) found a relationship between injured construc-
tion workersperceptions of workplace safety climate, psychologi-
cal job demands, decision latitude, coworker support, and the
severity of injuries sustained by the workers. Their model explained
23% of the variance in injury severities. Cooper and Phillips (2004)
also used multiple regressions and found that the perception of
importance of safety training can predict the actual levels of safe
behavior. Some other researchers attempted to predict safety per-
formance using leading indicators. In one of the recent studies,
Hinze et al. (2013) identified a list of construction-safety strategies
and linked them with the projects recordable injury rate (RIR).
In Singapore, Chua and Goh (2005) considered construction
accidents as random events and successfully fitted Poisson distri-
bution to a data set of accidents. The major limitation of predicting
accident occurrence using a general probability density function
(PDF) is that this type of predictive models does not consider
unique characteristics of activities or a project. In another study,
Goh and Chua (2013) used neural network analysis to investigate
relationship between safety performance and occupational safety
and health elements identified from accident reports. They found
that safety performance in a project is mostly impacted by incident
investigation and analysis, emergency preparedness, and group
meetings. While this approach can be used to identify successful
safety practices in general, it heavily relies on soft factors such as
safety policy, safety training, and groups meetings which are diffi-
cult to be measured objectively in a real construction project.
Although the mentioned predictive models can be effective tools
in measuring safety status, they have the following limitations:
(1) the reported relationship between safety climate and safety
behavior is largely dependent on subjective self-reporting instru-
ments (Chen and Yang 2004), (2) these models focus on unsafe
behavior and ignore the importance of physically unsafe condi-
tions, and (3) the proposed models cannot be integrated in to the
preconstruction safety activities because during design and precon-
struction there is no knowledge of the safety climate or behavioral
issues in the project during early stages of a project.
While accidents are infrequent events, some researchers implic-
itly considered risk as a predictive measure for safety performance.
This group of scholars has attempted to predict hazard by quantify-
ing risks produced by different trades (Baradan and Usmen 2006),
activities (Hallowell and Gambatese 2009), or loss-of-control
events (Rozenfeld et al. 2010). Lee and Halpin (2003) presented
a predictive tool to estimate accident risk in utility-trenching oper-
ations using training, supervision, and preplanning as predictive
variables. In order to assess the condition of different predictive
variables, they used the fuzzy input from the user. Outside of the
construction domain, Chen and Yang (2004) used regular observa-
tion of unsafe acts and conditions to develop a predictive risk index
as an indication of safety performance in a process plant.
Although these approaches offer benefits, there are two main
limitations in these types of predictive models, as follows: (1) there
are numerous activities and loss-of-control events, and quantifying
risks for all of them is impractical; and (2) in most research, risk has
been assessed subjectively, thereby limiting the internal and exter-
nal validity of the estimates. The prominent predictive models in
construction safety domain, as well as their associated response
and predictor variables, are listed in Table 1. After conducting an
extensive literature review, it can be concluded that developing pre-
dictive models using empirical data is important for obtaining more
reliable and robust knowledge in this area.
Contribution to the Body of Knowledge
The research reported in this paper departs from the current body
of knowledge by testing the validity of several statistical models
to predict hazardous situations in the early stages of a project. The
research reported in this paper is the first to employ an objective
large accident database to forecast safety related outcomes of ac-
cidents using a finite number of measurable attributes. This paper
makes several contributions to both theory and practice, not the
least of which is a predictive model of safety outcomes that is
(1) based on a large volume of internally and externally valid em-
pirical data, (2) explored with an efficient and rigorous technique
derived from established meteorological science, (3) robust enough
to predict outcomes for any combination of attributes that may be
© ASCE 04015022-2 J. Constr. Eng. Manage.
J. Constr. Eng. Manage.
Downloaded from ascelibrary.org by University of Nebraska-Lincoln on 03/31/15. Copyright ASCE. For personal use only; all rights reserved.
encountered in contemporary worksites, and (4) focused on the
unsafe physical conditions instead of unsafe behavior.
As far as planned theoretical contributions of the paper are
concerned, the results of the research reported in this paper enable
a project manager to predict the probability of different injury out-
comes independent from the tasks or other unique features of a
project. In addition, using a finite number of attributes that can be
identified during the early stages of a construction project provides
several opportunities for the project personnel to change the design
in such a way as to mitigate hazards. Furthermore, the individual-
ized predictive models for different type of construction projects
(e.g., family housing, highway, and so on) helps practitioners to
adapt models for different types of construction, such as vertical
and horizontal. To summarize, the applied results of the research
reported in this paper bring the current safety practices one step
closer to the vision of so-called zero injury in that if work injuries
can be predicted, they may be prevented.
Research Method
This paper built upon the previously established content analysis
of 1,771 struck-by accident reports that identified the fundamental
attributes that cause accidents. In this process, the fundamental
attributes that lead to struck-by accidents were identified as pre-
dictor variables and the severity of accidents that were caused by
these attributes were recorded as the response variable (Esmaeili
2012). The presence/absence of groups of hazardous attributes
(independent variable) and the various injury types (dependent var-
iable) became the dataset for the research reported in this paper.
The accident reports came from the Occupational Safety and
Health Administration (OSHA) Integrated Management Informa-
tion System (IMIS; OSHA 2013). The scope of the dataset was
limited to two major groups, as follows: (1) building construction
general contractors and operative builders, and (2) heavy construc-
tion other than building construction contractors (which usually
have the higher rate of struck-by accidents). The OSHA IMIS data-
base classifies the injury reports into different project groups, such
as single-family housing, residential, and industrial. Table 2shows
the detailed breakdown structure of the construction work groups.
In total, 22 attributes that cause struck-by accidents were iden-
tified (Table 3). The output of the content analysis was a matrix
featuring accident reports (rows) and safety attributes (columns);
within the matrix, if attribute jcontributed to the accident i, then
xij ¼1, otherwise xij ¼0. Several cases were omitted as missing
data because they did not have specific accident severity or the de-
scription in the report was less than two lines. Due to its lack of
sufficient accident reports, Standard Industrial Classification (SIC)
1531 was omitted from the analysis.
Injury severity was used to create the categorical dependent
variable (Lee and Halpin 2003). The severity related to each
accident were recorded and resulted in 26 different types of injury
outcomes. Fatality and severe accidents dominated as accident out-
come, which was expected because the IMIS database includes
OSHA recordable injuries that have severe consequences. How-
ever, this characteristic can cause problems for predictive models
Table 1. Previous Safety Predictive Models
Number Study Dependent variable Independent variable
1 Tam and Fung (1998) Accident rates Safety management strategies
2 Glendon and Litherland (2001) Percent safe behavior Safety climate
3 Gillen et al. (2002) Severity of accidents Perceived safety climate, job demands,
decision latitudes, and coworker support
4 Lee and Halpin (2003) Safety risk Training, supervision, and preplanning
5 Cooper and Phillips (2004) Safe behavior Safety training
6 Fang et al. (2006) Safety climate Personal characteristics
7 Baradan and Usmen (2006) Safety risk Construction trades
8 Johnson (2007) Lost workdays Safety climate
9 Hallowell and Gambatese (2009) Safety risk Formwork activities
10 Rozenfeld et al. (2010) Safety risk Loss-of-control events
11 Esmaeili and Hallowell (2013) Safety risk profiles Highway maintenance and reconstruction tasks
Table 2. Accident Reports Analyzed
Major groups SIC codeaDescription
Struck-by
incidents
After removing
missing data
Major Group 15, building
construction general contractors
and operative builders
1521 General contractors, single-family houses 247 149
1522 General contractors, residential buildings, other than
single-family
111 71
1531 Operative builders 19 14
1541 General contractors, industrial buildings, and warehouses 105 86
1542 General contractors, nonresidential buildings, other than
industrial buildings and warehouses
209 178
Major Group 16, heavy
construction other than building
construction contractors
1611 Highway and street construction, except elevated highways 501 463
1622 Bridge, tunnel, and elevated highway construction 116 104
1623 Water, sewer, pipeline, and communications and power line
construction
280 226
1629 Heavy construction, not elsewhere classified 183 159
Total 1,771 1,436
aSIG = Standard Industrial Classification.
© ASCE 04015022-3 J. Constr. Eng. Manage.
J. Constr. Eng. Manage.
Downloaded from ascelibrary.org by University of Nebraska-Lincoln on 03/31/15. Copyright ASCE. For personal use only; all rights reserved.
because the fatality or other serious injuries will become the most
common predicted outcome since less-severe outcomes may not be
representatively included in the database. To resolve this challenge,
the response variables were categorized into three main groups,
as follows: (1) the response variables dichotomized into fatal and
nonfatal injuries; (2) injury outcomes were classified into the cat-
egories of not severe, mild, and severe; and (3) response variables
qualified the data, i.e., first aid, medical case, lost work time, per-
manent disablement, and fatality. Whereas categorizing response
variables might improve quality of predictive models, it does not
help researchers to predict less-severe injuries that are underrepre-
sented in the database. The distribution of each injury outcome in
different SIC codes is shown in Table 4.
Principal component analysis (PCA) was used to identify the
linear combination of attributes that had the greatest explanatory
power and also to reduce the dimension of the multivariate data
into fewer orthogonal components, i.e., principal components
(PCs). The PCs became independent variables in a generalized
linear model (GLM) to predict the probability of each category of
injury severity. The performance of the predictive models was as-
sessed using a rank probability skill score. At the end, the Friedman
two-way ANOVA by ranks was used to compare the predictive
power of the models. The specific research methods employed are
discussed next.
For clarification, the different steps conducted in the research
reported in this paper are summarized in Fig. 1.
Principal Component Analysis
The PCA was introduced by Pearson (1901) and refined by
Hotelling (1933). The main objective of this technique is to reduce
the dimensionality of a dataset consisting of a large number of in-
terrelated variables while retaining the maximum possible variance.
This pursuit is achieved by transforming the dataset into a new set
of orthogonal variables, i.e., the principal components. In this pro-
cess the first principal component accounts for the largest amount
of variance in the data, the second principal component accounts
for the next largest amount of variance and is uncorrelated with the
first, and so on. Several applications have been stated in the liter-
ature for PCA, such as data reduction (Wold et al. 1987), modeling
(Palau et al. 2012), outlier detection (Barnett and Lewis 1994), var-
iable selection (Jolliffe 2002), clustering (Saitta et al. 2008), and
prediction (Salas et al. 2011). Principle component analysis is also
widely used in climate research wherein global climate data in
space and time needs to be analyzed, to identify coherent spatial
and temporal patterns for diagnosis and prediction (von Storch and
Zwiers 1999).
To explain the mathematical algorithm briefly, suppose that the
results of content analysis on OSHA IMIS database are stored in
Matrix Xof size Nrows by Mcolumns, where Nis the number of
accident records, and Mis the number of attributes. Matrix Xcan
be shown as
XðN;MÞ¼
Accident No:1
.
.
.
Accident No:N
2
6
6
4
x1,1::: x1;M
.
.
...
..
.
.
xN;1::: xN;M
3
7
7
5ð1Þ
where if attribute jcontributed to the accident i, then xij ¼1,
otherwise xij ¼0. The objective is to find a linear transformation
as WMM
ZNM ¼XNM ×WMM ð2Þ
where Z= score matrix whose kth column is Zk, the kth PC,
k¼1;2;:::;m; and W= orthogonal matrix, called loading, that
Table 3. Struck-by Attributes, i.e., Predictor Variables
Number Struck-by attributes
1 Working in swing area of a boomed vehicle
2 Workers on foot and moving equipmenta
3 Lack of vision or visibility
4 Flagger on the jobsite
5 Site topography
6 Working with heavy equipment
7 Falling out from heavy equipment
8 Nail gun
9 Working with power tools/large tools
10 Equipment backup
11 Working near active roadway
12 Vehicle accident
13 Flying debris/objects
14 Falling objects
15 Structure collapse
16 Material storage
17 Lifting heavy materialsb
18 Transporting heavy materials horizontally
19 Working at trench
20 Wind
21 Snow
22 Temperature
aFor example, workers are assigned to an activity in proximity of an
excavator.
bHeavy materials are defined as objects that if hit a worker, even with low
speed, can cause an injury.
Table 4. Classifying Injury Types and Their Distribution for Each SIC As Response Variable
Number of
categories
Type of injury,
response variance
Standard Industrial Classification code
Average1521 1522 1541 1542 1611 1622 1623 1629
1 Not fatality 69.8 67.9 55.9 54.6 24.1 42.1 36.3 30.2 46
2 Fatality 30.2 32.1 44.1 45.4 75.9 57.9 63.7 69.8 54.0
1 Not severe 10.1 9.9 6.5 9.2 3.0 5.6 2.4 7.0 6.7
2 Mild 59.7 58 49.4 45.4 21.1 36.5 33.9 23.2 39.3
3 Severe 30.2 32.1 44.1 45.4 75.9 57.9 63.7 69.8 54.0
1 First aid 10.1 9.9 6.5 9.2 3.0 5.6 2.4 7.0 6.7
2 Medical case 20.8 21.0 11.8 8.1 3.6 7.5 7.8 2.9 9.3
3 Lost work time 30.8 29.6 35.5 30.8 13.3 23.4 22.4 16.9 24.8
4 Permanent disablement 8.2 7.4 2.2 6.5 4.2 5.6 3.7 3.5 5.3
5 Fatality 30.2 32.1 44.1 45.4 75.9 57.9 63.7 69.8 54.0
Note: The SIC is expressed as a percentage.
© ASCE 04015022-4 J. Constr. Eng. Manage.
J. Constr. Eng. Manage.
Downloaded from ascelibrary.org by University of Nebraska-Lincoln on 03/31/15. Copyright ASCE. For personal use only; all rights reserved.
projects Xto Z. The PCA aims to find elements of Win a way that
the squared sum of Xs projection on to the PCs direction is the
maximum. Jolliffe (2002) showed that the columns of W(wi) are
the Eigenvectors of Xs covariance matrix (Cx).
Another common approach to find PCs is to use a correlation
matrix instead of a covariance matrix. However, Chatfield and
Collins (1980) stated that PCs obtained from a correlation matrix
are not the same as PCs obtained from a covariance matrix. One
of the main drawbacks of using a covariance matrix is that PCs
obtained from this method are sensitive to the units of measure-
ment used for each variable. It means that variables with largest
variance will dominate the first few PCs. In this case, because all
measurements are made in the same units, the covariance matrix
might be more appropriate. The algorithm of PCA is implemented
through the so-called prcomp function (Stats Package 2013)inR,
which is an open-source statistical program (R Development Core
Team 2011).
Selecting the number of PCs in the analysis is an important
issue. One of the common rules for selecting PCs is to drop any PC
with variance less than 1, which is known as Kaisers rule (Kaiser
1960). However, most of the scholars considered this as the most
inaccurate of all methods (Velicer and Jackson 1990). Other meth-
ods of selecting PCs that have proven to be more effective are to
look at the scree plot or retained variance by PCs since the ith
Eigenvalue λiis a valid measure of variance accounted by the ith
PCs (Jolliffe 2002). Therefore, the cumulative variance (CumVar)
retained by the first kPC can be determined as
CumVark¼X
k
i¼1
λiX
n
j¼1
λjð3Þ
Generalized Linear Models
Regression techniques have been widely used in the construction
industry to predict quarterly new orders for housing, commercial,
and industrial construction projects (Akintoye and Skitmore 1994;
Goh 1999), values of total construction activities (Tang et al. 1990),
and cash flow (Park et al. 2005). In general, regression techniques
aim to model the relationships among variables by quantifying the
magnitude that a response variable is related to a set of explanatory
variables. The output of the regression model is a forecasting tool
that can be used to evaluate the impact of various alternative inputs
on the response variable (Goh and Teo 2000).
A classical method for evaluating the relationship between a
predictor and response variable is linear regression. One of the
major assumptions of linear regression (LR) is that the response
variables come from a normally distributed population. However,
in reality, many response variables are categorical and violate
this assumption. In order to overcome this barrier, a more general
approach was adopted that does not have this limitation of LR,
called generalized linear models. This modeling technique pro-
vides a very flexible approach for exploring the relationships
among a variety of variables (discrete, categorical, continuous
and positive, and extreme value) as compared to traditional regres-
sion (McCullagh and Nelder 1989). In GLM, instead of modeling
the mean, a one-to-one continuous differentiable transformation
gðμiÞ, called a link function, is used. Depending on the assumed
distribution of the response variable (Y), an appropriate link func-
tions can be defined (McCullagh and Nelder 1989). As mentioned
in the section that described data acquisition, the response variables
in the research reported in this paper are dichotomous (fatality/no
fatality) and categorical (e.g. severe/mild/nonsevere). Thus, a logit
link function was used
ηi¼gðμiÞ¼log μi
1μið4Þ
where μi= expected value of the response variable (injury out-
come); and ηi= linear predictor that transforms the expected value
of the response variable in a way such that
ηi¼x0
iβð5Þ
where β= regression coefficient; and x= set of predictors and in-
cludes Nobservations (accident reports) and Ppossible predictor
variables (leading PCs). For two categorical variables, the model is
logistic regression, and for three and five categorical variables, the
model is the multinomial regression. Model parameters in GLM
will be determined in an iterative process called iterated weighted
least-squares (IWLS). In summary, this method finds a set of model
parameters that maximize the likelihood of reproducing the data
distribution of the training set. Multinomial logistic regression is
a generalization of the two-category case described previously
(Hastie et al. 2002). The binomial and multinomial logistic regres-
sion was implemented using the library VGAM (Yee and Wild
1996) in the open-source statistical package R. After esti-
mating β, one can predict ηand then the values can be transformed
into an original response using an inverse link function. The same
approach extends to multiple categories.
Model Pruning
One of the common threats to the validity and usefulness of stat-
istical models is overfitting the dataset, which yields a large number
of insignificant variables in the model. Therefore, the predicted var-
iables of the model should be pruned to find a so-called best model
that contains the right quantity of variables. To do that, a stepwise
regression approach was adopted that minimizes the Akaike infor-
mation criterion (AIC) instead of likelihood function to evaluate
goodness-of-fit in the stepwise search. By minimizing AIC, a bal-
ance between the number of parameters and goodness-of-fit will be
built. This method measures the ability of the predictive model to
reproduce the variance of the observations with the fewest number
of parameters (Wilks 1995). The AIC value can be calculated from
AIC ¼2K2lnðLÞð6Þ
where K= number of model parameters; and L= maximized value
of the likelihood function for the model. To minimize the AIC, both
Fig. 1. Research steps
© ASCE 04015022-5 J. Constr. Eng. Manage.
J. Constr. Eng. Manage.
Downloaded from ascelibrary.org by University of Nebraska-Lincoln on 03/31/15. Copyright ASCE. For personal use only; all rights reserved.
forward and backward searches were conducted in the stepwise
regression.
Evaluation of Model Skill
After developing the model, its predictive power should be mea-
sured objectively. In the research reported in this paper, the perfor-
mance of the model was measured against the observed data
through a rank probability score (RPS), which indicates the degree
to which the model predicts the observed data. To calculate the
RPS, two vectors were constructed, as follows: (1) for forecasted
probabilities (Pj), based on the GLM model predictions; and (2) for
observed events (zj), from the observed data. Then the cumulative
density function (CDF) of Pjand zjwere constructed based on the
GLM model predictions, resulting in the vectors PCDF;jand zCDF;j.
The RPS was computed using
RPS ¼1=N×X
j
i¼1ðPCDF;jzCDF;jÞ2ð7Þ
Although RPS is quite informative regarding the predictive
power of the model, there is a possibility that the observed data
may be reproduced by pure chance. Therefore, it is necessary to
compare the RPS of the model against the RPS of the random pro-
cess and assess their effectiveness. This test can be done through
the ranked probability skill score (RPSS), which has been used
in various climatological contexts to compare the models skill in
predicting categorical rainfall and streamflow quantities (Regonda
et al. 2006). A detailed description of the RPSS method is provided
by Wilks (1995). The RPSS is computed by forming a ratio be-
tween the average RPS values of the model and chance
RPSS ¼1RPSmodel
RPSchance ð8Þ
The RPSS compares the accuracy of a models predictions
against chance. However, in the research reported in this paper,
rather than simply compare the model against a 50=50 chance for
fatality/no fatality (or 33=33=33 chance for three categories re-
sponse variables), it was compared to the ratio of response vari-
ables provided by the original data. In other words, instead of pure
chance, a weighted coin was used, which is a more rigorous test
of model performance. The range for RPSS is from minus infinity
to 1, where negative values indicate that the model results are worse
than chance, 0 means that the model results reproduce chance
events, and positive values show that the model results are closer
to the original observations than chance.
Results
As mentioned previously, Esmaeili (2012) conducted content
analysis on 1,771 struck-by accident reports to identify fundamen-
tal safety attributes. The scree plot of fractional variance captured
by the various modes for SIC 1521 is illustrated in Fig. 2. Over 78%
of the fractional variance in the predictor set is captured by the first
five PCs, meaning that the 18-dimensional data (only 18 attributes
existed in SIC 1521) can be effectively represented by the five-
dimensional PCs.
As mentioned previously, PCs are the linear combination of dif-
ferent attributes. The loadings obtained by the PCA can be used to
determine the weight of various attributes (variables). The loadings
for the first five PCs of SIC 1521 are provided in Table 5. The first
PC, which captures approximately 31% of variance, is essentially
tasks related to working with tools and struck-by nail guns. This is
reasonable, because SIC 1521 includes general contractors, single-
family housing projects in which working with nail guns and power
tools are very common activities. Working with heavy equipment
has the highest loads on the second PC. For the last three PCs,
attributes related to struck-by objects, such as falling objects, struc-
ture collapse, lifting heavy materials, and transporting heavy ma-
terials horizontally, were the most influential attributes. The same
procedure was conducted on PCs for the remaining SICs. The se-
lected number of PCs and variance captured for different categories
of SIC is shown in Table 6.
Once the PCs were selected, GLMs with a logit link function
were fit to the selected PCs to predict the probability of fatality.
A stepwise regression approach found the parameter set that mini-
mized the model AIC. The overall results of generalized linear
models for two-category response variables are shown in Table 7.
0
0.05
0.1
0.15
0.2
0.25
0.3
0123456789101112131415161718
Variance Captured
PCs
Cut off point
Fig. 2. Scree plot of variance captured by each PC for SIC 1521
Table 5. Principle Component Analysis Loadings for the First Five PCs of SIC 1521
Number Attributes PC1 PC2 PC3 PC4 PC5
1 Working in swing area of a boomed vehicle 0.123 0.201 0.119 ——
2 Workers on foot and moving equipment ———0.123
3 Site topography 0.151 0.179 0.129
4 Working with heavy equipment 0.247 0.668 0.192 0.145
5 Nail gun 0.493 0.271 0.138
6 Working with power tools/large tools 0.531 0.257 0.123
7 Equipment backup ————0.199
8 Falling objects 0.379 0.361 0.462 0.383 0.456
9 Structure collapse 0.310 0.457 0.471 0.000 0.473
10 Lifting heavy materials 0.322 0.202 0.466 0.732 0.112
11 Transporting heavy materials horizontally 0.216 0.285 0.421 0.447 0.660
Variance captured (%) 30.9 18.9 10.9 9.1 7.9
© ASCE 04015022-6 J. Constr. Eng. Manage.
J. Constr. Eng. Manage.
Downloaded from ascelibrary.org by University of Nebraska-Lincoln on 03/31/15. Copyright ASCE. For personal use only; all rights reserved.
By adopting a stepwise variable selection method, the number of
PCs were reduced for most of SIC categories. By looking at sig-
nificant PCs and their related attributes in each SIC group, more
insights can be obtained through the critical attributes that contrib-
utes to fatalities. For example, two variables [(1) PC1, and (2) PC2]
emphasize the importance of working with power tools (e.g., nail
gun) and working with heavy equipment in causing fatality in SIC
1521. The same procedure was performed for the three [(1) not
severe, (2) mild, and (3) severe] and five [(1) first aid, (2) medical
case, (3) lost work time, (4) permanent disablement, and (5) fatality]
categorical response variables. However, just the results for two-
category response variables (i.e., fatality/no fatality) are presented
because these models had the superior performance in comparison
to other models.
For the two-category injury outcomes (i.e., fatality/no fatality),
since the response to be modeled varies binomially, the logit link
function was used to transform responses, x, into the linear predic-
tor. By estimating parameters (β), link functions (ηi) can be calcu-
lated, and by back-transforming link functions with the inverse
logit, probabilities of fatalities can be obtained. For example, the
underlying formula of model SIC 1521 is
lnPðfatalityÞ
1PðfatalityÞ¼β0þβ1×PC1þβ2×PC2ð9Þ
By substituting the values β0¼intercept ¼0.908;β1¼
0.899;and β2¼1.030
lnPðfatalityÞ
1PðfatalityÞ¼0.908 þ0.899 ×PC1þ1.030 ×PC2ð10Þ
Model Validation
To measure the predictive power of the models, the research re-
ported in this paper implemented the widely-used measure for
categorical data, i.e., rank probability skill score. As stated previ-
ously, RPSS is one of the strictest verification measures. The RPSS
can vary from minus infinity (no skill) to 1 (perfect skill), and the
expected value of RPSS is less than zero (Mason 2004), which
means that any value greater than zero indicates superior perfor-
mance of the model to the reference forecast. Unfortunately, there
is no established acceptable range for RPSS; however, these values
can be used to compare the skill performance among different mod-
els. For example, the RPSS values obtained as reported in this paper
can be used as a baseline to compare the performance of future
predictive models in the construction safety domain.
The results of the RPSS for different categories of SIC are
shown in Table 8. The RPSS values for all 27 models demonstrate
a strong model performance. The lowest RPSS values for the
two-category response variable models belong to SIC 1611 (0.047),
SIC 1623 (0.047), and SIC 1542 (0.088), which have a high rate of
fatalities, i.e., 76, 64, and 45%, respectively. Similar patterns can be
observed in the three-category and five-category response variables
models. The best RPSS for all three different types of predictive
models belong to SIC 1522, which have 0.246 for two-category,
0.226 for three-category, and 0.222 for five-category response
variables. The range of RPSS values is comparable with other pre-
dictive models for controlling of mixed-mode buildings (RPSS ¼
0.1540.186;May-Ostendorp et al. 2011) and weather forecast
(RPSS ¼0.00.50;Clark et al. 2004). In general, the obtained re-
sults indicate a strong predictive power for all 27 predictive models.
The RPSS values for different groups were compared to find any
significant difference between them.
Table 6. Number of PCs Selected and Total Variance Captured
Items 1521a1522 1541 1542 1611 1622 1623 1629 Total
Number
of PCs
5 5881396912
Total
variance
0.78 0.75 0.89 0.84 0.96 0.92 0.71 0.88 0.88
aRefers to the Standard Industrial Classification code. The definition of
each category is provided in Table 2.
Table 7. Overall Results of the Stepwise Generalized Linear Models for
Two-Category Response Variables
Number
SIC
codeaPredictorbEstimatec
Standard
errordteSignificancef
1 1521 Intercept 0.908 0.201 4.53 0.000
PC1 0.899 0.310 2.91 0.004
PC2 1.030 0.311 3.31 0.001
2 1522 Intercept 0.881 0.295 2.98 0.003
PC1 1.294 0.498 2.60 0.009
PC2 0.808 0.470 1.72 0.086
PC4 1.004 0.712 1.41 0.159
3 1541 Intercept 0.247 0.228 1.09 0.278
PC2 0.611 0.419 1.46 0.145
PC7 1.931 0.834 2.31 0.021
4 1542 Intercept 0.316 0.158 2.00 0.046
PC1 0.422 0.263 1.61 0.108
PC2 0.408 0.271 1.51 0.132
PC5 0.818 0.412 1.99 0.047
PC6 1.057 0.493 2.14 0.032
5 1611 Intercept 1.189 0.114 10.42 0.000
PC1 0.442 0.201 2.20 0.028
PC2 0.741 0.267 2.78 0.006
PC4 0.670 0.301 2.23 0.026
6 1622 Intercept 0.437 0.221 1.98 0.048
PC8 3.243 1.029 3.15 0.002
PC9 2.155 1.091 1.98 0.048
7 1623 Intercept 0.599 0.141 4.24 0.000
PC2 0.744 0.298 2.50 0.013
8 1629 Intercept 1.339 0.294 4.55 0.000
PC1 1.880 0.506 3.71 0.000
PC2 0.720 0.506 1.42 0.155
PC3 1.604 0.695 2.31 0.021
PC6 1.805 1.065 1.70 0.090
PC7 2.084 1.145 1.82 0.069
PC8 1.329 0.679 1.96 0.050
PC9 3.026 1.511 2.00 0.045
9 All
datag
Intercept 0.420 0.057 7.32 0.000
PC1 1.042 0.106 9.80 0.000
PC2 0.536 0.105 5.11 0.000
PC3 0.319 0.142 2.25 0.024
PC6 0.623 0.179 3.48 0.001
PC7 0.738 0.194 3.81 0.000
PC8 0.581 0.202 2.88 0.004
Note: In reference to fatality/not fatality.
aSIG = Standard Industry Classification. The definition of each category is
provided in Table 2.
bIndependent variables or predictors that selected to be included in the
predictive model.
cRegression coefficient for each variable.
dStandard error (σ) of variable regression coefficient, which is the
measurement of dispersion of regression coefficient over the sampling
distribution.
eValue of t-statistics. The t-statistic is calculated by dividing value of coef-
ficient by its standard error and is compared to theoretical t-distribution for
accuracy.
fSignificance of t-statistic (P-value), which is the estimated probability of
obtaining sample results where the regression coefficient is not true.
gAll data includes data points from the eight SIC categories.
© ASCE 04015022-7 J. Constr. Eng. Manage.
J. Constr. Eng. Manage.
Downloaded from ascelibrary.org by University of Nebraska-Lincoln on 03/31/15. Copyright ASCE. For personal use only; all rights reserved.
Repeated-Measures Design
The repeated-measures ANOVA was selected to compare RPSS
values obtained from different categories of response variables
because the same set of response variable was used in different
conditions. This test has three important univariate statistical as-
sumptions (Vogt 1999), as follows: (1) randomness of the sample,
(2) homogeneity of variance of differences between treatment lev-
els, and (3) normal distribution of the differences between treatment
levels. Assumption 1 implies that the sample should be randomly
selected from the population. In this case, all accident reports in
SIC were used to develop predictive models. Although there would
be some underreporting of accidents (e.g., near-misses and mild
injuries), the data can be considered as a representative sample of
struck-by accidents across the diverse geographical and cultural
regions of the United States.
Assumption 2, which is usually referred to as sphericity, re-
quires the equality of the variances of the differences between treat-
ment levels. Assumption 2, which is similar to homogeneity of
variance in between-group ANOVA, is a more general condition
of compound symmetry. Mauchleys test for sphericity is the most
common way to test the hypothesis that the variances of the differ-
ences between conditions are equal. If data violate the sphericity
assumption, the literature suggests several corrections to modify
the F-ratio, such as GreenhouseGeisser (Greenhouse and Geisser
1959), lower-bound, and HuynhFeldt (Huynh and Feldt 1976).
SPSS 17 was used to conduct the analysis (Field 2013). Mauchleys
test indicated that the assumption of sphericity has been met (χ2¼
4.250,P-value ¼0.119 >0.05).
Assumption 3 implies that the differences between treatment
levels should be normally distributed. KolmogorovSmirnov and
ShapiroWilk tests were conducted to test normality, and the results
appear in Table 9. Some of these data are not normally distributed
(P-value < 0.05); therefore, the last assumption is not satisfied.
Because the assumptions of repeated-measures ANOVA are
not satisfied, the research reported in this paper used an alternative
nonparametric test called the Friedman two-way ANOVA by ranks
(Friedman 1937). The Friedman two-way ANOVA variance by
ranks tests the null hypothesis that the krepeated measures or
matched groups come from the same population or populations
with the same median (Siegel and Castellan 1988). If the result of
the Friedman test is significant, at least one of the groups of the
samples is different from the other samples. The results of the
test indicates that there is a significant difference (χ2¼9.750,
P-value ¼0.005 <0.05) between the value of RPSS of two, three,
and five response variables.
Post-Hoc Tests
When the result of the Friedman test is significant, it indicates that
at least one of the categories differs from at least one other category;
however, it does not show how many of the groups or which one
of the groups is different. To supplement this finding, the research
reported in this paper used two post hoc tests. The Wilcoxon signed
ranks test evaluated the relative magnitude as well as direction of
differences between the assorted groups (Table 10). Applying a
Bonferroni correction, all effects were compared with a 0.0167
level of significance. It could thus be concluded that only the differ-
ence between the RPSS values for the two-category and three-
category procedures were significant (P-value ¼0.004 <0.0167).
The second way to conduct multiple comparisons between
groups is to determine the differences jRuRvjfor all pairs of
conditions or groups. Then, the significance of individual pairs of
differences can be tested as per Siegel and Castellan (1988)
jRuRvjZ=kðk1Þffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
Nkðkþ1Þ
6
rð11Þ
where Ru= sum of ranks for category u;Rv= sum of ranks for
category v;N= number of cases; k= number of categories;
Table 8. Rank Probability Skill Score Values for Two-Category, Three-Category, and Five-Category Response Variables
Response
variables
Standard Industrial Classification code
All data Mean1521 1522 1541 1542 1611 1622 1623 1629
Two-category 0.149 0.246 0.139 0.088 0.047 0.204 0.047 0.206 0.116 0.138
Three-category 0.120 0.226 0.073 0.074 0.034 0.040 0.036 0.158 0.085 0.094
Five-category 0.189 0.222 0.103 0.086 0.029 0.050 0.037 0.152 0.083 0.106
Table 9. Test of Normality for Differences between Treatment Levels
Differences between
treatment levels
KolmogorovSmirnov ShapiroWilk
Statistic Significance Statistic Significance
Two-category and
three-category
0.251 0.146 0.720 0.004
Two-category and
five-category
0.279 0.067 0.750 0.008
Three-category and
five-category
0.339 0.007 0.710 0.003
Table 10. Results of Wilcoxon Signed Ranks Test
Summary statistics
Two-category and
three-category
Two-category and
five-category
Three-category and
five-category
Za2.521 1.680 1.260
Sum of negative ranks 36.000 30.000 9.000
Sum of positive ranks 0.000 6.000 27.000
Effect sizeb0.630 0.420 0.315
Asymptotic significance, two-tailed 0.012 0.093 0.208
Exact significance, one-tailed 0.004 0.055 0.125
aBased on positive ranks.
bAccording to Rosenthal (1991), effect size can be calculated as r¼Z=ffiffiffiffi
N
p, in which N¼16.
© ASCE 04015022-8 J. Constr. Eng. Manage.
J. Constr. Eng. Manage.
Downloaded from ascelibrary.org by University of Nebraska-Lincoln on 03/31/15. Copyright ASCE. For personal use only; all rights reserved.
and Z=kðk1Þis the abscissa value from the unit normal distribution
above which lies =kðk1Þpercent of the distribution. For α¼
0.05 and k¼3,Zwill be equal to 2.394. As a result, the right-hand
side of the inequality will be calculated as 9.576. By looking at
Table 10, if the difference between the sums of ranks is bigger than
or equal to the critical difference, then that difference is significant.
Performing the calculations, it is found that only the difference be-
tween the first and second type of response variables (12) exceeds
the critical differences (9.576), which means that the difference
between the categories is significant.
In general, the RPSS values for the two-category response
variables models on average are better than the three-category or
five-category response variables models. This result was expected
because the fatalities were the dominant response variable in most
of the SIC groups. In addition, dividing nonfatal responses in the
three-category or five-category response variables would give a
higher weight to the fatality and decrease the predictive power the
models.
A diagnostic test was also conducted, and the residual plots of R
(being actual-Yless projected-Y) versus so-called predicted-Yshow
a random distribution. This confirms that the assumption about the
normality is valid. There was no need to analyze multicolinearity
among variables because the PCs are orthogonal and PCA remove
any multicolinearity.
Practical Implications
While the mathematics behind the models is complicated, the find-
ings can be easily used in practice. For example, to calculate the
probability of a fatality for an activity in SIC 1521, the following
steps should be followed:
1. The list of struck-by attributes should be reviewed by a practi-
tioner to decide which attributes workers would be exposed to
during the activity. For example, assume that there are three
main attributes, as follows: (1) nail gun, (2) falling objects,
and (3) material storage. The matrix of observation can be
constructed by assigning a one for attributes that exists and a
zero for attributes that does not exist. The matrix would appear
like this
X¼½00000001000001010000001×22 ð12Þ
2. The PCs that would be entered into the predictive model
should be calculated. As mentioned previously, PCs can be
calculated from Eq. (1)
ZðPCsÞ1×22 ¼XðobservationsÞ1×22 ×WðloadingsÞ22×22
ð13Þ
Having already completed Step 1, the observation matrix is
ready, and the loading matrix has been calculated as shown in
Table 6. Therefore, the PCs can be calculated as
ZðPCsÞ¼½PC1¼0.098; PC2¼0.425; PC3¼0.690;
PC4¼0.386; PC5¼0.312; :::ð14Þ
3. The resulting PC1 and PC2 can then be inserted into the pre-
dictive model for SIC 1521. The probability of fatality can be
calculated as
lnPðfatalityÞ
1PðfatalityÞ¼0.908 þ0.899 ×ð0.098Þ
þ1.030 ×ð0.425Þ
¼1.434
yieldsPðfatalityÞ¼0.192 ð15Þ
There are several practical implications for these results. For
example, empowered with this risk-assessment tool, a designer can
see the effect of different design elements on safety and alter the
design to provide a safer construction environment. If the hazards
cannot be prevented during the design, more attention should be
paid to mitigate them during the construction phase. In addition,
a project manager can compare alternative means and methods to
see which ones provide fewer hazards for the workers. Further-
more, a supervisor can identify hazardous activities or situations to
highlight them during job hazard analysis or toolbox meetings.
Limitations
Although the impact of these results on the current preconstruction
safety management is significant, there are several limitations re-
lated to this paper. First, splitting the data and conducting cross
validation is a robust method to check the validity of the model;
however, more studies should be conducted to test the validity of
the model in predicting hazards in real projects. Second, the exter-
nal validity of the research reported in this paper is limited because
the IMIS database includes only severe accidents that are required
to be reported by OSHA regulations. Therefore, the models are
applicable for accidents with serious outcomes. The predictive
models would likely change if the distribution of the input data
were to change. However, this limitation is intrinsic to the OSHA
IMIS database that was available. Future research should be con-
ducted to investigate predictive models for minor injuries and even
near-misses. Third, the safety risk can be mitigated by implement-
ing different practices. Further study should be conducted to evalu-
ate the effect of such injury-prevention practice implementation on
changing the injury outcome. In addition, the predictive models
developed here are based on classification of whether an attribute
contributed to an accident or not; however, in reality, the level of
contribution of each attribute can vary as a continuous variable be-
tween 0 and 1. By considering the impact of injury prevention prac-
tices on reducing risk of an attribute, researchers can enter attributes
as continuous variable into predictive models. Fourth, this paper
focused on the safety attributes that can be identified during the
preconstruction phase. These attributes mainly address physically
unsafe conditions in a project but accidents occur due to interaction
among both unsafe conditions and unsafe behavior. Another study
should be conducted to predict the injury outcomes while consid-
ering both attributes related to unsafe behavior and unsafe physical
conditions. Fifth, these models are predicting a severity of outcome
if an incident occurs, but they do not predict the probability of an
event which is a very small number. Sixth, this model considers
only elements that contribute to the probability of an injury but
does not consider any risk mitigating counterparts. A unified model
is needed. Seventh, while GLM is a strong statistical technique to
develop predictive models, future studies should be conducted to
compare performance predictive models developed using GLM
with other nonlinear modeling techniques such neural network
and support vector machines. Finally, the practical implication
of developed models should be tested in real construction projects.
Developing decision support systems to facilitate and automate
adoption of attribute-based safety risk management can be
rewarding.
© ASCE 04015022-9 J. Constr. Eng. Manage.
J. Constr. Eng. Manage.
Downloaded from ascelibrary.org by University of Nebraska-Lincoln on 03/31/15. Copyright ASCE. For personal use only; all rights reserved.
Despite these limitations, the proposed predictive models pres-
ent a practical and easy approach for designers, jobsite engineers,
and safety managers who are not familiar with extensive math-
ematical calculations to reliably predict the level of hazard in a
project.
Conclusions
Identifying the level of risk a project includes before the start of a
project is an essential step towards implementing proactive safety
management and achieving excellent safety performance. Unfortu-
nately, the current models that predict safety outcomes of projects
are not based on objective data, ignore the unsafe physical condi-
tions, and cannot be used in the preconstruction phase of a project.
To address these limitations, the research reported in this paper
used an attribute-based risk analysis approach to develop predictive
models that forecast injury outcomes and that may be implemented
during the early stages of projects. To achieve this objective, the
research reported in this paper built upon the previously established
content analysis of 1,771 struck-by accident reports from the
OSHA IMIS database (Esmaeili 2012); the research reported in
this paper identified 22 fundamental attributes (Table 3) that cause
struck-by accidents and recorded the severity of injuries resulted
from them. In the research reported in this paper, the safety attrib-
utes (e.g., flagger on the jobsite) were predictive variables and the
severity of injuries (e.g., medical case) was response variables. The
safety attributes were used to fuel the predictive models as leading
indicators of construction safety and form the basis for injury pre-
vention activities.
In order to reduce the dimension of the data and to remove any
possible multicollinearity among the variables, the matrix of obser-
vations were subjected to principal component analysis. Then the
influential PCs were entered into the GLM model and three series
of mathematical models were developed (for nine different groups
of data), as follows: (1) one that predicts the probability of fatality;
(2) one that predicts the probability of not severe, mild, and severe
injuries; and (3) one that predicts the probability of first aid, medi-
cal case, lost work time, permanent disablement, and fatality. To
evaluate the predictive power of the models, the RPSS of the mod-
els were calculated and the performance of different models was
compared. The results of RPSS indicated the developed models
have a performance better than chance and are valid.
The research reported in this paper resulted in several reliable
and valid predictive models that can be used by practitioners, man-
agers, supervisors, and researchers to accurately forecast the prob-
ability of different types of injuries. Safety predictions are the
cornerstone of an effective proactive safety management program.
As meteorologists build moderately accurate weather forecasts by
monitoring information (e.g., temperature, wind, and speed) from
various sources, the safety managers can use the fundamental safety
risk attributes to predict the probability of different injury out-
comes. To use meteorological terms, the models tell the project
manager if there is a storm (safety issue) approaching the project.
Before the research reported in this paper, there was minimal
systematic method for project managers to predict the possible
injury outcomes for a project. Furnished with a dataset of sufficient
size and quality, it is now possible to apply statistical techniques
and create reliable mathematical models. This paper augments
recent studies that have recognized the importance of identifying
hazards created by unsafe physical conditions in the early stages of
a project and lays the foundation for future work related to precon-
struction safety management and hazards mitigation. It is expected
that these predictive models could change the way potential injuries
are considered during the planning, project financing, and safety
controls stages, and will empower managers seeking to enact the
ever-desirable so-called zero-injuries project.
Acknowledgments
The National Science Foundation is thanked for supporting the
research reported in this paper through an Early Career Award
(i.e., the CAREER Program). This paper is based upon work
supported by the National Science Foundation under Grant No.
1253179. Any opinions, findings, and conclusions or recommen-
dations expressed in this material are those of the writers and do not
necessarily reflect the views of the National Science Foundation.
Bentley Systems is also recognized for their financial support for
the research reported this paper and Mr. Dean Bowman in particular
who provided invaluable insight to the application of this method.
References
Akintoye, A., and Skitmore, M. (1994). Models of UK private sector
quarterly construction demand.J. Constr. Manage. Econ., 12(1), 313.
Baradan, S., and Usmen, M. A. (2006). Comparative injury and fatality
risk analysis of building trades.J. Constr. Eng. Manage.,10.1061/
(ASCE)0733-9364(2006)132:5(533), 533539.
Barnett, V., and Lewis, T. (1994). Outliers in statistical data, 3rd Ed.,
Wiley, Chichester, U.K.
Chatfield, C., and Collins, A. J. (1980). Introduction to multivariate
analysis, Chapman and Hall, New York.
Chen, J. R., and Yang, Y. T. (2004). A predictive risk index for safety
performance in process industries.J. Loss Prev. Process Ind., 17(3),
233242.
Chua, D. K. H., and Goh, Y. M. (2005). A Poisson model of construction
incident occurrence.J. Constr. Eng. Manage.,10.1061/(ASCE)0733
-9364(2005)131:6(715), 715722.
Clark, M., Gangopadhyay, S., Hay, L., Rajagopalan, B., and Wilby, R.
(2004). The Schaake shuffle: A method for reconstructing spacetime
variability in forecasted precipitation and temperature fields.J. Hydro-
meteorol., 5(1), 243262.
Cooper, M. D., and Phillips, R. A. (2004). Exploratory analysis of the
safety climate and safety behavior relationship.J. Saf. Res., 35(5),
497512.
Esmaeili, B. (2012). Identifying and quantifying construction safety risks
at the attribute level.Ph.D. thesis, Univ. of Colorado, Boulder, CO.
Esmaeili, B., and Hallowell, M. R. (2013). Integration of safety risk
data with highway construction schedules.J. Constr. Manage. Econ.,
31(6), 528541.
Fang, D. P., Chen, Y., and Louisa, W. (2006). Safety climate in construc-
tion industry: A case study in Hong Kong.J. Constr. Eng. Manage.,
10.1061/(ASCE)0733-9364(2006)132:6(573), 573584.
Field, A. (2013). Discovering statistics using IBM SPSS statistics, 4th Ed.,
Sage, Los Angeles.
Friedman, M. (1937). The use of ranks to avoid the assumption of normal-
ity implicit in the analysis of variance.J. Am. Stat. Assoc., 32(200),
675701.
Gillen, M., Baltz, D., Gassel, M., Kirch, L., and Vaccaro, D. (2002).
Perceived safety climate, job demands, and coworker support among
union and nonunion injured construction workers.J. Saf. Res., 33(1),
3351.
Glendon, A. I., and Litherland, D. K. (2001). Safety climate factors, group
differences and safety behavior in road construction.J. Saf. Sci., 39(3),
157188.
Goh, B. H. (1999). An evaluation of the accuracy of the multiple
regression approach in forecasting sectoral construction demand in
Singapore.J. Constr. Manage. Econ., 17(2), 231241.
Goh, B. H., and Teo, H. P. (2000). Forecasting construction industry de-
mand, price and productivity in Singapore: the BoxJenkins approach.
J. Constr. Manage. Econ., 18(5), 607618.
© ASCE 04015022-10 J. Constr. Eng. Manage.
J. Constr. Eng. Manage.
Downloaded from ascelibrary.org by University of Nebraska-Lincoln on 03/31/15. Copyright ASCE. For personal use only; all rights reserved.
Goh, Y. M., and Chua, D. K. H. (2013). Neural network analysis of
construction safety management systems: A case study in Singapore.
J. Constr. Manage. Econ., 31(5), 460470.
Greenhouse, S. W., and Geisser, S. (1959). On methods in the analysis of
profile data.Psychometrika, 24(2), 95112.
Hallowell, M. R., Esmaeili, B., and Chinowsky, P. (2011). Safety risk
interactions among highway construction work tasks.J. Constr.
Manage. Econ., 29(4), 417429.
Hallowell, M. R., and Gambatese, J. A. (2009). Activity-based safety and
health risk quantification for formwork construction.J. Constr. Eng.
Manage.,10.1061/(ASCE)CO.1943-7862.0000071, 990998.
Hastie, T., Tibshirani, R., and Friedman, J. (2002). The elements of
statistical learning: Data mining, inference, and prediction, 2nd Ed.,
Springer, Amsterdam, Netherlands.
Hinze, J, Hallowell, M., and Baud, K. (2013). Construction-safety best
practices and relationships to safety performance.J. Constr. Eng.
Manage.,10.1061/(ASCE)CO.1943-7862.0000751, 04013006.
Hinze, J., Huang, X., and Terry, L. (2005). The nature of struck-by
accidents.J. Constr. Eng. Manage.,10.1061/(ASCE)0733-9364(2005)
131:2(262), 262268.
Hotelling, H. (1933). Analysis of a complex of statistical variables into
principal components.J. Educ. Psychol., 24(7), 498520.
Huynh, H., and Feldt, L. S. (1976). Estimation of the Box correction for
degrees of freedom from sample data in randomized block and split plot
designs.J. Educ. Stat., 1(1), 6982.
Johnson, S. E. (2007). The predictive validity of safety climate.J. Saf.
Res., 38(5), 511521.
Joliffe, I. T. (2002). Principal component analysis, 2nd Ed., Springer,
Berlin.
Kaiser, H. F. (1960). The application of electronic computers to factor
analysis.Educ. Psychol. Meas., 20(1), 141151.
Lee, S., and Halpin, D. W. (2003). Predictive tool for estimating accident
risk.J. Constr. Eng. Manage.,10.1061/(ASCE)0733-9364(2003)129:
4(431), 431436.
Mason, S. J. (2004). On using climatology as a reference strategy in the
brier and ranked probability skill scores.Mon. Weather Rev., 132(7),
18911895.
May-Ostendorp, P., Henze, G., Corbin, C., Rajagopalan, B., and
Felsmann, C. (2011). Model predictive control of mixed-mode build-
ings with rule extraction.Build. Environ., 46(2), 428437.
McCullagh, P., and Nelder, J. A. (1989). Generalized linear models,
Chapman and Hall, London.
OSHA (Occupational Safety and Health Administration). (2013). Statis-
tics and data, standard industry classification (SIC) system search.
http://goo.gl/iC2dV(Feb. 8, 2015).
Palau, C. V., Arregui, F. J., and Carlos, M. (2012). Burst detection in water
networks using principal component analysis.J. Water Resour. Plann.
Manage.,10.1061/(ASCE)WR.1943-5452.0000147,4754.
Park, H. K., Han, S. H., and Russell, J. S. (2005). Cash flow forecasting
model for general contractors using moving weights of cost categories.
J. Manage. Eng.,10.1061/(ASCE)0742-597X(2005)21:4(164),
164172.
Pearson, K. (1901). On lines and planes of closest fit to systems of points
in space.Philos. Mag., 2(6), 559572.
R Development Core Team. (2011). R: A language and environment for
statistical computing.Rep. Prepared for the R Foundation for
Statistical Computing, Vienna, Austria.
Regonda, S, Rajagopalan, B, and Clark, M. (2006). A new method to
produce categorical streamflow forecasts.Water Resour. Res.,
42(9), W09501.
Rosenthal, R. (1991). Meta-analytic procedures for social research, Sage,
Newbury Park, CA.
Rozenfeld, O., Sacks, R., Rosenfeld, Y., and Baum, H. (2010). Construc-
tion job safety analysis.J. Saf. Sci., 48(4), 491498.
Saitta, S., Kripakaran, P., Raphael, B., and Smith, I. F. C. (2008). Improv-
ing system identification using clustering.J. Comput. Civ. Eng.,
10.1061/(ASCE)0887-3801(2008)22:5(292), 292302.
Salas, J. D., Fu, C., and Rajagopalan, B. (2011). Long-range forecasting
of Colorado streamflows based on hydrologic, atmospheric, and
oceanic data.J. Hydrol. Eng.,10.1061/(ASCE)HE.1943-5584
.0000343, 508520.
Siegel, S., and Castellan, N. J., Jr. (1988). Nonparametric statistics: For the
behavioral sciences, McGrawHill, New York.
SPSS 17 [Computer software]. Cary, NC, SAS Institute.
Stats Package. (2013). “’prcompfunction.http://goo.gl/3PqhvM(Feb.
8, 2015).
Tam, C. M., and Fung, I. W. H. (1998). Effectiveness of safety manage-
ment strategies on safety performance in Hong Kong.J. Constr.
Manage. Econ., 16(1), 4955.
Tang, J. C. S., Karasudhi,P., and Tachopiyagoon, P. (1990). Thai construc-
tion industry: Demand and projection.J. Constr. Manage. Econ., 8(3),
249257.
Velicer, W. F., and Jackson, D. N. (1990). Component analysis versus
common factor-analysis: Some further observations.Multivariate
Behav. Res., 25(1), 97114.
Vogt, W. P. (1999). Dictionary of statistics and methodology: A
non-technical guide for the social sciences, 2nd Ed., Sage, Thousand
Oaks, CA.
von Storch, H., and Zwiers, F. W. (1999). Statistical analysis in climate
research, Cambridge University Press, Cambridge, U.K.
Wilks, D. S. (1995). Statistical methods in the atmospheric sciences,
Academic, New York.
Wold, S., Esbensen, K., and Geladi, P. (1987). Principal component
analysis.Chemom. Intell. Lab. Syst., 2(13), 3752.
Yee, T. W., and Wild, C. J. (1996). Vector generalized additive models.
J. Roy. Stat. Soc. B, 58, 481493.
Zohar, D. (1980). Safety climate in industrial organizations: Theoretical
and applied implications.J. Appl. Psychol., 12, 7885.
© ASCE 04015022-11 J. Constr. Eng. Manage.
J. Constr. Eng. Manage.
Downloaded from ascelibrary.org by University of Nebraska-Lincoln on 03/31/15. Copyright ASCE. For personal use only; all rights reserved.
... In 2015 B. Esmaeili et al. [36] investigated the validity of applying these fundamental risk factors to predict safety outcomes. The modeling technique consists of two steps: (1) doing principle components analysis to minimize dimensionality and (2) using principal components as generalized linear models to describe the probability of various risk categories. ...
... The proposed Multihead-CNN obtains a high accuracy of 99.16% with a small error rate with 50 training epochs in the identification of risks. The proposed multi-head CNN network has been compared with existing techniques such as SVM [34], MLP [35], GLM [36], and CNN [37] using specific parameters such as accuracy, specificity, and sensitivity which are shown in Fig 6. From the figure, it is clear that the proposed method achieves better accuracy than existing techniques. ...
... From the figure, it is clear that the proposed method achieves better accuracy than existing techniques. The proposed multi-head CNN network has been compared with existing techniques such as SVM [34], MLP [35], GLM [36], and CNN [37] using specific parameters such as accuracy, specificity, and sensitivity which are shown in Fig. 6. From the figure, it is clear that the proposed method achieves better accuracy than existing techniques. ...
Article
Full-text available
Agile methodology for software development has been in vogue for a few decades, notably among small and medium enterprises. The omission of an explicit risk identification approach turns a blind eye to a range of perilous risks, thus dumping the management into strenuous situations and precipitating dreadful issues at the crucial stages of the project. To overcome this drawback a novel Agile Software Risk Identification using Deep learning (ASRI-DL) approach has been proposed that uses a deep learning technique along with the closed fishbowl strategy, thus assisting the team in finding the risks by molding them to think from diverse perspectives, enhancing wider areas of risk coverage. The proposed technique uses a multi-head Convolutional Neural Network (Multihead-CNN) method for classifying the risk into 11 classes such as over-doing, under-doing, mistakes, concept risks, changes, differences, difficulties, dependency, conflicts, issues, and challenges in terms of producing a higher number of risks concerning score, criticality, and uniqueness of the risk ideas. The descriptive statistics further demonstrate that the participation and risk coverage of the individuals in the proposed methodology exceeded the other two as a result of applying the closed fishbowl strategy and making use of the risk identification aid. The proposed method has been compared with existing techniques such as Support Vector Machine (SVM), Multi-Layer Perceptron (MLP), Generalized Linear Models (GLM), and CNN using specific parameters such as accuracy, specificity, and sensitivity. Experimental findings show that the proposed ASRI-DL technique achieves a classification accuracy of 99.16% with a small error rate with 50 training epochs respectively.
... Due to the importance of project scheduling in safety issues, this section was discussed separately. Over the past two decades, added attention has been paid to integrating project schedules and safety plans, as such an approach can identify high-risk periods and minimise the risk of potential hazards (Hallowell et al., 2011;Yi and Langford, 2006;Esmaeili et al., 2015). The following are some safety studies in the project schedule. ...
... They argued that integrating safety risks and project scheduling data was necessary and played an influential role in accident prevention (Hallowell et al., 2011). Previous studies have emphasised project schedules to include time information in the safety planning process (Hallowell et al., 2011;Yi and Langford, 2006;Esmaeili et al., 2015). However, project plans are often updated, and there are few studies on how to update safety information during such updates dynamically. ...
Article
Identifying the appropriate safety methodology is essential to improving construction safety performance. This study aims to investigate safety development methodologies in the construction industry and identify gaps in the studies. Articles published from 2000 to 2022 were reviewed. Seventy-seven eligible articles were selected based on comprehensive and exclusive criteria. After obtaining selected literature, gaps in using these methodologies were discussed. Twelve criteria were used to compare safety methodologies. The selected literature focused more on the construction phase and did not provide an effective strategy in the project planning phase. Although the studies had specific benefits, none examined the safety program based on actual project conditions (resource, time, and cost constraints). There is a need for a model that examines safety in terms of actual project conditions (time, cost, and resource constraints). In addition, the model must optimise not only safety but also other vital components of the project (cost, time, and quality) while considering resource constraints (especially equipment constraints). If such a model is designed, the project team will not resist safety changes, which benefits all the construction stakeholders.
... In recent years, researchers have proposed a binary model to classify the severity of accidents. The earliest model was developed by Esmaeili et al. (2015) (which they called logistic regression), to predict the severity of accidents, being categorised as either 'fatal' or 'non-fatal'. Similarly, a more recent study on construction accident classification was conducted by Ayhan and Tokdemir (2020), who classified accidents into two categories titled 'low-severity' and 'high-severity'. ...
... Still, it is unsuitable for capturing complex variable relationships or interactions. Additionally, linear regression is sensitive to outliers, which can significantly affect model parameters and predictions [77]. Therefore, LiR is not ideal for nonlinear relationships. ...
Article
Full-text available
Machine learning, a key thruster of Construction 4.0, has seen exponential publication growth in the last ten years. Many studies have identified ML as the future, but few have critically examined the applications and limitations of various algorithms in construction management. Therefore, this article comprehensively reviewed the top 100 articles from 2018 to 2023 about ML algorithms applied in construction risk management, provided their strengths and limitations, and identified areas for improvement. The study found that integrating various data sources, including historical project data, environmental factors, and stakeholder information, has become a common trend in construction risk. However, the challenges associated with the need for extensive and high-quality datasets, models’ interpretability, and construction projects’ dynamic nature pose significant barriers. The recommendations presented in this paper can facilitate interdisciplinary collaboration between traditional construction and machine learning, thereby enhancing the development of specialized algorithms for real-world projects.
Book
Climatology is, to a large degree, the study of the statistics of our climate. The powerful tools of mathematical statistics therefore find wide application in climatological research. The purpose of this book is to help the climatologist understand the basic precepts of the statistician's art and to provide some of the background needed to apply statistical methodology correctly and usefully. The book is self contained: introductory material, standard advanced techniques, and the specialised techniques used specifically by climatologists are all contained within this one source. There are a wealth of real-world examples drawn from the climate literature to demonstrate the need, power and pitfalls of statistical analysis in climate research. Suitable for graduate courses on statistics for climatic, atmospheric and oceanic science, this book will also be valuable as a reference source for researchers in climatology, meteorology, atmospheric science, and oceanography.
Article
Vector smoothing is used to extend the class of generalized additive models in a very natural way to include a class of multivariate regression models. The resulting models are called ‘vector generalized additive models‘. The class of models for which the methodology gives generalized additive extensions includes the multiple logistic regression model for nominal responses, the continuation ratio model and the proportional and non‐proportional odds models for ordinal responses, and the bivariate probit and bivariate logistic models for correlated binary responses. They may also be applied to generalized estimating equations.