Content uploaded by Henry Burton
Author content
All content in this area was uploaded by Henry Burton on Jan 28, 2022
Content may be subject to copyright.
Data Paper
Earthquake Spectra
1–21
ÓThe Author(s) 2022
Article reuse guidelines:
sagepub.com/journals-permissions
DOI: 10.1177/87552930211061167
journals.sagepub.com/home/eqs
A relational database to
support post-earthquake
building damage and recovery
assessment
Morolake Omoya, M.EERI
1
,ItohanEro
1
, Mohsen Zaker
Esteghamati, M.EERI
2
, Henry V Burton, M.EERI
1
,
Scott Brandenberg
1
, Han Sun
1
, Zhengxiang Yi
1
,
Hua Kang
1
, and Chukuebuka C Nweke
1
Abstract
Systematically collected and curated data sets from historical events provide a strong
basis for simulating the physical and functional effects of natural hazards on the built
environment. This article develops a relational database to support post-earthquake
damage and recovery modeling of building portfolios. The current version of the
database has been populated with information on the 3695 buildings affected by the
2014 South Napa, California, earthquake. The associated data categories include gen-
eral building characteristics, site properties and shaking intensities, building damage
and repair permitting (timing and type) information, and census-block-level sociode-
mographics. The Napa data set can be used to validate post-earthquake recovery
simulation methodologies and explore the effectiveness of different modeling tech-
niques in predicting damage. The database can be expanded to include other earth-
quakes and the overall framework can be adapted to other types of natural hazards
(e.g. hurricanes, flooding).
Keywords
Relational database, post-earthquake damage and recovery assessment, 2014 South
Napa earthquake
Date received: 1 February 2021; accepted: 28 October 2021
1
Department of Civil & Environmental Engineering, The University of California, Los Angeles, Los Angeles, CA, USA
2
Department of Civil and Environmental Engineering, Virginia Polytechnic Institute and State University, Blacksburg, VA,
USA
Corresponding author:
Morolake Omoya, Department of Civil & Environmental Engineering, The University of California, Los Angeles, Los
Angeles, CA 90095, USA.
Email: morolake@ucla.edu
Introduction
Post-earthquake damage and recovery assessment of building portfolios is essential to seis-
mic risk mitigation and resilience planning. Developing prediction models to estimate the
damage to buildings is a necessary step in quantifying the socioeconomic impacts of an
earthquake. For example, the probable state of damage to a building conditioned on the
ground shaking intensity at its site is a key input into regional seismic loss estimation mod-
els. Damage assessments are also useful for quantifying the societal cost of earthquakes
such as the loss of housing and essential services (e.g. education, healthcare, emergency
operations) and the need for temporary building facilities (e.g. temporary shelter). The
current focus on seismic resilience has also underscored the importance of post-earthquake
recovery models, which are useful for quantifying the initial (immediately following an
event) and cumulative (during the period following an event) loss of functionality in build-
ing portfolios.
Like many other types of engineering models, the ones used to assess post-earthquake
damage and recovery rely on empirical data from buildings that have been subjected to
historical seismic events. The HAZUS methodology (Federal Emergency Management
Agency (FEMA), 2003a), which represents the current state of the art in regional earth-
quake impact assessment, utilizes a hybrid or semi-empirical model for estimating building
damage. More specifically, while the relationship between ground shaking and building
response is based on simplified engineering models (i.e. the advanced engineering building
module), the damage conditioned on the response (i.e. via fragility functions) is largely
informed by empirical data. The second-generation performance-based earthquake engi-
neering (PBEE) framework (FEMA, 2012), which is increasingly being used for regional
seismic impact assessments, utilizes more sophisticated (relative to HAZUS) mechanics-
based simulations to quantify structural response. However, similar to HAZUS, the fragi-
lity functions that link the engineering demand parameters from response history analysis
to component damage are informed by empirical data. In addition, researchers have begun
to explore the use of machine learning models to predict earthquake-induced building
damage (e.g. Mangalathu and Burton, 2019; Mangalathu et al., 2020). Unlike the HAZUS
and PBEE methods, these models are exclusively reliant on empirical data.
Because of the complex dynamic interactions that they attempt to simulate, mathemati-
cal models of post-earthquake recovery (building-specific or portfolio) are heavily reliant
on empirical data. One type of input to such models is the variables that capture the dura-
tion of recovery-related activities (e.g., time to inspection, time to obtain financing and any
necessary permits, repair time), which are inescapably linked to empirical observations. As
a specific example from structural engineering practice, the resilience-based earthquake
design initiative (REDi) (Almufti and Willford, 2013) framework is used by structural engi-
neering practitioners to estimate building and portfolio-scale post-earthquake recovery.
The framework specifies probability distributions for the times associated with so-called
‘‘impeding factors’’ (e.g., time to inspection, permit and finance acquisition, and engineer-
ing and contractor mobilization) that are based on empirical data. In the research litera-
ture, empirical data have been used to evaluate (Kang et al., 2018) and calibrate (Miles and
Chang, 2007) post-earthquake recovery models. Besides the duration-based variables,
sociodemographic data for the affected population have been used as inputs into post-
earthquake recovery models that seek to capture stakeholder decision-making (Burton
et al., 2019) and their influence on building recovery trajectories (Kang et al., 2018).
2Earthquake Spectra 00(0)
This article presents a relational database (RDB) to support post-earthquake damage
and recovery modeling for building portfolios and includes site characteristics, damage,
recovery, and sociodemographic data for buildings affected by the 2014 South Napa
Earthquake. The database is publicly available (Omoya et al., 2021) and is hosted on the
DesignSafe cyberinfrastructure (Rathje et al., 2017). The structure of the database is such
that it can be expanded in the future to include other events. The subsequent section pro-
vides a high-level summary of the information contained and sources used to develop the
data set. A detailed description of the data is then presented, followed by a discussion of
the structure of the RDB, including the schema, a description of the tables, attributes, and
relationships. The tools that have been developed to facilitate querying the data are also
presented along with some illustrative examples. Specific examples of prior and possible
future applications and extensions of the data set are also discussed.
Summary and sources used to develop the database
The data set includes information that is relevant to building damage and recovery assess-
ments. The assembled data are from the 2014 South Napa earthquake. A high-level sum-
mary of the data set is presented in Table 1, which includes the relevant sources, categories,
and number of observations for each data type. The data categories for this event include
sociodemographic data, general information about each building and its respective site and
shaking intensity, and damage and recovery data that are specific to that event. Prior
empirical research has demonstrated the influence of sociodemographics on the pace and
effectiveness of disaster recovery (e.g., Elliott, 2015; Kang et al., 2018; Zhang and Peacock,
2009). All of the demographic information in Table 1 was obtained from census data
(United States Census, 2020). The general building (e.g., number of stories, construction
year, building value) and site (location) information are relevant to both post-earthquake
damage and recovery assessments. The time-average shear wave velocity to a 30 m depth
(VS30) for each site, which is an important parameter in ground motion models (e.g., Boore
et al., 2014), was obtained indirectly based on correlations with simplified geologic units
(Wills et al., 2015). VS30 has also been used as a feature in machine learning–based building
damage prediction models (e.g., Mangalathu et al., 2020). The site class for each building
location is determined using the United States Geological Survey (USGS, 2020) applica-
tion program interface based on the VS30 and criteria suggested by FEMA 450-1 (FEMA,
2003b). A study by Boatwright et al. (2015) found that the extent of building damage dur-
ing the 2014 South Napa earthquake was strongly correlated with the underlying sedimen-
tary basin. For this reason, three basin depth parameters are provided at each site,
including the vertical distance from the ground surface to shear wave velocity isosurfaces
corresponding to 1.0 km/s (z1:0), 1.5 km/s (z1:5), and 2.5 km/s (z2:5) (Brocher et al., 2006).
The shaking intensity at each site in terms of the spectral acceleration at 0.3 s (Sa0:3s) and
peak ground acceleration (pga) are obtained through interpolation using the kriging algo-
rithm (Mangalathu et al., 2020) and 381 pairs of horizontal ground motion component
recordings from the 2014 earthquake (Center for Engineering Strong Motion Data
(CESMD), 2018). The details of the in-person assessments (i.e., inspection date, damage
descriptions, ATC-20 tag) performed by building professional volunteers were obtained
from the Earthquake Engineering Research Institute (EERI, 2016) clearinghouse website.
These data can be used to develop and/or validate the accuracy of different types of dam-
age assessment models (e.g., Mangalathu and Burton, 2019; Mangalathu et al., 2020) or as
inputs (direct or indirect) into post-earthquake recovery models (e.g., Kang et al., 2018).
The permit information acquired from the Napa Building Department website (City of
Omoya et al. 3
Napa, 2020b) can be used to obtain timestamps for the start and completion of specific
recovery activities (i.e. permitting and repair). The associated activity durations can be
used to reconstruct ‘‘observed’’ recovery trajectories or to calibrate simulation models (e.g.
Kang et al., 2018).
Table 1. Summary of the assembled data set and sources for the 2014 South Napa earthquake
Data category Data type Number of
data points
Data source
Sociodemographics Percentage of English-speaking
households
3695 US Census Bureau
Percentage of 25 years and older
population with high-school
diploma
3695
Percentage of population that is
Hispanic or Latino
3695
Percentage of population that is
Black or African American
3695
Percentage of population that is
Asian
3695
Percentage of households without
children below 18 years of age
3695
Per capita income 3695
Percentage of owner-occupied
housing units
3695
General Building
Information
Number of stories 3559 Napa County Tax
AssessorFloor area 3575
Number of units 2909
Construction year 2862
Building value 3507
Number of occupants -
Occupancy type 3030
Construction type (material) -
Lateral force resisting system -
Site Information Latitude and Longitude 3695 EERI Clearinghouse
City 3695
Basin Depth Parameters 3695 Brocher et al. (2006)
Time-averaged shear wave velocity
to 30 m depth
3695 Wills et al. (2015)
Site Class 3695 FEMA (2003b)
2014 South Napa
Earthquake Shaking
Intensity Information
Joyner–Boore distance 3695 Mangalathu et al.
(2020)Spectral acceleration at a period of
0.3 s
3695
Peak ground acceleration 3695
2014 South Napa
Earthquake Building
Damage Information
Observed damage description 3441 EERI Clearinghouse
Inspection date 3695
ATC-20 tag 3695
2014 South Napa
Earthquake Building
Permit Information
Permit description 1012 Napa County Building
DepartmentPermit type 912
Date of application for permit 3677
Date of permit approval 1062
Date of completion for permit-
related work
736
EERI: Earthquake Engineering Research Institute; FEMA: Federal Emergency Management Agency; ATC: Applied
Technology Council.
4Earthquake Spectra 00(0)
The database includes fields for the number of occupants in each building, the construc-
tion type, and lateral force resisting system in each building. The data associated with these
fields are currently unavailable and therefore not included in the present version of the
database. However, the expectation is that these three critical pieces of information will
likely become available and be added to the database in the future.
Description of data
General information and site characteristics
Figure 1 presents a map of the Napa region showing the locations of the buildings included
in the data set, which also correspond to the ones inspected and tagged (per Applied
Technology Council (ATC)-20) (ATC, 1995) after the 2014 earthquake. The building mar-
kers are color-coded based on the year of construction range. Approximately 35% of the
buildings are pre-1950 construction and roughly 15% are more than a century old. Only
4.2% have been constructed since the year 2000. The majority of buildings in the data set
are single-family residences (67%), with the remainder being equally split between multi-
family residences and commercial spaces, respectively. This, in part, explains why 69% of
the buildings in the data set have a floor area that is less than 5000 ft
2
. According to the
Napa County Tax Assessors website (City of Napa, 2020a), the average property value for
single-family residential, multifamily residential, and commercial buildings is $450,000,
$1.1 million, and $3.6 million, respectively. The number of stories ranges from one to four
with 96% of the buildings having one or two stories.
The VS30 values for the Napa building sites range from approximately 176–519 m/s with
an average of 319 m/s. Most (72%) buildings are located on Site Class D (stiff) soil and the
Figure 1. Map showing the locations of buildings in the Napa data set.
Omoya et al. 5
remainder are on Site Class C (dense soil/soft rock) soil. Only one building in the inventory
is located on Site Class E (liquefiable, soft) soil. The average values for the three basin
depth parameters are z
1.0
= 527 m, z
1.5
= 1092 m, and z
2.5
= 1303 m. These values suggest
that Napa Valley resides on shallower sediment deposits compared with adjacent regions
such as the central valley (Sacramento, Fresno) (Brocher et al., 2006; Chen and Lee, 2017).
This range of site conditions is expected to result in a complex seismic site response where
amplification and de-amplification can lead to significant spatial variability of shaking
intensity that would also depend on the magnitude and location of the earthquake event
(Boore et al., 2014; Seyhan and Stewart, 2014).
Sociodemographics
While sociodemographic factors do not directly affect post-earthquake building recovery,
unequal processes that discriminate based on these factors can affect recovery outcomes.
Based on the availability of the relevant information and a review of prior studies on the
factors that have been shown to be correlated with the pace of disaster recovery (e.g.
Elliott, 2015; Kang et al., 2018; Zhang and Peacock, 2009), the following sociodemo-
graphic variables are included in the Napa data set:
Percentage of households where English is spoken (HHEng).
Percentage of the population that is 25 years and older that have at least a high-
school diploma (PN.25+HS ).
Percentage of the total population that is Hispanic or Latino (PNHisp).
Percentage of the total population that is Black or African American (PNBlack).
Percentage of the total population that is Asian (PNAsian).
Percentage of households without individuals below the age of 18 years (HH6¼\18).
Household earnings in the past 12 months (IncAnn).
Percentage of housing units that are owner occupied (HU%Own).
Histograms showing the distribution of the sociodemographic factors at the census
block level (Manson et al., 2017) are shown in Figure 2. Histograms are not included for
PNBlack and PNAsian because these two demographics are not well represented in Napa
County. More specifically, the population of Asian and African Americans in Napa
County is approximately 0.7% and 3%, respectively (United States Census, 2020).
Shaking intensity and building damage from the 2014 South Napa earthquake
The spatial distribution of the ATC-20 tags (red, yellow, and green) assigned to each
building and the epicenter of the 2014 earthquake is shown in Figure 3. While most of the
severe damage appears to be concentrated in the downtown area (as evidenced by the clus-
ter of red tags), buildings as far west as the Browns Valley District and as far east as the
Shurtleff neighborhood (as evidenced by the widespread presence of yellow tags) were
affected. Only 5.4% of the inspected buildings received red tags and the percentage of
green (46.8%) and yellow (47.8%) tags are approximately equal. The documented descrip-
tions during the field inspections indicated that most of the yellow- and red-tagged build-
ings suffered some form of chimney damage. Although less prevalent, there was also
damage to super-structure and foundation walls (e.g. cripple walls) and fireplaces and a
few buildings suffered partial or total porch collapses.
6Earthquake Spectra 00(0)
Recall that the Sa0:3sand pga at each site are determined by applying the kriging algo-
rithm using the strong motion recordings from 381 sites. Figure 4 shows a histogram with
the Sa0:3sdistribution with each bin disaggregated based on the percentage of buildings
assigned red, yellow, and green tags. It is evident that the severity of damage generally
increases with shaking intensity. More specifically, Figure 4 shows that as Sa0:3sincreases,
the percentage of red and yellow tags dominate while the opposite is true for green-tagged
buildings.
Permitting and recovery information
Figure 5 shows a map with the buildings whose permit and repair times are included in
the Napa data set with the markers color-coded to reflect the relative time values
Figure 2. Sociodemographic distribution: (a) HHEng, (b) PN.25 +HS , (c) PNHisp , (d) HH6¼\18, (e) IncAnn , and
(f) HU%Own.
Omoya et al. 7
associated with each variable. The permit time is computed as the number of days from
the date that the building is inspected (also included in the data set) to the date that the
permit was approved. The number of days between the permit and repair approval (as
documented by the building department) dates is taken as the repair time. The buildings
that have been permitted (880) outnumber those with ‘‘officially’’ (i.e. certified by the
building department) completed repairs (672). This is because it is fairly common for own-
ers to perform the repairs outlined in the permit without pursuing the final signoff from
the city. A much less common scenario is when the necessary permits are obtained but the
repairs are never performed or are done in a manner that is inconsistent with the permit.
Figure 3. Map showing the distribution of ATC-20 tags for buildings in the Napa data set.
Figure 4. Histogram showing the relationship between the ATC-20 tags and shaking intensity (Sa0:3s)
for buildings in the Napa data set.
8Earthquake Spectra 00(0)
In general, longer permit and repair times are correlated with more severe damage. The
anecdotal evidence for this observation is the darker red colors (longer times) shown in
the downtown area in Figure 5a and b. Figure 6 shows a histogram of the recovery time
with the bins disaggregated based on the percentage of each ATC-20 tag. The recovery
time is taken as the sum of the inspection, permit, and repair times. Further evidence of
Figure 5. Map showing the distribution of (a) permit and (b) repair times for 880 and 672 buildings,
respectively, in the Napa data set.
Figure 6. Histogram showing the relationship between the ATC-20 tags and recovery time for
buildings in the Napa data set.
Omoya et al. 9
the positive correlation between the severity of damage and the permit and repair time is
provided in Figure 6. More specifically, as the recovery time increases, the percentage of
red- and yellow-tagged buildings in each bin increases. On the contrary, the opposite is
true for green-tagged buildings. In addition, the mean recovery time for green-, yellow-,
and red-tagged buildings is 197, 212, and 308 days, respectively.
RDB structure
Background on RDBs
An open-source RDB is developed using MySQL (2020) to support post-earthquake dam-
age and recovery modeling for building portfolios. RDBs are a better alternative to spread-
sheets because they store and organize data more efficiently, have better visualization
features, and it is easier to link interrelated fields. RDBs are especially suitable for incor-
porating multiple data sources, reducing data redundancy, increasing user efficiency, and
facilitating data visualization.
A well-structured database containing a large number of buildings affected by multiple
earthquakes would provide a wealth of information that can be used for modeling, risk
management, and resilience planning (Zaker Esteghamati et al., 2020). The RDB developed
as part of the current study represents the initial step toward the creation of such a resource
with the inclusion of 3695 buildings affected by a single (the 2014 South Napa) earthquake.
Schema
The structure of the database is defined by a series of interconnected tables that are linked
through shared fields called keys. A single table consists of a combination of keys and attri-
butes. A primary key is a unique identifier for each table, while a foreign key acts as a link
between tables. The primary key is referenced by the foreign key of another table to estab-
lish a relationship between tables. An attribute is the description of table entities that takes
on a type such as character, integer, or date. The structure of the database, or schema,isa
combination of tables, attributes, and relationships between tables.
The structure of the database developed for the current study is shown in Figure 7
where the primary and foreign keys are highlighted in green and blue, respectively. The
schema was developed through an iterative process to ensure efficiency and ease of access
(through querying). Altogether, there are 45 attributes (variables) included in the database
and they are grouped into five categories: (1) building and site properties, (2) damage
assessment information, (3) recovery assessment information, (4) census-block-level demo-
graphics, and (5) earthquake properties.
Description of tables, attributes, and relationships
The attributes of each table are summarized in Tables 2 to 11. The attribute data types
include integers (INT), dates (DATE), and decimals with the precision and scale specified
(NUMERIC(P, S)). Three types of strings that differ based on their maximum length
(VARCHAR, MEDIUMTEXT, and LONGTEXT) are included as datatypes.
The building and site properties category contains the site information table (Table 2)
which documents the location and soil properties unique to each building, the occupancy
type (Table 4), the construction type (Table 5), and the general building information
10 Earthquake Spectra 00(0)
Figure 7. Relational database schema.
Omoya et al. 11
(Table 3). The building table includes foreign keys from the site, occupancy, and construc-
tion type tables to reflect the one-to-many relationships (i.e. each occupancy/site/construc-
tion type is associated with many buildings). The sociodemography table is also a foreign
key in the building table because a single census-block-level sociodemographic variable
value can be associated with multiple buildings.
Table 2. Building table (Building)
Attribute Abbreviated attribute name Data type Examples
Building ID BuildingID INT 1,2,3, etc.
Number of stories NoStories INT 1,2,3, etc.
Floor area FlArea INT 20500, etc.
Number of units NoUnits INT 1,2,3, etc.
Construction year ConstrYear INT 1,2,3, etc.
Building value BuildVal INT 1,2,3, etc.
Site ID SiteID INT 1,2,3, etc.
Occupancy ID OccuID INT 1,2,3, etc.
Construction type ID ConstTypeID INT 1,2,3, etc.
Sociodemography ID SociodemID INT 1,2,3, etc.
Green shaded rows represent primary keys and blue shaded rows signify foreign keys.
Table 3. Site information table (SiteInfo)
Attribute Abbreviated attribute name Data type Examples
Site ID SiteID INT 1,2,3, etc.
Longitude Longitude DECIMAL (15,10) 122.12
Latitude Latitude DECIMAL (15,10) 238.123
City City VARCHAR (45) Napa, etc.
Vs
30
VS30 DECIMAL (15,10) 1.234, etc.
Basin depth Z1.0 Basin_Depth _Z1.0 DECIMAL (15,10) 1.234, etc.
Basin depth Z1.5 Basin_Depth _Z1.5 DECIMAL (15,10) 1.234, etc.
Basin depth Z2.5 Basin_Depth _Z2.5 DECIMAL (15,10) 1.234, etc.
Site class SiteClass VARCHAR(1) B, C, etc.
Green shaded rows represent primary keys.
Table 4. Occupancy table (Occupancy)
Attribute Abbreviated attribute name Data type Examples
Occupancy ID OccuID INT 1,2,3, etc.
Number of occupants NoOccu INT 1,2,3, etc.
Occupancy type OccuType VARCHAR(45) Commercial, office, etc.
Green shaded rows represent primary keys.
Table 5. Construction type table (ConstType)
Attribute Abbreviated
attribute name
Data type Examples
Construction type ID ConstTypeID INT 1,2,3, etc.
Construction type-material ConstTypeMat VARCHAR(45) Wood, steel, etc.
Lateral force resisting system LFRD MEDIUMTEXT Braced moment frame, etc.
Green shaded rows represent primary keys.
12 Earthquake Spectra 00(0)
The damage assessment information category contains the shaking intensity (Table 6)
and inspection information (Table 7) tables. The latter includes the observed damage
description, the date of inspection, and the ATC-tag. The shaking intensity table includes
the Joyner–Boore distance, Sa0:3s, and pga. These tables are unique to a specific building
and earthquake, hence the need for the associated foreign keys.
The recovery assessment category includes the observed recovery table (Table 8) which
documents the inspection, permit, and repair times. The permit information table (Table
9), which includes the permit description types and recovery-related dates, also falls under
the recovery assessment category. The permit information is unique to a given building
and is therefore linked through a foreign key. The permit information is a foreign key in
the observed recovery because the inspection, permit, and repair times are computed using
the information contained in the permit information table.
The census-block-level demographics category (Table 10) includes information about
race, education, tenure (owner or renter), age, and income of the building occupants. As
noted earlier, these variables are linked to the building table through a foreign key. The
Table 6. Shaking intensity information table (ShakeInt)
Attribute Abbreviated attribute name Data type Examples
Shaking intensity info ID ShakeIntID INT 1,2,3, etc.
Joyner–Boore distance Rjb DECIMAL (20,16) 5.67, etc.
Sa
0.3s
Sa0.3s DECIMAL (20,16) 0.654, etc.
PGA PGA DECIMAL (20,16) 0.654, etc.
Building ID BuildingID INT 1,2,3, etc.
Earthquake ID EarthquakeID INT 1,2,3, etc.
Green shaded rows represent primary keys and blue shaded rows signify foreign keys.
Table 7. Inspection information table (Inspection)
Attribute Abbreviated
attribute name
Data type Examples
Inspection ID InspID INT 1,2,3, etc.
Observed damage description DamaDesc VARCHAR(1000) Chimney damage, etc.
Inspection date InspDate DATE 12/12/2012, etc.
ATC-tag ATC_Tag VARCHAR(45) Green, Red, etc.
Building ID BuildingID INT 1,2,3, etc.
Earthquake ID EarthquakeID INT 1,2,3, etc.
Green shaded rows represent primary keys and blue shaded rows signify foreign keys.
Table 8. Observed recovery table (ObservedReco)
Attribute Abbreviated attribute name Data type Examples
Observed recovery ID ObservedRecoID INT 1,2,3, etc.
Inspection time InspTime INT 1,2,3, etc.
Permit time PermitTime INT 1,2,3, etc.
Repair time RepairTime INT 1,2,3, etc.
Permit ID PermitID INT 1,2,3, etc.
Earthquake ID EarthquakeID INT 1,2,3, etc.
Green shaded rows represent primary keys and blue shaded rows signify foreign keys.
Omoya et al. 13
earthquake properties table (Table 11) contains specific information about a given event
(e.g. magnitude, epicenter location) and EarthquakeID is a foreign key in the shaking
intensity inspection and observed recovery tables. As reflected in the one-to-many relation-
ships, each earthquake is associated with multiple intensities, inspections, and recoveries.
Illustrative examples
The RDB is made publicly available as ‘‘earthquake_recovery_db’’ through the
DesignSafe cyberinfrastructure. In this section, four example queries are published in a
Jupyter Notebook and presented to demonstrate how targeted information can be
extracted from the database in DesignSafe. Note that these are meant to serve as illustra-
tive examples and therefore do not show the full range of information covered in the
Table 10. Sociodemography table (Sociodemography)
Attribute Abbreviated attribute name Data type Examples
Sociodemography ID SociodemID INT 1,2,3, etc.
HH
Eng
English_hh DECIMAL (15,10) 85.9, etc.
PN
.25 +HS
Diploma_hh DECIMAL (15,10) 85.9, etc.
PN
Hisp
Latino_hh DECIMAL (15,10) 85.9, etc.
PN
Black
AfrAmer_hh DECIMAL (15,10) 85.9, etc.
PN
Asian
Asian_hh DECIMAL (15,10) 85.9, etc.
HH
6¼\18
Young_hh DECIMAL (15,10) 85.9, etc.
Inc
Ann
Income DECIMAL (15,10) 250,000 etc.
HU
%Own
Ownedunits DECIMAL (15,10) 85.9, etc.
Green shaded rows represent primary keys.
Table 11. Earthquake table (Earthquake)
Attribute Abbreviated attribute name Data type Examples
Earthquake ID EarthquakeID INT 1
Earthquake name EarthquakeName VARCHAR (45) 2014 Napa
Valley Earthquake
Earthquake date EarthquakeDate DATE 8/24/2014
Epicenter longitude EpiLong DECIMAL (15,10) 2122.31
Epicenter latitude EpiLat DECIMAL (15,10) 38.217
Magnitude Magnitude DECIMAL (3,2) 6.9
Green shaded rows represent primary keys.
Table 9. Permit information table (PermitInfo)
Attribute Abbreviated attribute name Data type Examples
Permit ID PermitID INT 1,2,3, etc.
Permit description Permit Desc LONGTEXT Repair Foundation
Permit type PermitType VARCHAR (45) Foundation
Date applied DateApplied DATE 12/12/2012, etc.
Date received DateRecvd DATE 12/12/2012, etc.
Date completed DateComp DATE 12/12/2012, etc.
Date expired DateExpired DATE 12/12/2012, etc.
Building ID BuildingID INT 1,2,3, etc.
Green shaded rows represent primary keys and blue shaded rows signify foreign keys.
14 Earthquake Spectra 00(0)
database. Figure 8 shows the script used to import the database for query through Jupyter
Hub on DesignSafe.
The first query is used to generate site information for the red-tagged buildings. The
associated script and a subset of the generated data are shown in Figure 9. With this type
of query, one can examine the Vs30, basin depth, and location associated with the most
severely damaged buildings.
Figure 10 shows the script used to query the observed recovery time information for all
buildings that have ATC-20 tags and the subsequent results. With the data generated by
this query, the effect of the level of damage on the various recovery time parameters
(inspection, permit, and repair) can be investigated.
The next query is used to find the shaking intensities associated with red-tagged build-
ings. The association between the extent of damage and shaking intensity can be inferred
from this type of query. The associated script and a subset of the generated data are shown
Figure 11.
The final query is used to find the repair times associated with specific sociodemo-
graphics for all buildings. The script used for the DesignSafe query and part of the
Figure 8. Jupyter Hub script used to import and query the database on DesignSafe.
Figure 9. Jupyter Hub script and results from the query seeking site properties associated with the red-
tagged buildings.
Omoya et al. 15
Figure 10. Jupyter Hub script and results from the query seeking the observed recovery information
and ATC-tags of all buildings.
Figure 11. Jupyter Hub script and results from the query seeking the shaking intensity information for
all red-tagged buildings.
16 Earthquake Spectra 00(0)
generated data are shown in Figure 12. This type of data can provide insight into dispari-
ties in the pace of recovery based on factors such as race, income, and tenure (owner or
renter occupied).
Summary of prior applications of the data set
The authors have already used different parts of the database in several studies. Kang et al.
(2018) used the time-to-permit and repair times for 456 buildings affected by the 2014
South Napa earthquake to validate and extend a post-earthquake recovery simulation
methodology. This data subset was used to establish an observed recovery trajectory for
the buildings. First, the following three building-level recovery states were defined: pre-
construction, during-construction (or ‘‘construction’’), and post-construction (or ‘‘com-
plete’’). The observed recovery trajectory was then constructed by assigning recovery levels
to each state and using the permit and repair times from the data subset. A time-based sto-
chastic process model was used to perform a ‘‘blind’’ simulation where a replication of the
observed trajectory was attempted without using the empirical data from the 2014 earth-
quake. This simulation was able to capture the overall shape of the observed trajectory,
including the sharp increase during the period immediately following the event and slower
pace of recovery in the later stages. However, the blind model also overpredicted the recov-
ery level by as much as a factor of 2.6. As expected, using the mean permit and repair times
from the empirical data significantly improved the replication of the observed recovery. To
generalize the time-based stochastic process model, the Random Forests (RF) algorithm
was used to link the time-to-permit and repair time to 12 predictors related to the level of
damage, general characteristics of the building (e.g. age, occupancy, property value), and
census-block-level sociodemographic variables (e.g. percentage of ethnic minorities).
Another subset of the database has been used to explore the effectiveness of different
machine learning techniques in predicting earthquake damage to buildings (Mangalathu
Figure 12. Jupyter Hub script and results from querying the sociodemographics and repair times for all
buildings.
Omoya et al. 17
et al., 2020). Classification models for predicting the ATC-20 tag were developing using
the discriminant analysis, k-nearest neighbors, decision trees, and RF algorithms and a
data set comprised of 2276 buildings. The features or model inputs included seismic para-
meters such as Sa0:3s,Rjb, and Vs30 as well as variables related to the building vulnerability,
including age, number of stories, the presence of plan irregularities, and the value (in dol-
lars) and total floor area of the building. For all four algorithms, the machine learning
model was trained using 70% of the data and evaluated using the remaining 30% (i.e. the
testing set). The RF algorithm had the best performance, predicting the ATC-tags in the
testing set with an overall accuracy of 66%. The skewed distribution of the tags (i.e. low
number of red tags relative to green and yellow) was the main challenge in developing the
model. For example, the recall (percentage of the actual tags that are correctly assigned)
of the RF red-tag predictions was only 13% compared with 52% and 79% for the green
and yellow tags, respectively. It was also noted that an RF model with 65% classification
accuracy was achieved using only Sa0:3s,Rjb, and the building age as the predictors.
Mangalathu and Burton (2019) also utilized a subset of the Napa data to explore the use
of text-based descriptions (the ones generated by in-person inspections) as the features in
predicting building damage. Using written descriptions from 3423 buildings, they trained a
long-term short memory (LSTM) deep learning model to predict the distribution of ATC-
20 tags. The LSTM model achieved an accuracy of 86% on the testing set. Similar to the
other study (Mangalathu et al., 2020), the lowest recall was associated with the red tag pre-
dictions (63% compared with 94% and 84% for green and yellow tags, respectively).
Possible future applications and extensions of the data set
There are several additional potential future applications of the assembled database that
can be undertaken by other researchers. All of the studies described in the previous section
are based on a single data set from the 2014 South Napa earthquake. Because of this, the
findings cannot be generalized. As such, additional studies that utilize data sets that are
diverse in terms of the scale of earthquake damage and target region are much needed.
For the study by Kang et al. (2018), the effect of lifeline damage and restoration on post-
earthquake building recovery was not considered. This again points to the need for future
studies that leverage integrated data on building and lifeline damage and recovery in simula-
tion modeling. The Kang et al. (2018) study incorporated the building permit data by using
the mean observed permit and repair time values in the recovery simulation model. This
approach significantly biases the resulting model toward a single event and therefore limits
its generality. A more systematic and balanced approach would be to utilize Bayesian infer-
encing to update the prior temporal parameters using the data from the 2014 event. The
recovery literature will also benefit from Longitudinal studies that utilize data acquired
from recurring visits to affected buildings are needed to benchmark the timestamps pro-
vided by the permit data to functional restoration. Finally, the empirical data assembled by
this study and future studies can complement or update the damage and recovery-related
recommendations that are based on expert opinion. In this regard, recovery models can be
developed using machine learning algorithms in combination with expert opinion.
The studies that developed machine learning models to predict building damage utilized
data from a single event. Despite the use of the training-testing data split in the development
of these models, the extent to which they can be generalized is questionable at best. The valid-
ity of these machine learning–based damage prediction models, which have been developed
using a single-event data set, can be investigated by evaluating them using data from a
18 Earthquake Spectra 00(0)
different event. For example, the model developed using the Napa data can be tested against
observations from the 1994 Northridge earthquake. The performance of such a model can
also be benchmarked against others that have been developed using multi-event data sets. As
a standalone data set, the Napa data can also be used to develop multimodal machine learn-
ing models, which combine features from different modalities (e.g. text, images, categorical or
continuous variables) into a single algorithm (Baltrusaitis et al., 2018).
Conclusion
A relational database to support post-earthquake building damage and recovery assess-
ment is developed using MySQL and made publicly available through the DesignSafe
cyberinfrastructure. In its current form, the database includes 3695 buildings impacted by
the 2014 South Napa, California, earthquake. The information provided in the database is
categorized into earthquake properties, building and site characteristics, damage and
recovery assessment (permitting) information, and census-block-level demographics. Most
of the buildings in the Napa data set are single- and multifamily dwellings with one or two
stories. Included in the building damage information are brief descriptions of the physical
impacts and the ATC-20 tags assigned during the in-person field inspections. The shaking
intensity at each site is documented in terms of peak ground acceleration and spectral
acceleration at a period of 0.3 s. The recovery-related data include the dates corresponding
to the completion of inspection, permitting, and repairs (based on the building department
certification). These data types have already been shown to be useful for developing or
evaluating the efficacy of post-earthquake damage and recovery assessment models.
There are several opportunities to expand and enhance the current database. One obvious
extension is to include additional data sets that comprise buildings affected by other earth-
quakes. The fidelity of future data sets relative to the current one could also be improved. For
example, in the Napa data, the repair duration is inferred from the dates associated with permit
acquisition and completion. However, in many instances, the repairs may be completed long
before the work is certified by the building department. In addition, while fields were included
for the building construction type, lateral force resisting system, and number of occupants, they
have not been populated in the current version of the Napa data set. Expansion of the data-
base to include buildings affected by multiple events will create opportunities for developing
more generalizable predictive models (damage and recovery). Also, data on the time it takes to
acquire recovery financing from different sources (e.g., loans, local and federal assistance)
would make a valuable addition to the database. It is also important to note that the set of
sociodemographic factors that are included in the data set is by no means comprehensive and
additional data should be assembled that better reflects the extensive social science literature on
this topic. Finally, while the current database can only accommodate data sets related to build-
ings affected by earthquakes, the overall structure can be adapted to consider other types of
natural hazards (e.g., hurricanes, floods, and wildfires) and infrastructure (e.g., lifelines).
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/
or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/
or publication of this article: This research is supported by National Science Foundation Award No.
1554714.
Omoya et al. 19
ORCID iDs
Mohsen Zaker Esteghamati https://orcid.org/0000-0002-2144-2938
Scott Brandenberg https://orcid.org/0000-0003-2493-592X
Chukuebuka C Nweke https://orcid.org/0000-0002-8939-571X
References
Almufti I and Willford M (2013) REDi
ä
:Resilience-Based Earthquake Design Initiative (REDi
ä
)
Rating System. London: Arup Group.
Applied Technology Council (ATC) (1995) Procedures for Post-Earthquake Building Safety
Evaluation Procedures (ATC-20). Redwood City, CA: ATC.
Baltrusaitis T, Ahuja C and Morency LP (2018) Multimodal machine learning: A survey and
taxonomy. IEEE Transactions on Pattern Analysis and Machine Intelligence 41: 423–443.
Boatwright J, Blair JL, Aagaard BT and Wallis K (2015) The distribution of red and yellow tags in
the City of Napa. Seismological Research Letters 86(2A): 361–368.
Boore DM, Stewart JP, Seyhan E and Atkinson GM (2014) NGA-West2 equations for predicting
PGA, PGV, and 5% damped PSA for shallow crustal earthquakes. Earthquake Spectra 30(3):
1057–1085.
Brocher TM, Aagaard BT, Simpson RW and Jachens RC (2006) The USGS 3D seismic velocity
model for northern California. Paper presented at the 2006 fall meeting (Abstract id S51B-1266),
American Geophysical Union (AGU), San Francisco, CA, 11–15 December.
Burton HV, Kang H, Miles SB, Nejat A and Yi Z (2019) A framework and case study for integrating
household decision-making into post-earthquake recovery models. International Journal of
Disaster Risk Reduction 37: 101167.
Center for Engineering Strong Motion Data (CESMD) (2018) Data for latest earthquakes. Available
at: https://www.strongmotioncenter.org (accessed on January 2021).
Chen P and Lee EJ (2017) UCVM 17.3.0 documentation. Available at: http://hypocenter.usc.edu/
research/ucvm/17.3.0/docs/index.html (accessed on January 2021).
City of Napa (2020a) City of Napa assessor parcel data. Available at: https://
www.countyofnapa.org/150/Assessor-Parcel-Data (accessed on January 2021).
City of Napa (2020b) City of Napa community development department. Available at: https://
etrakit.cityofnapa.org/etrakit/Search/permit.aspx (accessed on January 2021)
Earthquake Engineering Research Institute (EERI) (2016) 2014 South Napa earthquake data.
Available at: http://eqclearinghouse.org/map/2014-08-24-south-napa/ (accessed on January 2021).
Elliott JR (2015) Natural hazards and residential mobility: General patterns and racially unequal
outcomes in the United States. Social Forces 93(4): 1723–1747.
Federal Emergency Management Agency (FEMA) (2003a) Multi-Hazard Loss Estimation
Methodology—Earthquake Model: HAZUS MH-MR4 Technical Manual. Washington, DC:
Department of Homeland Security, FEMA.
Federal Emergency Management Agency (FEMA) (2003b) NHERP Recommended Provisions for
Seismic Regulations for New Buildings and Other Structures (FEMA 450-1). Washington, DC:
Department of Homeland Security, FEMA.
Federal Emergency Management Agency (FEMA) (2012) Seismic Performance Assessment of
Buildings. Redwood City, CA: Applied Technology Council (ATC).
Kang H, Burton HV and Miao H (2018) Replicating the recovery following the 2014 South Napa
earthquake using stochastic process models. Earthquake Spectra 34(3): 1247–1266.
Mangalathu S and Burton HV (2019) Deep learning-based classification of earthquake-impacted
buildings using textual damage descriptions. International Journal of Disaster Risk Reduction 36:
101111.
Mangalathu S, Sun H, Nweke CC, Yi Z and Burton HV (2020) Classifying earthquake damage to
buildings using machine learning. Earthquake Spectra 36: 183–208.
Manson S, Schroeder J, Van Riper D and Ruggles S (2017) IPUMS National Historical Geographic
Information System: Version 12.0 (Database). Minneapolis, MN: University of Minnesota, p. 39.
20 Earthquake Spectra 00(0)
Miles SB and Chang SE (2007) A simulation model of urban disaster recovery and resilience:
Implementation for the 1994 Northridge earthquake. Technical report MCEER-07-0014, 7
September. Buffalo, NY: Multidisciplinary Center for Earthquake Engineering Research,
University at Buffalo.
MySQL (2020) Open source database. Available at: https://www.mysql.com/ (accessed on January
2021).
Omoya M, Ero I, Zaker Esteghamati M, Burton HV, Brandenberg S and Nweke C (2021) Relational
Database for Post-Earthquake Damage and Recovery Assessment: 2014 South Napa Earthquake
(DesignSafe-CI). Available at: https://doi.org/10.17603/ds2-3nvj-4127 (accessed on January
2021).
Rathje EM, Dawson C, Padgett JE, Pinelli JP, Stanzione D, Adair A, Arduino P, Brandenberg SJ,
Cockerill T, Dey C, Esteva M, Haan FL, Hanlon M, Kareem A, Lowes L, Mock S and
Mosqueda G (2017) DesignSafe: New cyberinfrastructure for natural hazards engineering.
Natural Hazards Review 18(3): 06017001.
Seyhan E and Stewart JP (2014) Semi-empirical nonlinear site amplification from NGA-West2 data
and simulations. Earthquake Spectra 30(3): 1241–1256.
United States Census (2020) United States Census Bureau. Available at: https://www.census.gov/
quickfacts/napacountycalifornia (accessed on January 2021).
United States Geological Survey (USGS) (2020) The United States Geological Survey hazard maps.
Available at: https://earthquake.usgs.gov/ws/designmaps/asce7-16.html (accessed on January
2021).
Wills CJ, Gutierrez CI, Perez FG and Branum DM (2015) A next generation VS30 map for
California based on geology and topography. Bulletin of the Seismological Society of America
105(6): 3083–3091.
Zaker Esteghamati M, Lee J, Musetich M and Flint MM (2020) INSSEPT: An open-source
relational database of seismic performance estimation to aid with early design of buildings.
Earthquake Spectra 36: 2177–2197.
Zhang Y and Peacock WG (2009) Planning for housing recovery? Lessons learned from Hurricane
Andrew. Journal of the American Planning Association 76(1): 5–24.
Omoya et al. 21