The southern portion of the San Andreas fault, between Cajon Creek and
Bombay Beach has not seen a major event since 1690, and has therefore
accumulated a slip deficit of 5-6 m. The potential for this portion of
the fault to rupture in a single M7.7 event is a major component of
seismic hazard in southern California and northern Mexico. TeraShake is
a large-scale finite-difference (fourth-order) simulation of such an
event based on Olsen's Anelastic Wave Propagation Model (AWM) code, and
conducted in the context of the Southern California Earthquake Center
Community Modeling Environment (CME). The fault geometry is taken from
the 2002 USGS National Hazard Maps. The kinematic slip function is
transported and scaled from published inversions for the 2002 Denali
(M7.9) earthquake. The three-dimensional crustal structure is the SCEC
Community Velocity model. The 600km x 300km x 80km simulation domain
extends from the Ventura Basin and Tehachapi region to the north and to
Mexicali and Tijuana to the south. It includes all major population
centers in southern California, and is modeled at 200m resolution using
a rectangular, 1.8 giganode, 3000 x 1500 x 400 mesh. The simulated
duration is 200 seconds, with a temporal resolution of 0.01seconds,
maximum frequency of 0.5Hz, for a total of 20,000 time steps. The
simulation is planned to run at the San Diego Supercomputer Center
(SDSC) on 240 processors of the IBM Power4, DataStar machine. Validation
runs conducted at one sixteenth (4D) resolution have shown that this is
the optimal configuration in the trade-off between computational and I/O
demands. The full run will consume about 18,000 CPU.hours. Each time
step produces a 21.6GByte mesh snapshot of the entire ground motion
velocity vectors. A 4D wavefield containing 2,000 time steps, amounting
to 43 Tbytes of data, will be stored at SDSC. Surface data will be
archived for every time step for synthetic seismogram engineering
analysis, totaling 1 Tbyte. The data will be registered with the SCEC
Digital Library supported by the SDSC Storage Resource Broker (SRB).
Data collections will be annotated with simulation metadata, which will
allow data discovery operations on metadata-based queries. The binary
output will be described using HDF5 headers. Each file will be
fingerprinted with MD5 checksums to preserve and validate data
integrity. Data access, management and data product derivation will be
provided through a set of SRB APIs, including java, C, web service and
data grid workflow interfaces. High resolution visualizations of the
wave propagation phenomena will be produced under diverse camera views.
The surface data will be analyzed online by remote web clients plotting
synthetic seismograms. Data mining operations, spectral analysis and
data subsetting are planned as future work. The TeraShake simulation
project has provided some insights about the cyberinfrastructure needed
to advance computational geoscience, which we will discuss.