ArticlePDF Available

Seismic Event and Phase Detection Using Time-Frequency Representation and Convolutional Neural Networks

Authors:

Abstract and Figures

The availability of abundant digital seismic records and successful application of deep learning in pattern recognition and classification problems enable us to achieve a reliable earthquake detection framework. To overcome the limitations and challenges of conventional methods, which are mainly due to an incomplete set of template waveforms and low signal‐to‐noise ratio, we design a generalized model to improve discrimination between earthquake and noise recordings using a deep convolutional network (ConvNet). Exclusively based on a dataset of over 4900 earthquakes recorded over a period of 3 yrs in western Canada, a multilayer ConvNet is trained to learn general characteristics of background noise and earthquake signals in the time–frequency domain. In the next step, we train a secondary network using the wavelet transform of the major seismic arrivals to separate P from S waves and estimate their approximate arrival times. The results of validation experiments demonstrate promising performance and achieve an average accuracy of nearly 99% for both networks. To investigate the applicability of our algorithm, we apply the trained model on an independent dataset recently recorded in northeastern British Columbia (NE BC). It is found that deep‐learning‐based methods are superior to traditional techniques in detecting a higher number of seismic events at significantly less computational cost.
Content may be subject to copyright.
E
Seismic Event and Phase Detection Using
TimeFrequency Representation and
Convolutional Neural Networks
by Ramin M. H. Dokht, Honn Kao, Ryan Visser, and Brindley Smith
ABSTRACT
The availability of abundant digital seismic records and suc-
cessful application of deep learning in pattern recognition and
classification problems enable us to achieve a reliable earth-
quake detection framework. To overcome the limitations and
challenges of conventional methods, which are mainly due to
an incomplete set of template waveforms and low signal-to-
noise ratio, we design a generalized model to improve discrimi-
nation between earthquake and noise recordings using a deep
convolutional network (ConvNet). Exclusively based on a
dataset of over 4900 earthquakes recorded over a period of
3 yrs in western Canada, a multilayer ConvNet is trained to
learn general characteristics of background noise and earth-
quake signals in the timefrequency domain. In the next step,
we train a secondary network using the wavelet transform of
the major seismic arrivals to separate Pfrom Swaves and esti-
mate their approximate arrival times. The results of validation
experiments demonstrate promising performance and achieve
an average accuracy of nearly 99% for both networks. To inves-
tigate the applicability of our algorithm, we apply the trained
model on an independent dataset recently recorded in
northeastern British Columbia (NE BC). It is found that
deep-learning-based methods are superior to traditional tech-
niques in detecting a higher number of seismic events at sig-
nificantly less computational cost.
Electronic Supplement: Tables reporting the performance of
convolutional networks trained directly in the time domain,
and figures showing the accuracy of the validation set and
P- and S-wave error measurements.
INTRODUCTION
The recent increase in the rate of earthquake occurrence in
western Canada, a region with a historically low level of back-
ground seismicity, has been largely attributed to the develop-
ment of unconventional hydrocarbon resources (Horner et al.,
1994;Schultz et al., 2014;Farahbod et al., 2015;Rubinstein
and Mahani, 2015;Atkinson et al., 2016). Reliable ground-
motion analyses and seismic hazard assessments require a com-
plete earthquake catalog containing both natural and induced
earthquakes. However, most of the conventional automated
techniques fail to identify induced events of low magnitudes,
which makes characterizing and locating these events a chal-
lenging task. In comparison, manual picking of seismic events
has a higher detection rate, but the process is extremely labo-
rious and remains subjective to the analysts experience.
Among the various detection algorithms, the short-term
average/long-term average (STA/LTA) technique, which mea-
sures the signal-to-noise ratio (SNR) function, has been widely
used for detecting moderate-to-large earthquakes if a certain
triggering threshold is exceeded (Allen, 1978;Withers et al.,
1998). On the other hand, cross-correlation-based techniques
(also called template matching) have been extensively used to
identify repeating earthquakes of lower magnitudes based on sim-
ilarity measurements of the entire waveforms (Gibbons and
Ringdal, 2006;Skoumal et al., 2015;Caffagni et al., 2016).
Although less sensitive to high noise levels, template matching
is computationally intensive and its application is limited to iden-
tifying earthquakes sharing the same source region and mecha-
nism (Eisner et al., 2006).Thiscanbepartiallyremediedby
generalizing template matching using clustering of similar wave-
form fingerprints through a set of hash functions (Yoon et al.,
2015;Bergen et al., 2016), and subspace analysis of a set of rep-
resentative waveforms (Barrett and Beroza, 2014). Recent studies
take advantage of spectral information of seismic records to
improve the accuracy associated with detection of weak micro-
seismic events (Galiana-Merino et al., 2008;Va ez i a nd Va n d er
Baan, 2015;Mousavi et al., 2016), though aprioriknowledge of
noise characteristics is required.
Deep learning is a set of representation-learning algorithms
with multiple layers of nonlinear transformations (Bengio et al.,
2013;LeCun et al.,2015;Najafabadi et al., 2015). Unlike con-
ventional machine-learning techniques, which rely on carefully
hand-engineered features (Wa n g a n d T e n g , 1 9 9 5 ;Gentili and
Michelini, 2006;Maity et al.,2014), deep learning allows a
model to learn a general-purpose representation of raw data
doi: 10.1785/0220180308 Seismological Research Letters Volume XX, Number XX 2019 1
SRL Early Edition
Downloaded from https://pubs.geoscienceworld.org/ssa/srl/article-pdf/doi/10.1785/0220180308/4616095/srl-2018308.1.pdf
by Natural Resources Canada Library-Ottawa user
on 16 January 2019
using a training set (Dahl et al., 2013;Wang and Yeung,
2013). Deep representation learning has been widely applied
to several research areas such as natural language understanding
(Collobert et al., 2011;Mikolov et al.,2013), image classification
(Krizhevsky et al.,2012), and speech recognition (Dahl et al.,
2012;Hinton et al.,2012). Recent studies have investigated
the application of deep-learning techniques for earthquake
detection (e.g., Perol et al.,2018;Ross et al.,2018;Zhu and
Beroza, 2018) and seismic imaging (Araya-Polo et al., 2018;
Moseley et al., 2018).
Our research is motivated by the successful application of
deep convolutional networks (ConvNets) to overcome the lim-
itations of traditional techniques in studying induced seismic-
ity in Oklahoma, United States (Perol et al., 2018). The basic
idea is to train a ConvNet model on a large dataset of previ-
ously recorded earthquakes, so that the classifier can be gener-
alized to identify seismic events different from those used in
training. In comparison with the model proposed by Perol et al.
(2018), the current study presents additional improvements
by including both temporal and spectral information of three-
component seismograms to enhance the detection accuracy. We
divide the automatic earthquake detection process into two steps.
A pretrained model is first adopted to separate earthquakes from
nonearthquake signals in the timefrequency domain. Then, we
build up a secondary supervised classification system using
higher-resolution spectral images of earthquake records to dis-
criminate between Pand Swaves.
METHOD AND DATASET
Convolutional Neural Networks
ConvNets are feed-forward, multilayer neural networks, which
were introduced to process multidimensional arrays (LeCun
et al., 1998). A ConvNet transforms an input volume of data
features to output class probabilities through a sequence of sev-
eral hidden units consisting of convolutional, pooling, and fully
connected layers. Each convolutional layer contains a bank of
linear filters to extract local features at all parts on the previous
layer (Cireşan et al.,2012) and passes the resulting convolutional
responses through a nonlinear activation unit. ConvNets have
been found to train several times faster by mapping all negative
responses to zero using a rectified linear unit (Nair and Hinton,
2010;Glorot et al.,2011):
EQ-TARGET;temp:intralink-;df1;40;229xjmax0;b
jX
i
wijxi;1
in which bjand wij are the bias and weights of the jth neuron
passing the input from the previous layer xito the output feature
map xj, respectively. It is common to spatially downsample the
resulting activations by merging similar local features into one
(LeCun et al.,2015). The subsampling, also known as pooling,
operation can significantly reduce the number of free parame-
ters, thereby improving the performance of the network and
avoiding overfitting (Krizhevsky et al.,2012). Max pooling is the
most widely used subsampling operator; this calculates the
maximum activation over nonoverlapping local neighborhoods
(Serre et al., 2005;Cireşan et al., 2012), and thus reduces the
variability to small temporal/spatial transformations (LeCun
et al.,1990;Ya ng et al., 2010;Farabet et al.,2013). The output
feature maps, resulted from a sequence of convolutional, non-
linear activation, and pooling layers, are concatenated and passed
to a fully connected layer in which every neuron is linearly con-
nected to all activations in the previous layer. The output of the
last fullyconnected layer isfed to a normalized exponential func-
tion (softmax classifier), which calculates a probability distribu-
tion over Cdifferent possible classes (Peterson and Söderberg,
1989):
EQ-TARGET;temp:intralink-;df2;311;601pjexpxj
PC
k1expxkk1;;C: 2
The objective of the current study is to find a set of learn-
able free parameters, which minimizes the misfit between the
predicted pand ground-truth scores gof Ninstances using a
L2-regularized multinomial logistic loss function:
EQ-TARGET;temp:intralink-;df3;311;507J1
NX
N
n1
X
C
k1
gn
klogpn
kλX
i
kWik2
2:3
The regularization parameter λcontrols the trade-off
between the data misfit and model constraints, and Wiare the
model parameters (bias and weights) of the ith layer. The sol-
ution to the objective function is found using the gradient
descent technique, which calculates the updates on network
parameters at each iteration tby a linear combination of the
negative gradient of the loss function and the model update
from the previous iteration t1:
EQ-TARGET;temp:intralink-;df4;311;346ΔWtμΔWt1αJWt;4
in which the learning rate αis the weight of the negative gra-
dient, and the momentum μcontrols the network parameter
update at each iteration. For a large training set, it would be
more efficient to estimate the stochastic approximation of the
cost function by drawing a small random selection (mini-
batch) of the training set at each iteration.
Earthquake Detection
In the first step of our earthquake detection framework, we
design a ConvNet to scan continuous seismic records and sep-
arate between noise and earthquake signals. Exclusively based
on a comprehensive earthquake catalog from western Canada,
a classifier is trained to identify coherent high-power earthquake
signals between three-component seismograms. In this region,
the Geological Survey of Canada has reported 4914 earthquakes
with local magnitudes MLranging from 0.1 to 4.9 between
January 2014 and December 2016 (Fig. 1;Visser et al.,2017).
After deconvolving the instrument response from the ground
velocity records, we apply a 2 Hz high-pass filter and resample
the data to 20 Hz. The resulting waveforms are visually
2 Seismological Research Letters Volume XX, Number XX 2019
SRL Early Edition
Downloaded from https://pubs.geoscienceworld.org/ssa/srl/article-pdf/doi/10.1785/0220180308/4616095/srl-2018308.1.pdf
by Natural Resources Canada Library-Ottawa user
on 16 January 2019
inspected and low-quality data are discarded. The final dataset
contains 13,949 earthquakereceiver pairs with epicentral dis-
tances up to 275 km (Fig. 1c) and a total of 148,000 noise
records selected from time segments free of earthquake arrivals.
To avoid overfitting due to the uneven number of observations
per category, the earthquake signals are randomly repeated, time-
shifted, and contaminated with Gaussian noise.
Unlike the previous studies (Perol et al., 2018;Ross et al.,
2018), which perform the classification task in the time
domain, the current model attempts to learn the general char-
acteristics associated with the earthquake and nonearthquake
signals in the timefrequency domain. Because seismic event
and noise recordings have intrinsically different spectral con-
tents, the performance of the network can be further improved
by learning from the spectro-temporal representations of wave-
form data (Wang and Teng, 1995;Vaezi and Van der Baan,
2015). The first layer of the ConvNet is represented by spectro-
grams of 100-s-long data segments calculated using the short-
time Fourier transform (STFT; Portnoff, 1980). The spectro-
grams are computed using a moving Hanning window of 40
samples with 50% overlap between successive windows (Fig. 2).
The resulting spectrograms are then normalized by their maxi-
mum spectral values and therefore become independent of
earthquake magnitude.
The architecture of earthquake detection ConvNet con-
sists of a sequence of four convolutional layers, each followed
by a pooling layer (Table 1). The supervised classifier is trained
on a random subset consisting of 75% of spectrograms (train-
ing set) and its performance is evaluated in terms of detection
accuracy of the remaining 25% observations
(validation set). The training is run for 50,000
iterations with a trade-off parameter λ104,
an initial learning rate of 0.01 decaying to
0.005, and a fixed moment rate of 0.9. The
training process is performed using a mini-
batch of 200 samples per iteration and takes
approximately 4 hrs on an Intel Core i7
CPU (4 GHz).
Phase Identification
The goal of the next experiment is to design a
secondary classifier using the existing earthquake
catalog for separating between different body
waves. Robust measurements of phase arrivals are
achieved by analyzing the nonstationary, multi-
component seismic signals using the wavelet
transform (WT) (Zhang et al.,2003;Ahmed
et al.,2007;Galiana-Merino et al., 2008).
Compared with the STFT spectrogram, the WT
has higher temporal resolution, but it suffers
from the energy spread out along the frequency
axis. The synchrosqueezing wavelet transform
(SWT) is a new timefrequency analysis tech-
nique that combines the conventional WT with
a frequency reassignment method to enhance the
timefrequency localization (Daubechies and
Maes, 1996;Daubechies et al.,2011). Assuming that the energy
of a seismic event is concentrated in a few high-amplitude wave-
let coefficients, Mousavi et al. (2016) introduced an SWT-based
technique to simultaneously suppress nonstationary random
noise and detect onset times of weak microseismic events using
a characteristic function of the thresholded wavelet coefficients.
In this step, we train a separate network using a dataset of
high-resolution wavelet power spectra of major seismic arrivals
picked by expert analysts. The new dataset consists of 11,500
P-wave, 11,500 S-wave, and 45,000 noise windows of 5 s long
each. The windows containing both Pand Swaves are repeated
and centered on the arrival time of each phase separately. We
applied the same data augmentation strategy described in the
Earthquake Detection section to ensure that there is an equal
number of observations in each class. Then, we employed a
Morlet wavelet, product of a complex exponential and a
Gaussian envelope, as the basis function to calculate the SWT
of windowed data (Fig. 3). The final dataset is eventually
divided into the training and validation sets including time
frequency representations of waveform data normalized by the
maximum value of wavelet coefficients of all three compo-
nents. The architecture of phase identification network and its
model parameters are similar to those of earthquake detection.
However, the input layer consists of three maps of 80 ×101
neurons, and all convolutional layers have a fixed filter size
of 3×5. The pooling layers have local receptive fields of size
2×2with a constant stride of 2 in both dimensions, the first
fully connected layer has 192 neurons and the output layer is a
vector of 3 class labels (Table 2).
(a) (b)
(c)
Figure 1. (a) Distributions of earthquakes (circles) and seismic stations (white
triangles) used in this study. The warm and cold colors correspond to shallow
and deep earthquakes, respectively. The circle sizes represent the earthquake
magnitudes. (b) A histogram of event local magnitude distribution with mean and
standard deviation values of 1.8 and 0.55, respectively. (c) Distribution of source
receiver distances using a 20 km bin size.
Seismological Research Letters Volume XX, Number XX 2019 3
SRL Early Edition
Downloaded from https://pubs.geoscienceworld.org/ssa/srl/article-pdf/doi/10.1785/0220180308/4616095/srl-2018308.1.pdf
by Natural Resources Canada Library-Ottawa user
on 16 January 2019
RESULT
We evaluate the performance of the proposed
framework on validation datasets selected inde-
pendent of training sets. While the solver tries
to minimize the objective function using the
training samples (learning curves in Fig. 4a,b),
validation data are employed to measure the gen-
eralization capability of the trained network for
correct classification of new samples that have
never been considered during the training phase
(Table 3).Thelearningcurvesshowthatthe
earthquake detection algorithm rapidly converges
to the optimal solution after 20,000 iterations,
but it takes 40;000 iterations until the objec-
tive function becomes flat for the seismic phase
identification model. The neural network produ-
ces a vector of confidence scores for all class
labels, indicating the probability of a particular
input belonging to a given category. The accuracy
of correct classification is calculated every 5000
iterations and the model with the highest accu-
racy on the validation set is selected as the final
classifier (Fig. S1, available in the electronic
supplement to this article).
For earthquake detection, the network accu-
racy reaches 99.8% on the validation set and a
total number of only 40 earthquake windows are
mislabeled. This is comparable to that obtained
from the training dataset, which yields respective
accuracies of 99.7% and 99.95% for earthquake
and nonearthquake categories (see Table 3). On
the other hand, the phase identification network
results in a slightly reduced accuracy on average.
The second classifier gives an average error of
0.7% for the entire training set and predicts
Table 1
Architecture of Convolutional Network (ConvNet) Used for Seismic Event Detection
Layer Type Kernel Size Stride Output
1 Input data ——3mapsof33 ×99 neurons
2 ConvReLU 5×716 maps of 29 ×93 neurons
3 Maxpool 1×2(1, 2) 16 maps of 29 ×47 neurons
4 ConvReLU 5×516 maps of 25 ×43 neurons
5 Maxpool 1×2(1, 2) 16 maps of 25 ×22 neurons
6 ConvReLU 3×316 maps of 23 ×20 neurons
7 Maxpool 2×2(2, 2) 16 maps of 12 ×10 neurons
8 ConvReLU 3×316 maps of 10 ×8neurons
9 Maxpool 2×2(2, 2) 16 maps of 5×4neurons
10 FC —— 320 neurons
11 FC —— 2 neurons
ConvReLU, a convolutional layer followed by a rectified linear unit; Maxpool, a max pooling layer; FC, a fully connected layer.
18:40:25
18:40:45
18:41:05
18:41:25
18:41:45
18:42:20
18:42:40
18:43:00
18:43:20
18:43:40
0
5
10
0
5
10
0
5
10
0
5
10
0
5
10
0
5
10
–500 –450 –400 –350 –500 –450 –400
Frequency (Hz) Frequency (Hz) Frequency (Hz)
(a) (b)
(c)
(d)
(e)
(f)
(g)
(h)
CN.BMBC.BHE
CN.BMBC.BHN
CN.BMBC.BHZ
CN.BMBC.BHE
CN.BMBC.BHN
CN.BMBC.BHZ
18:40:25
18:40:45
18:41:05
18:41:25
18:41:45
18:42:20
18:42:40
18:43:00
18:43:20
18:43:40
CN.BMBC.BHECN.BMBC.BHE
CN.BMBC.BHNCN.BMBC.BHN
CN.BMBC.BHZ CN.BMBC.BHZ
Figure 2. Three-component waveforms of (a) a 2.4 magnitude earthquake
recorded by CN.BMBC at 164 km distance on 25 February 2014, and (b) a postevent
noise window recorded at the same station. (ce) The short-time Fourier transform
of earthquake records in (a). (fh) Same as (ce) but calculated for the noise win-
dow in (b).
4 Seismological Research Letters Volume XX, Number XX 2019
SRL Early Edition
Downloaded from https://pubs.geoscienceworld.org/ssa/srl/article-pdf/doi/10.1785/0220180308/4616095/srl-2018308.1.pdf
by Natural Resources Canada Library-Ottawa user
on 16 January 2019
correct labels for approximately 98.7% and 98.4% of P-andS-
wave windows in the validation set, respectively.
The performance of the proposed model is compared with
that obtained from a model that is trained directly in the time
domain using the same dataset.Although the difference between
the earthquake detection networks is insignificant, the total
number of misclassified samples obtained from our model is
nearly 40% of that of the time-domain classifier (see classifica-
tion accuracies in Table 3and Table S1 and confusion matri-
ces in Table 4and Table S2). However, the difference
becomes more pronounced for the phase identification process.
On average, the model trained using the timefrequency features
improves the accuracy by up to 3.2%. Although
the time-domain model misidentifies 580 of P
phases and 588 of Sphases in the validation set,
our model provides a higher recall rate and
misses only 137 and 79 of Pand Swaves, respec-
tively (see confusion matrices in Table 5and
Table S3).
The earthquake detection and phase iden-
tification results indicate that the classification
performance can be affected by the minimum
accepted confidence score (Fig. 4). We define a
hard-thresholding function, which discards
earthquakes whose scores are less than a given
probability threshold (LeCun et al., 1990). A
careful selection of probability threshold is
required because a low threshold value can
increase the number of false detections, while
higher threshold values may result in erroneous
rejection of true events. The distribution of the
number of earthquakes as a function of detec-
tion score shows a gradual increase for confi-
dence scores greater than 0.6 with a sudden
rise at a detection score of 0.95 (Fig. 4c). The
same observations are made for the phase sep-
aration results as nearly 98.5% of correctly pre-
dicted examples fall in the last probability bin
(Fig. 4d).
Induced Seismicity in Northeast British
Columbia
After successful training and validation, the
ConvNet models were employed for monitor-
ing the seismic activity in Fort St. John and near
Dawson Creek, an area with an increased num-
ber of potentially induced earthquakes in recent
years (Horner et al., 1994;British Columbia Oil
and Gas Commission, 2014;Visser et al., 2017).
The network was implemented on continuous
records from an array of nine broadband sta-
tions, deployed in August 2017, which are not
included in either training or validation set. For
each station, the daily records are first divided
into 100-s-long windows (with 50% overlap
between windows) and the STFT representa-
tions of windowed data are calculated. If the first classifier
detects a seismic event at a probability threshold of 0.95,
the SWT of the corresponding time window will be calculated
and scanned by the second network to identify Pand Swaves.
The earthquake detection step takes 5sfor a three-compo-
nent daily recording, which is 30 times faster than the STA/
LTA technique. To assess the detection performance and gen-
eralization power of the model, all detected events are visually
inspected. It turns out that only 13 events are false detections
(2% of the total number of detected events), 9 of which are
eventually eliminated by the phase identification classifier. In
comparison, the time-domain model returns 25% more false
CN.BMBC.BHE
CN.BMBC.BHN
CN.BMBC.BHZ
0
5
10
Frequency (Hz)
0
5
10
Frequency (Hz)
0
5
10
Frequency (Hz)
0
5
10
0
5
10
0
5
10
10 10
(a)
(b)
(c)
(d)
(e)
(f)
(g)
18:40:25 18:41:45 18:41:05 18:41:25
18:40:32 18:40:37 18:40:53 18:40:58
CN.BMBC.BHE
CN.BMBC.BHN
CN.BMBC.BHZ
CN.BMBC.BHE
CN.BMBC.BHN
CN.BMBC.BHZ
Figure 3. (a) Three-component waveforms of the earthquake shown in
Figure 2a. Red and blue dashed lines indicate the P- and S-wave arrival times,
respectively. (bd) The normalized synchrosqueezing wavelet transform (SWT) of
a 5-s-long window centered on the P-wave arrival. (eg) The normalized SWT of a
5-s-long window centered on the S-wave arrival.
Seismological Research Letters Volume XX, Number XX 2019 5
SRL Early Edition
Downloaded from https://pubs.geoscienceworld.org/ssa/srl/article-pdf/doi/10.1785/0220180308/4616095/srl-2018308.1.pdf
by Natural Resources Canada Library-Ottawa user
on 16 January 2019
positives than its timefrequency counterpart. The phase iden-
tification network is capable of detecting multiple P-andS-wave
arrivals if more than one event exists within a time window
(Fig. 5and Fig. S2). Our technique finds 652 events
with epicentral distances ranging from 1 to 28 km on
5 September 2017 (Fig. 6). The local magnitudes of detected
earthquakes vary from 0:5to 1.1, which remain below the
minimum magnitude in the training catalog
for nearly 30% of events.
The automatic detection results corroborate
the observations obtained from manual picking
and show significant temporal variations (Fig. 6).
The ConvNet model detects 20% more earth-
quakes than previously reported by an expert
analyst, though Pwaves are identified for only
40% of detected events. A possible explanation
for this observation includes the difficulties in
separating Pand Sphases for earthquakes
recorded at shorter distances. For the remaining
60%, the P-wave arrivals are included mostly in
the S-wave windows because the differential
times between the two phases are less than 1.5 s
(nearly 80% of detected events are recorded at
epicentral distances 10 km). It is worth noting
that the timefrequency phase classifier identifies
52 more P-wave windows than the model trained
in the time domain. The ConvNet models
trained in the timefrequency and time domains
miss 31 and 39 of the events identified by a
human analyst, respectively. A lower probability
threshold allows the network to detect smaller
events, but this may result in false detections.
DISCUSSION
Most of the existing earthquake detection tech-
niques are poorly suited for identifying low-
magnitude events, and their application in areas with no record
of seismicity is limited by the absence of template earthquake
waveforms (Yoon et al., 2015). The main goal of this research
is to present a detection framework independent of earthquake
magnitude, epicentral distance, and noise level. We propose a
method utilizing multiresolution timefrequency analysis and
advances in deep learning to achieve a robust earthquake detec-
tion. In comparison with the time-domain analysis, the
0.0
0.2
0.4
0.6
Objective value
0 10k 20k 30k 40k 50k
Iteration
100
101
102
103
104
105
106
0.5 0.6 0.7 0.8 0.9 1.0
Number of earthquakes
Earthquake detection probability
Training set
Validation set
0.0
0.4
0.8
0 10k 20k 30k 40k 50k
Iteration
0.5 0.6 0.7 0.8 0.9 1.0
101
102
103
104
105
Phase identification probability
Number of windows containing P- or S-wave
Earthquake detection
learning curve
(a) (b)
(c) (d)
Objective value
Phase identification
learning curve
Training set
Validation set
Figure 4. Learning curves of the (a) earthquake detection, and (b) phase iden-
tification networks showing the reduction in objective function value during the
training process. Distributions of correctly labeled (c) earthquakes and (d) seismic
phases as a function of detection score.
Table 2
Architecture of ConvNet Used for Seismic Phase Identification
Layer Type Kernel Size Stride Output
1 Input data ——3 maps of 80 ×101 neurons
2 ConvReLU 3×516 maps of 78 ×97 neurons
3 Maxpool 2×2(2, 2) 16 maps of 39 ×49 neurons
4 ConvReLU 3×516 maps of 37 ×45 neurons
5 Maxpool 2×2(2, 2) 16 maps of 19 ×23 neurons
6 ConvReLU 3×516 maps of 17 ×19 neurons
7 Maxpool 2×2(2, 2) 16 maps of 9×10 neurons
8 ConvReLU 3×516 maps of 7×6neurons
9 Maxpool 2×2(2, 2) 16 maps of 4×3neurons
10 FC —— 192 neurons
11 FC —— 3 neurons
ConvReLU, a convolutional layer followed by a rectified linear unit; Maxpool, a max pooling layer; FC, a fully connected layer.
6 Seismological Research Letters Volume XX, Number XX 2019
SRL Early Edition
Downloaded from https://pubs.geoscienceworld.org/ssa/srl/article-pdf/doi/10.1785/0220180308/4616095/srl-2018308.1.pdf
by Natural Resources Canada Library-Ottawa user
on 16 January 2019
application of SWT enables the network to learn the spectral
structures of noise and seismic signals, which has been found to
increase the accuracy of phase identification process by up to
3%. However, the calculation of WTs of continuous wave-
forms is computationally expensive and may not efficiently
scale to the large-array data. To reduce the processing time, the
phase identification network scans only the earthquake time
windows that are detected by the preliminary ConvNet using
the STFT-based spectrograms.
The learning ability of a neural network can vary with
changes in its structure (Wang and Teng, 1995). The sensitiv-
ity of the proposed method with respect to the network archi-
tecture was explored by (1) increasing the number of filters in
convolutional layers, and (2) replacing the pooling with strided
convolutions (Perol et al., 2018). We found no significant
variation in model performance in both cases, though the
Table 3
Classification Accuracies of the Earthquake Detection and
Seismic Phase Identification ConvNets Trained in the
TimeFrequency Domain
Training Set (%) Validation Set (%)
Earthquake detection accuracy
Seismic event 99.7 99.6
Noise 99.95 99.9
Phase identification accuracy
P-wave 99.1 98.7
S-wave 99.1 98.4
Noise 99.7 99.4
XL.MG03.HHE
XL.MG03.HHN
XL.MG03.HHZ
Detection score
0
1
0
5
10
Frequency (Hz)
0
5
10
Frequency (Hz)
0
5
10
Frequency (Hz)
P-wave
S-wave
02:26:27 02:26:47 02:27:07 02:27:27 02:27:47
02:26:27 02:26:47 02:27:07 02:27:27 02:27:47
XL.MG03.HHE
XL.MG03.HHN
XL.MG03.HHZ
(a)
(b)
(c)
(d)
(e)
0 0.2 0.4 0.6 0.8 1.0
Figure 5. (a) A 100-s-long time segment labeled as seismic event
by the earthquake detection network. The second network identi-
fies two separate events within this time segment using a 5-s-long
sliding window. Red and blue colors represent windows containing
Pand Swaves, respectively. Vertical red and blue lines mark the
P-andS-wave arrival times using their corresponding detection
scores, respectively. Dashed gray and black lines indicate the man-
ually picked Pand Swaves, respectively. (b) Continuous functions
of the P- (red) and S-wave (blue) detection scores. (ce) The SWT
of waveform records presented in (a).
Table 5
Validation Set Confusion Matrix Calculated for the Phase
Identification Network Trained Based on the
TimeFrequency Feature Maps
Predicted Labels
True labels
P-wave S-wave Noise
P-wave 11,363 90 47
S-wave 58 11,421 21
Noise 87 91 11,322
04
812
16
20
0
20
40
60
80
100 ConvNet
Manual
Number of events
Time (hour)
Figure 6. Distribution of detected earthquakes over the period
of one day, on 5 September 2017, in the Dawson Creek area.
Convolutional network (ConvNet) detects a total number of 652
events (red bars), which is 20% more than the manually picked
earthquakes (gray bars).
Table 4
Validation Set Confusion Matrix Calculated for the
Earthquake Detection Network Trained Based
on the TimeFrequency Feature Maps
Predicted Labels
True labels
Seismic event Noise
Seismic event 37,460 40
Noise 142 36,882
Seismological Research Letters Volume XX, Number XX 2019 7
SRL Early Edition
Downloaded from https://pubs.geoscienceworld.org/ssa/srl/article-pdf/doi/10.1785/0220180308/4616095/srl-2018308.1.pdf
by Natural Resources Canada Library-Ottawa user
on 16 January 2019
training time dramatically rises with the increasing number of
free parameters (the former case). To avoid overtraining, a con-
stant dropout rate of 10% was applied to the input and con-
volutional layers, which forces the network to learn more
robust features by ignoring a random subset of neurons during
the training process. An optimum dropout rate provides the best
trade-off between the training data misfit and validation set
accuracy. The near-identical performance of the proposed model
on both training and validation sets reflects the generalization
power of the model beyond the existing earthquake catalog.
In addition to phase separation, the second classifier pro-
vides a means to estimate the phase arrival times, which are
required to determine earthquake location and other source
parameters. Ross et al. (2018) suggest that the location of maxi-
mum detection score of each phase can be used to measure its
approximate travel time if multiple successive windows identify
the same phase (see Fig. 5b). The maximum difference between
the manual picks and those obtained from the detection scores
is on the order of one half of the sliding window length
(Fig. 7). The accuracy of automatic time picking is generally
affected by the SNR, and weaker signals result in larger errors
(Fig. S3). However, the implementation of the WT enables
noise reduction, which can further reduce the picking error and
possibly automate the earthquake location process (a topic
beyond the scope of this article, but we refer readers to
Mousavi et al., 2016, for more detailed discussion).
CONCLUSION
Based on recent advances in deep learning, we propose a
ConvNet algorithm for robust detection of seismic events to
address the shortcomings of existing methods. Our technique
relies on the detectability of earthquake signals in the timefre-
quency domain and takes advantage of the spectral characteris-
tics of phase arrivals to separate Pand Swaves. In comparison
with manual detection, our technique can identify 20% more
events while significantly reducing the processing time and
improving the efficiency. In addition to event detection, this
approachprovides initial estimates of phase onset
times, which can be used to determine prelimi-
nary earthquake locations. Highly improved
accuracy and reliable rejection of false detections
are achieved by joint application of the event
detection and phase identification ConvNets.
The proposed approach can be potentially uti-
lized to enhance real-time monitoring of both
natural and induced seismicity.
DATA AND RESOURCES
The regional earthquake catalog used in this
study was compiled by the Geological Survey
of Canada (http://publications.gc.ca/site/eng/
9.856883/publication.html, last accessed
September 2018). Waveform data can be collected
from the Incorporated Research Institutions for
Seismology (IRIS) Data Management Center at https://
ds.iris.edu/ds/nodes/dmc/ (last accessed September 2018). We
used Mocha, a deep-learning framework for Julia, to train the
convolutional networks (the latest version of Mocha is available
at https://mochajl.readthedocs.io/en/latest/, last accessed
September 2018). Some figures were generated using the
Generic Mapping Tools (GMT) v.5.4.2 (www.soest.hawaii.edu/
gmt, last accessed September 2018; Wessel and Smith, 1998).
ACKNOWLEDGMENTS
The authors wish to thank Guest Editor Karianne Bergen and
two anonymous reviewers for useful comments and suggestions
that helped improve the quality of this article. This is Natural
Resources Canada (NRCan) Contribution Number 20180263.
REFERENCES
Ahmed, A., M. Sharma, and A. Sharma (2007). Wavelet based automatic
phase picking algorithm for 3-component broadband seismological
data, J. Seismol. Earthq. Eng. 9, nos. 1/2, 1524.
Allen, R. V. (1978). Automatic earthquake recognition and timing from
single traces, Bull. Seismol. Soc. Am. 68, no. 5, 15211532.
Araya-Polo, M., J. Jennings, A. Adler, and T. Dahlke (2018). Deep-learn-
ing tomography, The Leading Edge 37, no. 1, 5866.
Atkinson, G. M., D. W. Eaton, H. Ghofrani, D. Walker, B. Cheadle, R.
Schultz, R. Shcherbakov, K. Tiampo, J. Gu, R. M. Harrington, et al.
(2016). Hydraulic fracturing and seismicity in the western Canada
sedimentary basin, Seismol. Res. Lett. 87, no. 3, 631647.
Barrett, S. A., and G. C. Beroza (2014). An empirical approach to sub-
space detection, Seismol. Res. Lett. 85, no. 3, 594600.
Bengio, Y., A. Courville, and P. Vincent (2013). Representation learning:
A review and new perspectives, IEEE Trans. Pattern Anal. Mach.
Intell. 35, no. 8, 17981828.
Bergen, K., C. Yoon, and G. C. Beroza (2016). Scalable similarity search
in seismology: A new approach to large-scale earthquake detection,
International Conf. on Similarity Search and Applications, Tokyo,
Japan, 2426 October 2016, 301308.
British Columbia Oil and Gas Commission (2014). Investigation of
observed seismicity in the Montney Trend, available at http://
www.bcogc.ca/investigationobservedseismicitymontneytrend (last
accessed December 2018).
(a) (b)
|tmanual - tpredicted| (s)
PP |tmanual - tpredicted| (s)
SS
0
10
20
40
50
Normalized frequency (%)
30
0
10
20
30
0 0.5 1.0 1.5 2.0 2.5 3.0 3.50 0.5 1.0 1.5 2.0 2.5 3.0 3.5
5
15
25
35
Mean = 0.6 s
Standard deviation= 0.5 s
Mean = 0.8 s
Standard deviation= 0.7 s
Figure 7. Distributions of errors between manual time picks and those predicted
from the detection scores for (a) Pand (b) Swaves. The average error in predicted
P-wave time is less than that of the S-wave.
8 Seismological Research Letters Volume XX, Number XX 2019
SRL Early Edition
Downloaded from https://pubs.geoscienceworld.org/ssa/srl/article-pdf/doi/10.1785/0220180308/4616095/srl-2018308.1.pdf
by Natural Resources Canada Library-Ottawa user
on 16 January 2019
Caffagni, E., D. W. Eaton, J. P. Jones, and M. van der Baan (2016).
Detection and analysis of microseismic events using a Matched
Filtering Algorithm (MFA), Geophys. J. Int. 206, no. 1, 644658.
Cireşan, D., U. Meier, J. Masci, and J. Schmidhuber (2012). Multi-column
deep neural network for traffic sign classification, Neural Networks
32, 333338.
Collobert, R., J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, and P.
Kuksa (2011). Natural language processing (almost) from scratch,
J. Mach. Learn. Res. 12, 24932537.
Dahl, G. E., T. N. Sainath, and G. E. Hinton (2013). Improving deep neural
networks for LVCSR using rectified linear units and dropout,
2013 IEEE International Conf. on Acoustics, Speech and Signal
Processing (ICASSP), Vancouver, BC, Canada, 2631 May 2013,
86098613.
Dahl, G. E., D. Yu, L. Deng, and A. Acero (2012). Context-dependent
pre-trained deep neural networks for large-vocabulary speech
recognition, IEEE Trans. Audio Speech Lang. Process. 20, no. 1,
3042.
Daubechies, I., and S. Maes (1996). A nonlinear squeezing of the continu-
ous wavelet transform basedon auditory nerve models, in Wavelets in
Medicine and Biology, A. Aldroubi and M. Unser (Editors), CRC
Press, Boca Raton, Florida, 527546.
Daubechies, I., J. Lu, and H.-T. Wu (2011). Synchrosqueezed wavelet
transforms: An empirical mode decomposition-like tool, Appl.
Comput. Harmon. Anal. 30, no. 2, 243261.
Eisner, L., T. Fischer, and J. H. Le Calvez (2006). Detection of repeated
hydraulic fracturing (out-of-zone growth) by microseismic monitor-
ing, The Leading Edge 25, no. 5, 548554.
Farabet, C., C. Couprie, L. Najman, and Y. LeCun (2013). Learning
hierarchical features for scene labeling, IEEE Trans. Pattern Anal.
Mach. Intell. 35, no. 8, 19151929.
Farahbod, A. M., H. Kao, D. M. Walker, and J. F. Cassidy (2015).
Investigation of regional seismicity before and after hydraulic frac-
turing in the Horn River basin, northeast British Columbia, Can. J.
Earth Sci. 52, no. 2, 112122.
Galiana-Merino, J. J., J. L. Rosa-Herranz, and S. Parolai (2008). Seismic
P phase picking using a Kurtosis-based criterion in the stationary
wavelet domain, IEEE Trans. Geosci. Remote Sens. 46, no. 11,
38153826.
Gentili, S., and A. Michelini (2006). Automatic picking of P and S phases
using a neural tree, J. Seismol. 10, no. 1, 3963.
Gibbons, S. J., and F. Ringdal (2006). The detection of low magnitude
seismic events using array-based waveform correlation, Geophys. J.
Int. 165, no. 1, 149166.
Glorot, X., A. Bordes, and Y. Bengio (2011). Deep sparse rectifier neural
networks, Proceedings of the Fourteenth International Conf. on
Artificial Intelligence and Statistics, Fort Lauderdale, Florida, 11
13 April 2011, 315323.
Hinton, G., L. Deng, D. Yu, G. E. Dahl, A.-r. Mohamed, N. Jaitly, A. Senior,
V. Vanhoucke, P. Nguyen, T. N. Sainath, et al. (2012). Deep neural
networks for acoustic modeling in speech recognition: The shared
views of four research groups, IEEE Signal Process. Mag. 29, no. 6,
8297.
Horner, R. B., J. E. Barclay, and J. M. MacRae (1994). Earthquakes and
hydrocarbon production in the Fort St. John area of northeastern
British Columbia, Can. J. Explor. Geophys. 30, no. 1, 3950.
Krizhevsky, A., I. Sutskever, and G. E. Hinton (2012). Imagenet classi-
fication with deep convolutional neural networks, Advances in
Neural Information Processing Systems, Lake Tahoe, Nevada, 36
December 2012, 10971105.
LeCun, Y.,Y. Beng io, and G. Hinton (2015). Deep learning , Nature 521,
no. 7553, 436.
LeCun,Y.,B.E.Boser, J.S.Denker,D.Henderson,R.E.Howard,W.E.
Hubbard, and L. D. Jackel (1990). Handwritten digit recognition with
a back-propagation network, Advances in Neural Information Processing
Systems,Denver,Colorado,2730 November 1989, 396404.
LeCun, Y., L. Bottou, Y. Bengio, and P. Haffner (1998). Gradient-based
learning applied to document recognition, Proc. IEEE 86, no. 11,
22782324.
Maity, D., F. Aminzadeh, and M. Karrenbach (2014). Novel hybrid arti-
ficial neural network based autopicking workflow for passive seismic
data, Geophys. Prospect. 62, no. 4, 834847.
Mikolov, T., I. Sutskever, K. Chen, G. S. Corrado, and J. Dean (2013).
Distributed representations of words and phrases and their compo-
sitionality, Advances in Neural Information Processing Systems, Lake
Tahoe, Nevada, 510 December 2013, 31113119.
Moseley, B., A. Markham, and T. Nissen-Meyer (2018). Fast approximate
simulation of seismic waves with deep learning , available at https://
arxiv.org/abs/1807.06873 (last accessed December 2018).
Mousavi, S. M., C. A. Langston, and S. P. Horton (2016). Automatic
microseismic denoising and onset detection using the synchros-
queezed continuous wavelet transform, Geophysics 81, no. 4,
V341V355.
Nair, V., and G. E. Hinton (2010). Rectified linear units improve
restricted Boltzmann machines, Proc. of the 27th International Conf.
on Machine Learning (ICML-10), Haifa, Israel, 2124 June 2010,
807814.
Najafabadi, M. M., F. Villanustre, T. M. Khoshgoftaar, N. Seliya, R. Wald,
and E. Muharemagic (2015). Deep learning applications and chal-
lenges in big data analytics, J. Big Data 2, no. 1, 1.
Perol, T., M. Gharbi, and M. Denolle (2018). Convolutional neural net-
work for earthquake detection and location, Sci. Adv. 4, no. 2,
e1700578.
Peterson, C., and B. Söderberg (1989). A new method for mapping
optimization problems onto neural networks, Int. J. Neural Syst.
1, no. 1, 322.
Portnoff, M. (1980). Timefrequency representation of digital signals
and systems based on short-time Fourier analysis, IEEE Trans.
Acoust. Speech Signal Process. 28, no. 1, 5569.
Ross, Z. E., M. A. Meier, E. Hauksson, and T. H. Heaton (2018).
Generalized seismic phase detection with deep learning, available
at https://arxiv.org/abs/1805.01075 (last accessed December
2018).
Rubinstein, J. L., and A. B. Mahani (2015). Myths and facts on waste-
water injection, hydraulic fracturing, enhanced oil recovery, and
induced seismicity, Seismol. Res. Lett. 86, no. 4, 10601067.
Schultz, R., V. Stern, and Y. J. Gu (2014). An investigation of seismicity
clustered near the Cordel Field, west central Alberta, and its
relation to a nearby disposal well, J. Geophys. Res. 119, no. 4,
34103423.
Serre, T., L. Wolf, and T. Poggio (2005). Object recognition with features
inspired by visual cortex, IEEE Computer Society Conf. on Computer
Vision and Pattern Recognition 2005 (CVPR 2005), Vol. 2, San
Diego, California, 2025 June 2005, 9941000.
Skoumal, R. J., M. R. Brudzinski, and B. S. Currie (2015). Earthquakes
induced by hydraulic fracturing in Poland Township, Ohio, Bull.
Seismol. Soc. Am. 105, no. 1, 189197.
Vaezi, Y., and M. Van der Baan (2015). Comparison of the STA/LTA and
power spectral density methods for microseismic event detection,
Mon. Not. R. Astron. Soc. 203, no. 3, 18961908.
Visser, R., B. Smith, H. Kao, A. Babaie Mahani, J. Hutchinson, and
J. E. McKay (2017). A comprehensive earthquake catalogue for
northeastern British Columbia and western Alberta, 20142016,
Geol. Surv. of Canada, Open-File 8335,doi:10.4095/306292.
Wang , J., and T.-L. Teng (1995). Artificial neural network-based seismic
detector, Bull. Seismol. Soc. Am. 85, no. 1, 308319.
Wang, N., and D.-Y. Yeung (2013). Learning a deep compact image rep-
resentation for visual tracking, Advances in Neural Information
Processing Systems, Lake Tahoe, Nevada, 510 December 2013,
809817.
Wessel, P., and W. H. Smith (1998). New, improved version of Generic
Mapping Tools released, Eos Trans. AGU 79, no. 47, 579.
Seismological Research Letters Volume XX, Number XX 2019 9
SRL Early Edition
Downloaded from https://pubs.geoscienceworld.org/ssa/srl/article-pdf/doi/10.1785/0220180308/4616095/srl-2018308.1.pdf
by Natural Resources Canada Library-Ottawa user
on 16 January 2019
Withers, M., R. Aster, C. Young, J. Beiriger, M. Harris, S. Moore, and J.
Trujillo (1998). A comparison of select trigger algorithms for auto-
mated global seismic phase and event detection, Bull. Seismol. Soc.
Am. 88, no. 1, 95106.
Yang, J., K. Yu, and T. Huang (2010). Supervised translation-invariant
sparse coding, 2010 IEEE Conf. on Computer Vision and Pattern
Recognition (CVPR), San Francisco, California, 1318 June
2010, 35173524.
Yoon, C. E., O. O'Reilly, K. J. Bergen, and G. C. Beroza (2015).
Earthquake detection through computationally efficient similarity
search, Sci. Adv. 1, no. 11, e1501057.
Zhang, H., C. Thurber, and C. Rowe (2003). Automatic P-wave arrival
detection and picking with multiscale wavelet analysis for single-com-
ponent recordings, Bull. Seismol. Soc. Am. 93, no. 5, 19041912.
Zhu, W., and G. C. Beroza (2018). PhaseNet: A deep-neural-network-
based seismic arrival time picking method, available at http://arxiv
.org/abs/1803.03211v1 (last accessed December 2018).
Ramin M. H. Dokht
Honn Kao
Ryan Visser
Brindley Smith
Pacific Geoscience Centre
Natural Resources Canada
Geological Survey of Canada
9860 West Saanich Road
Sidney, British Columbia
Canada V8L 4B2
ramin.mohammadhosseinidokht@canada.ca
Published Online 16 January 2019
10 Seismological Research Letters Volume XX, Number XX 2019
SRL Early Edition
Downloaded from https://pubs.geoscienceworld.org/ssa/srl/article-pdf/doi/10.1785/0220180308/4616095/srl-2018308.1.pdf
by Natural Resources Canada Library-Ottawa user
on 16 January 2019
... Because of the extensive archives of seismic data and the availability of handpicked labels from earthquake catalogs, event detection and phase picking can be considered supervised learning tasks (Ross et al., 2018;Dokht et al., 2019;Pardo et al., 2019;Woollam et al., 2019;Mousavi and Beroza, 2022;Saad et al., 2023). These tasks can be approached either independently or simultaneously Zhu and Beroza, 2019;Mousavi et al., 2020). ...
... Because of the extensive archives of seismic data and the availability of handpicked labels from earthquake catalogs, event detection and phase picking can be considered supervised learning tasks (Ross et al., 2018;Dokht et al., 2019;Pardo et al., 2019;Woollam et al., 2019;Mousavi and Beroza, 2022;Saad et al., 2023). These tasks can be approached either independently or simultaneously Zhu and Beroza, 2019;Mousavi et al., 2020). ...
Article
Seismic networks worldwide are designed to monitor seismic ground motion. This process includes identifying seismic events in the signals, picking and associating seismic phases, determining the event’s location, and calculating its magnitude. Although machine-learning (ML) methods have shown significant improvements in some of these steps individually, there are other stages in which traditional non-ML algorithms outperform ML approaches. We introduce SeisMonitor, a Python open-source package to monitor seismic activity that uses ready-made ML methods for event detection, phase picking and association, and other well-known methods for the rest of the steps. We apply these steps in a totally automated process for almost 7 yr (2016–2022) in three seismic networks located in Colombian territory, the Colombian seismic network and two local and temporary networks in northern South America: the Middle Magdalena Valley and the Caribbean-Mérida Andes seismic arrays. The results demonstrate the reliability of this method in creating automated seismic catalogs, showcasing earthquake detection capabilities and location accuracy similar to standard catalogs. Furthermore, it effectively identifies significant tectonic structures and emphasizes local crustal faults. In addition, it has the potential to enhance earthquake processing efficiency and serve as a valuable supplement to manual catalogs, given its ability at detecting minor earthquakes and aftershocks.
... Machine learning algorithms are also other aspects of seismic monitoring. Determination of seismic phases (Ross et al., 2018b;Dokht et al., 2019;Woollam et al., 2019; and first motion polarity (Ross et al., 2018a) are studied. Focal mechanisms (Kuang et al., 2021;Zhang et al., 2021) of the seismic events and seismic sources such as earthquakes (Perol et al., 2018;Tang et al., 2020;Yeck et al., 2021;Yang et al., 2021), volcanoes (Titos et al., 2018;Cortés et al., 2019) and geothermal (Holtzman et al., 2018) events and their positions are also analyzed via machine learning tools. ...
Article
Full-text available
Separation of seismic sources of seismic events such as earthquakes and quarry blasts is a complex task and, in most cases, require manual inspection. In this study, artificial neural network models are developed to automatically identify the events that occurred in North-East Italy, where earthquakes and quarry blasts may share the same area. Due to the proximity of the locations of the active fault lines and mining sites, many blasts are registered as earthquakes that can contaminate earthquake catalogues. To be able to differentiate various sources of seismic events 11,821 seismic records from 1463 earthquakes detected by various seismic networks and 9822 seismic records of 727 blasts manually labelled by the Slovenian Environment Agency are used. Three-component seismic records with 90 s length and their frequency contents are used as an input. Ten different models are created by changing various features of the neural networks. Regardless of the features of the created models, results show that accuracy rates are always around 99 %. The performance of our models is compared with a previous study that also used artificial neural networks. It is found that our models show significantly better performance with respect to the models developed by the previous study which performs badly due to differences in the data. Our models perform slightly better than the new model created by using our dataset, but with the previous study’s architecture. Developed model can be useful for the discrimination of the earthquakes from quarry blasts in North-East Italy, which may help us to monitor seismic events in the region.
... Recently machine learning has been used to detect and identify events in continuous seismic waveforms. In particular, Deep Learning (DL) approaches using Convolutional Neural Networks (CNNs) have been successfully trained to identify and pick P-and S-wave onsets (e.g., Dokht et al., 2019;Perol et al., 2018;Ross et al., 2018;Zhu et al., 2019). DL approaches can identify more events than were originally detected in network catalogs (Mousavi et al., 2020), suggesting opportunities to improve event detection. ...
Article
Full-text available
Template matching has proven to be an effective method for seismic event detection, but is biased toward identifying events similar to previously known events, and thus is ineffective at discovering events with non‐matching waveforms (e.g., those dissimilar to existing catalog events). In principle, this limitation can be overcome by cross‐correlating every segment (possible template) of a seismogram with every other segment to identify all similar event pairs, but doing so has been previously considered computationally infeasible for long time series. Here we describe a method, called the ‘Matrix Profile’ (MP), a “correlate everything with everything” calculation that can be efficiently and scalably computed. The MP returns the maximum value of the correlation coefficient of every sub‐window of continuous data with every other sub‐window, as well as the best‐correlated sub‐window location. Here we show how MP methods can obtain valuable results when applied to months and years of continuous seismic data in both local and global case studies. We find that the MP can identify many new events in Parkfield, California seismicity that are not contained in existing event catalogs and that it can efficiently find clusters of similar earthquakes in global seismic data. Either used by itself, or as a starting point for subsequent template matching calculations, the MP is likely to provide a useful new tool for seismology research.
Article
Full-text available
Ambient noise source localization is of great significance for estimating seismic noise source distribution, understanding source mechanisms and imaging subsurface structures. The commonly used methods for source localization, such as the matched field processing and the full-waveform inversion, are time-consuming and not applicable for time-lapse monitoring of the noise source distribution. We propose an efficient alternative of using deep learning for noise source localization. In the neural network, the input data are noise cross-correlation functions and the output are matrices containing the information of noise source distribution. It is assumed that the subsurface structure is a horizontally layered earth model and the model parameters are known. A wavefield superposition method is employed to efficiently simulate ambient noise data with quantities of local noise sources labelled as training datasets. We use a weighted binary cross-entropy loss function to address the prediction inaccuracy caused by a sparse label matrix during training. The proposed deep learning framework is validated by synthetic tests and two field data examples. The successful applications to locate an anthropogenic noise source and a carbon dioxide (CO2) degassing area demonstrate the accuracy and efficiency of the proposed deep learning method for noise source localization, which has great potential for monitoring the changes of the noise source distribution in a survey area.
Article
Full-text available
Given the recent developments in machine-learning technology, its application has rapidly progressed in various fields of earthquake seismology, achieving great success. Here, we review the recent advances, focusing on catalog development, seismicity analysis, ground-motion prediction, and crustal deformation analysis. First, we explore studies on the development of earthquake catalogs, including their elemental processes such as event detection/classification, arrival time picking, similar waveform searching, focal mechanism analysis, and paleoseismic record analysis. We then introduce studies related to earthquake risk evaluation and seismicity analysis. Additionally, we review studies on ground-motion prediction, which are categorized into four groups depending on whether the output is ground-motion intensity or ground-motion time series and the input is features (individual measurable properties) or time series. We discuss the effect of imbalanced ground-motion data on machine-learning models and the approaches taken to address the problem. Finally, we summarize the analysis of geodetic data related to crustal deformation, focusing on clustering analysis and detection of geodetic signals caused by seismic/aseismic phenomena. Graphical Abstract
Article
Full-text available
Plain Language Summary Earthquake monitoring often involves measuring arrival times of P‐ and S‐waves of earthquakes from continuous seismic data. With the advancement of artificial intelligence, state‐of‐the‐art phase picking methods use deep neural networks to examine seismic data from each station independently; this is in stark contrast to the way that human experts annotate seismic data, in which waveforms from the whole network containing multiple stations are examined simultaneously. With the performance gains of single‐station algorithms approaching saturation, it is clear that meaningful future advances will require algorithms that can naturally examine data for entire networks at once. Here we introduce a multi‐station phase picking algorithm based on a recently developed machine learning paradigm called Neural Operator. Our algorithm, called Phase Neural Operator, leverages the spatial‐temporal information of earthquake signals from an input seismic network with arbitrary geometry. This results in superior performance over leading baseline algorithms by detecting many more earthquakes, picking many more seismic wave arrivals, yet also greatly improving measurement accuracy.
Article
Full-text available
As the number of seismic sensors grows, it is becoming increasingly difficult for analysts to pick seismic phases manually and comprehensively, yet such efforts are fundamental to earthquake monitoring. Despite years of improvements in automatic phase picking, it is difficult to match the performance of experienced analysts. A more subtle issue is that different seismic analysts may pick phases differently, which can introduce bias into earthquake locations. We present a deep-neural-network-based arrival-time picking method called "PhaseNet" that picks the arrival times of both P and S waves. Deep neural networks have recently made rapid progress in feature learning, and with sufficient training, have achieved super-human performance in many applications. PhaseNet uses three-component seismic waveforms as input and generates probability distributions of P arrivals, S arrivals, and noise as output. We engineer PhaseNet such that peaks in probability provide accurate arrival times for both P and S waves, and have the potential to increase the number of S-wave observations dramatically over what is currently available. This will enable both improved locations and improved shear wave velocity models. PhaseNet is trained on the prodigious available data set provided by analyst-labeled P and S arrival times from the Northern California Earthquake Data Center. The dataset we use contains more than seven million waveform samples extracted from over thirty years of earthquake recordings. We demonstrate that PhaseNet achieves much higher picking accuracy and recall rate than existing methods.
Technical Report
Full-text available
To gain a better understanding of induced seismicity in northeast British Columbia and western Alberta, we conducted an intensive analysis of seismic data to locate earthquakes that occurred within the area of 52˚N–61˚N, 126˚W–115˚W for the years of 2014 through 2016. Continuous seismic waveforms from as many as 43 stations operated by various organizations in the region were used in this study. A total of 5478 events were identified and located; but only 4916 solutions were deemed acceptable by our quality criteria. The number of earthquakes in our final catalogue is approximately three times the base level of the Canadian National Seismograph Network catalogue. In this report, we describe in detail our location procedures and how each source parameter (origin time, epicenter, focal depth, and magnitude) is determined. The earthquake catalogue is summarized in a table, while the phase picking data for individual events are presented in an ASCII file as a supplement to this report. The total numbers of events in 2014, 2015, and 2016 are 1287, 1575, and 2057, respectively. The overall magnitude of completeness of our catalogue is ML 1.8, an improvement from the value of 2.3 for the CNSN catalogue.
Article
Full-text available
Velocity-model building is a key step in hydrocarbon exploration. The main product of velocity-model building is an initial model of the subsurface that is subsequently used in seismic imaging and interpretation workflows. Reflection or refraction tomography and full-waveform inversion (FWI) are the most commonly used techniques in velocity-model building. On one hand, tomography is a time-consuming activity that relies on successive updates of highly human-curated analysis of gathers. On the other hand, FWI is very computationally demanding with no guarantees of global convergence. We propose and implement a novel concept that bypasses these demanding steps, directly producing an accurate gridding or layered velocity model from shot gathers. Our approach relies on training deep neural networks. The resulting predictive model maps relationships between the data space and the final output (particularly the presence of high-velocity segments that might indicate salt formations). The training task takes a few hours for 2D data, but the inference step (predicting a model from previously unseen data) takes only seconds. The promising results shown here for synthetic 2D data demonstrate a new way of using seismic data and suggest fast turnaround of workflows that now make use of machine-learning approaches to identify key structures in the subsurface.
Article
Full-text available
The recent evolution of induced seismicity in Central United States calls for exhaustive catalogs to improve seismic hazard assessment. Over the last decades, the volume of seismic data has increased exponentially, creating a need for efficient algorithms to reliably detect and locate earthquakes. Today's most elaborate methods scan through the plethora of continuous seismic records, searching for repeating seismic signals. In this work, we leverage the recent advances in artificial intelligence and present ConvNetQuake, a highly scalable convolutional neural network for earthquake detection and location from a single waveform. We apply our technique to study the induced seismicity in Oklahoma (USA). We detect 20 times more earthquakes than previously cataloged by the Oklahoma Geological Survey. Our algorithm is orders of magnitude faster than established methods.
Article
Full-text available
Typical microseismic data recorded by surface arrays are characterized by low signal-to-noise ratios (S/Ns) and highly nonstationary noise that make it difficult to detect small events. Currently, array or crosscorrelation-based ap-proaches are used to enhance the S/N prior to processing. We have developed an alternative approach for S/N improve-ment and simultaneous detection of microseismic events. The proposed method is based on the synchrosqueezed continuous wavelet transform (SS-CWT) and custom thresholding of sin-gle-channel data. The SS-CWT allows for the adaptive filter-ing of time-and frequency-varying noise as well as offering an improvement in resolution over the conventional wavelet transform. Simultaneously, the algorithm incorporates a de-tection procedure that uses the thresholded wavelet coeffi-cients and detects an arrival as a local maxima in a characteristic function. The algorithm was tested using a syn-thetic signal and field microseismic data, and our results have been compared with conventional denoising and detection methods. This technique can remove a large part of the noise from small-amplitudes signal and detect events as well as es-timate onset time.
Article
Full-text available
A new Matched Filtering Algorithm (MFA) is proposed for detecting and analyzing microseismic events recorded by downhole monitoring of hydraulic fracturing. This method requires a set of well-located template (‘parent’) events, which are obtained using conventional microseismic processing and selected on the basis of high signal-to-noise (S/N) ratio and representative spatial distribution of the recorded microseismicity. Detection and extraction of ‘child’ events are based on stacked, multi-channel cross-correlation of the continuous waveform data, using the parent events as reference signals. The location of a child event relative to its parent is determined using an automated process, by rotation of the multi-component waveforms into the ray-centered co-ordinates of the parent and maximizing the energy of the stacked amplitude envelope within a search volume around the parent's hypocentre. After correction for geometrical spreading and attenuation, the relative magnitude of the child event is obtained automatically using the ratio of stacked envelope peak with respect to its parent. Since only a small number of parent events require interactive analysis such as picking P- and S-wave arrivals, the MFA approach offers the potential for significant reduction in effort for downhole microseismic processing. Our algorithm also facilitates the analysis of single-phase child events, i.e. microseismic events for which only one of the S- or P-wave arrival is evident due to unfavorable S/N conditions. A real-data example using microseismic monitoring data from 4 stages of an open-hole slickwater hydraulic fracture treatment in western Canada demonstrates that a sparse set of parents (in this case, 4.6 per cent of the originally located events) yields a significant (more than four-fold increase) in the number of located events compared with the original catalog. Moreover, analysis of the new MFA catalog suggests that this approach leads to more robust interpretation of the induced microseismicity and novel insights into dynamic rupture processes based on the average temporal (foreshock-aftershock) relationship of child events to parents.
Article
To optimally monitor earthquake‐generating processes, seismologists have sought to lower detection sensitivities ever since instrumental seismic networks were started about a century ago. Recently, it has become possible to search continuous waveform archives for replicas of previously recorded events (i.e., template matching), which has led to at least an order of magnitude increase in the number of detected earthquakes and greatly sharpened our view of geological structures. Earthquake catalogs produced in this fashion, however, are heavily biased in that they are completely blind to events for which no templates are available, such as in previously quiet regions or for very large‐magnitude events. Here, we show that with deep learning, we can overcome such biases without sacrificing detection sensitivity. We trained a convolutional neural network (ConvNet) on the vast hand‐labeled data archives of the Southern California Seismic Network to detect seismic body‐wave phases. We show that the ConvNet is extremely sensitive and robust in detecting phases even when masked by high background noise and when the ConvNet is applied to new data that are not represented in the training set (in particular, very large‐magnitude events). This generalized phase detection framework will significantly improve earthquake monitoring and catalogs, which form the underlying basis for a wide range of basic and applied seismological research.
Article
While logistic sigmoid neurons are more biologically plausable that hyperbolic tangent neurons, the latter work better for training multi-layer neural networks. This paper shows that rectifying neurons are an even better model of biological neurons and yield equal or better performance than hyperbolic tangent networks in spite of the hard non-linearity and non-differentiability at zero, creating sparse representations with true zeros, which seem remarkably suitable for naturally sparse data. Even though they can take advantage of semi-supervised setups with extra-unlabelled data, deep rectifier networks can reach their best performance without requiring any unsupervised pre-training on purely supervised tasks with large labelled data sets. Hence, these results can be seen as a new milestone in the attempts at understanding the difficulty in training deep but purely supervised nueral networks, and closing the performance gap between neural networks learnt with and without unsupervised pre-training
Conference Paper
We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 dif- ferent classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implemen- tation of the convolution operation. To reduce overfitting in the fully-connected layers we employed a recently-developed regularization method called dropout that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry
Conference Paper
Extracting earthquake signals from continuous waveform data recorded by networks of seismic sensors is a critical and challenging task in seismology. Earthquakes occur infrequently in long-duration data and may produce weak signals, which are challenging to detect while limiting the number of false discoveries. Earthquake detection based on waveform similarity has demonstrated success in detecting weak signals from small events, but existing techniques either require prior knowledge of the event waveform or have poor scaling properties that limit use to small data sets. In this paper, we describe ongoing research into the use of similarity search for large-scale earthquake detection. We describe Fingerprint and Similarity Thresholding (FAST), a new earthquake detection method that leverages locality-sensitive hashing to enable waveform-similarity-based earthquake detection in long-duration continuous seismic data. We demonstrate the detection capability of FAST and compare different fingerprinting schemes by performing numerical experiments on test data, with an emphasis on false alarm reduction.