ArticlePDF Available

Seismic Event and Phase Detection Using Time-Frequency Representation and Convolutional Neural Networks

January 2019
Seismological Research Letters

January 2019

DOI:10.1785/0220180308

Authors:

Ramin Dokht

Natural Resources Canada

Honn Kao

Natural Resources Canada

Ryan Visser

Natural Resources Canada

Brindley Smith

McGill University

The availability of abundant digital seismic records and successful application of deep learning in pattern recognition and classification problems enable us to achieve a reliable earthquake detection framework. To overcome the limitations and challenges of conventional methods, which are mainly due to an incomplete set of template waveforms and low signal‐to‐noise ratio, we design a generalized model to improve discrimination between earthquake and noise recordings using a deep convolutional network (ConvNet). Exclusively based on a dataset of over 4900 earthquakes recorded over a period of 3 yrs in western Canada, a multilayer ConvNet is trained to learn general characteristics of background noise and earthquake signals in the time–frequency domain. In the next step, we train a secondary network using the wavelet transform of the major seismic arrivals to separate P from S waves and estimate their approximate arrival times. The results of validation experiments demonstrate promising performance and achieve an average accuracy of nearly 99% for both networks. To investigate the applicability of our algorithm, we apply the trained model on an independent dataset recently recorded in northeastern British Columbia (NE BC). It is found that deep‐learning‐based methods are superior to traditional techniques in detecting a higher number of seismic events at significantly less computational cost.

Architecture of Convolutional Network (ConvNet) Used for Seismic Event Detection

…

Architecture of ConvNet Used for Seismic Phase Identification

…

Figures - uploaded by Ramin Dokht

Content may be subject to copyright.

Content uploaded by Ramin Dokht

Content may be subject to copyright.

○

Seismic Event and Phase Detection Using

Time–Frequency Representation and

Convolutional Neural Networks

by Ramin M. H. Dokht, Honn Kao, Ryan Visser, and Brindley Smith

ABSTRACT

The availability of abundant digital seismic records and suc-

cessful application of deep learning in pattern recognition and

classification problems enable us to achieve a reliable earth-

quake detection framework. To overcome the limitations and

challenges of conventional methods, which are mainly due to

an incomplete set of template waveforms and low signal-to-

noise ratio, we design a generalized model to improve discrimi-

nation between earthquake and noise recordings using a deep

convolutional network (ConvNet). Exclusively based on a

dataset of over 4900 earthquakes recorded over a period of

3 yrs in western Canada, a multilayer ConvNet is trained to

learn general characteristics of background noise and earth-

quake signals in the time–frequency domain. In the next step,

we train a secondary network using the wavelet transform of

the major seismic arrivals to separate Pfrom Swaves and esti-

mate their approximate arrival times. The results of validation

experiments demonstrate promising performance and achieve

an average accuracy of nearly 99% for both networks. To inves-

tigate the applicability of our algorithm, we apply the trained

model on an independent dataset recently recorded in

northeastern British Columbia (NE BC). It is found that

deep-learning-based methods are superior to traditional tech-

niques in detecting a higher number of seismic events at sig-

nificantly less computational cost.

Electronic Supplement: Tables reporting the performance of

convolutional networks trained directly in the time domain,

and figures showing the accuracy of the validation set and

P- and S-wave error measurements.

INTRODUCTION

The recent increase in the rate of earthquake occurrence in

western Canada, a region with a historically low level of back-

ground seismicity, has been largely attributed to the develop-

ment of unconventional hydrocarbon resources (Horner et al.,

1994;Schultz et al., 2014;Farahbod et al., 2015;Rubinstein

and Mahani, 2015;Atkinson et al., 2016). Reliable ground-

motion analyses and seismic hazard assessments require a com-

plete earthquake catalog containing both natural and induced

earthquakes. However, most of the conventional automated

techniques fail to identify induced events of low magnitudes,

which makes characterizing and locating these events a chal-

lenging task. In comparison, manual picking of seismic events

has a higher detection rate, but the process is extremely labo-

rious and remains subjective to the analyst’s experience.

Among the various detection algorithms, the short-term

average/long-term average (STA/LTA) technique, which mea-

sures the signal-to-noise ratio (SNR) function, has been widely

used for detecting moderate-to-large earthquakes if a certain

triggering threshold is exceeded (Allen, 1978;Withers et al.,

1998). On the other hand, cross-correlation-based techniques

(also called template matching) have been extensively used to

identify repeating earthquakes of lower magnitudes based on sim-

ilarity measurements of the entire waveforms (Gibbons and

Ringdal, 2006;Skoumal et al., 2015;Caffagni et al., 2016).

Although less sensitive to high noise levels, template matching

is computationally intensive and its application is limited to iden-

tifying earthquakes sharing the same source region and mecha-

nism (Eisner et al., 2006).Thiscanbepartiallyremediedby

generalizing template matching using clustering of similar wave-

form fingerprints through a set of hash functions (Yoon et al.,

2015;Bergen et al., 2016), and subspace analysis of a set of rep-

resentative waveforms (Barrett and Beroza, 2014). Recent studies

take advantage of spectral information of seismic records to

improve the accuracy associated with detection of weak micro-

seismic events (Galiana-Merino et al., 2008;Va ez i a nd Va n d er

Baan, 2015;Mousavi et al., 2016), though aprioriknowledge of

noise characteristics is required.

Deep learning is a set of representation-learning algorithms

with multiple layers of nonlinear transformations (Bengio et al.,

2013;LeCun et al.,2015;Najafabadi et al., 2015). Unlike con-

ventional machine-learning techniques, which rely on carefully

hand-engineered features (Wa n g a n d T e n g , 1 9 9 5 ;Gentili and

Michelini, 2006;Maity et al.,2014), deep learning allows a

model to learn a general-purpose representation of raw data

doi: 10.1785/0220180308 Seismological Research Letters Volume XX, Number XX –2019 1

SRL Early Edition

Downloaded from https://pubs.geoscienceworld.org/ssa/srl/article-pdf/doi/10.1785/0220180308/4616095/srl-2018308.1.pdf

by Natural Resources Canada Library-Ottawa user

on 16 January 2019

using a training set (Dahl et al., 2013;Wang and Yeung,

2013). Deep representation learning has been widely applied

to several research areas such as natural language understanding

(Collobert et al., 2011;Mikolov et al.,2013), image classification

(Krizhevsky et al.,2012), and speech recognition (Dahl et al.,

2012;Hinton et al.,2012). Recent studies have investigated

the application of deep-learning techniques for earthquake

detection (e.g., Perol et al.,2018;Ross et al.,2018;Zhu and

Beroza, 2018) and seismic imaging (Araya-Polo et al., 2018;

Moseley et al., 2018).

Our research is motivated by the successful application of

deep convolutional networks (ConvNets) to overcome the lim-

itations of traditional techniques in studying induced seismic-

ity in Oklahoma, United States (Perol et al., 2018). The basic

idea is to train a ConvNet model on a large dataset of previ-

ously recorded earthquakes, so that the classifier can be gener-

alized to identify seismic events different from those used in

training. In comparison with the model proposed by Perol et al.

(2018), the current study presents additional improvements

by including both temporal and spectral information of three-

component seismograms to enhance the detection accuracy. We

divide the automatic earthquake detection process into two steps.

A pretrained model is first adopted to separate earthquakes from

nonearthquake signals in the time–frequency domain. Then, we

build up a secondary supervised classification system using

higher-resolution spectral images of earthquake records to dis-

criminate between Pand Swaves.

METHOD AND DATASET

Convolutional Neural Networks

ConvNets are feed-forward, multilayer neural networks, which

were introduced to process multidimensional arrays (LeCun

et al., 1998). A ConvNet transforms an input volume of data

features to output class probabilities through a sequence of sev-

eral hidden units consisting of convolutional, pooling, and fully

connected layers. Each convolutional layer contains a bank of

linear filters to extract local features at all parts on the previous

layer (Cireşan et al.,2012) and passes the resulting convolutional

responses through a nonlinear activation unit. ConvNets have

been found to train several times faster by mapping all negative

responses to zero using a rectified linear unit (Nair and Hinton,

2010;Glorot et al.,2011):

EQ-TARGET;temp:intralink-;df1;40;229xjmax0;b

jX

wijxi;1

in which bjand wij are the bias and weights of the jth neuron

passing the input from the previous layer xito the output feature

map xj, respectively. It is common to spatially downsample the

resulting activations by merging similar local features into one

(LeCun et al.,2015). The subsampling, also known as pooling,

operation can significantly reduce the number of free parame-

ters, thereby improving the performance of the network and

avoiding overfitting (Krizhevsky et al.,2012). Max pooling is the

most widely used subsampling operator; this calculates the

maximum activation over nonoverlapping local neighborhoods

(Serre et al., 2005;Cireşan et al., 2012), and thus reduces the

variability to small temporal/spatial transformations (LeCun

et al.,1990;Ya ng et al., 2010;Farabet et al.,2013). The output

feature maps, resulted from a sequence of convolutional, non-

linear activation, and pooling layers, are concatenated and passed

to a fully connected layer in which every neuron is linearly con-

nected to all activations in the previous layer. The output of the

last fullyconnected layer isfed to a normalized exponential func-

tion (softmax classifier), which calculates a probability distribu-

tion over Cdifferent possible classes (Peterson and Söderberg,

1989):

EQ-TARGET;temp:intralink-;df2;311;601pjexpxj

k1expxkk1;…;C: 2

The objective of the current study is to find a set of learn-

able free parameters, which minimizes the misfit between the

predicted pand ground-truth scores gof Ninstances using a

L2-regularized multinomial logistic loss function:

EQ-TARGET;temp:intralink-;df3;311;507J1

n1

k1

−gn

klogpn

kλX

kWik2

2:3

The regularization parameter λcontrols the trade-off

between the data misfit and model constraints, and Wiare the

model parameters (bias and weights) of the ith layer. The sol-

ution to the objective function is found using the gradient

descent technique, which calculates the updates on network

parameters at each iteration tby a linear combination of the

negative gradient of the loss function and the model update

from the previous iteration t−1:

EQ-TARGET;temp:intralink-;df4;311;346ΔWtμΔWt−1−α∇JWt;4

in which the learning rate αis the weight of the negative gra-

dient, and the momentum μcontrols the network parameter

update at each iteration. For a large training set, it would be

more efficient to estimate the stochastic approximation of the

cost function by drawing a small random selection (mini-

batch) of the training set at each iteration.

Earthquake Detection

In the first step of our earthquake detection framework, we

design a ConvNet to scan continuous seismic records and sep-

arate between noise and earthquake signals. Exclusively based

on a comprehensive earthquake catalog from western Canada,

a classifier is trained to identify coherent high-power earthquake

signals between three-component seismograms. In this region,

the Geological Survey of Canada has reported 4914 earthquakes

with local magnitudes MLranging from 0.1 to 4.9 between

January 2014 and December 2016 (Fig. 1;Visser et al.,2017).

After deconvolving the instrument response from the ground

velocity records, we apply a 2 Hz high-pass filter and resample

the data to 20 Hz. The resulting waveforms are visually

2 Seismological Research Letters Volume XX, Number XX –2019

SRL Early Edition

Downloaded from https://pubs.geoscienceworld.org/ssa/srl/article-pdf/doi/10.1785/0220180308/4616095/srl-2018308.1.pdf

by Natural Resources Canada Library-Ottawa user

on 16 January 2019

inspected and low-quality data are discarded. The final dataset

contains 13,949 earthquake–receiver pairs with epicentral dis-

tances up to 275 km (Fig. 1c) and a total of 148,000 noise

records selected from time segments free of earthquake arrivals.

To avoid overfitting due to the uneven number of observations

per category, the earthquake signals are randomly repeated, time-

shifted, and contaminated with Gaussian noise.

Unlike the previous studies (Perol et al., 2018;Ross et al.,

2018), which perform the classification task in the time

domain, the current model attempts to learn the general char-

acteristics associated with the earthquake and nonearthquake

signals in the time–frequency domain. Because seismic event

and noise recordings have intrinsically different spectral con-

tents, the performance of the network can be further improved

by learning from the spectro-temporal representations of wave-

form data (Wang and Teng, 1995;Vaezi and Van der Baan,

2015). The first layer of the ConvNet is represented by spectro-

grams of 100-s-long data segments calculated using the short-

time Fourier transform (STFT; Portnoff, 1980). The spectro-

grams are computed using a moving Hanning window of 40

samples with 50% overlap between successive windows (Fig. 2).

The resulting spectrograms are then normalized by their maxi-

mum spectral values and therefore become independent of

earthquake magnitude.

The architecture of earthquake detection ConvNet con-

sists of a sequence of four convolutional layers, each followed

by a pooling layer (Table 1). The supervised classifier is trained

on a random subset consisting of 75% of spectrograms (train-

ing set) and its performance is evaluated in terms of detection

accuracy of the remaining 25% observations

(validation set). The training is run for 50,000

iterations with a trade-off parameter λ10−4,

an initial learning rate of 0.01 decaying to

0.005, and a fixed moment rate of 0.9. The

training process is performed using a mini-

batch of 200 samples per iteration and takes

approximately 4 hrs on an Intel Core i7

CPU (4 GHz).

Phase Identification

The goal of the next experiment is to design a

secondary classifier using the existing earthquake

catalog for separating between different body

waves. Robust measurements of phase arrivals are

achieved by analyzing the nonstationary, multi-

component seismic signals using the wavelet

transform (WT) (Zhang et al.,2003;Ahmed

et al.,2007;Galiana-Merino et al., 2008).

Compared with the STFT spectrogram, the WT

has higher temporal resolution, but it suffers

from the energy spread out along the frequency

axis. The synchrosqueezing wavelet transform

(SWT) is a new time–frequency analysis tech-

nique that combines the conventional WT with

a frequency reassignment method to enhance the

time–frequency localization (Daubechies and

Maes, 1996;Daubechies et al.,2011). Assuming that the energy

of a seismic event is concentrated in a few high-amplitude wave-

let coefficients, Mousavi et al. (2016) introduced an SWT-based

technique to simultaneously suppress nonstationary random

noise and detect onset times of weak microseismic events using

a characteristic function of the thresholded wavelet coefficients.

In this step, we train a separate network using a dataset of

high-resolution wavelet power spectra of major seismic arrivals

picked by expert analysts. The new dataset consists of 11,500

P-wave, 11,500 S-wave, and 45,000 noise windows of 5 s long

each. The windows containing both Pand Swaves are repeated

and centered on the arrival time of each phase separately. We

applied the same data augmentation strategy described in the

Earthquake Detection section to ensure that there is an equal

number of observations in each class. Then, we employed a

Morlet wavelet, product of a complex exponential and a

Gaussian envelope, as the basis function to calculate the SWT

of windowed data (Fig. 3). The final dataset is eventually

divided into the training and validation sets including time–

frequency representations of waveform data normalized by the

maximum value of wavelet coefficients of all three compo-

nents. The architecture of phase identification network and its

model parameters are similar to those of earthquake detection.

However, the input layer consists of three maps of 80 ×101

neurons, and all convolutional layers have a fixed filter size

of 3×5. The pooling layers have local receptive fields of size

2×2with a constant stride of 2 in both dimensions, the first

fully connected layer has 192 neurons and the output layer is a

vector of 3 class labels (Table 2).

(a) (b)

(c)

▴Figure 1. (a) Distributions of earthquakes (circles) and seismic stations (white

triangles) used in this study. The warm and cold colors correspond to shallow

and deep earthquakes, respectively. The circle sizes represent the earthquake

magnitudes. (b) A histogram of event local magnitude distribution with mean and

standard deviation values of 1.8 and 0.55, respectively. (c) Distribution of source–

receiver distances using a 20 km bin size.

Seismological Research Letters Volume XX, Number XX –2019 3

SRL Early Edition

Downloaded from https://pubs.geoscienceworld.org/ssa/srl/article-pdf/doi/10.1785/0220180308/4616095/srl-2018308.1.pdf

by Natural Resources Canada Library-Ottawa user

on 16 January 2019

RESULT

We evaluate the performance of the proposed

framework on validation datasets selected inde-

pendent of training sets. While the solver tries

to minimize the objective function using the

training samples (learning curves in Fig. 4a,b),

validation data are employed to measure the gen-

eralization capability of the trained network for

correct classification of new samples that have

never been considered during the training phase

(Table 3).Thelearningcurvesshowthatthe

earthquake detection algorithm rapidly converges

to the optimal solution after 20,000 iterations,

but it takes ∼40;000 iterations until the objec-

tive function becomes flat for the seismic phase

identification model. The neural network produ-

ces a vector of confidence scores for all class

labels, indicating the probability of a particular

input belonging to a given category. The accuracy

of correct classification is calculated every 5000

iterations and the model with the highest accu-

racy on the validation set is selected as the final

classifier (ⒺFig. S1, available in the electronic

supplement to this article).

For earthquake detection, the network accu-

racy reaches 99.8% on the validation set and a

total number of only 40 earthquake windows are

mislabeled. This is comparable to that obtained

from the training dataset, which yields respective

accuracies of 99.7% and 99.95% for earthquake

and nonearthquake categories (see Table 3). On

the other hand, the phase identification network

results in a slightly reduced accuracy on average.

The second classifier gives an average error of

0.7% for the entire training set and predicts

Table 1

Architecture of Convolutional Network (ConvNet) Used for Seismic Event Detection

Layer Type Kernel Size Stride Output

1 Input data ——3mapsof33 ×99 neurons

2 ConvReLU 5×7—16 maps of 29 ×93 neurons

3 Maxpool 1×2(1, 2) 16 maps of 29 ×47 neurons

4 ConvReLU 5×5—16 maps of 25 ×43 neurons

5 Maxpool 1×2(1, 2) 16 maps of 25 ×22 neurons

6 ConvReLU 3×3—16 maps of 23 ×20 neurons

7 Maxpool 2×2(2, 2) 16 maps of 12 ×10 neurons

8 ConvReLU 3×3—16 maps of 10 ×8neurons

9 Maxpool 2×2(2, 2) 16 maps of 5×4neurons

10 FC —— 320 neurons

11 FC —— 2 neurons

ConvReLU, a convolutional layer followed by a rectified linear unit; Maxpool, a max pooling layer; FC, a fully connected layer.

18:40:25

18:40:45

18:41:05

18:41:25

18:41:45

18:42:20

18:42:40

18:43:00

18:43:20

18:43:40

–500 –450 –400 –350 –500 –450 –400

Frequency (Hz) Frequency (Hz) Frequency (Hz)

(a) (b)

(c)

(d)

(e)

(f)

(g)

(h)

CN.BMBC.BHE

CN.BMBC.BHN

CN.BMBC.BHZ

CN.BMBC.BHE

CN.BMBC.BHN

CN.BMBC.BHZ

18:40:25

18:40:45

18:41:05

18:41:25

18:41:45

18:42:20

18:42:40

18:43:00

18:43:20

18:43:40

CN.BMBC.BHECN.BMBC.BHE

CN.BMBC.BHNCN.BMBC.BHN

CN.BMBC.BHZ CN.BMBC.BHZ

▴Figure 2. Three-component waveforms of (a) a 2.4 magnitude earthquake

recorded by CN.BMBC at 164 km distance on 25 February 2014, and (b) a postevent

noise window recorded at the same station. (c–e) The short-time Fourier transform

of earthquake records in (a). (f–h) Same as (c–e) but calculated for the noise win-

dow in (b).

4 Seismological Research Letters Volume XX, Number XX –2019

SRL Early Edition

Downloaded from https://pubs.geoscienceworld.org/ssa/srl/article-pdf/doi/10.1785/0220180308/4616095/srl-2018308.1.pdf

by Natural Resources Canada Library-Ottawa user

on 16 January 2019

correct labels for approximately 98.7% and 98.4% of P-andS-

wave windows in the validation set, respectively.

The performance of the proposed model is compared with

that obtained from a model that is trained directly in the time

domain using the same dataset.Although the difference between

the earthquake detection networks is insignificant, the total

number of misclassified samples obtained from our model is

nearly 40% of that of the time-domain classifier (see classifica-

tion accuracies in Table 3and ⒺTable S1 and confusion matri-

ces in Table 4and ⒺTable S2). However, the difference

becomes more pronounced for the phase identification process.

On average, the model trained using the time–frequency features

improves the accuracy by up to 3.2%. Although

the time-domain model misidentifies 580 of P

phases and 588 of Sphases in the validation set,

our model provides a higher recall rate and

misses only 137 and 79 of Pand Swaves, respec-

tively (see confusion matrices in Table 5and

ⒺTable S3).

The earthquake detection and phase iden-

tification results indicate that the classification

performance can be affected by the minimum

accepted confidence score (Fig. 4). We define a

hard-thresholding function, which discards

earthquakes whose scores are less than a given

probability threshold (LeCun et al., 1990). A

careful selection of probability threshold is

required because a low threshold value can

increase the number of false detections, while

higher threshold values may result in erroneous

rejection of true events. The distribution of the

number of earthquakes as a function of detec-

tion score shows a gradual increase for confi-

dence scores greater than 0.6 with a sudden

rise at a detection score of 0.95 (Fig. 4c). The

same observations are made for the phase sep-

aration results as nearly 98.5% of correctly pre-

dicted examples fall in the last probability bin

(Fig. 4d).

Induced Seismicity in Northeast British

Columbia

After successful training and validation, the

ConvNet models were employed for monitor-

ing the seismic activity in Fort St. John and near

Dawson Creek, an area with an increased num-

ber of potentially induced earthquakes in recent

years (Horner et al., 1994;British Columbia Oil

and Gas Commission, 2014;Visser et al., 2017).

The network was implemented on continuous

records from an array of nine broadband sta-

tions, deployed in August 2017, which are not

included in either training or validation set. For

each station, the daily records are first divided

into 100-s-long windows (with 50% overlap

between windows) and the STFT representa-

tions of windowed data are calculated. If the first classifier

detects a seismic event at a probability threshold of 0.95,

the SWT of the corresponding time window will be calculated

and scanned by the second network to identify Pand Swaves.

The earthquake detection step takes ∼5sfor a three-compo-

nent daily recording, which is ∼30 times faster than the STA/

LTA technique. To assess the detection performance and gen-

eralization power of the model, all detected events are visually

inspected. It turns out that only 13 events are false detections

(2% of the total number of detected events), 9 of which are

eventually eliminated by the phase identification classifier. In

comparison, the time-domain model returns ∼25% more false

CN.BMBC.BHE

CN.BMBC.BHN

CN.BMBC.BHZ

Frequency (Hz)

10 10

(a)

(b)

(c)

(d)

(e)

(f)

(g)

18:40:25 18:41:45 18:41:05 18:41:25

18:40:32 18:40:37 18:40:53 18:40:58

CN.BMBC.BHE

CN.BMBC.BHN

CN.BMBC.BHZ

CN.BMBC.BHE

CN.BMBC.BHN

CN.BMBC.BHZ

▴Figure 3. (a) Three-component waveforms of the earthquake shown in

Figure 2a. Red and blue dashed lines indicate the P- and S-wave arrival times,

respectively. (b–d) The normalized synchrosqueezing wavelet transform (SWT) of

a 5-s-long window centered on the P-wave arrival. (e–g) The normalized SWT of a

5-s-long window centered on the S-wave arrival.

Seismological Research Letters Volume XX, Number XX –2019 5

SRL Early Edition

Downloaded from https://pubs.geoscienceworld.org/ssa/srl/article-pdf/doi/10.1785/0220180308/4616095/srl-2018308.1.pdf

by Natural Resources Canada Library-Ottawa user

on 16 January 2019

positives than its time–frequency counterpart. The phase iden-

tification network is capable of detecting multiple P-andS-wave

arrivals if more than one event exists within a time window

(Fig. 5and ⒺFig. S2). Our technique finds 652 events

with epicentral distances ranging from 1 to 28 km on

5 September 2017 (Fig. 6). The local magnitudes of detected

earthquakes vary from −0:5to 1.1, which remain below the

minimum magnitude in the training catalog

for nearly 30% of events.

The automatic detection results corroborate

the observations obtained from manual picking

and show significant temporal variations (Fig. 6).

The ConvNet model detects ∼20% more earth-

quakes than previously reported by an expert

analyst, though Pwaves are identified for only

40% of detected events. A possible explanation

for this observation includes the difficulties in

separating Pand Sphases for earthquakes

recorded at shorter distances. For the remaining

60%, the P-wave arrivals are included mostly in

the S-wave windows because the differential

times between the two phases are less than 1.5 s

(nearly 80% of detected events are recorded at

epicentral distances ≤10 km). It is worth noting

that the time–frequency phase classifier identifies

52 more P-wave windows than the model trained

in the time domain. The ConvNet models

trained in the time–frequency and time domains

miss 31 and 39 of the events identified by a

human analyst, respectively. A lower probability

threshold allows the network to detect smaller

events, but this may result in false detections.

DISCUSSION

Most of the existing earthquake detection tech-

niques are poorly suited for identifying low-

magnitude events, and their application in areas with no record

of seismicity is limited by the absence of template earthquake

waveforms (Yoon et al., 2015). The main goal of this research

is to present a detection framework independent of earthquake

magnitude, epicentral distance, and noise level. We propose a

method utilizing multiresolution time–frequency analysis and

advances in deep learning to achieve a robust earthquake detec-

tion. In comparison with the time-domain analysis, the

0.0

0.2

0.4

0.6

Objective value

0 10k 20k 30k 40k 50k

Iteration

100

101

102

103

104

105

106

0.5 0.6 0.7 0.8 0.9 1.0

Number of earthquakes

Earthquake detection probability

Training set

Validation set

0.0

0.4

0.8

0 10k 20k 30k 40k 50k

Iteration

0.5 0.6 0.7 0.8 0.9 1.0

101

102

103

104

105

Phase identification probability

Number of windows containing P- or S-wave

Earthquake detection

learning curve

(a) (b)

Objective value

Phase identification

learning curve

Training set

Validation set

▴Figure 4. Learning curves of the (a) earthquake detection, and (b) phase iden-

tification networks showing the reduction in objective function value during the

training process. Distributions of correctly labeled (c) earthquakes and (d) seismic

phases as a function of detection score.

Table 2

Architecture of ConvNet Used for Seismic Phase Identification

Layer Type Kernel Size Stride Output

1 Input data ——3 maps of 80 ×101 neurons

2 ConvReLU 3×5—16 maps of 78 ×97 neurons

3 Maxpool 2×2(2, 2) 16 maps of 39 ×49 neurons

4 ConvReLU 3×5—16 maps of 37 ×45 neurons

5 Maxpool 2×2(2, 2) 16 maps of 19 ×23 neurons

6 ConvReLU 3×5—16 maps of 17 ×19 neurons

7 Maxpool 2×2(2, 2) 16 maps of 9×10 neurons

8 ConvReLU 3×5—16 maps of 7×6neurons

9 Maxpool 2×2(2, 2) 16 maps of 4×3neurons

10 FC —— 192 neurons

11 FC —— 3 neurons

ConvReLU, a convolutional layer followed by a rectified linear unit; Maxpool, a max pooling layer; FC, a fully connected layer.

6 Seismological Research Letters Volume XX, Number XX –2019

SRL Early Edition

Downloaded from https://pubs.geoscienceworld.org/ssa/srl/article-pdf/doi/10.1785/0220180308/4616095/srl-2018308.1.pdf

by Natural Resources Canada Library-Ottawa user

on 16 January 2019

application of SWT enables the network to learn the spectral

structures of noise and seismic signals, which has been found to

increase the accuracy of phase identification process by up to

∼3%. However, the calculation of WTs of continuous wave-

forms is computationally expensive and may not efficiently

scale to the large-array data. To reduce the processing time, the

phase identification network scans only the earthquake time

windows that are detected by the preliminary ConvNet using

the STFT-based spectrograms.

The learning ability of a neural network can vary with

changes in its structure (Wang and Teng, 1995). The sensitiv-

ity of the proposed method with respect to the network archi-

tecture was explored by (1) increasing the number of filters in

convolutional layers, and (2) replacing the pooling with strided

convolutions (Perol et al., 2018). We found no significant

variation in model performance in both cases, though the

Table 3

Classification Accuracies of the Earthquake Detection and

Seismic Phase Identification ConvNets Trained in the

Time–Frequency Domain

Training Set (%) Validation Set (%)

Earthquake detection accuracy

Seismic event 99.7 99.6

Noise 99.95 99.9

Phase identification accuracy

P-wave 99.1 98.7

S-wave 99.1 98.4

Noise 99.7 99.4

XL.MG03.HHE

XL.MG03.HHN

XL.MG03.HHZ

Detection score

Frequency (Hz)

P-wave

S-wave

02:26:27 02:26:47 02:27:07 02:27:27 02:27:47

XL.MG03.HHE

XL.MG03.HHN

XL.MG03.HHZ

(a)

(b)

(c)

(d)

(e)

0 0.2 0.4 0.6 0.8 1.0

▴Figure 5. (a) A 100-s-long time segment labeled as seismic event

by the earthquake detection network. The second network identi-

fies two separate events within this time segment using a 5-s-long

sliding window. Red and blue colors represent windows containing

Pand Swaves, respectively. Vertical red and blue lines mark the

P-andS-wave arrival times using their corresponding detection

scores, respectively. Dashed gray and black lines indicate the man-

ually picked Pand Swaves, respectively. (b) Continuous functions

of the P- (red) and S-wave (blue) detection scores. (c–e) The SWT

of waveform records presented in (a).

Table 5

Validation Set Confusion Matrix Calculated for the Phase

Identification Network Trained Based on the

Time–Frequency Feature Maps

Predicted Labels

True labels

P-wave S-wave Noise

P-wave 11,363 90 47

S-wave 58 11,421 21

Noise 87 91 11,322

812

100 ConvNet

Manual

Number of events

Time (hour)

▴Figure 6. Distribution of detected earthquakes over the period

of one day, on 5 September 2017, in the Dawson Creek area.

Convolutional network (ConvNet) detects a total number of 652

events (red bars), which is ∼20% more than the manually picked

earthquakes (gray bars).

Table 4

Validation Set Confusion Matrix Calculated for the

Earthquake Detection Network Trained Based

on the Time–Frequency Feature Maps

Predicted Labels

True labels

Seismic event Noise

Seismic event 37,460 40

Noise 142 36,882

Seismological Research Letters Volume XX, Number XX –2019 7

SRL Early Edition

Downloaded from https://pubs.geoscienceworld.org/ssa/srl/article-pdf/doi/10.1785/0220180308/4616095/srl-2018308.1.pdf

by Natural Resources Canada Library-Ottawa user

on 16 January 2019

training time dramatically rises with the increasing number of

free parameters (the former case). To avoid overtraining, a con-

stant dropout rate of 10% was applied to the input and con-

volutional layers, which forces the network to learn more

robust features by ignoring a random subset of neurons during

the training process. An optimum dropout rate provides the best

trade-off between the training data misfit and validation set

accuracy. The near-identical performance of the proposed model

on both training and validation sets reflects the generalization

power of the model beyond the existing earthquake catalog.

In addition to phase separation, the second classifier pro-

vides a means to estimate the phase arrival times, which are

required to determine earthquake location and other source

parameters. Ross et al. (2018) suggest that the location of maxi-

mum detection score of each phase can be used to measure its

approximate travel time if multiple successive windows identify

the same phase (see Fig. 5b). The maximum difference between

the manual picks and those obtained from the detection scores

is on the order of one half of the sliding window length

(Fig. 7). The accuracy of automatic time picking is generally

affected by the SNR, and weaker signals result in larger errors

(ⒺFig. S3). However, the implementation of the WT enables

noise reduction, which can further reduce the picking error and

possibly automate the earthquake location process (a topic

beyond the scope of this article, but we refer readers to

Mousavi et al., 2016, for more detailed discussion).

CONCLUSION

Based on recent advances in deep learning, we propose a

ConvNet algorithm for robust detection of seismic events to

address the shortcomings of existing methods. Our technique

relies on the detectability of earthquake signals in the time–fre-

quency domain and takes advantage of the spectral characteris-

tics of phase arrivals to separate Pand Swaves. In comparison

with manual detection, our technique can identify ∼20% more

events while significantly reducing the processing time and

improving the efficiency. In addition to event detection, this

approachprovides initial estimates of phase onset

times, which can be used to determine prelimi-

nary earthquake locations. Highly improved

accuracy and reliable rejection of false detections

are achieved by joint application of the event

detection and phase identification ConvNets.

The proposed approach can be potentially uti-

lized to enhance real-time monitoring of both

natural and induced seismicity.

DATA AND RESOURCES

The regional earthquake catalog used in this

study was compiled by the Geological Survey

of Canada (http://publications.gc.ca/site/eng/

9.856883/publication.html, last accessed

September 2018). Waveform data can be collected

from the Incorporated Research Institutions for

Seismology (IRIS) Data Management Center at https://

ds.iris.edu/ds/nodes/dmc/ (last accessed September 2018). We

used Mocha, a deep-learning framework for Julia, to train the

convolutional networks (the latest version of Mocha is available

at https://mochajl.readthedocs.io/en/latest/, last accessed

September 2018). Some figures were generated using the

Generic Mapping Tools (GMT) v.5.4.2 (www.soest.hawaii.edu/

gmt, last accessed September 2018; Wessel and Smith, 1998).

ACKNOWLEDGMENTS

The authors wish to thank Guest Editor Karianne Bergen and

two anonymous reviewers for useful comments and suggestions

that helped improve the quality of this article. This is Natural

Resources Canada (NRCan) Contribution Number 20180263.

REFERENCES

Ahmed, A., M. Sharma, and A. Sharma (2007). Wavelet based automatic

phase picking algorithm for 3-component broadband seismological

data, J. Seismol. Earthq. Eng. 9, nos. 1/2, 15–24.

Allen, R. V. (1978). Automatic earthquake recognition and timing from

single traces, Bull. Seismol. Soc. Am. 68, no. 5, 1521–1532.

Araya-Polo, M., J. Jennings, A. Adler, and T. Dahlke (2018). Deep-learn-

ing tomography, The Leading Edge 37, no. 1, 58–66.

Atkinson, G. M., D. W. Eaton, H. Ghofrani, D. Walker, B. Cheadle, R.

Schultz, R. Shcherbakov, K. Tiampo, J. Gu, R. M. Harrington, et al.

(2016). Hydraulic fracturing and seismicity in the western Canada

sedimentary basin, Seismol. Res. Lett. 87, no. 3, 631–647.

Barrett, S. A., and G. C. Beroza (2014). An empirical approach to sub-

space detection, Seismol. Res. Lett. 85, no. 3, 594–600.

Bengio, Y., A. Courville, and P. Vincent (2013). Representation learning:

A review and new perspectives, IEEE Trans. Pattern Anal. Mach.

Intell. 35, no. 8, 1798–1828.

Bergen, K., C. Yoon, and G. C. Beroza (2016). Scalable similarity search

in seismology: A new approach to large-scale earthquake detection,

International Conf. on Similarity Search and Applications, Tokyo,

Japan, 24–26 October 2016, 301–308.

British Columbia Oil and Gas Commission (2014). Investigation of

observed seismicity in the Montney Trend, available at http://

www.bcogc.ca/investigation‑observed‑seismicity‑montney‑trend (last

accessed December 2018).

(a) (b)

|tmanual - tpredicted| (s)

PP |tmanual - tpredicted| (s)

Normalized frequency (%)

0 0.5 1.0 1.5 2.0 2.5 3.0 3.50 0.5 1.0 1.5 2.0 2.5 3.0 3.5

Mean = 0.6 s

Standard deviation= 0.5 s

Mean = 0.8 s

Standard deviation= 0.7 s

▴Figure 7. Distributions of errors between manual time picks and those predicted

from the detection scores for (a) Pand (b) Swaves. The average error in predicted

P-wave time is less than that of the S-wave.

8 Seismological Research Letters Volume XX, Number XX –2019

SRL Early Edition

Downloaded from https://pubs.geoscienceworld.org/ssa/srl/article-pdf/doi/10.1785/0220180308/4616095/srl-2018308.1.pdf

by Natural Resources Canada Library-Ottawa user

on 16 January 2019

Caffagni, E., D. W. Eaton, J. P. Jones, and M. van der Baan (2016).

Detection and analysis of microseismic events using a Matched

Filtering Algorithm (MFA), Geophys. J. Int. 206, no. 1, 644–658.

Cireşan, D., U. Meier, J. Masci, and J. Schmidhuber (2012). Multi-column

deep neural network for traffic sign classification, Neural Networks

32, 333–338.

Collobert, R., J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, and P.

Kuksa (2011). Natural language processing (almost) from scratch,

J. Mach. Learn. Res. 12, 2493–2537.

Dahl, G. E., T. N. Sainath, and G. E. Hinton (2013). Improving deep neural

networks for LVCSR using rectified linear units and dropout,

2013 IEEE International Conf. on Acoustics, Speech and Signal

Processing (ICASSP), Vancouver, BC, Canada, 26–31 May 2013,

8609–8613.

Dahl, G. E., D. Yu, L. Deng, and A. Acero (2012). Context-dependent

pre-trained deep neural networks for large-vocabulary speech

recognition, IEEE Trans. Audio Speech Lang. Process. 20, no. 1,

30–42.

Daubechies, I., and S. Maes (1996). A nonlinear squeezing of the continu-

ous wavelet transform basedon auditory nerve models, in Wavelets in

Medicine and Biology, A. Aldroubi and M. Unser (Editors), CRC

Press, Boca Raton, Florida, 527–546.

Daubechies, I., J. Lu, and H.-T. Wu (2011). Synchrosqueezed wavelet

transforms: An empirical mode decomposition-like tool, Appl.

Comput. Harmon. Anal. 30, no. 2, 243–261.

Eisner, L., T. Fischer, and J. H. Le Calvez (2006). Detection of repeated

hydraulic fracturing (out-of-zone growth) by microseismic monitor-

ing, The Leading Edge 25, no. 5, 548–554.

Farabet, C., C. Couprie, L. Najman, and Y. LeCun (2013). Learning

hierarchical features for scene labeling, IEEE Trans. Pattern Anal.

Mach. Intell. 35, no. 8, 1915–1929.

Farahbod, A. M., H. Kao, D. M. Walker, and J. F. Cassidy (2015).

Investigation of regional seismicity before and after hydraulic frac-

turing in the Horn River basin, northeast British Columbia, Can. J.

Earth Sci. 52, no. 2, 112–122.

Galiana-Merino, J. J., J. L. Rosa-Herranz, and S. Parolai (2008). Seismic

P phase picking using a Kurtosis-based criterion in the stationary

wavelet domain, IEEE Trans. Geosci. Remote Sens. 46, no. 11,

3815–3826.

Gentili, S., and A. Michelini (2006). Automatic picking of P and S phases

using a neural tree, J. Seismol. 10, no. 1, 39–63.

Gibbons, S. J., and F. Ringdal (2006). The detection of low magnitude

seismic events using array-based waveform correlation, Geophys. J.

Int. 165, no. 1, 149–166.

Glorot, X., A. Bordes, and Y. Bengio (2011). Deep sparse rectifier neural

networks, Proceedings of the Fourteenth International Conf. on

Artificial Intelligence and Statistics, Fort Lauderdale, Florida, 11–

13 April 2011, 315–323.

Hinton, G., L. Deng, D. Yu, G. E. Dahl, A.-r. Mohamed, N. Jaitly, A. Senior,

V. Vanhoucke, P. Nguyen, T. N. Sainath, et al. (2012). Deep neural

networks for acoustic modeling in speech recognition: The shared

views of four research groups, IEEE Signal Process. Mag. 29, no. 6,

82–97.

Horner, R. B., J. E. Barclay, and J. M. MacRae (1994). Earthquakes and

hydrocarbon production in the Fort St. John area of northeastern

British Columbia, Can. J. Explor. Geophys. 30, no. 1, 39–50.

Krizhevsky, A., I. Sutskever, and G. E. Hinton (2012). Imagenet classi-

fication with deep convolutional neural networks, Advances in

Neural Information Processing Systems, Lake Tahoe, Nevada, 3–6

December 2012, 1097–1105.

LeCun, Y.,Y. Beng io, and G. Hinton (2015). Deep learning , Nature 521,

no. 7553, 436.

LeCun,Y.,B.E.Boser, J.S.Denker,D.Henderson,R.E.Howard,W.E.

Hubbard, and L. D. Jackel (1990). Handwritten digit recognition with

a back-propagation network, Advances in Neural Information Processing

Systems,Denver,Colorado,27–30 November 1989, 396–404.

LeCun, Y., L. Bottou, Y. Bengio, and P. Haffner (1998). Gradient-based

learning applied to document recognition, Proc. IEEE 86, no. 11,

2278–2324.

Maity, D., F. Aminzadeh, and M. Karrenbach (2014). Novel hybrid arti-

ficial neural network based autopicking workflow for passive seismic

data, Geophys. Prospect. 62, no. 4, 834–847.

Mikolov, T., I. Sutskever, K. Chen, G. S. Corrado, and J. Dean (2013).

Distributed representations of words and phrases and their compo-

sitionality, Advances in Neural Information Processing Systems, Lake

Tahoe, Nevada, 5–10 December 2013, 3111–3119.

Moseley, B., A. Markham, and T. Nissen-Meyer (2018). Fast approximate

simulation of seismic waves with deep learning , available at https://

arxiv.org/abs/1807.06873 (last accessed December 2018).

Mousavi, S. M., C. A. Langston, and S. P. Horton (2016). Automatic

microseismic denoising and onset detection using the synchros-

queezed continuous wavelet transform, Geophysics 81, no. 4,

V341–V355.

Nair, V., and G. E. Hinton (2010). Rectified linear units improve

restricted Boltzmann machines, Proc. of the 27th International Conf.

on Machine Learning (ICML-10), Haifa, Israel, 21–24 June 2010,

807–814.

Najafabadi, M. M., F. Villanustre, T. M. Khoshgoftaar, N. Seliya, R. Wald,

and E. Muharemagic (2015). Deep learning applications and chal-

lenges in big data analytics, J. Big Data 2, no. 1, 1.

Perol, T., M. Gharbi, and M. Denolle (2018). Convolutional neural net-

work for earthquake detection and location, Sci. Adv. 4, no. 2,

e1700578.

Peterson, C., and B. Söderberg (1989). A new method for mapping

optimization problems onto neural networks, Int. J. Neural Syst.

1, no. 1, 3–22.

Portnoff, M. (1980). Time–frequency representation of digital signals

and systems based on short-time Fourier analysis, IEEE Trans.

Acoust. Speech Signal Process. 28, no. 1, 55–69.

Ross, Z. E., M. A. Meier, E. Hauksson, and T. H. Heaton (2018).

Generalized seismic phase detection with deep learning, available

at https://arxiv.org/abs/1805.01075 (last accessed December

2018).

Rubinstein, J. L., and A. B. Mahani (2015). Myths and facts on waste-

water injection, hydraulic fracturing, enhanced oil recovery, and

induced seismicity, Seismol. Res. Lett. 86, no. 4, 1060–1067.

Schultz, R., V. Stern, and Y. J. Gu (2014). An investigation of seismicity

clustered near the Cordel Field, west central Alberta, and its

relation to a nearby disposal well, J. Geophys. Res. 119, no. 4,

3410–3423.

Serre, T., L. Wolf, and T. Poggio (2005). Object recognition with features

inspired by visual cortex, IEEE Computer Society Conf. on Computer

Vision and Pattern Recognition 2005 (CVPR 2005), Vol. 2, San

Diego, California, 20–25 June 2005, 994–1000.

Skoumal, R. J., M. R. Brudzinski, and B. S. Currie (2015). Earthquakes

induced by hydraulic fracturing in Poland Township, Ohio, Bull.

Seismol. Soc. Am. 105, no. 1, 189–197.

Vaezi, Y., and M. Van der Baan (2015). Comparison of the STA/LTA and

power spectral density methods for microseismic event detection,

Mon. Not. R. Astron. Soc. 203, no. 3, 1896–1908.

Visser, R., B. Smith, H. Kao, A. Babaie Mahani, J. Hutchinson, and

J. E. McKay (2017). A comprehensive earthquake catalogue for

northeastern British Columbia and western Alberta, 2014–2016,

Geol. Surv. of Canada, Open-File 8335,doi:10.4095/306292.

Wang , J., and T.-L. Teng (1995). Artificial neural network-based seismic

detector, Bull. Seismol. Soc. Am. 85, no. 1, 308–319.

Wang, N., and D.-Y. Yeung (2013). Learning a deep compact image rep-

resentation for visual tracking, Advances in Neural Information

Processing Systems, Lake Tahoe, Nevada, 5–10 December 2013,

809–817.

Wessel, P., and W. H. Smith (1998). New, improved version of Generic

Mapping Tools released, Eos Trans. AGU 79, no. 47, 579.

Seismological Research Letters Volume XX, Number XX –2019 9

SRL Early Edition

Downloaded from https://pubs.geoscienceworld.org/ssa/srl/article-pdf/doi/10.1785/0220180308/4616095/srl-2018308.1.pdf

by Natural Resources Canada Library-Ottawa user

on 16 January 2019

Withers, M., R. Aster, C. Young, J. Beiriger, M. Harris, S. Moore, and J.

Trujillo (1998). A comparison of select trigger algorithms for auto-

mated global seismic phase and event detection, Bull. Seismol. Soc.

Am. 88, no. 1, 95–106.

Yang, J., K. Yu, and T. Huang (2010). Supervised translation-invariant

sparse coding, 2010 IEEE Conf. on Computer Vision and Pattern

Recognition (CVPR), San Francisco, California, 13–18 June

2010, 3517–3524.

Yoon, C. E., O. O'Reilly, K. J. Bergen, and G. C. Beroza (2015).

Earthquake detection through computationally efficient similarity

search, Sci. Adv. 1, no. 11, e1501057.

Zhang, H., C. Thurber, and C. Rowe (2003). Automatic P-wave arrival

detection and picking with multiscale wavelet analysis for single-com-

ponent recordings, Bull. Seismol. Soc. Am. 93, no. 5, 1904–1912.

Zhu, W., and G. C. Beroza (2018). PhaseNet: A deep-neural-network-

based seismic arrival time picking method, available at http://arxiv

.org/abs/1803.03211v1 (last accessed December 2018).

Ramin M. H. Dokht

Honn Kao

Ryan Visser

Brindley Smith

Pacific Geoscience Centre

Natural Resources Canada

Geological Survey of Canada

9860 West Saanich Road

Sidney, British Columbia

Canada V8L 4B2

ramin.mohammadhosseinidokht@canada.ca

Published Online 16 January 2019

10 Seismological Research Letters Volume XX, Number XX –2019

SRL Early Edition

Downloaded from https://pubs.geoscienceworld.org/ssa/srl/article-pdf/doi/10.1785/0220180308/4616095/srl-2018308.1.pdf

by Natural Resources Canada Library-Ottawa user

on 16 January 2019

Colombian Seismic Monitoring Using Advanced Machine-Learning Algorithms

Article

Full-text available

May 2024

Colombian Seismic Monitoring Using Advanced Machine-Learning Algorithms

Article

May 2024

Seismic networks worldwide are designed to monitor seismic ground motion. This process includes identifying seismic events in the signals, picking and associating seismic phases, determining the event’s location, and calculating its magnitude. Although machine-learning (ML) methods have shown significant improvements in some of these steps individually, there are other stages in which traditional non-ML algorithms outperform ML approaches. We introduce SeisMonitor, a Python open-source package to monitor seismic activity that uses ready-made ML methods for event detection, phase picking and association, and other well-known methods for the rest of the steps. We apply these steps in a totally automated process for almost 7 yr (2016–2022) in three seismic networks located in Colombian territory, the Colombian seismic network and two local and temporary networks in northern South America: the Middle Magdalena Valley and the Caribbean-Mérida Andes seismic arrays. The results demonstrate the reliability of this method in creating automated seismic catalogs, showcasing earthquake detection capabilities and location accuracy similar to standard catalogs. Furthermore, it effectively identifies significant tectonic structures and emphasizes local crustal faults. In addition, it has the potential to enhance earthquake processing efficiency and serve as a valuable supplement to manual catalogs, given its ability at detecting minor earthquakes and aftershocks.

Seismic Signal Discrimination of Earthquakes and Quarry Blasts in North-East Italy Using Deep Neural Networks

Article

Full-text available

Apr 2024
PURE APPL GEOPHYS

Separation of seismic sources of seismic events such as earthquakes and quarry blasts is a complex task and, in most cases, require manual inspection. In this study, artificial neural network models are developed to automatically identify the events that occurred in North-East Italy, where earthquakes and quarry blasts may share the same area. Due to the proximity of the locations of the active fault lines and mining sites, many blasts are registered as earthquakes that can contaminate earthquake catalogues. To be able to differentiate various sources of seismic events 11,821 seismic records from 1463 earthquakes detected by various seismic networks and 9822 seismic records of 727 blasts manually labelled by the Slovenian Environment Agency are used. Three-component seismic records with 90 s length and their frequency contents are used as an input. Ten different models are created by changing various features of the neural networks. Regardless of the features of the created models, results show that accuracy rates are always around 99 %. The performance of our models is compared with a previous study that also used artificial neural networks. It is found that our models show significantly better performance with respect to the models developed by the previous study which performs badly due to differences in the data. Our models perform slightly better than the new model created by using our dataset, but with the previous study’s architecture. Developed model can be useful for the discrimination of the earthquakes from quarry blasts in North-East Italy, which may help us to monitor seismic events in the region.

The Matrix Profile in Seismology: Template Matching of Everything With Everything

Article

Full-text available

Feb 2024

Template matching has proven to be an effective method for seismic event detection, but is biased toward identifying events similar to previously known events, and thus is ineffective at discovering events with non‐matching waveforms (e.g., those dissimilar to existing catalog events). In principle, this limitation can be overcome by cross‐correlating every segment (possible template) of a seismogram with every other segment to identify all similar event pairs, but doing so has been previously considered computationally infeasible for long time series. Here we describe a method, called the ‘Matrix Profile’ (MP), a “correlate everything with everything” calculation that can be efficiently and scalably computed. The MP returns the maximum value of the correlation coefficient of every sub‐window of continuous data with every other sub‐window, as well as the best‐correlated sub‐window location. Here we show how MP methods can obtain valuable results when applied to months and years of continuous seismic data in both local and global case studies. We find that the MP can identify many new events in Parkfield, California seismicity that are not contained in existing event catalogs and that it can efficiently find clusters of similar earthquakes in global seismic data. Either used by itself, or as a starting point for subsequent template matching calculations, the MP is likely to provide a useful new tool for seismology research.

A Seismic Phase-Picking Model Transfer Learning Approach Based on Maximum Mean Discrepancy

Conference Paper

Dec 2023

Noise source localization using deep learning

Article

Full-text available

May 2024

Ambient noise source localization is of great significance for estimating seismic noise source distribution, understanding source mechanisms and imaging subsurface structures. The commonly used methods for source localization, such as the matched field processing and the full-waveform inversion, are time-consuming and not applicable for time-lapse monitoring of the noise source distribution. We propose an efficient alternative of using deep learning for noise source localization. In the neural network, the input data are noise cross-correlation functions and the output are matrices containing the information of noise source distribution. It is assumed that the subsurface structure is a horizontally layered earth model and the model parameters are known. A wavefield superposition method is employed to efficiently simulate ambient noise data with quantities of local noise sources labelled as training datasets. We use a weighted binary cross-entropy loss function to address the prediction inaccuracy caused by a sparse label matrix during training. The proposed deep learning framework is validated by synthetic tests and two field data examples. The successful applications to locate an anthropogenic noise source and a carbon dioxide (CO2) degassing area demonstrate the accuracy and efficiency of the proposed deep learning method for noise source localization, which has great potential for monitoring the changes of the noise source distribution in a survey area.

CREDIT-X1local: A reference dataset for machine learning seismology from ChinArray in Southwest China

Article

Apr 2024

Recent advances in earthquake seismology using machine learning

Article

Full-text available

Feb 2024
EARTH PLANETS SPACE

Given the recent developments in machine-learning technology, its application has rapidly progressed in various fields of earthquake seismology, achieving great success. Here, we review the recent advances, focusing on catalog development, seismicity analysis, ground-motion prediction, and crustal deformation analysis. First, we explore studies on the development of earthquake catalogs, including their elemental processes such as event detection/classification, arrival time picking, similar waveform searching, focal mechanism analysis, and paleoseismic record analysis. We then introduce studies related to earthquake risk evaluation and seismicity analysis. Additionally, we review studies on ground-motion prediction, which are categorized into four groups depending on whether the output is ground-motion intensity or ground-motion time series and the input is features (individual measurable properties) or time series. We discuss the effect of imbalanced ground-motion data on machine-learning models and the approaches taken to address the problem. Finally, we summarize the analysis of geodetic data related to crustal deformation, focusing on clustering analysis and detection of geodetic signals caused by seismic/aseismic phenomena. Graphical Abstract

Seismic Phase Picking Using Synchrosqueezed Transform and Attention Mechanism

Conference Paper

Oct 2023

Phase Neural Operator for Multi‐Station Picking of Seismic Arrivals

Article

Full-text available

Dec 2023
GEOPHYS RES LETT

Plain Language Summary Earthquake monitoring often involves measuring arrival times of P‐ and S‐waves of earthquakes from continuous seismic data. With the advancement of artificial intelligence, state‐of‐the‐art phase picking methods use deep neural networks to examine seismic data from each station independently; this is in stark contrast to the way that human experts annotate seismic data, in which waveforms from the whole network containing multiple stations are examined simultaneously. With the performance gains of single‐station algorithms approaching saturation, it is clear that meaningful future advances will require algorithms that can naturally examine data for entire networks at once. Here we introduce a multi‐station phase picking algorithm based on a recently developed machine learning paradigm called Neural Operator. Our algorithm, called Phase Neural Operator, leverages the spatial‐temporal information of earthquake signals from an input seismic network with arbitrary geometry. This results in superior performance over leading baseline algorithms by detecting many more earthquakes, picking many more seismic wave arrivals, yet also greatly improving measurement accuracy.

PhaseNet: A Deep-Neural-Network-Based Seismic Arrival Time Picking Method

Article

Full-text available

Mar 2018

As the number of seismic sensors grows, it is becoming increasingly difficult for analysts to pick seismic phases manually and comprehensively, yet such efforts are fundamental to earthquake monitoring. Despite years of improvements in automatic phase picking, it is difficult to match the performance of experienced analysts. A more subtle issue is that different seismic analysts may pick phases differently, which can introduce bias into earthquake locations. We present a deep-neural-network-based arrival-time picking method called "PhaseNet" that picks the arrival times of both P and S waves. Deep neural networks have recently made rapid progress in feature learning, and with sufficient training, have achieved super-human performance in many applications. PhaseNet uses three-component seismic waveforms as input and generates probability distributions of P arrivals, S arrivals, and noise as output. We engineer PhaseNet such that peaks in probability provide accurate arrival times for both P and S waves, and have the potential to increase the number of S-wave observations dramatically over what is currently available. This will enable both improved locations and improved shear wave velocity models. PhaseNet is trained on the prodigious available data set provided by analyst-labeled P and S arrival times from the Northern California Earthquake Data Center. The dataset we use contains more than seven million waveform samples extracted from over thirty years of earthquake recordings. We demonstrate that PhaseNet achieves much higher picking accuracy and recall rate than existing methods.

A comprehensive earthquake catalogue for northeastern British Columbia and western Alberta, 2014–2016

Technical Report

Full-text available

Dec 2017

To gain a better understanding of induced seismicity in northeast British Columbia and western Alberta, we conducted an intensive analysis of seismic data to locate earthquakes that occurred within the area of 52˚N–61˚N, 126˚W–115˚W for the years of 2014 through 2016. Continuous seismic waveforms from as many as 43 stations operated by various organizations in the region were used in this study. A total of 5478 events were identified and located; but only 4916 solutions were deemed acceptable by our quality criteria. The number of earthquakes in our final catalogue is approximately three times the base level of the Canadian National Seismograph Network catalogue. In this report, we describe in detail our location procedures and how each source parameter (origin time, epicenter, focal depth, and magnitude) is determined. The earthquake catalogue is summarized in a table, while the phase picking data for individual events are presented in an ASCII file as a supplement to this report. The total numbers of events in 2014, 2015, and 2016 are 1287, 1575, and 2057, respectively. The overall magnitude of completeness of our catalogue is ML 1.8, an improvement from the value of 2.3 for the CNSN catalogue.

Deep-learning tomography

Article

Full-text available

Jan 2018

Velocity-model building is a key step in hydrocarbon exploration. The main product of velocity-model building is an initial model of the subsurface that is subsequently used in seismic imaging and interpretation workflows. Reflection or refraction tomography and full-waveform inversion (FWI) are the most commonly used techniques in velocity-model building. On one hand, tomography is a time-consuming activity that relies on successive updates of highly human-curated analysis of gathers. On the other hand, FWI is very computationally demanding with no guarantees of global convergence. We propose and implement a novel concept that bypasses these demanding steps, directly producing an accurate gridding or layered velocity model from shot gathers. Our approach relies on training deep neural networks. The resulting predictive model maps relationships between the data space and the final output (particularly the presence of high-velocity segments that might indicate salt formations). The training task takes a few hours for 2D data, but the inference step (predicting a model from previously unseen data) takes only seconds. The promising results shown here for synthetic 2D data demonstrate a new way of using seismic data and suggest fast turnaround of workflows that now make use of machine-learning approaches to identify key structures in the subsurface.

Convolutional Neural Network for Earthquake Detection and Location

Article

Full-text available

Feb 2017

The recent evolution of induced seismicity in Central United States calls for exhaustive catalogs to improve seismic hazard assessment. Over the last decades, the volume of seismic data has increased exponentially, creating a need for efficient algorithms to reliably detect and locate earthquakes. Today's most elaborate methods scan through the plethora of continuous seismic records, searching for repeating seismic signals. In this work, we leverage the recent advances in artificial intelligence and present ConvNetQuake, a highly scalable convolutional neural network for earthquake detection and location from a single waveform. We apply our technique to study the induced seismicity in Oklahoma (USA). We detect 20 times more earthquakes than previously cataloged by the Oklahoma Geological Survey. Our algorithm is orders of magnitude faster than established methods.

Automatic microseismic denoising and onset detection using the synchrosqueezed continuous wavelet transform

Article

Full-text available

Jul 2016

Typical microseismic data recorded by surface arrays are characterized by low signal-to-noise ratios (S/Ns) and highly nonstationary noise that make it difficult to detect small events. Currently, array or crosscorrelation-based ap-proaches are used to enhance the S/N prior to processing. We have developed an alternative approach for S/N improve-ment and simultaneous detection of microseismic events. The proposed method is based on the synchrosqueezed continuous wavelet transform (SS-CWT) and custom thresholding of sin-gle-channel data. The SS-CWT allows for the adaptive filter-ing of time-and frequency-varying noise as well as offering an improvement in resolution over the conventional wavelet transform. Simultaneously, the algorithm incorporates a de-tection procedure that uses the thresholded wavelet coeffi-cients and detects an arrival as a local maxima in a characteristic function. The algorithm was tested using a syn-thetic signal and field microseismic data, and our results have been compared with conventional denoising and detection methods. This technique can remove a large part of the noise from small-amplitudes signal and detect events as well as es-timate onset time.

Detection and analysis of microseismic events using a Matched Filtering Algorithm (MFA)

Article

Full-text available

May 2016
GEOPHYS J INT

A new Matched Filtering Algorithm (MFA) is proposed for detecting and analyzing microseismic events recorded by downhole monitoring of hydraulic fracturing. This method requires a set of well-located template (‘parent’) events, which are obtained using conventional microseismic processing and selected on the basis of high signal-to-noise (S/N) ratio and representative spatial distribution of the recorded microseismicity. Detection and extraction of ‘child’ events are based on stacked, multi-channel cross-correlation of the continuous waveform data, using the parent events as reference signals. The location of a child event relative to its parent is determined using an automated process, by rotation of the multi-component waveforms into the ray-centered co-ordinates of the parent and maximizing the energy of the stacked amplitude envelope within a search volume around the parent's hypocentre. After correction for geometrical spreading and attenuation, the relative magnitude of the child event is obtained automatically using the ratio of stacked envelope peak with respect to its parent. Since only a small number of parent events require interactive analysis such as picking P- and S-wave arrivals, the MFA approach offers the potential for significant reduction in effort for downhole microseismic processing. Our algorithm also facilitates the analysis of single-phase child events, i.e. microseismic events for which only one of the S- or P-wave arrival is evident due to unfavorable S/N conditions. A real-data example using microseismic monitoring data from 4 stages of an open-hole slickwater hydraulic fracture treatment in western Canada demonstrates that a sparse set of parents (in this case, 4.6 per cent of the originally located events) yields a significant (more than four-fold increase) in the number of located events compared with the original catalog. Moreover, analysis of the new MFA catalog suggests that this approach leads to more robust interpretation of the induced microseismicity and novel insights into dynamic rupture processes based on the average temporal (foreshock-aftershock) relationship of child events to parents.

Generalized Seismic Phase Detection with Deep Learning

Article

Aug 2018

To optimally monitor earthquake‐generating processes, seismologists have sought to lower detection sensitivities ever since instrumental seismic networks were started about a century ago. Recently, it has become possible to search continuous waveform archives for replicas of previously recorded events (i.e., template matching), which has led to at least an order of magnitude increase in the number of detected earthquakes and greatly sharpened our view of geological structures. Earthquake catalogs produced in this fashion, however, are heavily biased in that they are completely blind to events for which no templates are available, such as in previously quiet regions or for very large‐magnitude events. Here, we show that with deep learning, we can overcome such biases without sacrificing detection sensitivity. We trained a convolutional neural network (ConvNet) on the vast hand‐labeled data archives of the Southern California Seismic Network to detect seismic body‐wave phases. We show that the ConvNet is extremely sensitive and robust in detecting phases even when masked by high background noise and when the ConvNet is applied to new data that are not represented in the training set (in particular, very large‐magnitude events). This generalized phase detection framework will significantly improve earthquake monitoring and catalogs, which form the underlying basis for a wide range of basic and applied seismological research.

Deep Sparse Rectifier Neural Networks

Article

Jan 2011
J MACH LEARN RES

While logistic sigmoid neurons are more biologically plausable that hyperbolic tangent neurons, the latter work better for training multi-layer neural networks. This paper shows that rectifying neurons are an even better model of biological neurons and yield equal or better performance than hyperbolic tangent networks in spite of the hard non-linearity and non-differentiability at zero, creating sparse representations with true zeros, which seem remarkably suitable for naturally sparse data. Even though they can take advantage of semi-supervised setups with extra-unlabelled data, deep rectifier networks can reach their best performance without requiring any unsupervised pre-training on purely supervised tasks with large labelled data sets. Hence, these results can be seen as a new milestone in the attempts at understanding the difficulty in training deep but purely supervised nueral networks, and closing the performance gap between neural networks learnt with and without unsupervised pre-training

Imagenet classification with deep convolutional neural networks

Conference Paper

Jan 2012

We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 dif- ferent classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implemen- tation of the convolution operation. To reduce overfitting in the fully-connected layers we employed a recently-developed regularization method called dropout that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry

Conference Paper

Oct 2016

Extracting earthquake signals from continuous waveform data recorded by networks of seismic sensors is a critical and challenging task in seismology. Earthquakes occur infrequently in long-duration data and may produce weak signals, which are challenging to detect while limiting the number of false discoveries. Earthquake detection based on waveform similarity has demonstrated success in detecting weak signals from small events, but existing techniques either require prior knowledge of the event waveform or have poor scaling properties that limit use to small data sets. In this paper, we describe ongoing research into the use of similarity search for large-scale earthquake detection. We describe Fingerprint and Similarity Thresholding (FAST), a new earthquake detection method that leverages locality-sensitive hashing to enable waveform-similarity-based earthquake detection in long-duration continuous seismic data. We demonstrate the detection capability of FAST and compare different fingerprinting schemes by performing numerical experiments on test data, with an emphasis on false alarm reduction.

Seismic Event and Phase Detection Using Time-Frequency Representation and Convolutional Neural Networks

Abstract and Figures

Recommended publications

Automatic P-wave picking using undecimated wavelet transform

Reactivation of an Intraplate Fault by Mine‐Blasting Events: Implications to Regional Seismic Hazard...

CCFE: A Few-Shot Learning Model for Earthquake Detection and Phase Identification

Combining Deep Learning and the Source-Scanning Algorithm for Improved Seismic Monitoring

A comprehensive earthquake catalogue for the Fort St. John-Dawson Creek region, British Columbia, 20...