Content uploaded by Mario Ruiz
Author content
All content in this area was uploaded by Mario Ruiz on Jun 03, 2020
Content may be subject to copyright.
IEEE TRANSACTIONS ON GEOSCIENCES AND REMOTE SENSING , VOL. XX, NO. X, NOV 2016 1
Automatic Recognition of Long Period Events from
Volcano Tectonic Earthquakes at Cotopaxi Volcano
Rom´
an Lara-Cueva, Member, IEEE, Diego S. Ben´
ıtez, Senior Member, IEEE, Enrique V. Carrera, Member, IEEE,
Mario Ruiz, and Jos´
e Luis Rojo- ´
Alvarez, Senior Member, IEEE
Abstract
Geophysics experts are interested in understanding the behavior of volcanoes and forecasting possible eruptions, by monitoring
and detecting the increment on volcano-seismic activity, with the aim of safeguarding human lives and material losses. This work
presents an automatic volcanic event detection and classification system, which considers feature extraction and feature selection
stages, in order to reduce the processing time towards a reliable real-time volcano early warning system (RT-VEWS). We built
the proposed approach in terms of the seismicity presented during 2009 and 2010 at Cotopaxi Volcano located in Ecuador. In
the detection stage, the recordings were time-segmented by using a non-overlapping 15-second window, and in the classification
stage, the detected seismic signals were 1min long. For each detected signal conveying seismic events, a comprehensive set of
statistical, temporal, spectral, and scale-domain features were compiled and extracted, aiming to separate Long Period (LP) events
from Volcano Tectonic (VT) earthquakes. We benchmarked two commonly used types of feature selection techniques, namely,
wrapper (recursive feature extraction) and embedded (cross-validation and pruning). Each technique was used within a suitable
and appropriate classification algorithm, either Support Vector Machine (SVM) or Decision Trees. The best result was obtained
by using the SVM classifier, yielding up to 99%accuracy in the detection stage, and 97%accuracy and sensitivity in the event
classification stage. Selected features and their interpretation were consistent among different input spaces in simple terms of
spectral content of the frequency bands at 3.1 and 6.8 Hz. A comparative analysis showed that the most relevant features for
automatic discrimination between LP and VT events were 1 in the time domain, 5 in the frequency domain, and 9 in the scale
domain. Our study provides the framework for an event classification system with high accuracy and reduced computational
requirements, according to the orientation towards a future RT-VEWS.
Index Terms
Volcanic Events, Seismic Event Detection, Seismic Event Classification, Support Vector Machines, Decision Trees, Feature
Extraction, Feature Selection.
I. INT ROD UC TI ON
In the case of natural disasters or anthropogenic hazards, early warning systems are necessary in order to safe human lives
[1], [2]. One of the main natural hazards are volcanic eruptions [3], and in this regard, several monitoring systems have been
Manuscript received April 7, 2015. RALC and EVC are with Departamento de El´
ectrica y Electr´
onica, Universidad de las Fuerzas Armadas ESPE, Quito-
Ecuador, 171-5-231B (see http://wicom.espe.edu.ec/contactos.html). DSB is with Colegio de Ciencias e Ingenier´
ıas El Polit´
ecnico, Universidad San Francisco
de Quito, Quito, Ecuador. MR is with Instituto Geof´
ısico, Escuela Polit´
ecnica Nacional, Quito, Ecuador. JLRA is with Department of Signal Theory and
Communications, Rey Juan Carlos University, Fuenlabrada-Spain.
This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.
The final version of record is available at http://dx.doi.org/10.1109/TGRS.2016.2559440
Copyright (c) 2016 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing pubs-permissions@ieee.org.
IEEE TRANSACTIONS ON GEOSCIENCES AND REMOTE SENSING , VOL. XX, NO. X, NOV 2016 2
previously developed and deployed around the world for trying to better understand this phenomenon [4]. The seismic signature
of volcanic events are registered today by powerful seismometers or geophones, and the main types of seismic signals on active
volcanoes are Volcano Tectonic (VT) earthquakes, Long Period (LP) events, Tremors (TRE), and Hybrid (HYB) events [5], [6].
Figures 2(c) and 2(d) show signal examples of VT and LP events originated at Cotopaxi Volcano. VT events are earthquakes
taking place in a volcanic environment with a variable time duration, usually less than 30s and wide spectral content, typically
above 5 Hz. Whilst, LP events, also known as low frequency events, show a lack of distinguishable phases and a typical time
duration below 90s, with spectral content limited at narrow frequency bands between 2 and 5 Hz. HYB earthquakes share the
features of VT and LP events, and they are characterized by high frequency signals, usually with a wide spectral bandwidth
above 10 Hz. Finally, TRE are characterized by their constant amplitude and long duration, they are the most distinctive signals
generated by volcanoes, and they are widely considered as the most complex types of events, with duration ranging from a
few minutes to several days [7].
Our main interest is to help geophysicists to understand the behavior of volcanoes and to try to predict eruptions when
possible. For this purpose, we designed a volcano monitoring system that includes a high accuracy event detection followed by
a classification component. A Volcano Early Warning System (VEWS) based on our proposal may allow authorities to alert the
population, as soon as possible, about the risk of eventual disasters. In this setting, and as a first step to the future development
of a real-time VEWS (RT-VEWS), the aim of this work is to build two connected systems, namely, an automatic detection
system for volcanic seismicity, and subsequently followed by a high-accuracy event classification system. Our hypothesis is that
the use of carefully designed detector and classifier systems, both of them based on appropriate feature extraction and machine
learning techniques, will reduce the processing time currently required by human experts for examination of the seismic signals.
Specifically, we addressed the design of an automatic system for first separating LP and VT events from Background Noise
(BN), and then identifying LP from VT with the highest possible accuracy. We are mainly interested in LP and VT events,
since these types of events are crucial for helping to forecast eruptions [8]–[10]. The classification of other types of events is
beyond the scope of this work, but it could be readily addressed with our proposed approach.
After literature review, a general procedure was defined and designed, in order to process, analyze, and identify volcano-
seismic signals. We used a preprocessing stage, which removes noise and non-volcanic originated signals like thunders, by
using band pass filters [11]. Then, a stage capable of detecting volcanic events is proposed [12], and it is followed by a
feature extraction [13] and feature selection stage [14]. Following the state of art of preceding works in geophysics, the use
of filter and embedded methods allowed us to determine the main features in the time, frequency, and scale domains. Finally,
a classification stage is used to identify VT and LP events [15]. In this setting, we benchmarked two commonly used feature
selection techniques, namely, wrapper and embedded, each of them in a suitable and appropriate classification algorithm,
namely, Support Vector Machines (SVM) and Decision Trees (DT), as later discussed with detail.
There are two main contributions from this paper. First, the whole system (i.e., event detection and classification) is based on
machine learning techniques, and our proposed approach applies a two-stage solution, consisting of event detection followed
by event-classification. Previous works reported in the literature [16]–[19] did not apply any detection strategy based on
classification, but instead they just used sliding windows (with fixed or variable length) and a ratio triggering algorithm for
This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.
The final version of record is available at http://dx.doi.org/10.1109/TGRS.2016.2559440
Copyright (c) 2016 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing pubs-permissions@ieee.org.
IEEE TRANSACTIONS ON GEOSCIENCES AND REMOTE SENSING , VOL. XX, NO. X, NOV 2016 3
event detection. Therefore our proposed specific event detector based in classification techniques is novel for this application. In
this setting, we also employ feature extraction and feature selection techniques in order to reduce the processing time towards
a reliable real-time system. As a second contribution, we focus our analysis to the data obtained from the Cotopaxi volcano.
Although there are several studies about seismological events at Cotopaxi volcano, including lahars studies or very-long-period
and long-period events activity analysis [20], [21], none of them focused in algorithms for automatic real-time detection and
classification of seismic events. To our best knowledge, this is the first try to create an automatic process for signals from such
volcano.
The rest of the paper is organized as follows. Section II summarizes previous works and results about the automatic
classification of seismic events by using machine learning techniques. Section III describes the proposed approach and the
experimental study including event detection, feature extraction, feature selection, and event classification. Section IV presents
the results obtained in the automatic classification process when considering different classifiers. Finally, Section V presents
the discussion and the main conclusions.
II. RE LATE D WOR K
Our study is referred to Cotopaxi, an active volcano which is part of the so-called Ring of Fire, located at latitude 0◦41’05”
S and longitude 78◦25’54.8” W in the Andean mountain region of Ecuador. On this volcano, a monitoring system has been
previously deployed by the Instituto Geof´
ısico de la Escuela Polit´
ecnica Nacional (IGEPN) (see Fig. 1), which currently has
installed: (a) six short period (SP) seismological stations (PITA, NAS2, VC1, REF, CAMI, and TAM), four of them with
vertical-axis sensors and two of them with three-axis sensors, and all of them with a frequency response range of 1–50 Hz; (b)
six broadband (BB) stations (VC2, REF, NAS, TAM, MORU, and VCES), with a frequency response range of 0.1–50 Hz [22].
Currently, expert scientists must analyze seismograms of volcanic signals by visual inspection, in order to label and classify
events, which represents a subjective and hard work demanding process. This task also requires a significant amount of time
and noteworthy experience. A reliable automatic classification system can significantly reduce the effort required to make a
faster and objective classification [16], [17].
In this context, a strategy to classify VT from other non-volcanic originated events, such as quarry blasts (QB) and thunders,
was proposed for the Vesuvius Volcano (Italy) [18]. In that work, a feature extraction strategy based on Linear Predictive
Coding was developed, yielding a set of 60 parameters of temporal and spectral characteristics. By using Artificial Neural
Networks (ANN) as classifier, the system reached a 100%accuracy separating VT from QB. Similarly, in [23] a Hidden
Markov Model (HMM) classifier was applied to data from San Crist´
obal Volcano (Nicaragua), in order to identify LP from
two types of explosions, and BN in raw seismic data, yielding an 80%accuracy. Another system was proposed in [24], in which
a fuzzy algorithm was used for automatic classification of local and regional earthquakes, and two non-volcanic originated
events, namely, QB and machinery noise. Six main features were considered in the time and frequency domains, yielding a
96%classification accuracy.
A feature selection strategy was developed in [25] for Nevado del Ruiz Volcano (Colombia), considering VT, LP, and Ice
Quake. Two classifiers were considered, based on parametric Power Spectral Density (PSD) estimation of events. A Bayesian
This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.
The final version of record is available at http://dx.doi.org/10.1109/TGRS.2016.2559440
Copyright (c) 2016 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing pubs-permissions@ieee.org.
IEEE TRANSACTIONS ON GEOSCIENCES AND REMOTE SENSING , VOL. XX, NO. X, NOV 2016 4
Fig. 1. Seismological stations deployed at Cotopaxi Volcano.
classifier built on dissimilarity representations and a kNearest Neighbor (k-NN) classifier were used, obtaining accuracy rates
of 81%and 84%, respectively. Whilst in [26], authors considered HB, LP, TRE and VT events, and worked on the stochastic
variability of a wide set of time-variant features. With this approach, the classification rate improved from 78%to 88%when
k-NN was used instead of HMM. In [27], several feature sets proposed in [28]–[32] were benchmarked in order to discriminate
HYB, LP, TRE, and VT events, yielding the best accuracy of 83%when a k-NN classifier was used.
Meanwhile in [28], a segmentation window of 30s and a combination of ANN with Genetic Algorithms were applied
to data from Villarica Volcano (Chile), obtaining a baseline recognition rate of 93%, by using an input space of temporal
and spectral features when considering LP, TRE, and Energetic Tremor events. In [33], the seismic signals were segmented
with a rectangular window of 1min, and a feature extraction strategy was applied by using circular statistics, obtaining the
instantaneous phase and the mean energy of the events with Hilbert and Wavelet transforms, in order to identify LP events
and VT earthquakes at Llaima Volcano (Chile). A linear discriminant classifier reached a 92.5%accuracy. In this setting, a
SVM classifier obtained a baseline recognition accuracy of 80%, by using a set of features related to amplitude, frequency,
This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.
The final version of record is available at http://dx.doi.org/10.1109/TGRS.2016.2559440
Copyright (c) 2016 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing pubs-permissions@ieee.org.
IEEE TRANSACTIONS ON GEOSCIENCES AND REMOTE SENSING , VOL. XX, NO. X, NOV 2016 5
and phase of the seismic signals [19]. A 10%of the entire record was used as segmentation window with 50%overlapping,
which was distinct in different recordings, due to the variable lengths for the entire seismic events at Villarica Volcano.
According to this review, previous works have demonstrated the possibilities of using machine learning techniques in this
setting. Nevertheless, the literature still lacks supportive evidence about which are the main design parameters to be considered
in each stage, from preprocessing to classification, in order to satisfy RT requirements.
III. PROP OS ED ME TH OD OL OG Y
As already mentioned, our main interest in the middle term is to develop a VEWS, which can accomplish RT capabilities
for monitoring and decision making. In this work, we focus on developing an automatic classifier to distinguish LP and VT
events as a supporting tool that can be used by experts to quantify seismic activity and to alert in the case of emergency
situations. The proposed system consist of several stages:
1) Acquired signals are initially treated by a preprocessing subsystem.
2) Next, a subsystem previously proposed by our research group [34] is adapted here to detect the presence of an (any
type) event from BN.
3) Then, an additional subsystem is given by a machine learning approach, and it is specifically devoted to classify the
previously detected events into one of two target classes, namely VT and LP. For designing this subsystem, attention has
to be placed on the stages of feature extraction and feature selection.
These subsystems, together with their performance evaluation, are described in this section.
A. Preprocessing Subsystem
Data used in this work were provided by the IGEPN. Data consisted of N= 914 independent volcano-seismic recordings
(759 LP, 116 VT, 30 HYB, and 9 TRE) sampled at 100 Hz. These particular recordings have been extracted, identified and
labelled by experts at the IGEPN from continuous monitoring recordings of seismograms. Each available recording conveyed
a single volcanic event, and it was preprocessed with a 128th order band-pass finite impulse response filter. The filter had a
passband between 1 and 15 Hz, given that undesired sea microseisms generate spectral power content about 0.2 Hz [35]. In
addition, it is known that the main spectral content for LP events is expected to be in the frequency range of 2-to-4 Hz, and
for VT earthquakes in the 5-to-10 Hz bandwidth [36]. The resulting filtered record was then normalized in order to have a
zero-mean and unit-variance signal. Figure 2 shows an example of a recording segment including a VT event, both for the
raw recording as registered by the deployed sensors (a), and for its corresponding filtered and normalized signal (b).
B. Event Detection and Segmentation Subsystem
Volcanic events in the database need to be extracted from BN as a first stage. In a previous work, we developed a LP
event detector with high detection capabilities with respect to BN (see [34] for details). Given that our current system aims to
distinguish between LP and VT events, we adapted that method to provide with an accurate event detector in the presence of
LP, VT, HYB, and TRE events available in the database, before addressing the event classification.
This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.
The final version of record is available at http://dx.doi.org/10.1109/TGRS.2016.2559440
Copyright (c) 2016 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing pubs-permissions@ieee.org.
IEEE TRANSACTIONS ON GEOSCIENCES AND REMOTE SENSING , VOL. XX, NO. X, NOV 2016 6
(a) (b)
(c) (d)
(e) (f)
Fig. 2. Examples for preprocessing and feature extraction. (a) Raw recorded waveform conveying a VT event. (b) The same signal after filtering and
normalizing. (c,d) Examples of VT and LP events after detection and segmentation. (e) Welch periodogram for the example events (LP in blue continuous,
VT in red discontinuous). (f) Wavelet decomposition at level 6 for the example events (LP in blue continuous, VT in red discontinuous).
This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.
The final version of record is available at http://dx.doi.org/10.1109/TGRS.2016.2559440
Copyright (c) 2016 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing pubs-permissions@ieee.org.
IEEE TRANSACTIONS ON GEOSCIENCES AND REMOTE SENSING , VOL. XX, NO. X, NOV 2016 7
For this purpose, a non-overlapping sliding window of length w= 1500 samples (15s) was first applied for segmentation of
each signal. This produced a data matrix that was used to train a classifier for distinguishing between BN and the presence of
any event, according to the event labeling provided by the IGEPN experts. We used supervised learning, specifically with DT
as machine learning technique. The output of the DT classifier predicted a class yfor each segment, which was labeled with
y= +1 in the case of existing events of any kind, and y=−1otherwise. The performance obtained with this event detection
stage is analyzed in the results section.
The sequence of ±1’s generated by the detector and applied to each sliding window in the same recording was post-
processed for event delimitation. The average time position of consecutive detected segments was computed and denoted as td.
We decided to use a time window of 60s since this value is a good trade-off for conveying LP and VT event duration at the
Cotopaxi Volcano [22]. Therefore, the starting (ts) and ending (te) time-stamps of each event were obtained as ts=td−30
and te=td+ 30, respectively. This post-processing stage provided us with a data matrix T, given by:
T= [tT
1,tT
2, . . . , tT
i,tT
i+1, . . . , tT
M]T,(1)
where Mis the number of vectors, and tiare the signal segments in each recording containing some event within a 60s
window. We had available 350 LP and 116 VT events in this matrix, since initial recordings only have a single event per
record. For avoiding bias in the classifier, we used balanced classes for training, including 116 segments of each class (hence
M= 232 signal vectors, and each vector ticonveyed 6000 elements or time samples).
Figures 2(c) and 2(d) show two examples of waveforms conveying VT and LP events after detection and segmentation. In
these panels, black vertical lines denote the segments (or sliding windows) used for the event detector, whereas the red line
indicates the detection (i.e., +1) or not (i.e.,−1) of an event in each window. Since the whole delimiting time window has
60s, the VT event is fully included in the window (4 segments), whereas the LP event although has a larger duration, it is still
mostly included in the window.
C. Design of Feature Extraction Stage
The VT–LP classification subsystem was also designed in a machine learning framework. The input vector to the classifier
was given either by the raw signal in vector ti, or by a set of features (measurements on specific properties of the signal)
which aimed to enhance the relevant information to be fed to the classifier. We scrutinized several feature extraction strategies,
in order to extract a set of relevant features from each row of data matrix T, yielding a feature vector which can be denoted
by xi=g(ti), where g(.)is the feature extraction operator to be defined for each feature extraction strategy. Two different
feature extraction strategies were benchmarked, and they are summarized next.
(a) Features from PSD. The PSD was calculated for each signal by using its Welch periodogram [37], yielding feature
vectors xF
i=gF(ti), where gFdenotes the PSD computing operator. The Welch periodogram used a boxcar window with
50% overlapping, unit amplitude, and length of 512 samples. Each resulting feature vector had a number of features given by
the number of samples in the Fourier representation of the Welch periodogram, yielding therefore 257 features. Figure 2(e)
shows the PSD features for the example events in figures 2(c) and 2(d).
This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.
The final version of record is available at http://dx.doi.org/10.1109/TGRS.2016.2559440
Copyright (c) 2016 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing pubs-permissions@ieee.org.
IEEE TRANSACTIONS ON GEOSCIENCES AND REMOTE SENSING , VOL. XX, NO. X, NOV 2016 8
(b) Features from Signal Parameters in Several Domains. Not only the spectral representation, but also many other approaches
have been proposed in the literature in order to analyze volcanic events with a number of signal parameters. In this setting, we
can define a parameter as a measurement of some signal quality, either in the time, frequency or scale domain. In this case,
the number of parameters used as features were 80 (see [38] and references therein for details), and they were compiled as
follows:
•First, a total of 12 features were obtained in the time domain, namely: mean time (µt), standard deviation (σt), signal
entropy (Ent), concentration of the signal around the mean value of the distribution (kurtosist), multi-scale entropy
(MSEt), time to reach the maximum peak (tt,mp), difference between the maximum and minimum peaks (P2Pt), root
mean square value (RMS), difference between the maximum and minimum peaks in the RMS signal (P2PRMS
t), signal
energy in RMS (ERMS
t), zero crossing number (Zt), and peaks density in the RMS signal.
•Second, a total of 19 features were obtained in the frequency domain by applying the Fast Fourier Transform (FFT) to
each signal in the data matrix. Thus, we have: the number of peaks over a threshold of 0.9 (ψf), the maximum detected
frequency (maxf), the mean frequency (µf), the frequency standard deviation (σf), the spectral entropy (Enf), the spectral
energy (Ef), the spectral kurtosis (kurtosisf), the spectral multi-scale entropy (M SEf), the maximum peak value (U)
and maximum frequency in the frequency bands from 10 to 20 Hz (max10−20
f) and from 20 to 30 Hz (max20−30
f), the root
mean square value (RMSf), the difference between the maximum and minimum peaks (P2Pf), the difference between
the maximum and minimum peaks in RMS (P2PRM S
f), the signal energy in RMS (ERMS
f), the peaks density in RMS,
the highest peak value in RMS (Hf), the second (H0
f) and the third (H00
f) highest peaks in RMS. In all these features,
the subscript findicates their frequency domain dependence.
•Finally, a total of 49 features in the scale domain were extracted, where a Wavelet transform was applied to each signal
by using a 10th order symlet mother wavelet. Note that the use of the Wavelet transform aimed to overcome the resolution
limitations exhibited by the Fourier transform [39]. Knowing the main frequency bands where events are usually present
(from 2 to 4 Hz for LP and from 5 to 10 Hz for VT), we decided to work with a decomposition (δ) at level 6 of the
Wavelet transform, obtaining the corresponding approximation (cA) and detailed (cD) Wavelet coefficients. Hence, we
retrieved 7 features: the mean value (µw), the frequency of the maximum value (fmax,w), the energy percentage (Eδ
%w),
the difference between the maximum and minimum peaks (P2Pδ
w), the maximum frequency (Vδ
w), the root mean square
value (RMSδ
w), the difference between the maximum and minimum peaks in RMS (P2Pδ,RMS
w), for each coefficients
cA6,cD6,cD5,cD4,cD3,cD2, and cD1. Figures 2(e) and 2(f) allow to compare the Welch periodogram with the
level-6 Wavelet coefficients, respectively, for the preceding examples of VT and LP events, the subscript windicates their
scale domain dependence.
D. Classification Algorithms and Feature Selection Strategies
For the event classification stage, we also considered a supervised machine learning approach. Thus, we scrutinized the
suitability of two well-known classification algorithms, namely, DT [15], [40] and SVM [41].
DT is a non-parametric supervised learning method used for approximating discrete-valued target functions in classification
This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.
The final version of record is available at http://dx.doi.org/10.1109/TGRS.2016.2559440
Copyright (c) 2016 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing pubs-permissions@ieee.org.
IEEE TRANSACTIONS ON GEOSCIENCES AND REMOTE SENSING , VOL. XX, NO. X, NOV 2016 9
and regression. The aim of this method is to create a model which can predict the cpossible values of a target variable by
learning simple decision rules inferred from the data features. Each node in the tree specifies a rule for some attribute of the
instance, and each branch descending from that node corresponds to one of the possible values for that attribute. An instance is
classified by starting at the root node of the tree, testing the attribute specified by this node, and then moving down the branch
corresponding to the value of the attribute. This process is then repeated for the subtree rooted at the new node. The algorithm
promotes small trees instead of large ones, which yields classifiers with good generalization capabilities. The depth or leafiness
of the tree is the free parameter for this machine learning technique, which is measured in terms of the information actually
contained by the child nodes. Here, we used two indices for this purpose, namely, with the average amount of information
contained in each event (as given by the entropy, E(t)), and with the statistical dispersion and inequality (as given by Gini
index, G(t)) [42] as follows:
E(t) = −
c−1
X
i=0
p(i|t) log2p(i|t),(2)
G(t) = 1 −
c−1
X
i=0
[p(i|t)]2,(3)
where p(i|t)denotes the fraction of records belonging to output value iat a given node t, and cis the number of possible
values of the target variable.
Conventional machine learning classifiers, such as Gaussian maximum likelihood or artificial neural networks, can be strongly
affected by the high dimensionality of input observation vectors, and they can tend to over-fit to the data in the presence of
noise, or to perform poorly with low number of available training samples [43]. In the last few years, the use of SVMs
[44], [45] for machine learning practical applications has received wide attention because the method integrates in the same
classification procedure: (1) a simple way for dealing with nonlinear classification boundaries, as samples that are nonlinearly
separable in the input space are mapped to a higher dimensional space where a simpler (linear) classification is performed;
(2) an intrinsic regularization procedure, which controls with efficiency the model complexity; and (3) the minimization of an
upper bound of the generalization error, thus following the Structural Risk Minimization principle.
The ν-SVM algorithm for classification was used in this work, it is defined in summary as follows, and the interested reader
is recommended to see e.g. [45] for details. Given a labeled training data set {xi, yi}n
i=1, where xi∈RNand yi∈ {−1,+1},
and given a nonlinear mapping φ(·), the ν-SVM method solves:
min
w,ξi,b,ρ (1
2kwk2+νρ +1
n
n
X
i=1
ξi)(4)
subject to:
yi(hφ(xi),wi+b)≥ρ−ξi∀i= 1, . . . , n (5)
ρ≥0, ξi≥0∀i= 1, . . . , n (6)
where wand bdefine a linear classifier in the feature space, and ξiare positive slack variables enabling to deal with errors. The
This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.
The final version of record is available at http://dx.doi.org/10.1109/TGRS.2016.2559440
Copyright (c) 2016 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing pubs-permissions@ieee.org.
IEEE TRANSACTIONS ON GEOSCIENCES AND REMOTE SENSING , VOL. XX, NO. X, NOV 2016 10
appropriate choice of nonlinear mapping φguarantees that the transformed samples are more likely to be linearly separable
in the (higher dimensional) feature space. In this formulation, variable ρis controlled with coefficient ν, which adds another
degree of freedom to the margin, the size of the margin increasing linearly with ρ. Therefore, the trade off between the training
error and the generalization error is controlled in the ν-SVM formulation by adjusting νin the range [0,1], which acts as an
upper bound on the fraction of margin errors, and it is also a lower bound on the fraction of support vectors.
Primal problem (4) is solved by using its dual problem counterpart, yielding w=Pn
i=1 yiαiφ(xi)(see [45] for further
theoretical and algorithmic details), and the decision function for any test vector x∗is finally given by
f(x∗) = sgn n
X
i=1
yiαiK(xi,x∗) + b!(7)
where αiare Lagrange multipliers corresponding to constraints in (4), being the support vectors (SVs) those training samples
xiwith non-zero Lagrange multipliers αi6= 0; and the bias term bis calculated by using the unbounded Lagrange multipliers
as b= 1/k Pk
i=1(yi− hφ(xi),wi), where kis the number of unbounded Lagrange multipliers (0< αi< C). Note that a
particularity of SVM is that decision function f(x)is a function of a small subset of the training examples, which are the
support vectors. Those are the examples that are the closest to the decision boundary and lie on the margin, as well as those
wrong-class examples. The existence of such support vectors is at the origin of the computational properties of SVM and their
competitive classification performance. The interested reader can see [46] for more details about the algorithm with linear and
non-linear SVM.
Another key point is the use of Mercer kernels, K(xi,x∗) = hxi,x∗i, to handle the nonlinear algorithm implementations.
In this work we use the two well-known Mercer kernels given by the linear kernel, K(x,z) = hx,zi, and the Gaussian kernel,
K(x,z) = exp −kx−zk2
2σ2, where σis the width free parameter, to be tuned together with νfree parameter during the training
and validation stage.
We used a feature selection strategy in order to identify the pmost relevant features that improved tpand its overall
performance. We worked with Recursive Feature Extraction (RFE), a well known wrapper method, which adds or removes
features by generating the ranking of them using backward feature elimination [47], [48], in order to find the optimal combination
that maximizes model performance. RFE-SVM is a weight-based method, in which at each step, the coefficients of the weight
vector of SVM are used as the feature ranking criterion.
Free parameters were tuned for both classification techniques by following the usual cross-validation process. For DT, the
tree depth was determined, meanwhile for SVM, free parameter νwas tuned to control the number of support vectors.
This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.
The final version of record is available at http://dx.doi.org/10.1109/TGRS.2016.2559440
Copyright (c) 2016 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing pubs-permissions@ieee.org.
IEEE TRANSACTIONS ON GEOSCIENCES AND REMOTE SENSING , VOL. XX, NO. X, NOV 2016 11
E. Performance
Both the detection and the classification performance were measured in terms of Accuracy (A), Precision (P), Sensitivity
or Recall (R), Specificity (S) criterion, and Balanced Error Rate (BER), which are defined as follows:
A(%) = NC
NT
×100,(8)
P(%) = NT P
NT P +NF P
×100,(9)
R(%) = NT P
NT P +NF N
×100,(10)
S(%) = NT N
NT N +NF P
×100,(11)
BER = 1 −R+S
2×100,(12)
where NCis the number of correctly classified events, NTis the total number of events used to feed the classifier, NT P is
the number of true positives, NFN is the number of false negatives, NTN is the number of true negatives, and NFP is the
number of false positives. We calculated these performance measures using training and testing folds. The time consumption
of the process in the system, denoted by tp, was also considered.
IV. RES ULT S
This section presents the results obtained by applying the methodology proposed to our dataset from the Cotopaxi Volcano.
The entire dataset was divided into training and testing sets, and the independence among all the records in each of these sets
was ensured. The experiments were carried out in MatlabT M , on a Core I5 PC with 3.1 GHz and 4GB RAM.
The detector stage showed a correct event detection rate of 99%, for discriminating LP and VT events from BN. Figure 3
shows examples of four detected events. As described, a time duration of 1min was set for each analysis window, wd, since
this kind of events at Cotopaxi Volcano have this mean time duration. Figures 3(c) and 3(d) show that each excerpt comprised
entirely the VT earthquakes, since VT earthquakes have time duration lower than 1min.
A. Results Using DT
In the frequency domain, the DT algorithm obtained the model by using a training set containing N=116 (58 instances for
LP and VT), which made possible to induce a tree shape taking the top-down form, as depicted in Figure 4. The DT algorithm
selected 3 key features Xi, where iis the number of the selected feature corresponding to the amplitude of the PSD in a
determined frequency, beginning from the top node with the rule X16 ≥ −0.42 (corresponding to the amplitude value at 3
Hz), X21 ≥ −0.58 (corresponding to the amplitude value at 4 Hz), and X36 ≥0.47 (corresponding to the amplitude value at
6.8 Hz), which made possible to classify into any 1 of the 6 possible leafs. Figures 4(d) to 4(f) show a mesh representation of
10 segments, where from 1 to 5 (from 6 to 10) correspond to examples of LP (VT) events. We can observe that DT algorithm
retrieved about 3 effective features in each segment in order to predict the outcome.
Cross-validation and pruning methods were performed to control leafiness. Cross-validation selected 2 key features, beginning
from the top node, with rule X16 ≥ −0.43, and X36 ≥0.37, for classifying into 1 of the 3 possible leafs, as depicted in
This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.
The final version of record is available at http://dx.doi.org/10.1109/TGRS.2016.2559440
Copyright (c) 2016 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing pubs-permissions@ieee.org.
IEEE TRANSACTIONS ON GEOSCIENCES AND REMOTE SENSING , VOL. XX, NO. X, NOV 2016 12
0 200 400 600 800 1000 1200
−1
−0.5
0
0.5
1
Time (s)
Normalized Amplitude
(a)
0 200 400 600 800 1000 1200
−1
−0.5
0
0.5
1
Time (s)
Normalized Amplitude
(b)
0 200 400 600 800 1000 1200
−1
−0.5
0
0.5
1
Time (s)
Normalized Amplitude
(c)
0 200 400 600 800 1000 1200
−1
−0.5
0
0.5
1
Time (s)
Normalized Amplitude
(d)
Fig. 3. Event detection representation with time duration of 1min for each excerpt (red), and with segmentation window w=15s (black), for LP events (a-b),
and VT earthquakes (c-d).
VT LP
LP VT VT LP
x16 >= −0.42
x21 >= −0.58 x36 >= 0.47
x36 >= −0.01 x16 >= 0.51
(a)
VT
LP VT
x16 >= −0.43
x36 >= 0.37
(b)
VT LP VT
LP VT
x16 >= −0.43
x21 >= −0.58 x36 >= 0.47
x36 >= −0.01
(c)
(d) (e) (f)
Fig. 4. Tree representation considering frequency features, for cross validation and pruning feature selection methods in DT. Segments from 1 to 5 (from 6
to 10) correspond to examples of LP (VT) events: (a) main features in the tree, and (d) features verified in each signal; (b) cross validation and (c) pruning
feature selection methods, with main features in the tree; (d), (e),(f) features retrieved in each signal.
Figure 4(b). Pruning selected 3 key features, the top node was kept with the same rule, and 2 more rules were defined, given
by X21 ≥ −0.58 and X36 ≥0.47, which corresponded to 4 Hz and 6.8 Hz, respectively, and they were classified into 1 of
the 5 possible leafs, as depicted in Figure 4(c). We observed in Figures 4(d), 4(e) and 4(f), that VT earthquakes have the main
spectral content above 4 Hz, which was retrieved by DT algorithm in 4 Hz (X21) and 6.8 Hz (X36 ), whilst LP events present
This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.
The final version of record is available at http://dx.doi.org/10.1109/TGRS.2016.2559440
Copyright (c) 2016 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing pubs-permissions@ieee.org.
IEEE TRANSACTIONS ON GEOSCIENCES AND REMOTE SENSING , VOL. XX, NO. X, NOV 2016 13
VT LP
VT VT LP LP LP VT LP VT
VT LP LP VT VT LP
LP VT LP VT
x27 >= 0.3
x67 >= −0.6 x25 >= −0.1
x31 >= −0.5 x57 >= −0.1 x57 >= 0.7 x5 >= −0.4
x1 >= −0.04
x51 >= −3.7
x1 >= −0.3 x55 >= −0.8 x66 >= −1.4
x1 >= −0.2
x2 >= −1.2 x10 >= −0.5
x48 >= −0.4
x45 >= 1.8
x27 >= 1
x20 >= −0.01
(a)
LP VT
x27 >= 0.3
(b)
VT
VT LP
VT LP LP VT
x27 >= 0.3
x67 >= −0.6
x31 >= −0.45 x57 >= −0.1
x1 >= −0.04 x1 >= −0.27
(c)
Fig. 5. Tree representation considering parameter features, cross validation and pruning feature selection methods: (a) main features in the tree, cross validation
(b) and pruning (c) feature selection methods.
TABLE I
EXP ERI ME NTAL P ERF ORM AN CE RE SU LTS FO R CLA SS IFIC ATIO N WHE N AP PLYI NG FE ATUR E SEL EC TIO N STAG E WIT H DT C LAS SIFI ER .
Matrix N. Features–Method A(%) P(%) R(%) S(%) BER tp(s)
Frequency / parameters 4/15–Default 90/94 96/90 83/98 96/89 0.1/0.06 1.5/1
Frequency / parameters 2/1–Cross-Val. 83/84 78/78 92/91 74/78 0.17/0.16 0.7/0.6
Frequency / parameters 3/5–Pruning 90/94 96/90 83/98 96/89 0.1/0.06 1/0.7
their main spectral component in 3 Hz (X16) and above 6 Hz has negligible values.
The DT algorithm retrieved 15 key features by considering the parameter features, beginning from the top node, with rule
X27 ≥0.3, and making possible to classify into any 1 of the 20 possible leafs, as depicted in Figure 5(a). By using cross-
validation, the DT algorithm selected one key feature, the top node with rule X27 ≥0.3, classified into one of the two possible
leafs, as depicted in Figure 5(b), whilst pruning selected 5 key features corresponding to ERMS
f(X27), P2Pδ5
w(X67), Eδ3
%w
(X57), H00
p(X31), and µt(X1). The top node was kept with the same rule and 5 more rules were defined, which allowed to
classify into 1 of the 6 possible leafs, as depicted in Figure 5(c).
Table I shows the results obtained with the testing set containing N=116 independent cases. The pruning method identified
5 key features, which made possible to reach the best system performance. The main differences with the original features
were in terms of the BER and tp.
B. Results Using Linear and Non-linear SVM Classifiers
SVM is often sensitive to class unbalance, and for this reason we first determined the training percentage for our database
which maximized the system performance. These values were set to 35% and 50% for linear and non-linear SVM, respectively.
Then we set the free parameter νto 0.05 for linear, 0.1 and 0.15 for non-linear SVM classifiers, with frequency and parameter
This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.
The final version of record is available at http://dx.doi.org/10.1109/TGRS.2016.2559440
Copyright (c) 2016 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing pubs-permissions@ieee.org.
IEEE TRANSACTIONS ON GEOSCIENCES AND REMOTE SENSING , VOL. XX, NO. X, NOV 2016 14
TABLE II
EXP ERI ME NTAL R ESU LTS FO R CLA SS IFIC ATIO N BY U SIN G LIN EA R AND N ON -LIN EA R1SVM CL ASS IFI ERS .
Matrix N. Features–Method A(%) P(%) R(%) S(%) BER tp(s)
Frequency / parameters 257/80–Default 92/97 93/99 91/96 93/99 0.08/0.03 0.23/0.08
Frequency / parameters 4/15–RFE 93/97 94/97 92/96 94/98 0.07/0.03 0.06/0.06
Frequency / parameters 257/80–Default184/97 86/96 81/98 87/96 0.16/0.03 7/3.9
Frequency / parameters 15/15–RFE193/93 96/93 91/94 96/93 0.06/0.06 1.9/1.9
TABLE III
EXP ERI ME NTAL R ESU LTS FO R CLA SS IFIC ATIO N BY U SIN G LIN EA R SVM CLA SS IFIE R,B Y CON SI DER IN G IND EP END EN TLY FEAT URE S IN T HE TI ME (T),
FREQUENCY (F), AN D SC ALE (W)D OM AIN S.
Domain N. Features A(%) P(%) R(%) S(%) BER tp(s)
T1 90 94 85 94 0.10 0.04
F5 97 97 96 97 0.03 0.05
W9 93 92 95 92 0.07 0.05
features. Parameter γwas set to 8.86e-5 and 4.00e-3, respectively, by making an iterative process in order to maximize the
system performance. Finally, we selected the main features by using RFE. The results are summarized in Table II. For both
feature sets, linear and non-linear SVM, the use of RFE improved the performance in terms of the tp. RFE selected 4 key
features for frequency features and 15 for parameter features in the case of linear SVM, whilst for non-linear SVM, 15 key
features were selected for both feature sets. We obtained the best results with linear SVM and by using RFE method, which
identified 15 key features considering parameter features, and the main difference with the other methods was in terms of tp
and BER.
Considering frequency features with linear SVM, the algorithm recognized the relevance of frequency bands corresponding
to 3.11, 3.3, 3.5, and 7 Hz, as depicted in Figure 6(a). Meanwhile non-linear SVM identified 15 bands, which corresponding
to 1.17, 2.5, 3, 3.11, 3.3, 3.5, 4.5, 6.8, 7, 7.2, 7.4, 22.3, 36.2, 39, and 41.6 Hz. We observed that not only low frequencies
were considered, but also high frequency bands were identified, as shown in Figure 6(b). Therefore, we observed in Figure
6(c) that non-linear SVM considers frequencies above 35 Hz. In general, linear and non-linear SVM were able to identify LP
from VT in simple terms of amplitude and spectral content.
By using parameter features and RFE feature selection method, linear SVM retrieved 15 key features: Zt(X11), ψf(X13),
MSEf(X20), U10−20 (X21 ), RMSf(X24), ERM S
f(X27), Vδ5
w(X39), µδ5
w(X41), µδ1
w(X53), Eδ5
%w(X56), Eδ2
%w(X59), P2Pδ5
w
(X68), P2Pδ2
w(X76), P2Pδ2,RM S
w(X77), P2Pδ1
w(X80), and non-linear SVM retrieved also 15 key features: Zt(X11), ψf(X13 ),
ERMS
f(X19), MSEf(X20 ), RMSf(X24), ERM S
f(X27), Vδ5
w(X39), P2Pδ6
w(X42), RMSδ5
w(X51), µδ1
w(X53), P2Pδ5
w(X68), P2Pδ2
w
(X76), P2Pδ2,RM S
w(X77), Vδ5
w(X78), RMSδ5
w(X79), P2Pδ1
w(X80). We identified 11 matching features, which corresponded to
Zt,ψf, MSEf, RMSf, ERMS
f, Vδ5
w,µδ1
w, P2Pδ5
w, P2Pδ2
w, P2Pδ2,RMS
w, and P2Pδ1
w.
In the feature selection block, RFE method presented an improvement in all metrics in at least 1%by considering parameter
features, and using 4 of 257 features. Moreover, the pruning method got similar results with DT classifier considering both
matrices, whilst cross-validation got worse results in at least 10%in each metric. This strategy permitted to reduce the tpfrom
1500ms to 700ms. Linear SVM-RFE identified 15 main features: 1 in the time domain T(Zt), 5 in the frequency domain F
(ψf, MSEf, U10−20, RMSf, ERM S
f), and 9 in the scale domain W( Vδ5
w,µδ5
w,µδ1
w, Eδ5
%w, Eδ2
%w, P2Pδ5
w, P2Pδ2
w, P2Pδ2,RMS
w,
This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.
The final version of record is available at http://dx.doi.org/10.1109/TGRS.2016.2559440
Copyright (c) 2016 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing pubs-permissions@ieee.org.
IEEE TRANSACTIONS ON GEOSCIENCES AND REMOTE SENSING , VOL. XX, NO. X, NOV 2016 15
(a) (b)
(c)
Fig. 6. Tree representation considering frequency features, RFE feature selection method: features retrieved in each gifor linear (a), non-linear (b), and
zoomed non-linear SVM above 35 Hz (c).
P2Pδ1
w). Table III shows the results with each set tested independently, where the best result were obtained with 5 features in
the frequency domain, reaching similar performance values, by reducing the tpto 500ms. Additionally, we identified 2 bands
corresponding to details levels 2 fs2∈(0, 25) Hz and 5 fs5∈(0, 3.13) Hz in the scale domain, where by just considering
some parameters allowed us to discriminate VT from LP events with acceptable accuracy. Figure 7 shows possible features
combination, by considering the 3 most relevant features in each classifier. We observed in Figure 7(e), for example, two well
defined clusters, making possible by linear separability to recognize LP from VT. This was possible by considering linear
SVM classifier and U10−20, RMSf, and ERM S
f, which are in the frequency domain.
V. DI SC US SI ON A ND CO NC LU SI ON S
LP and VT events have proven to be keys for monitoring any volcano activity, including Cotopaxi, since they provide
important information about the volcano internal status [8]–[10]. With appropriate RT constrains for monitoring, detecting,
and classifying volcanic signals, VEWS may be used as a tool for detecting potencial anormal increments in seismic activity.
Such information, in combination with other parameters such as the external activity of the volcano, may be used by the local
authorities to launch an effective early warning to the population.
Our proposed approach applies a two-stage solution, consisting of event detection followed by event-classification. Previous
works did not apply any detection strategy based on classification, but instead a rectangular sliding window of fixed or variable
size was used [19], [28], [33]. In our setting, the use of the detection stage has simplified the learning problem of distinguishing
LP, VT and non-volcanic origin events. In fact, the first stage detects seismic events from BN with high accuracy, therefore
reducing the computational requirements of the system. The detection accuracy was near 99%. In our study, the proposed event
detector showed superior performance in relation to previous studies, however a further study with standard datasets should
This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.
The final version of record is available at http://dx.doi.org/10.1109/TGRS.2016.2559440
Copyright (c) 2016 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing pubs-permissions@ieee.org.
IEEE TRANSACTIONS ON GEOSCIENCES AND REMOTE SENSING , VOL. XX, NO. X, NOV 2016 16
0
2
4
6
8
−4
−2
0
−1
0
1
2
3
4
X14
X19
X34
(a)
0
2
4
6
8
0
5
10
−1
0
1
2
3
4
X14
X16
X34
(b)
−4
−2
0
2
0
5
10
0
2
4
X12
X16
X35
(c)
−2
−1
0
1
−1
0
1
2
3
−1
0
1
2
3
X27
X39
X68
(d)
0
2
4
0
2
4
6
−2
−1
0
1
X21
X24
X27
(e)
0
2
4
6
−2
−1
0
1
0
2
4
6
X24
X27
X76
(f)
Fig. 7. Scatter plots of selected main features by considering: G matrix (a-b-c), and H matrix (d-e-f), using DT classifier (a-d), linear SVM (b-e), and
non-linear SVM (c-f), respectively. Red dots represent VT while blue dots represent LP events.
be conducted to be able to give a quantitative comparison analysis.
In the classification stage, the DT algorithm showed an accuracy near to 90%by considering the frequency features, while
a linear SVM has reached 97%considering the parameter features. These values are comparable to the best results obtained
in previous studies by applying different methodologies to the classification problem, such as Hilbert and Wavelet transforms
[33], or Bayesian and k-NN classifiers [25].
Additionally, our experiments have shown that the best results were obtained with the parameter features, where a set of
80 features in time, frequency, and scale domains was considered. We took into account well known features with some of
them given by experts of IGEPN, which had been obtained by heuristic and observation processes. An error rate of 3% was
obtained using 15 main features retrieved by RFE, and linear SVM classifier. The main difference between linear and non-linear
SVM was tpequal to 60ms and 1900ms, respectively. In the frequency domain, the linear SVM classifier was able to reach a
performance similar to the one obtained when using the optimized features with RFE, but by considering only 5 main features
(ψf, MSEf, U10−20, RMSf, ERM S
f). This proposed strategy has been developed in order to use it in RT analysis, since it
reduces the processing time to values lower than 1min, approximately.
Although the experimental results show the feasibility and reliability of the proposed system, specific aspects at the detection
and classification stages still have to be considered in order to generalize the system to identify all volcano-seismic signals. As
future work, we are interested in classifying LP, VT, HYB, TRE, and non-volcanic origin signals like thunders, by using linear
or non-linear multi class SVM. It should be possible to improve the detection stage by using well-known automatic detection
algorithms applied in speech recognition. These algorithms could determine the starting and ending time stamps of an event,
hence, a better analysis can be performed in the entire duration of the volcanic event, while maintaining or improving the
This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.
The final version of record is available at http://dx.doi.org/10.1109/TGRS.2016.2559440
Copyright (c) 2016 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing pubs-permissions@ieee.org.
IEEE TRANSACTIONS ON GEOSCIENCES AND REMOTE SENSING , VOL. XX, NO. X, NOV 2016 17
accuracy rate.
Regarding feature selection, we have addressed the problem in this paper from the machine learning theory point of view,
and most of the main features found by the algorithms were consistent with common features used in other works for event
classification in other volcanoes [24], [28], [30]. Although the volcanologist significance of the most important features related
to Cotopaxi volcano is still unclear at this moment, and it requires deeper and further investigation, feature selection techniques
were used in this paper to simplify the problem of classification. We are planning to address the problem of volcano-specific
characteristic significance in future studies. Furthermore, analyzing and combining data from multiple stations may help us to
improve the performance of the system, this will also be addressed in future works. Finally, benchmarking our algorithm with
seismological data acquired with similar equipment for other volcanoes is to be addressed among the next steps.
ACK NOW LE DG ME NT
The authors gratefully acknowledge the contribution of Universidad de las Fuerzas Armadas ESPE for the economical support
in the development of this project by Research Grants 2013-PIT-014 and 2015-PIC-004. This work has also been partly supported
by Research Grants PRINCIPIAS (TEC2013-48439-C4-1-R) from Spanish Government and PRICAM (S2013/ICE-2933) from
Comunidad de Madrid. Finally, we want to thank to the IGEPN for providing us the dataset used in this work.
REF ER EN CE S
[1] G. Berz, W. Kron, T. Loster, E. Rauch, J. Schimetschek, J. Schmieder, A. Siebert, A. Smolka, and A. Wirtz, “World map of natural hazards–a global
view of the distribution and intensity of significant exposures,” Natural hazards, vol. 23, no. 2-3, pp. 443–465, 2001.
[2] K. Hewitt, Regions of risk: A geographical introduction to disasters. Routledge, 2014.
[3] S. Makowski Giannoni, R. Rollenbeck, K. Trachte, and J. Bendix, “Natural or anthropogenic? On the origin of atmospheric sulfate deposition in the
Andes of southeastern Ecuador,” Atmospheric Chemistry and Physics, vol. 14, no. 20, pp. 11297–11 312, 2014.
[4] F. A. Pfaffl and W.-C. Dullo, “The first ascent to the Volcano Cotopaxi in Ecuador by Wilhelm Reiss (1838–1908),” International Journal of Earth
Sciences, vol. 103, no. 4, pp. 1175–1179, 2014.
[5] S. R. McNutt, “Volcanic seismology,” Annu. Rev. Earth planet. Sci., vol. 32, pp. 461–491, 2005.
[6] H. Langer, S. Falsaperla, T. Powell, and G. Thompson, “Automatic classification and a-posteriori analysis of seismic event identification at Soufriere
Hills Volcano, Montserrat,” Journal of Volcanology and Geothermal Research, vol. 153, no. 1, pp. 1–10, 2006.
[7] J. Ib´
a˜
nez and E. Carmona, “Sismicidad volc´
anica,” Curso Internacional de Volcanolog´
ıa y Geof´
ısica Volc´
anica, Serie Casa de los Volcanes, no. 7, pp.
269–282, 2000.
[8] B. A. Chouet, “Long-period volcano seismicity: its source and use in eruption forecasting,” Nature, vol. 380, no. 6572, pp. 309–316, 1996.
[9] C. J. Bean, L. De Barros, I. Lokmer, J.-P. M ´
etaxian, G. O’Brien, and S. Murphy, “Long-period seismicity in the shallow volcanic edifice formed from
slow-rupture earthquakes,” Nature geoscience, vol. 7, no. 1, pp. 71–75, 2014.
[10] P. Cusano, M. Palo, and M. West, “Long-period seismicity at Shishaldin Volcano (Alaska) in 2003–2004: Indications of an upward migration of the
source before a minor eruption,” Journal of Volcanology and Geothermal Research, vol. 291, pp. 14–24, 2015.
[11] A. M. Baig, M. Campillo, and F. Brenguier, “Denoising seismic noise cross correlations,” Journal of Geophysical Research: Solid Earth (1978–2012),
vol. 114, no. B8, 2009.
[12] A. Ruano, G. Madureira, O. Barros, H. Khosravani, M. Ruano, and P. Ferreira, “Seismic detection using support vector machines,” Journal of
Neurocomputing, vol. 135, no. 0, pp. 273 – 283, 2014.
[13] I. Guyon, Feature extraction: foundations and applications. Springer Science & Business Media, 2006.
[14] H. Liu and H. Motoda, Computational methods of feature selection. CRC Press, 2007.
[15] T. M. Mitchell, Machine Learning. Burr Ridge, IL: McGraw Hill, 1997, vol. 45.
This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.
The final version of record is available at http://dx.doi.org/10.1109/TGRS.2016.2559440
Copyright (c) 2016 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing pubs-permissions@ieee.org.
IEEE TRANSACTIONS ON GEOSCIENCES AND REMOTE SENSING , VOL. XX, NO. X, NOV 2016 18
[16] T. S. Newman and A. K. Jain, “A survey of automated visual inspection,” Journal of Computer Vision and Image Understanding, vol. 61, no. 2, pp.
231–262, 1995.
[17] D. Mery and O. Medina, “Automated visual inspection of glass bottles using adapted median filtering,” Journal of Image Analysis and Recognition, pp.
818–825, 2004.
[18] S. Scarpetta, F. Giudicepietro, E. Ezin, S. Petrosino, E. Del Pezzo, M. Martini, and M. Marinaro, “Automatic classification of seismic signals at Mt.
Vesuvius Volcano, Italy, using neural networks,” Bulletin of the Seismological Society of America, vol. 95, no. 1, pp. 185–196, 2005.
[19] M. Curilem, J. Vergara, C. San Mart´
ın, G. Fuentealba, C. Cardona, F. Huenupan, M. Chac´
on, M. S. Khan, W. Hussein, and N. B. Yoma, “Pattern
recognition applied to seismic signals of the Llaima Volcano (Chile): An analysis of the events’ features,” Journal of Volcanology and Geothermal
Research, vol. 282, pp. 134–147, 2014.
[20] E. Aguilera, M. Pareschi, M. Rosi, and G. Zanchetta, “Risk from lahars in the northern valleys of Cotopaxi Volcano (Ecuador),” Natural Hazards,
vol. 33, no. 2, pp. 161–189, 2004.
[21] I. Molina, H. Kumagai, A. Garc´
ıa-Aristiz´
abal, M. Nakano, and P. Mothes, “Source process of very-long-period events accompanying long-period signals
at Cotopaxi Volcano, Ecuador,” Journal of Volcanology and Geothermal Research, vol. 176, no. 1, pp. 119–133, 2008.
[22] H. D. Ortiz Erazo, “Estudio de los efectos de sitio para la construcci´
on de un ´
ıdice de actividad s´
ısmica en el volc´
an Cotopaxi,” Master’s thesis, Escuela
Polit´
ecnica Nacional, 2013.
[23] L. Guti´
errez, J. Ib´
aez, G. Cort´
es, J. Ram´
ırez, C. Ben´
ıtez, V. Tenorio, and A. Isaac, “Volcano-seismic signal detection and classification processing using
Hidden Markov Models. Application to San Crist´
obal Volcano, Nicaragua,” in Symposium in Geoscience and Remote Sensing, vol. 4, 2009, pp. 522–525.
[24] E. H. A. Laasri, E.-S. Akhouayri, D. Agliz, D. Zonta, and A. Atmani, “A fuzzy expert system for automatic seismic signal classification,” Expert Systems
with Applications, vol. 42, no. 3, pp. 1013 – 1027, 2015.
[25] M. Orozco, M. E. Garc´
ıa, R. P. Duin, and C. G. Castellanos, “Dissimilarity-based classification of seismic signals at Nevado del Ruiz Volcano,” Earth
Sciences Research Journal, vol. 10, no. 2, pp. 57–66, 2006.
[26] D. C´
ardenas-Pe˜
na, M. Orozco-Alzate, and G. Castellanos-Dom´
ınguez, “Selection of time-variant features for earthquake classification at the Nevado del
Ruiz Volcano,” Computers & Geosciences, vol. 51, pp. 293–304, 2013.
[27] P. A. Castro-Cabrera, M. Orozco-Alzate, A. Adami, M. Bicego, J. M. Londo˜
no-Bonilla, and G. Castellanos-Dom´
ınguez, “A comparison between time-
frequency and cepstral feature representations for the classification of seismic-volcanic signals,” in Progress in Pattern Recognition, Image Analysis,
Computer Vision, and Applications. Springer, 2014, pp. 440–447.
[28] G. Curilem, J. Vergara, G. Fuentealba, G. Acu˜
na, and M. Chac´
on, “Classification of seismic signals at Villarrica Volcano (Chile) using neural networks
and genetic algorithms,” Journal of Volcanology and Geothermal Research, vol. 180, no. 1, pp. 1–8, 2009.
[29] M. Ibs-von Seht, “Detection and identification of seismic signals recorded at Krakatau Volcano (Indonesia) using artificial neural networks,” Journal of
Volcanology and Geothermal Research, vol. 176, no. 4, pp. 448–456, 2008.
[30] I. ´
Alvarez, L. Garc´
ıa, G. Cort´
es, C. Ben´
ıtez, and A. De la Torre, “Discriminative feature selection for automatic classification of volcano-seismic signals,”
IEEE Letters on Geoscience and Remote Sensing Letters, vol. 9, no. 2, pp. 151–155, 2012.
[31] R. Avesani, A. Azzoni, M. Bicego, and M. Orozco-Alzate, “Automatic classification of volcanic earthquakes in HMM-induced vector spaces,” in Progress
in Pattern Recognition, Image Analysis, Computer Vision, and Applications. Springer, 2012, pp. 640–647.
[32] M. Bicego, C. Acosta-Mu˜
noz, and M. Orozco-Alzate, “Classification of seismic volcanic signals using Hidden Markov Model based generative
embeddings,” IEEE Transactions on Geoscience and Remote Sensing, vol. 51, no. 6, pp. 3400–3409, 2013.
[33] C. San-Mart´
ın, C. Melgarejo, C. Gallegos, G. Soto, M. Curilem, and G. Fuentealba, “Feature extraction using circular statistics applied to volcano
monitoring,” in Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications. Springer, 2010, pp. 458–466.
[34] R. Lara-Cueva, D. Ben´
ıtez, E. Carrera, M. Ruiz, and J. Rojo- ´
Alvarez, “Feature selection of seismic waveforms for long period event detection at cotopaxi
volcano,” Journal of Volcanology and Geothermal Research, vol. 316, pp. 34–49, 2016.
[35] B. Kenneth, The seismic wavefield, Volume I: Introduction and Theoretical Development. Cambridge Univ. Press, USA, 2001.
[36] J. Lahr, B. Chouet, C. Stephens, J. Power, and R. Page, “Earthquake classification, location, and error analysis in a volcanic environment: Implications
for the magmatic system of the 1989–1990 eruptions at Redoubt Volcano, Alaska,” Journal of Volcanology and Geothermal Research, vol. 62, no. 1,
pp. 137–151, 1994.
[37] S. M. Kay and S. L. Marple Jr, “Spectrum analysis—a modern perspective,” Proceedings of the IEEE, vol. 69, no. 11, pp. 1380–1419, 1981.
This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.
The final version of record is available at http://dx.doi.org/10.1109/TGRS.2016.2559440
Copyright (c) 2016 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing pubs-permissions@ieee.org.
IEEE TRANSACTIONS ON GEOSCIENCES AND REMOTE SENSING , VOL. XX, NO. X, NOV 2016 19
[38] R. Lara-Cueva, P. Bernal, G. Saltos, D. Ben´
ıtez, and J. L. Rojo- ´
Alvarez, “Time and frequency feature selection for seismic events from Cotopaxi
Volcano,” in Asia-Pacific Conference on Computer Aided System Engineering (APCASE), 2014, 2015, pp. 1–6.
[39] L. Grafakos, Classical and modern Fourier analysis. Pearson / Prentice-Hall, 2004.
[40] C. M. Bishop et al.,Pattern recognition and machine learning. Springer, 2006, vol. 1.
[41] B. Scholkopf, K.-K. Sung, C. J. Burges, F. Girosi, P. Niyogi, T. Poggio, and V. Vapnik, “Comparing support vector machines with gaussian kernels to
radial basis function classifiers,” IEEE Transactions on Signal Processing, vol. 45, no. 11, pp. 2758–2765, 1997.
[42] G. Dougherty, Pattern recognition and classification: an introduction. Springer Science & Business Media, 2012.
[43] G. Hughes, “On the mean accuracy of statistical pattern recognizers,” IEEE Transactions on Information Theory, vol. 14, no. 1, pp. 55–63, Jan 1968.
[44] N. Cristianini and J. Shawe-Taylor, An introduction to support Vector Machines: and other kernel-based learning methods. New York, NY, USA:
Cambridge University Press, 2000.
[45] B. Sch¨
olkopf and A. Smola, Learning with Kernels, 1st ed. Cambridge, MA: MIT Press, 2002.
[46] I. Guyon, J. Weston, S. Barnhill, and V. Vapnik, “Gene selection for cancer classification using support vector machines,” Machine learning, vol. 46,
no. 1-3, pp. 389–422, 2002.
[47] G. H. John, R. Kohavi, K. Pfleger et al., “Irrelevant features and the subset selection problem,” in Proceedings of the Eleventh International Conference
on Machine Learning, 1994, pp. 121–129.
[48] R. Kohavi and D. Sommerfield, “Feature subset selection using the wrapper method: Overfitting and dynamic search space topology,” in KDD, 1995,
pp. 192–197.
This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.
The final version of record is available at http://dx.doi.org/10.1109/TGRS.2016.2559440
Copyright (c) 2016 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing pubs-permissions@ieee.org.