Conference PaperPDF Available

Comparative analysis of automated classifiers applied to volcano event identification

Authors:

Abstract and Figures

The classification of several types of volcanic events can be used to determine the intrinsic behavior of a volcano. The identified behavior can be useful to provide an early alarm in the case of imminent volcanic activity. Therefore, finding an efficient algorithm capable of identifying seismic activity can be very useful for this purpose. In such sense, this work evaluates several machine learning techniques that have been applied to classify seismic events taking into account quality and performance parameters. In order to test the algorithms, a seismic database from the Cotopaxi volcano in Ecuador was used. This database was collected by the Geophysical Institute at Escuela Politécnica Nacional between January and June of 2010. The analysis was focused in two major types of events: long period and volcano tectonic. For each event, 79 key features in time and frequency domain were extracted. These features were used to train 3 well known classifiers: k-nearest neighbors, decision trees and neural networks. Finally, a feature selection technique was employed to find those features with greater impact, improving the classifier performance. Our approach allows us to identify 3 main features with an accuracy of 98\% when using k-NN.
Content may be subject to copyright.
A preview of the PDF is not available
... They often employ signal processing techniques in conjunction with machine learning schemes to achieve highly accurate event classifiers aimed at monitoring active volcanoes [38]. For example, [39] reported a performance comparison between three well-known supervised MLCs, the k-nearest neighbor algorithm (kNN), decision tree (DT), and an artificial neural network (ANN), where the kNN reached the highest accuracy (ACC) value of 98%. In [40], the authors presented a recognition system based on HMM methods to classify VT earthquakes, LP, volcanic tremor, and hybrid events. ...
... Since both actions perform in linear time (see Table V), the time execution is faster, and the algorithm complexity is less compared to the vast majority of state-of-the-art methods. These facts enable the proposed method as a practical and beneficial scheme that Opnq ANN [39] 914`6 no no 97 Opn 4 q DT [39] 914`3 no no 96 ...
... Since both actions perform in linear time (see Table V), the time execution is faster, and the algorithm complexity is less compared to the vast majority of state-of-the-art methods. These facts enable the proposed method as a practical and beneficial scheme that Opnq ANN [39] 914`6 no no 97 Opn 4 q DT [39] 914`3 no no 96 ...
Article
This work proposes a new approach based on a suit combination of mathematical morphology and similarity criteria techniques to classify long-period and volcano-tectonic seismic events of the Cotopaxi volcano. The proposed method explores the seismic signal domain to compute a new feature space based on the edges map of the seismic events pattern represented in the gray-level spectrogram images, which is used to feed a set of similarity-based classifiers. The $L_{2}$ -norm was selected as the best metric to be implemented by the proposed method. In terms of classification performance, the $L_{2}$ -norm was statistically superior in the D1 data set (seismic events with overlapped signals of nonvolcanic origin) and similar in the D2 data set (events without overlapped signals) with respect to the other metrics, reaching accuracy mean scores of 93.34% and 96.88%, respectively. These results demonstrated that the computed edges map feature space is a better environment for separating both seismic events compared with the original gray-level space. Regarding the execution time, total time (TT) and time per-sample (TS) did not exceed 0.388 and 0.002 s during the training stage, respectively. During the testing stage, a TS of no more than 0.012 s was achieved. Finally, its execution time is faster, and the algorithm complexity is lower compared with the state-of-the-art methods, which makes it a practical and beneficial scheme to implement for real-time seismic events’ classification.
... to address the problem of volcano seismic events classification, e.g., long-period (LP) and volcano-tectonic (VT) seismic events, as shown in Fig. 1. Machine learning classifiers (MLC) such as hidden Markov models (HMM) [1], boosting strategies [34], decision trees (DT) [16], random forest (RF) [26,28], Gaussian mixture models (GMM) [33], support vector machine (SVM) methods [7,26], and artificial neural networks (ANN) [2,5] were combined with classical time, frequency and scale domain features and non traditional features such as intensity statistic, shape and texture features extracted from the spectrogram images [26] to differentiate seismic events. ...
Chapter
In this work, we proposed a new method to classify long-period and volcano-tectonic spectrogram images using eight different deep learning architectures. The developed method used three deep convolutional neural networks named DCNN1, DCNN2, and DCNN3, three deep convolutional neural networks combined with deep recurrent neural networks named DCNN-RNN1, DCNN-RNN2, and DCNN-RNN3, and two autoencoder neural networks named AE1 and AE2, to maximize the area under the curve of the receiver operating characteristic scores on a dataset of volcano seismic spectrogram images. The three deep recurrent neural network-based models reached the worst results due to the overfitting produced by the small number of samples in the training sets. The DCNN1 overcame the remaining models by obtaining an area under the curve of the receiver operating characteristic and accuracy scores of 0.98 and \(95\%\), respectively. Although these values were not the highest values per metric, they did not represent statistical differences against other results obtained by more algorithmically complex models. The proposed DCNN1 model showed similar or superior performance compared to the majority of the state of the art methods in terms of accuracy. Therefore it can be considered a successful scheme to classify LP and VT seismic events based on their spectrogram images.
... In this regard, a wide variety of approaches have been used in recent years to address the problem of volcano seismic events classification, e.g., long-period (LP) and volcanotectonic (VT) seismic events, as it is shown in Fig. 1. Machine learning classifiers (MLC) such as hidden Markov models (HMM) [3], boosting strategies [4], decision trees (DT) [5], random forest (RF) [6], [7], Gaussian mixture models (GMM) [8], support vector machine (SVM) methods [7], [9], and artificial neural networks (ANN) [10], [11] were combined with classical time, frequency and scale domain features and non traditional as intensity statistic, shape and texture features extracted from the spectrogram images [7] to differentiate seismic events. On the other hand, convolutional neural networks (CNN) are particular ANN architectures that are gaining more attention in image analysis contexts [13]. ...
Article
Full-text available
Earthquake catalogs are essential to analyze the evolution of active fault systems. The background seismicity rate, or rate of earthquakes that are not directly triggered by other earthquakes, directly relates to the stressing rate, a crucial quantity for understanding the seismic hazards. Determining the background seismicity rate is challenging because aftershock sequences may dominate the seismicity rate. Classifying these events in earthquake catalogs—known as catalog declustering—is a common practice and most declustering solutions rely on spatiotemporal distances between events, such as the nearest‐neighbor‐distance algorithm, widely used in various contexts. This algorithm assumes that the nearest‐neighbor distance (NND) follows a bimodal distribution related to the background seismicity and to the aftershocks. Constraining these two distributions is crucial to distinguish the aftershocks from the background events accurately. Recent work often uses linear splitting based on the NND, ignoring the potential overlap between the two populations and resulting in a biased identification of background earthquakes and aftershock sequences. We revisit this problem with machine‐learning algorithms. After testing several popular algorithms, we show that a random forest trained with various synthetic catalogs generated by an Epidemic‐Type Aftershock Sequence model outperforms approaches such as k‐means, Gaussian‐mixture models, and support vector machine classification. We apply our model to two different earthquake catalogs: the relocated Southern California earthquake center catalog and the GeoNet catalog of New Zealand. Our model capably adapts to these two different tectonic contexts, highlighting the differences in aftershock productivity between crustal and intermediate‐depth seismicity.
Article
Maintaining the seismic monitoring of volcanoes has resulted in a greater understanding of the relationship between the volcano and its environment. This monitoring also provides information on the relationship between the seismic activity of a volcano and possible eruptions. Therefore, automatic detection systems of microearthqueakes are of great importance in this task. In this sense, The aim of this work is to develop a microearthquake detector that search to estimate the signal of the seismic source by reducing the noise present in the registered signal by sensors. For this purpose, it has been proposed the use of homomorphic deconvolution to estimate the signal of the volcanic microearthquake from the signal received by the different seismic sensors that monitor the Cotopaxi volcano. The homomorphic deconvolution technique is accompanied by the calculation of the root mean square energy as a function g(x) for the Short Time Average over Long Time Average (STA/LTA) algorithm. The proposed detector was evaluated with several datasets, including a record of approximately 5 days (7000 min) with 350 seismic events, the data set was provided and labeled by experts from IGEPN, where, a 98.29% of detection probability and an 99.31% of accuracy it was obtained. Additionally, the volcanic microearthquake signals show a gain of 10dB on average after homomorphic deconvolution, when estimating the Signal to Noise Ratio (SNR).
Chapter
We explored four different clustering-based classifiers to categorize two different volcanic seismic events and to find possible overlapping signals that could occur at the same time or immediately after seismic events occurrence. The BFR classifier with \(k=2\) was chosen as the best out of 36 explored models statistically (\(p<0.05\)), reaching a mean of accuracy score of \(88\%\). This result represents a satisfactory and competitive classification performance when compared to the state of art methods. The CURE classifier with \(k=3\) achieved a mean of accuracy value of \(87\%\) at \(p<0.05\), allowing it to be the only model capable of detecting seismic events with overlapping signals. Therefore, the proposed clustering-based exploration was effective in providing competitive models for seismic events classification and overlapped signal detection.
Conference Paper
Full-text available
This paper presents a study to select the most relevant features for classification of seismic signals obtained from the Cotopaxi Volcano, in the time and frequency domain. Fourier and Wavelet transform were used in the analysis. A total of 79 different features were used for the study. Feature selection was performed by CART by using Gini, Standard Deviation, Twoing Rule, Gram-Schmidt, and Interaction Information as relevant indices. A comparative analysis of the features obtained indicates that the most relevant features for the identification of seismic events are: Maximum Peak Value in the 10-20 Hz range, High Frequency in WT A6 and the Percentage of Energy in the D2 and D5 WT levels.
Conference Paper
Full-text available
The analysis and classification of seismic patterns, which are typically registered as digital signals, can be used to monitor and understand the underlying geophysical phenomena beneath the volcanoes. In recent years, there has been an increasing interest in the development of automated systems for labeling those signals according to a number of pre-defined volcanic, tectonic and environmental classes. The first and crucial stage in the design of such systems is the definition or adoption of an appropriate representation of the raw seismic signals, in such a way that the subsequent stage —classification— is made easier or more accurate. This paper describes and discusses the most common representations that have been applied in the literature on classification of seismic-volcanic signals; namely, time-frequency features and cepstral coefficients. A comparative study of them is performed in terms of two criteria: (i) the leave-one-out nearest neighbor error, which provides a parameterless measure of the discriminative representational power and (ii) a visual examination of the representational quality via a scatter plot of the best three selected features.
Article
Full-text available
We consider the problem of classifying completely or partially unlabeled data by using inequalities that contain absolute values of the data. This allows each data point to belong to either one of two classes by entering the inequality with a plus or minus value. By using such absolute value inequalities in linear and nonlinear support vector machines, unlabeled or partially labeled data can be successfully partitioned into two classes that capture most of the correct labels dropped from the unlabeled data.
Article
Full-text available
Data mining techniques often ask for the resolution of optimization problems. Supervised classification, and, in particular, support vector machines, can be seen as a paradigmatic instance. In this paper, some links between mathematical optimization methods and supervised classification are emphasized. It is shown that many different areas of mathematical optimization play a central role in off-the-shelf supervised classification methods. Moreover, mathematical optimization turns out to be extremely useful to address important issues in classification, such as identifying relevant variables, improving the interpretability of classifiers or dealing with vagueness/noise in the data.
Article
We investigated the dynamics of the continuous seismic signal recorded before and during the 2011–2012 El Hierro eruption (Canary Islands) using the innovative approach of the Fisher-Shannon method, a suitable statistical tool for detecting dynamic changes in complex systems. Our findings identify dynamic changes in the seismic signal that can be correlated with different stress states of the magmatic setting and the plumbing system in the volcano at El Hierro. The results contribute to the understanding of the fracturing pattern in the crust during a new intrusion and eruption of an overpressurized batch of magma.This article is protected by copyright. All rights reserved.
Article
This paper proposes a computer-based classifier to automatically identify four seismic events classes of the Llaima volcano, one of the most active volcanoes in the Southern Andes, situated in the Araucanía Region of Chile. A combination of features that provided good recognition performance in our previous papers concerning the Llaima and Villarica (located 100 km south of Llaima) volcanoes are utilized in order to train the classifiers. These features are extracted from the amplitude, frequency and phase of the seismic signals. Unlike the previous works where fixed length windows were used to obtain the seismic signals, this paper employs signals of variable lengths that span the entire seismic event. The classifiers are implemented using support vector machines. A confidence analysis is also included to improve reliability of the classification. Results indicate that the features used for recognition of the events of Villarica volcano also provide good recognition results for the Llaima volcano, yielding classification exactitude of over 80 %.
Article
The automated classification of seismic volcanic signals has been faced with several different pattern recognition approaches. Among them, hidden Markov models (HMMs) have been advocated as a cost-effective option having the advantages of a straightforward Bayesian interpretation and the capacity of dealing with seismic sequences of different lengths. In the volcano seismology scenario, HMM-based classification schemes were only based on a standard and purely generative scheme, i.e., the Bayes rule: training an HMM per class and classifying an incoming seismic signal according to the class whose model shows the highest likelihood. In this paper, a novel HMM-based classification approach for pretriggered seismic volcanic signals is proposed. The main idea is to enrich the classical HMM scheme with a discriminative step that is able to recover from situations when the classical Bayes classification rule is not sufficient. More in detail, a generative embedding scheme is used, which employs the models to map the signals into a vector space, which is called generative embedding space. In such a space, any discriminative vector-based classifier can be applied. A thorough set of experiments, which is carried out on pretriggered signals recorded at Galeras Volcano in Colombia, shows that the proposed approach typically outperforms standard HMM-based classification schemes, also in some cross-station cases.
Chapter
Volcanic processes operate over a wide range of time scales that require a variety of instruments and techniques for their study. Short-period seismology typically covers the band 0.1–1 s, while broadband seismology can extend that band to 0.1–100 s. Borehole strainmeters may be used to cover the very wide band from 0.1 s to 100 days and Global Positioning System (GPS) surveys are useful to track deformation over time scales ranging from days to decades. Arrays of three-component broadband seismographs coupled with arrays of borehole strainmeters are required to monitor the dynamics of magmatic and hydrothermal fluids prior to and during eruptions. Inversions of broadband data may be performed to image the forces operating at a source and infer the fluid pathway geometry and mass transport balance in a volcano. The accuracy of such inversions depends on the degree of resolution achieved for the volcanic structure. High-resolution tomography based on iterative inversions of seismic travel-time data can image three-dimensional structures at a scale of a few hundred meters provided adequate local short-period earthquake data are available. Hence, forces in a volcano are potentially resolvable for periods longer than a few seconds. Studies of long-period events and tremor at periods ≤ 1s offer constraints on pressure fluctuations resulting from unsteady mass transport, and observations of volcano-tectonic activity tell us about hydraulic fracturing processes, as well as brittle response of the rock matrix to stress changes induced by rapid injection and/or withdrawal of fluids. Short-period networks are routinely used to monitor, locate, and assess the source properties of volcano-tectonic earthquakes and long-period events, while small-aperture seismic antennas are required to track tremor and separate source, path, and site effects in tremor wavefields. This chapter reviews quantitative methods used in the interpretation of source processes and structures in volcanoes and discusses some of the major challenges faced by volcano seismologists in their quest to understand eruptive behavior.