ArticlePDF Available

Supervised Machine Learning Application for Developing a Predictive Model of the Monthly Phase of the Pacific Decadal Oscillation

April 2021
Research in Computing Science 150(4):15-26

April 2021
150(4):15-26

Authors:

Mexican Institute of Water Technology (IMTA)

In this work, supervised machine learning was applied, using regression trees, to develop a predictive model of the monthly phase of the Pacific De-cadal Oscillation. This oscillation is associated with the alteration of weather patterns , mainly in the North Pacific and southwestern North America. As characteristics , the records of the PDO phase of the 24 months prior to the forecast target month were used. The predictive model developed presented an acceptable capacity to estimate the monthly phase of the PDO. This according to the performance evaluation statistics corresponding to the Mean Absolute Error, Maximum Error, Mean Quadratic Error and Pearson's Correlation, which obtained ranges of [0.55,1.07], [1.58,3.29], [0.55,1.82] and [0.30,0.74] respectively for 20% of test data for the period 1854-2020.

Content uploaded by Indalecio Mendoza Uribe

Content may be subject to copyright.

Supervised Machine Learning Application for Developing

a Predictive Model of the Monthly Phase of the Pacific

Decadal Oscillation

Indalecio Mendoza Uribe

Instituto Mexicano de Tecnología del Agua,

Mexico

indalecio_mendoza@tlaloc.imta.mx

Abstract. In this work, supervised machine learning was applied, using regres-

sion trees, to develop a predictive model of the monthly phase of the Pacific De-

cadal Oscillation. This oscillation is associated with the alteration of weather pat-

terns, mainly in the North Pacific and southwestern North America. As charac-

teristics, the records of the PDO phase of the 24 months prior to the forecast target

month were used. The predictive model developed presented an acceptable ca-

pacity to estimate the monthly phase of the PDO. This according to the perfor-

mance evaluation statistics corresponding to the Mean Absolute Error, Maximum

Error, Mean Quadratic Error and Pearson's Correlation, which obtained ranges of

[0.55,1.07], [1.58,3.29], [0.55,1.82] and [0.30,0.74] respectively for 20% of test

data for the period 1854-2020.

Keywords: Artificial intelligence, climate, regression trees.

1 Introduction

In climatology, machine learning has great potential, especially in phenomena of long

temporal development, as is the case of the Pacific Decadal Oscillation (PDO). PDO is

mainly characterized by changes in sea surface temperature (SST) in the Pacific Ocean

over 20° north latitude, as well as variation in sea level pressure and wind patterns. The

study of the PDO has gained relevance in recent years due to its association with the

alteration of weather patterns, mainly in the North Pacific and southwestern North

America [1, 2, 3, 4]. Alterations in the climate have significant socioeconomic impacts,

especially in countries that base their development on the management of their natural

resources [5].

In the area of artificial intelligence, various machine learning techniques have been

applied to understand, describe and predict the behavior of natural phenomena. Ovando

et al. [6] developed a model based on neural networks to predict the occurrence of frost

in Argentina, based on meteorological data of temperature, relative humidity, cloud

cover, wind direction and speed. On the other hand, Téllez-Valero et al. [7] developed

a system based on machine learning methods that improves the acquisition of data from

ISSN 1870-4069

Research in Computing Science 150(4), 2021pp. 15–26; rec. 2020-12-13; acc. 2021-02-10

natural disasters, the system automatically populates a database of natural disasters with

information extracted from online newspaper news. In addition, Haro-Rivera [8] ap-

plied a decision tree to identify predominant meteorological variables in the province

of Chimborazo, Ecuador. Finally, in this list of examples, Suárez et al. [9] analyzed the

meteorological phenomenon called DANA, which caused serious floods, human losses,

economic and infrastructure damage in the southeast of Spain during the month of Sep-

tember 2019, studying the phenomenon from the perspective of data analysis.

Machine learning is a data analysis technique that gives computers the ability to

learn from experience without relying on a given equation as a model. These algorithms

look for natural patterns in the data that generate knowledge. Algorithms adaptively

improve their performance as the number of samples available for learning increases.

In a general way, we can classify machine learning techniques as supervised and unsu-

pervised.

A supervised learning algorithm takes a set of known data (inputs) and known re-

sponses for this data (outputs) to train a model that can generate reasonable predictions

in response to new data. Supervised learning uses classification and regression tech-

niques to develop predictive models. In comparison, unsupervised learning looks for

hidden patterns or intrinsic structures in the data. Used to infer information from data

sets consisting of input data with no labeled responses. Among the most common un-

supervised learning techniques are neural networks [10], k-means [11], among other.

The objective of this work was to apply supervised machine learning through regres-

sion trees to develop a predictive model of the monthly phase of the Pacific Decadal

Oscillation. As characteristics, the records of the PDO phase of the 24 months prior to

the target month of prognosis were used.

2 Method

The development of the predictive model was carried out by applying three procedures.

First, the historical data set of the monthly value of the PDO was obtained for the period

1854-2020. These data were organized by month and grouped into training and test

data. Second, for each month of the year the regression tree corresponding to the pre-

dictive model was generated with the training data. Third, the monthly predictive mod-

els were applied on the test data sets. The results were evaluated using three continuous

error measurement metrics and one of correlation.

2.1 Dataset

The PDO is a pattern of anomalies of the SST, this fluctuation oscillates between -4

and 4 degrees centigrade, corresponding to the cold and warm phase respectively. The

PDO values indicate the variation of the SST with respect to the historical average. The

data was obtained from National Oceanic and Atmospheric Administration through the

URL https://www.ncdc.noaa.gov/teleconnections/pdo/data.csv. The data set corre-

sponds to the monthly deviation of the SST for the period 1854-2020 (see Fig. 1). For

Indalecio Mendoza Uribe

Research in Computing Science 150(4), 2021 ISSN 1870-4069

each forecast month (label) the values of the previous 24 months were assigned as char-

acteristics. The characteristics and labels for each month of the year were grouped in

separate files to facilitate their processing.

Fig. 1. Monthly anomaly of the Pacific Decadal Oscillation for the period 1854-2020 [12].

Machine learning consists of learning some properties of a data set and then verify-

ing those properties with another data set. A common practice in machine learning is

to evaluate an algorithm by dividing the data into two subsets. The majority set is dom-

inated by training data, from which the algorithm learns some properties. While the

second set of data is called test data, with which the ability of the model to predict

through the learned properties is verified. For this study, the training and test data set

were divided into a proportion of 80 and 20% respectively.

2.2 Generation of the Predictive Model

For each month of the year, the regression tree corresponding to the predictive model

was generated. Each predictive model was trained with 80% corresponding train-

ing data.

Classification and regression trees (CART) were developed by Breiman et al. [13].

Tree models where the target variable can take a finite set of values are called classifi-

cation trees. On the other hand, trees where the target variable can take continuous

values are called regression trees.

Let Y be the response variable and x be the vector with the set of predictor variables,

the problem corresponds to establishing a relationship between Y and x in such a way

Supervised Machine Learning Application for Developing a Predictive Model ...

Research in Computing Science 150(4), 2021ISSN 1870-4069

that it is possible to predict Y based on the values of x. Mathematically looking for

probability P(Y | x1, x2, …, xk).

The construction of the tree is done following a recursive binary division approach,

let N be the number of data and Nj the number of cases in class j.

The probability that a case is in class j given that it was located in the terminal node

t, is given by the Eq. 1.

󰇛󰇜󰇛󰇜

󰇛󰇜 󰇛󰇜



(1)

and comply with:

󰇛󰇜

(2)

Thus, the set of P(j|t) are the relative proportions of the cases in class j at node t [8].

To obtain the optimal tree, evaluate each subdivision among all possible trees, get

the root node and the subsequent ones, the algorithm must measure the predictions

achieved and evaluate them to select the best one. Fig. 2 shows a simplified form of a

regression tree.

Fig. 2. Simplified form of a regression tree.

In this study, machine learning was applied through the Scikit-Learn library of the

Python programming language, which integrates a wide range of machine learning al-

gorithms for supervised and unsupervised problems [14].

Specifically, the tree.DecisionTreeRegressor method was used to create the instance

of the predictive model; train_test_split to divide the training/test data set; mean_ab-

solute_error, mean_squared_error y max_error to measure mean absolute error, mean

square error and maximum error respectively; finally, the function plot_tree was used

to graph the regression trees.

Indalecio Mendoza Uribe

Research in Computing Science 150(4), 2021 ISSN 1870-4069

2.3 Statistical Validation of the Predictive Model

Monthly predictive models were applied on the corresponding test data sets. For the

evaluation of the monthly predictive model of the PDO phase, three continuous error

measurement metrics and Pearson's correlation were used. These metrics are recom-

mended for evaluating forecasts of a deterministic nature. These metrics are de-

scribed below.

The Mean Absolute Error (MAE) measures the magnitude of the errors in a set of

predictions, regardless of their direction [15, 16]. It corresponds to the average of the

absolute differences between the prediction and the observation where all the individual

differences have the same weight (Eq. 3):









(3)

where Pi is the prediction value at position i, Oi is the value observed at position i and

n is the sample size.

The Maximum Error (ME) allows to identify the largest absolute value of the ob-

served error between the prediction and the observation (Eq. 4). It belongs to the set of

objective functions used for the calibration of models [17]:

󰇝󰇞





(4)

The Root Mean Square Root (RMSE) measures the mean magnitude of the error.

Corresponds to the square root of the average of the squared differences between the

prediction and the observation, therefore this measure has been used in the evaluation of

forecasting models [18, 19]. Amplifies and penalizes with greater force those errors of

greater magnitude (Eq. 5):

󰇛󰇜





(5)

Pearson's Correlation, denoted as r (Eq. 6), is a normalized measure widely used to

establish relationships between two continuous quantitative variables [20, 21]. It allows

to show the joint variability and therefore to typify what happens with the data. The

coefficient can score values ranging from -1.0 to 1.0 and is interpreted as follows: val-

ues close to 1.0 indicate that there is a strong association between the variables, that is,

they increase or decrease in the same direction.

On the other hand, values close to -1.0 indicate that there is a strong negative asso-

ciation between the variables, that is, as one variable increases, the other decreases. A

value of 0.0 indicates that there is no correlation or it is a null correlation [22].

Supervised Machine Learning Application for Developing a Predictive Model ...

Research in Computing Science 150(4), 2021ISSN 1870-4069

  󰇛󰇜󰇛󰇜





󰇛󰇜



 󰇛󰇜



 

(6)

where  is the mean value of the predictions and  is the mean value of the observa-

tions.

3 Results

For the creation of the monthly predictive models based on regressive trees, the con-

structor of the DecisionTreeRegressor class was used. Table 1 lists the parameters used

during the creation of the predictive model with which the best results were obtained.

Table 1. Predictive model creation parameters.

Parameter

Value

Description

criterion

mse

Function to measure the quality of the division.

splitter

best

Strategy used to choose the division at each node.

max_depth

None

Maximum depth of the tree. None indicates that nodes are

expanded until all sheets are pure or until all sheets contain

less than min_samples_split samples.

min_samples_split

The minimum number of samples required to divide an in-

ternal node.

min_samples_leaf

The minimum number of samples required to be in a leaf

node.

max_features

The number of features to consider when looking for the

best division.

random_state

Controls the randomness of the estimator. To obtain a de-

terministic behavior during the setting random_state must

be set to an integer.

As part of the training, the algorithm identifies the impact on the prognosis of each

of the characteristics. As can be seen in Table 2, in general, with 12 characteristics,

more than 90% importance is obtained in the forecast.

These 12 characteristics are not the same for all months of the year, therefore, in the

training stage, the 24 characteristics are initially considered, but the algorithm is in-

structed to only select the 12 most relevant characteristics. This reduction in dimensions

Indalecio Mendoza Uribe

Research in Computing Science 150(4), 2021 ISSN 1870-4069

allows the algorithm to be optimized by eliminating characteristics that do not contrib-

ute to the forecast.

Table 2. Percentage of importance by characteristics for monthly predictive models.

Month

Jan

Feb

Mar

Apr

May

Jun

Jul

Aug

Sep

Oct

Nov

Dec

Characteristics

Algorithm 1 presents in a simplified way the sequence of steps to divide the data

into the training/test subsets, feed the classifier (predictive model) with the training

data, apply the classifier on the test data, calculate model performance evaluation met-

rics, graphing and data storage. Clarification is made that the algorithm does not detail

the modules of dataReadingMonth() and graphingStorage().

Supervised Machine Learning Application for Developing a Predictive Model ...

Research in Computing Science 150(4), 2021ISSN 1870-4069

Algorithm 1: Simplified sequence to generate, train, apply and validate the monthly predic-

tive model

for month in range(0,12):

totalCharacteristics, totalLabels = dataReadingMonth(month)

trainingCharacteristics, testCharacteristics, trainingLabels, testLabels = \

train_test_split(totalCharacteristics, totalLabels,train_size=0.80, \

test_size=0.20, random_state= 5)

# Creation of the instance (object) of type DecisionTreeRegressor (predictive model)

predictiveModel = tree.DecisionTreeRegressor(criterion = 'mse', splitter = 'best', \

max_depth = None, min_samples_split = 2, min_samples_leaf = 1, \

max_features = 12, random_state=5)

# Feed the classifier with the training data (train the predictive model)

predictiveModel.fit(trainingCharacteristics,trainingLabels)

# Apply the predictive model to the test data set

predictions = predictiveModel.predict([testCharacteristics])

predictedLabels = predictions[0]

# Calculate MAE, ME, RMSE and r performance metrics

mae = round(mean_absolute_error(testLabels,predictedLabels),2)

me = round(max_error(testLabels,predictedLabels),2)

rmse = round(mean_squared_error(testLabels,predictedLabels),2)

pearson = sc.pearsonr(testLabels,predictedLabels)

r = round(pearson[0],2)

storageGraph(month,predictiveModel,mae,me,rmse,r)

Figures 3 and 4 show arbitrarily the trees corresponding to the predictive models for

the months of June and December, respectively.

Fig. 3. Regression tree for the month of June. The predictive model was trained with 80% of data

from the period 1854-2020. The strongest fill color indicates the majority class for classification.

Indalecio Mendoza Uribe

Research in Computing Science 150(4), 2021 ISSN 1870-4069

Fig. 4. Regression tree for the month of December. The predictive model was trained with 80%

of data from the period 1854-2020. The strongest fill color indicates the majority class for clas-

sification.

Monthly predictive models were applied for 20% of test data. Table 3 shows the

results of the four statistical metrics applied by the monthly predictive model. Besides

that, Fig. 5 shows the dispersion diagrams with the comparison between the observed

and predicted data.

Table 3. Result of the statistical metrics of the monthly predictive models.

Target Month

MAE

RMSE

January

0.79

2.49

1.03

0.59

February

0.66

2.09

0.70

0.64

March

0.61

2.44

0.66

0.72

April

0.74

1.58

0.74

May

0.86

2.52

1.13

0.62

June

0.64

1.79

0.62

0.77

July

1.01

2.47

1.44

0.55

August

1.07

3.29

1.82

0.30

September

1.01

2.83

1.58

0.38

October

0.69

1.60

0.68

0.74

November

0.55

1.75

0.55

0.77

December

0.89

3.53

1.31

0.47

Supervised Machine Learning Application for Developing a Predictive Model ...

Research in Computing Science 150(4), 2021ISSN 1870-4069

Fig. 5. Monthly dispersion diagrams between observed and predicted values for 20% of test

data for the 1854-2020 period.

Indalecio Mendoza Uribe

Research in Computing Science 150(4), 2021 ISSN 1870-4069

4 Conclusions

Of the 24 characteristics considered, it was identified that characteristic 23 in eleven

months and characteristic 22 in the month of July, predominated as root node in the

trees of the predictive models, that is, these characteristics have a greater impact on

forecasts. In addition, it was distinguished that in 12 characteristics more than 90% of

importance is obtained in the prognosis.

The predictive model developed using machine learning presented an acceptable ca-

pacity to estimate the monthly phase of the PDO. This according to the results of the

performance evaluation statistics MAE, ME, RMSE and r obtained for 20% of test data,

with ranges of [0.55, 1.07], [1.58, 3.29], [0.55, 1.82] y [0.30, 0.74] respectively. There-

fore, it is considered that the predictive model developed can constitute a reference

forecasting tool, but not an exact one.

As future work, it is proposed to continue with the validation and adjustment of the

predictive model for its application in larger time windows, such as for seasonal fore-

cast (3 months), or even annual forecast.

Regarding the functionality of the Scikit-Learn library, this turned out to be docile

to implement and very efficient in its performance. The computational cost required for

the training and testing of the predictive model was of the order of seconds on a per-

sonal computer.

References

1. Mantua, N.J., Hare, S.R.: The Pacific Decadal Oscillation. Journal of Oceanography, 58,

35–44 (2002). Doi: https://doi.org/10.1023/A:1015820616384

2. Cayan, D.R., Dettinger, M.D., Diaz, H.F., Graham, N.E.: Decadal variability of precipitation

over western North America. Journal of Climate, 11, 3148-3166 (1998). Doi:

https://doi.org/10.1175/1520-0442(1998)011<3148:DVOPOW>2.0.CO;2

3. Higgins, R.W., Leetmaa, A., Xue, Y., Barnston, A.: Dominant factors influencing the sea-

sonal predictability of U.S. precipitation and surface air temperature. Journal of Climate,

13(22), 3994–4017 (2000). Doi: https://doi.org/10.1175/1520-

0442(2000)013<3994:DFITSP>2.0.CO;2

4. Gutzler, D.S., Kann, D.M., Thornbrugh, C.: Modulation of ENSO-based long-lead outlooks

of Southwest U.S. Winter precipitation by the Pacific Decadal Oscillation. Weather and

Forecasting, 17, 1163–1172 (2002).

5. Méndez-González, J., Ramírez-Leyva, A., Zárate-Lupercio, A., Cavazos-Pérez, T.: Tele-

conexiones de la Oscilación Decadal del Pacífico (PDO) a la precipitación y temperatura en

México. Investigaciones Geográficas, 73, 57–70 (2010).

6. Ovando, G., Bocco, M., Sayago, S.: Redes neuronales para modelar predicción de heladas.

Agricultura Técnica, 65(1), 65–73 (2005). Doi: http://dx.doi.org/10.4067/S0365-

28072005000100007

7. Téllez-Valero, A., Montes, M., Villaseñor-Pineda, L.: Using Machine Learning for Extract-

ing Information from Natural Disasters News Reports. Computación y Sistemas, 13(1), 33–

44 (2009).

8. Haro-Rivera, S.M.: Árbol de decisión, aplicación con datos meteorológicos. KnE Engineer-

ing, 5(2), 37–46 (2020). Doi: https://doi.org/10.18502/keg.v5i2.6217

Supervised Machine Learning Application for Developing a Predictive Model ...

Research in Computing Science 150(4), 2021ISSN 1870-4069

9. Suárez, L., Alarcon, P.A.: Inteligencia artificial para la comprensión de desastres naturales.

Telefónica Digital, España (2020).

10. Mercado-Polo, D., Pedraza-Caballero, L., Martínez-Gómez, E.: Comparación de Redes

Neuronales aplicadas a la predicción de Series de Tiempo. Prospectiva, 13(2), 88–95 (2015).

Doi: http://dx.doi.org/10.15665/rp.v13i2.491

11. Kanungo, T., Mount, D.M., Netanyahu, N.S., Piatko, C.D., Silverman, R., Wu, A.Y.: An

efficient k-means clustering algorithm: analysis and implementation. IEEE Transactions on

Pattern Analysis and Machine Intelligence, 24(7), 881–892 (2002). Doi:

https://doi.org/10.1109/TPAMI.2002.1017616

12. NOAA (National Oceanic and Atmospheric Administration) Pacific Decadal Oscillation

(PDO), https://www.ncdc.noaa.gov/teleconnections/pdo, last accessed 2021/02/22.

13. Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees.

Chapman & Hall/CRC, New York (1984). Doi: https://doi.org/10.1201/9781315139470

14. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel,

M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D.:

Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, 12(85),

2825–2830 (2011).

15. Willmott, C.J., Matsuura, K.: Advantages of the mean absolute error (MAE) over the root

mean square error (RMSE) in assessing average model performance. Climate Research, 30,

79–82 (2005).

16. Karamirad, M., Omid, M., Alimardani, R., Mousazadeh, H., Heidari, S.N.: ANN based sim-

ulation and experimental verification of analytical four and five parameters models of PV

modules. Simulation Modelling Practice and Theory, 34, 86–98 (2013).

17. Gupta, H.V., Sorooshian, S., Yapo, P.O.: Toward improved calibration of hydrologic mod-

els: Multiple and noncommesurable measure of information. Water Resources Research,

34(4), 751–763 (1998).

18. González-Leyva, F., Ibáñez-Castillo, L.A., Valdés, J.B., Vázquez-Peña, M.A., Ruiz-García,

A.: Pronóstico de caudales con Filtro de Kalman Discreto en el río Turbio. Tecnología y

Ciencias del Agua, 6(4), 5-24 (2015).

19. Vázquez, M.: Predicción de series de tiempo usando un modelo híbrido basado en la

descomposición wavelet. Comunicaciones estadísticas, 11(2), 257–283 (2018).

20. Restrepo, L.F., González, J.: De Pearson a Spearman. Revista Colombiana de Ciencias Pec-

uarias, 20, 183–192 (2007).

21. Martínez-Curbelo, G., Cortés-Cortés, M.E., Pérez-Fernández, A.C.: Metodología para el

análisis de correlación y concordancia en equipos de mediciones similares. Universidad y

Sociedad, 8(4), 65–70 (2016).

22. Anderson, R.B., Doherty, M.E., Friedrich, J.C.: Sample size and correlational inference.

Journal of Experimental Psychology: Learning, Memory and Cognition, 34(4), 929–944

(2008). Doi: https://doi.org/10.1037/0278-7393.34.4.929

Indalecio Mendoza Uribe

Research in Computing Science 150(4), 2021 ISSN 1870-4069

ResearchGate has not been able to resolve any citations for this publication.

Árbol De Decisión, Aplicación Con Datos Meteorológicos/Decision Tree, Application With Meteorological Data

Article

Full-text available

Jan 2020

Silvia Haro

La minería de datos es una técnica que hoy en día se aplica en muchas áreas de las ciencias, es por ello que con el objetivo de identificar variables meteorológicas predominantes a ocho intervalos de tiempo se aplicó la técnica supervisada árbol de clasificación en data mining. La información se obtuvo de la estación Alao, misma que se encuentra ubicada a 3064 m.s.m en la provincia de Chimborazo, Ecuador. El estudio se realizó mediante código desarrollado en el software estadístico R; los datos corresponden a información por hora del año 2016, las variables analizadas fueron; temperatura del aire, humedad relativa, presión barométrica, radiación solar difusa, radiación solar global, temperatura del suelo a −20cm y velocidad de viento. El árbol mostró que la principal variable en esta zona es la radiación solar global, a horas comprendidas de 06h00 a 08h00, si ésta es mayor o igual a 120w/m2, entonces se puede determinar la presión barométrica de 09h00 a 11h00 de la mañana; y si ésta es mayor o igual que 709w/m2, entonces se predice la temperatura del aire. El árbol de decisión es una técnica que permitió identificar variables meteorológicas relevantes, en determinadas horas donde se encuentra ubicada la estación Alao. Abstract: Data mining is a technique that today is applied in many areas of science, which is why in order to identify predominant meteorological variables at eight time intervals the supervised tree classification technique was applied in data mining. The information was obtained from the Alao station, which is located at 3064 m.s.m in the province of Chimborazo, Ecuador. The study was carried out using a code developed in statistical software R, the data correspond to information by hour of the year 2016, the variables analyzes were air temperature, relative humidity, barometric pressure, diffuse solar radiation, global solar radiation, soil temperature at −20cm and wind speed. The showed that the main variable in this area is the global solar radiation, at hours between 06h00 and 08h00, if it is greater than or equal to 120w/m2, then the barometric pressure can be determined from 09h00 to 11h00 of the morning, if, and it is great than or equal to 709w/m2, then the air temperature is predicted. The decision tree is a technique that allowed us to identify relevant meteorological variables in certain hours where the Alao station is located. Palabras clave: árboles de decisión, datos meteorológicos. Keywords: decision tree, meteorological data.

Comparación de Redes Neuronales aplicadas a la predicción de Series de Tiempo.

Article

Full-text available

Dec 2015

El presente estudio tiene como objetivo principal presentar la comparativa de las redes neuronales artificiales (RNA) tipo perceptrón multicapa (MLP) y de funciones de base radial (RBF) aplicadas a la predicción de series de tiempo. Se utilizó resilient backpropagation como algoritmo de aprendizaje para la red MLP y una combinación entre el algoritmo de los k-emanes y el método de la matriz pseudoinversa para la RBF. La implementación de las RNA se realizó utilizando un sistema basado en arquitectura cliente-servidor, previendo una futura integración con aplicaciones en tiempo real. Para la evaluación de las RNA se utilizaron conjuntos de datos de diferentes características y cantidad de datos.De acuerdo a los resultados obtenidos se concluye que para la utilización e integración de técnicas de inteligencia computacional en sistemas web, es preferible el uso de las RBF, debido a que obtiene mejores tiempos de ejecución. Es importante resaltar también que en calidad de respuesta los dos tipos de redes neuronales obtienen resultados similares.

Pronóstico de caudales con Filtro de Kalman Discreto en el río Turbio

Article

Full-text available

Sep 2015

This paper proposes the use of the discreet Kalman filter (DKF) along with an autoregressive model with exogenous inputs (ARX) for short-term streamflow forecasting with lead times of 24, 48, 72 and 96 hours. This model was applied to the Turbio River basin, located in the state of Guanajuato and a portion of the state of Jalisco, Mexico. This area is vulnerable to flooding during rainy periods which normally occur in the region. The forecasting was based on available precipitation and streamflow data from the years 2003 and 2004. The results indicate that the forecasts performed with one-step ahead, that is with a 24-hour lead time, present better fits than 48, 72 and 96-hour lead times in terms of Nash-Sutcliffe, MSE and RMSE.

Teleconexiones de la Oscilación Decadal del Pacífico (PDO) a la precipitación y temperatura en México

Article

Full-text available

Dec 2010

Esta investigación resalta la variabilidad de la precipitación (PP), temperatura máxima (TM) y mínima (tm) y su relación con las teleconexiones climáticas de la Oscilación Decadal del Pacífico (PDO) en México durante el periodo de 19502007. Correlaciones no paramétricas realizadas con una significancia estadística del 90% o mayor, en 550 estaciones climatológicas distribuidas en todo el territorio mexicano, fueron calculadas a escala mensual. Los resultados sugieren los siguientes patrones de PDOPP, TM y tm: periodos húmedos por arriba de lo normal durante el invierno boreal (novabr) coinciden con la fase cálida (positiva) de la PDO, así como condiciones cálidas durante verano boreal (mayoct) extendiéndose sobre el noreste de México. Los resultados confirman las teleconexiones de la PDO y la amplificación de signos climáticos en México a escalas locales y regionales.

Predicción de series de tiempo usando un modelo híbrido basado en la descomposición wavelet

Article

Dec 2018

Michael Vasquez

El pronóstico de series de tiempo que exhiben una estructura de segundo orden que vara en función del tiempo ha recibido especial atención debido a la dificultad de obtener buenos pronósticos, especialmente cuando existe una estructura poco homogénea al final de los datos. En este trabajo, se usa una metodología adecuada para pronosticar series de tiempo, con un alto nivel de ruido que evidencien no estacionariedad. Especialmente, se combina la transformación wavelet discreta de máximo traslape (MODWT) con el modelo ARFIMA-HYGARCH y redes neuronales. Ambos modelos se aplican para pronosticar la tasa de cambio USD/COP. Los resultados sugieren que la metodología basada en wavelets y redes neuronales, proveen pronósticos más precisos para pronosticar una apreciación/depreciación del tipo de cambio.

Classification And Regression Trees

Book

Oct 2017

The methodology used to construct tree structured rules is the focus of this monograph. Unlike many other statistical procedures, which moved from pencil and paper to calculators, this text's use of trees was unthinkable before computers. Both the practical and theoretical sides have been developed in the authors' study of tree methods. Classification and Regression Trees reflects these two sides, covering the use of trees as a data analysis method, and in a more mathematical framework, proving some of their fundamental properties.

Revista Colombiana de Ciencias Pecuarias

Article

Jan 1997

María S. González Domínguez

Using Machine Learning for Extracting Information from Natural Disaster News Reports Usando Aprendizaje Automático para Extraer Información de Noticias de Desastres Naturales

Article

The disasters caused by natural phenomena have been present all along human history; nevertheless, their consequences are greater each time. This tendency will not be reverted in the coming years; on the contrary, it is expected that natural phenomena will increase in number and intensity due to the global warming. Because of this situation it is of great interest to have sufficient data related to natural disasters, since these data are absolutely necessary to analyze their impact as well as to establish links between their occurrence and their effects. In accordance to this necessity, in this paper we describe a system based on Machine Learning methods that improves the acquisition of natural disaster data. This system automatically populates a natural disaster database by extracting information from online news reports. In particular, it allows extracting information about five different types of natural disasters: hurricanes, earthquakes, forest fires, inundations, and droughts. Experimental results on a collection of Spanish news show the effectiveness of the proposed system for detecting relevant documents about natural disasters (reaching an F-measure of 98%), as well as for extracting relevant facts to be inserted into a given database (reaching an F-measure of 76%).

ANN based simulation and experimental verification of analytical four-and five-parameters models of PV modules

Article

May 2013
SIMUL MODEL PRACT TH

In this article, artificial neural network (ANN) is adopted to predict photovoltaic (PV) panel behaviors under realistic weather conditions. ANN results are compared with analytical four and five parameter models of PV module. The inputs of the models are the daily total irradiation, air temperature and module voltage, while the outputs are the current and power generated by the panel. Analytical models of PV modules, based on the manufacturer datasheet values, are simulated through Matlab/Simulink environment. Multilayer perceptron is used to predict the operating current and power of the PV module. The best network configuration to predict panel current had a 3–7–4–1 topology. So, this two hidden layer topology was selected as the best model for predicting panel current with similar conditions. Results obtained from the PV module simulation and the optimal ANN model has been validated experimentally. Results showed that ANN model provide a better prediction of the current and power of the PV module than the analytical models. The coefficient of determination (R2), mean square error (MSE) and the mean absolute percentage error (MAPE) values for the optimal ANN model were 0.971, 0.002 and 0.107, respectively.A comparative study among ANN and analytical models was also carried out. Among the analytical models, the five-parameter model, with MAPE = 0.112, MSE = 0.0026 and R2 = 0.919, gave better prediction than the four-parameter model (with MAPE = 0.152, MSE = 0.0052 and R2 = 0.905). Overall, the 3–7–4–1 ANN model outperformed four-parameter model, and was marginally better than the five-parameter model.

Advantages of the Mean Absolute Error (MAE) over the Root Mean Square Error (RMSE) in Assessing Average Model Performance

Article

Dec 2005

The relative abilities of 2, dimensioned statistics-the root-mean-square error (RMSE) and the mean absolute error (MAE) -to describe average model-performance error are examined. The RMSE is of special interest because it is widely reported in the climatic and environmental literature; nevertheless, it is an inappropriate and misinterpreted measure of average error. RMSE is inappropriate because it is a function of 3 characteristics of a set of errors, rather than of one (the average error). RMSE varies with the variability within the distribution of error magnitudes and with the square root of the number of errors (n(1/2)), as well as with the average-error magnitude (MAE). Our findings indicate that MAE is a more natural measure of average error, and (unlike RMSE) is unambiguous. Dimensioned evaluations and inter-comparisons of average model-performance error, therefore, should be based on MAE.

Supervised Machine Learning Application for Developing a Predictive Model of the Monthly Phase of the Pacific Decadal Oscillation

Abstract

Recommended publications

Predictive Model of the ENSO Phenomenon Based on Regression Trees

Regression: Data-Driven Modeling of Numerical Relationships in Tourism

Detección temprana de enfermedades del corazón mediante el aprendizaje automático

Comparison of the Predictive Capabilities of Several Data Mining Algorithms and Multiple Linear Regr...