ArticlePDF Available

Spatio-Temporal Analysis of Water Quality Parameters in Machángara River with Nonuniform Interpolation Methods

November 2016
Water 8(11):507

November 2016
8(11):507

DOI:10.3390/w8110507

License
CC BY 4.0

Authors:

Patricio Vizcaíno

Universidad de las Fuerzas Armadas-ESPE

Enrique V. Carrera

Universidad de las Fuerzas Armadas-ESPE

Marga Sanromán

King Juan Carlos University

Show all 6 authorsHide

Water quality measurements in rivers are usually performed at intervals of days or months in monitoring campaigns, but little attention has been paid to the spatial and temporal dynamics of those measurements. In this work, we propose scrutinizing the scope and limitations of state-of-the-art interpolation methods aiming to estimate the spatio-temporal dynamics (in terms of trends and structures) of relevant variables for water quality analysis usually taken in rivers. We used a database with several water quality measurements from the Machángara River between 2002 and 2007 provided by the Metropolitan Water Company of Quito, Ecuador. This database included flow rate, temperature, dissolved oxygen, and chemical oxygen demand, among other variables. For visualization purposes, the absence of measurements at intermediate points in an irregular spatio-temporal sampling grid was fixed by using deterministic and stochastic interpolation methods, namely, Delaunay and k-Nearest Neighbors (kNN). For data-driven model diagnosis, a study on model residuals was performed comparing the quality of both kinds of approaches. For most variables, a value of k=15 yielded a reasonable fitting when Mahalanobis distance was used, and water quality variables were better estimated when using the kNN method. The use of kNN provided the best estimation capabilities in the presence of atypical samples in the spatio-temporal dynamics in terms of leave-one-out absolute error, and it was better for variables with slow-changing dynamics, though its performance degraded for variables with fast-changing dynamics. The proposed spatio-temporal analysis of water quality measurements provides relevant and useful information, hence complementing and extending the classical statistical analysis in this field, and our results encourage the search for new methods overcoming the limitations of the analyzed traditional interpolators.

Monitoring station location at the Mach?ngara River. The station name and numeric codes were provided by the Metropolitan Water Company of Quito, Ecuador.

…

Behavior of MAE r for different k values.

…

MAE r for different normalized numbers of measurements. DL-MAE r is for Delaunay linear, DN-MAE r is for Delaunay nearest, and kNN-MAE r is for kNN method.

…

Cont.

…

Spatio-temporal distributions for |?AE| = |AE kNN ? AE Delaun |: (a) O&F; (b) DET; (c) BOD/COD; (d) DO; (e) NO 3 ; and (f) PO 4 .

…

Figures - uploaded by Enrique V. Carrera

Content may be subject to copyright.

Content uploaded by Enrique V. Carrera

Content may be subject to copyright.

water

Article

Spatio-Temporal Analysis of Water Quality

Parameters in Machángara River with Nonuniform

Interpolation Methods

Iván P. Vizcaíno 1,*, Enrique V. Carrera 1, Margarita Sanromán-Junquera 2,

Sergio Muñoz-Romero 2, José Luis Rojo-Álvarez 2and Luis H. Cumbal 3

1Departamento de Eléctrica y Electrónica, Universidad de las Fuerzas Armadas ESPE,

Av. General Rumiñahui s/n, P.O. Box 171-5-231B, Sangolquí, Ecuador; evcarrera@espe.edu.ec

2Department of Signal Theory and Communications and Telematic Systems and Computation,

Universidad Rey Juan Carlos, Camino del Molino S/N, 28942 Fuenlabrada, Madrid, Spain;

marga.sanroman@urjc.es (M.S.-J.); sergio.munoz@urjc.es (S.M.-R.); joseluis.rojo@urjc.es (J.L.R.-Á.)

3Centro de Nanociencia y Nanotecnología, Universidad de las Fuerzas Armadas ESPE,

Av. General Rumiñahui s/n, P.O. Box 171-5-231B, Sangolquí, Ecuador; lhcumbal@espe.edu.ec

*Correspondence: ipvizcaino@espe.edu.ec; Tel.: +593-2398-9400 (ext. 1873)

Academic Editor: Richard Skefﬁngton

Received: 25 July 2016; Accepted: 28 October 2016; Published: 4 November 2016

Abstract:

Water quality measurements in rivers are usually performed at intervals of days or months

in monitoring campaigns, but little attention has been paid to the spatial and temporal dynamics of

those measurements. In this work, we propose scrutinizing the scope and limitations of state-of-the-art

interpolation methods aiming to estimate the spatio-temporal dynamics (in terms of trends and

structures) of relevant variables for water quality analysis usually taken in rivers. We used a database

with several water quality measurements from the Machángara River between 2002 and 2007 provided

by the Metropolitan Water Company of Quito, Ecuador. This database included flow rate, temperature,

dissolved oxygen, and chemical oxygen demand, among other variables. For visualization purposes,

the absence of measurements at intermediate points in an irregular spatio-temporal sampling grid was

ﬁxed by using deterministic and stochastic interpolation methods, namely, Delaunay and k-Nearest

Neighbors (

NN). For data-driven model diagnosis, a study on model residuals was performed

comparing the quality of both kinds of approaches. For most variables, a value of

15 yielded

a reasonable ﬁtting when Mahalanobis distance was used, and water quality variables were better

estimated when using the

NN method. The use of

NN provided the best estimation capabilities in

the presence of atypical samples in the spatio-temporal dynamics in terms of leave-one-out absolute

error, and it was better for variables with slow-changing dynamics, though its performance degraded

for variables with fast-changing dynamics. The proposed spatio-temporal analysis of water quality

measurements provides relevant and useful information, hence complementing and extending the

classical statistical analysis in this ﬁeld, and our results encourage the search for new methods

overcoming the limitations of the analyzed traditional interpolators.

Keywords: water quality; interpolation; smoothing; Delaunay; kNN

1. Introduction

Pollution is related to the introduction into the environment of substances, from anthropogenic

or natural origin, which are harmful or toxic to humans and ecosystems. Pollution usually alters the

chemical, physical, biological, or radiological integrity of soil, water, and living species, resulting

in alterations of the food chain, with effects on human health [

]. In particular, water pollution is

mainly due to the increment in urban and industrial density. Growing population waste poses a threat

Water 2016,8, 507; doi:10.3390/w8110507 www.mdpi.com/journal/water

Water 2016,8, 507 2 of 17

to public health and jeopardizes the continuous use of water reserves [

]. For example, contamination

of watercourses is a consequence of wastewater discharge, from municipal, industrial, or farming

runoffs [

]. Typically, urban wastewater is a complex mixture containing water (usually over 99%)

mixed with organic and inorganic compounds, both in suspension and dissolved with very small

concentrations (mg/L) [

]. Globally, two million tons of wastewater are discharged into the world

waterways [

]. Wastewater Treatment Plants (WWTPs) are used to combat water pollution of rivers in

communities (municipalities) reducing suspended solids and the organic load to accelerate the natural

process of water puriﬁcation [3,7].

On the other hand, several properties and factors are usually considered in the water quality

analysis and in the monitoring of pollution water sources in order to assess the impact of water pollution

on flora, fauna, and humans. Water appearance, color or turbidity, temperature, taste, and smell often

describe the physical properties of drinking water, whereas the water chemical characterization includes

the analysis of organic and inorganic substance concentrations. Microbiological features are related to

pathogenic agents (bacteria, viruses, and protozoa), which are relevant to public health and usually

modify the water chemistry. In addition, radiological factors could be also considered in areas where

water comes into contact with radioactive substances [

]. Other speciﬁcations such as water hardness,

pH, acidity, oils, and fats can also be taken into account in the water quality analysis.

Water quality monitoring focuses on programmed sampling, measurement, and recording of

regulated water quality parameters. The water quality management in rivers can be more efﬁcient

when: (1) monitoring of rivers is continuous, hence its seasonal behavior can be characterized; (2) the

sampling period is based on the spatio-temporal dynamics (trends or patterns) of the measured

variables; (3) the choice of the sampling sites takes into account the basin irregularities; and

(4) other factors at the study area are taken into consideration, such as population and industrial

growth. Measurements are not usually taken uniformly at determined locations and times during the

monitoring campaigns, and the pollutant concentrations in river waters do not follow linear variations.

Therefore, the use of mathematical models with basic physics (that govern the transport process of

pollution) and linear models can be complemented with data-driven models for modeling the river

contaminants dynamics, in the sense of trends and spatio-temporal structures [9].

For these reasons, several scientiﬁc works have scrutinized different spatio-temporal analysis

of water quality from a statistical point of view, in order to understand their behavior and help to

generate water decontamination designs in a more efﬁcient way. Siyue et al. analyzed up to 41 sites at

the Han River (China) during 2005 and 2006 in order to explore the spatio-temporal variations in the

basin [

]. Cluster methods and analysis of variance (ANOVA) grouped the 41 sampling sites into

ﬁve statistically signiﬁcant clusters. Results showed that dissolved inorganic nitrogen and nitrates had

large spatial variability, while nitrogen had a relatively higher concentration in wet seasons compared

with dry seasons, and phosphorous had the opposite trend. On the other hand, Serre et al. used

the Bayesian maximum entropy to analyze spatio-temporal variability of water quality parameters

in the case of phosphate estimation along the Raritan River basin (New Jersey, USA) between 1990

and 2002 [

]. The database consisted of 1305 phosphate measurements at 55 monitoring stations.

Their results showed that the spatio-temporal analysis improves the purely spatial analysis when the

water samples are noisy and scarce. In addition, Duan et al. proposed a statistical multivariate analysis

including cluster analysis, discriminant analysis and principal component analysis/factor analysis to

distinguish spatio-temporal variation of water quality and contaminants [

]. Fourteen parameters

were studied in 28 sites of Eastern Poyang Lake Basin, Jiangxi Province of China from January 2012 to

April 2015. This work also pointed out the spatio-temporal analysis as a tool to help in the optimization

of the water quality monitoring programs. The impact of wastewater was also scrutinized in a detailed

anthropogenic study of the Henares River (Spain) [

]. The Henares River runs through residential,

industrial, and farming areas. Thus, strategic points were chosen along the river, with ﬁve stations

upstream of a WWTP, and ﬁve stations downstream. Six monitoring campaigns were carried out

between April and June 2010, assembling 36 water samples altogether. Descriptive statistics, such as

Water 2016,8, 507 3 of 17

frequency or mean of pollutant concentration and uni-dimensional graphical representations were used

to analyze their spatial and temporal evolution, showing the inﬂuence of the wastewater discharge and

of the farm areas’ proximity. For example, high concentrations of polycyclic aromatic hydrocarbons,

which are usually adsorbed on the river sediments, still continued along the Henares River regardless

of season. Note that all these results point out the relevance of the observable dynamics of these

pollutants with respect to time and space. This work pointed out the importance of the spatio-temporal

analysis in order to visualize the trends of some compounds in the rivers, which could determine

a possible relationship between river water contamination and wastewater efﬂuent discharges.

However, and to the best of our knowledge, the variability of measurements jointly expressed in

space and time has not been explored for analyzing the spatio-temporal distributions of water quality

variables. In the present work, we propose scrutinizing the scope and limitations of state-of-the-art

interpolation methods aiming to estimate the spatio-temporal dynamics (in terms of trends and

structures) of relevant variables for water quality analysis usually taken in rivers. For visualization

purposes, the absence of measurements at intermediate points in an irregular spatio-temporal sampling

grid is ﬁxed by using deterministic and stochastic interpolation methods, namely, Delaunay and

-Nearest Neighbors (

NN). For data-driven model diagnosis, a study on model residuals is performed,

allowing for comparison of the model quality for both kinds of approaches.

These methods are here applied to pollution measurements at the Machángara River and its

tributaries in Quito, Ecuador. Whereas several previous studies of Machángara River pollution have

been made since 1977 [

], they have conducted statistical analysis on speciﬁc variables such as

phosphates, pesticides, nitrates, and hydrocarbons, but a more detailed and complete view of the

wastewater dynamics can still be addressed.

The rest of this paper is as follows. In Section 2, the materials and methods are explained,

including the mathematical description of the interpolation algorithms and details of the database

used for this analysis. In Section 3, results are presented for a number of measured variables, the

algorithmic performance is benchmarked, a comparative analysis is made on the data-driven residuals

of the models with both methods, and the analysis on several environmental variables and their

spatio-temporal dynamics is described. In Section 4, the results are discussed, and in Section 5,

the main conclusions are presented.

2. Materials and Methods

2.1. Study Area

Quito, the capital of Ecuador, is located at approximately 2815 m above sea level, at UTMWGS84

coordinates with latitude 9973588.50 [00

S] and longitude 776529.41 [78

W], as depicted

in Figure 1, and it had an average temperature of 14

C between 2002 and 2007. The Machángara

River was chosen for this study because it is the main wastewater collector of Quito. This river runs

through the city from south to north, collecting wastewater at a distance of approximately 22 km, and

it receives about 75% of the city waste. Along the river pathway, 25 water quality monitoring stations

are installed [

]. For our work, six stations were chosen in the upstream section of the River, within

a reach of about 10 km, in order to monitor large amounts of wastewater. The identiﬁcation of the

monitoring stations is shown in Table 1, and the water quality parameters to be analyzed are described

in Table 2.

Sixty-four monitoring campaigns were carried out to measure 15 water quality parameters

between 2002 and 2007. Note that a value of each parameter is usually collected in each campaign.

However, some water quality parameters are sometimes not collected, and also more than one value

can be registered in several campaigns. The number of water quality measurements available for each

variable is shown in Table 3. In the same time period, rainfall measurements were conducted at one

weather station near the study area, and these measurements were assembled to compare rainfall with

water quality variables at the Machángara River.

Water 2016,8, 507 4 of 17

San Pedro River

Guambi River

Machangara River

Pita River

Pusuqui River

Chiche River

Sam Bachi River

Uravia River

2.09

3.04

3.03

3.02

6.03

6.02

2.14

1.09

2.11

2.10

2.08

2.07

4.08

4.03

4.02

750000

755000

760000

765000

770000

775000

780000

785000

790000

795000

800000

9960000

9965000

9970000

9975000

9980000

9985000

9990000

9995000

Oceano Pacfi co

Peru

Colombia

PICHINCHA

780’0"W

7830’0"W

790’0"W

00’0"

030’0"S

0 46.000 92.000 138.000 184.00023.000

Metros

Scale:1:200.000

Figure 1.

Monitoring station location at the Machángara River. The station name and numeric codes

were provided by the Metropolitan Water Company of Quito, Ecuador.

The preprocessing of the water quality database required the design of the following modules in

Matlab

™

(R2014b, TheMathWorks Inc., Natick, MA, USA): (1) station selection, which allowed the

graphical selection of water quality monitoring stations from a map of Quito and those measurements;

and (2) model estimation with smoothing interpolation methods and its representations, which

allows us to work with the database of the selected monitoring stations in speciﬁc sections along the

Machángara River. The latter module also helped to calculate the Mean Absolute Error (MAE) for the

two studied interpolation algorithms, namely, Delaunay and kNN algorithms.

Table 1.

Monitoring Stations of Machángara River. ST1 is the ﬁrst station and

is the distance from

each station with respect to the ﬁrst one. Each monitoring station name is followed by the original code

provided by the Metropolitan Water Company of Quito, Ecuador.

Station Name Code d (km)

R. Mch. El Recreo (2.07) ST1 0.00

R. Mch. Villaﬂora (2.08) ST2 1.75

R. Mch. El Sena (2.09) ST3 2.75

R. Mch. El Trébol (2.10) ST4 4.91

R. Mch. Las Orquídeas (2.11) ST5 6.31

Q. El Batán (1.09) ST6 9.49

Water 2016,8, 507 5 of 17

Table 2. Studied water quality parameters for the case study of the Machángara River.

Variable Acronym Units

Flow rate Q m3/s

Temperature T ◦C

Dissolved Oxygen DO mg/L

Biochemical Oxygen Demand BOD mg/L

Chemical Oxygen Demand COD mg/L

BOD/COD ratio BOD/COD

Total Dissolved Solids TDS mg/L

Total Suspended Solids TSS mg/L

Ammonia NH3mg/L

Total Nitrogen TNK mg/L

Nitrate NO3mg/L

Phosphates PO4mg/L

Detergents DET mg/L

Oils and Fats O&F mg/L

Total Escherichia coli ColiT mg/L

Table 3. Interpolation errors for each variable with nonuniform interpolation methods.

Variable No. MAE MAEr MAE MAEr MAE MAEr

Measur. (Dela_lin) (Dela_lin) (Dela_nea) (Dela_nea) (k= 15) (k= 15)

Q 306 0.60 0.23 0.71 0.27 0.58 0.22

T 393 1.88 0.11 2.00 0.12 1.88 0.11

DO 329 1.16 0.47 1.28 0.52 1.03 0.42

BOD 396 47.24 0.31 52.14 0.34 49.14 0.32

COD 396 106.99 0.30 122.36 0.34 114.96 0.32

BOD/COD 396 0.66 0.25 0.65 0.25 0.79 0.30

TSS 136 122.54 0.47 142.42 0.55 123.94 0.47

TDS 392 51.18 0.17 54.34 0.18 50.93 0.16

NH3377 3.09 0.15 3.08 0.15 4.66 0.22

TNK 82 2.90 0.08 2.70 0.08 4.25 0.12

NO3286 0.59 0.33 0.63 0.35 0.65 0.36

PO4382 0.90 0.32 0.91 0.32 1.08 0.38

DET 381 0.27 0.26 0.27 0.27 0.22 0.22

O&F 270 10.71 0.72 10.64 0.72 10.37 0.70

ColiT 345 1.53 0.46 1.54 0.47 1.54 0.46

Average 324 0.31 0.33 0.32

Note: Mean Absolute Error (MAE) and relative error of MAE (MAEr) for Delaunay linear (Dela_lin),

Delaunay nearest (Dela_nea) and k-Nearest Neighbors (kNN) methods.

2.2. Interpolation Algorithms

Our aim in the present work is to show that statistical interpolation can yield relevant information

on the underlying dynamics of the analyzed variables, which can complement the current knowledge

and analysis of measurements themselves. The proposed interpolation techniques not only improve

data visualization, but they also allow the identiﬁcation of trends and structures that are consistently

supported by measurements being close in time or space. The result of such an interpolation process

can provide an enhanced information view for assisting water analysts. Note that we are actually

working here with two conventional and well-known approaches, namely, deterministic interpolation

(given by Delaunay) and statistical interpolation (given by

NN). The ﬁrst one does not provide more

than just a grid visualization of time-spatial data, and the second one is well known in the machine

learning literature for being able to provide us with the dynamics or trends in the underlying evolution

of the measured variables. As a result, these trends are more easily and consistently observed in

statistical interpolation approaches, especially when noise and perturbations are clearly present in the

Water 2016,8, 507 6 of 17

measurements. Note that, in this case, the interpolation process can be seen as a smoothing estimation

process, which identiﬁes the consistent trends and separates them from the system perturbations, as

estimated by the model residuals.

In studies about multi-dimensional variables, it can be useful to search for dependencies among

them; therefore, the construction of mathematical models should be able to describe those existing

relationships. Regression models can explain the dependency relationship between a response variable

and one or more independent or explanatory variables in such a way that these models can estimate

new values from a new unobserved set of measurements from the explanatory variables.

The use of nonparametric regression is sometimes suitable when a response is difﬁcult to obtain

in terms of physical models or when the measuring methods are expensive. The main objective of

the interpolation is to estimate one or more unknown independent variables from a given set of

simultaneously measured samples from the independent variables and the response variable.

An intermediate goal in this work is to ﬁll up the regular grid in the quantitative representation

of water quality measurements at those times when no monitoring campaigns were conducted, and in

those spaces of rivers where there are no monitoring stations. The interpolation methods used in this

work were Delaunay Triangulation and kNN.

2.2.1. Interpolation with Delaunay Triangulation

The interpolation with Delaunay triangulation has been used in digital cartography for

the generation of digital terrain models [

]. The starting point of this method is a cloud of

three-dimensional (3D) points, usually irregularly spatial distributed, which allows us to represent

surfaces digitally. This triangulation approximates surfaces by irregular and planar triangles that

connect the 3D points. In this work, we do not use the 3D spatial coordinates of the points as input

space, but instead we pursue a representation for two-dimensional input spaces given by the time and

location (in terms of the distance along the river path), where a measurement was taken.

The Delaunay interpolation method is based on Voronoi diagrams and Delaunay triangulation,

which uses the Euclidean distance as interpolation criterion [

]. Given two points in the spatio-temporal

plane (x,t), denoted as p1= (x1,t1)and p2= (x2,t2), the Euclidean distance among them is

dist(p1,p2):=q(x1−x2)2+ (t1−t2)2. (1)

Let

P={p1,p2, ..., pn}

be a set of ndistinct points (or sites) in the spatio-temporal plane.

The Voronoi diagram of Pis the subdivision of the plane in ncells (Figure 2a), one for each site

. The condition is that a point

lies in the cell corresponding to a site

if and only if

dist(pe,pi)<dist(pe,pj)

for each

pj∈P

with

j6=i

. The Voronoi diagram of

is denoted by

Vor(P)

, and it indicates only the edges and vertices of the subdivision

[

]. Graph

has a node

for every Voronoi cell equivalent for every site, and the union of external edges for each Gconforms

a polygon Pol.

Figure 2b shows the measurements (

axe) of a variable which forms a polyhedron of irregular

triangles where the measurements are the vertices. Note that bold uppercases are used to represent

points (vertices of irregular triangles which form a polyhedron) deﬁned in the coordinates

while the projections of these vertices in the plane

are represented by bold lowercases.

For example, samples represented by points

, and

form a triangle polyhedron, and when

it is projected in the plane

, a new triangle with

, and

vertices is formed. The estimated

value,

, at a new point

E= (xE

tE)

, is obtained two-fold: (1) a Delaunay triangle, which encloses

the point

, is found; and (2)

is computed as the results of applying the values

and

in the

plane equation deﬁned by the points

and

in the linear interpolation, and as the

value of the

nearest neighbor vertex in the nearest interpolation.

Water 2016,8, 507 7 of 17

2 4 6 8 10

Distance, x

Time, t

Region

Vor(P)

(a) (b)

Figure 2.

Representation and nomenclature of the elements in our Delaunay interpolation: (

) Delaunay

triangulation and Voronoi diagram; and (b) obtaining a polyhedral from a set of sample points.

2.2.2. kNN Interpolation

The

NN rule is among the simplest statistical learning tools in density estimation, classiﬁcation,

and regression. Trivial to train and easy to code, the nonparametric algorithm is surprisingly

competitive and fairly robust to errors when using cross-validation procedures [

]. The ﬁtting

is made by using only those measurements close to the target point

. A function of weights assigned

to each piis based on the distance from pe.

The usual calculation methods of known distances are Euclidean, Manhattan, Minkowski,

weighted Euclidean, Mahalanobis, and Cosine, among others. The Mahalanobis distance between two

points p1and p2is deﬁned as

dist M(p1,p2) = q(p1−p2)0∑−1(p1−p2), (2)

where

∑

is the covariance matrix. Mahalanobis distance has advantageous properties compared to

the use of Euclidean distance, namely, it is invariant to changes in scale, and it does not depend on

measurements units. By using matrix

∑−1

, we consider correlations between variables and redundancy

effect. The estimation function of

is represented by

f(pe)

, and it is estimated according to Distance

Weighted Nearest Neighbor algorithm as

f(pe) = ∑k

i=1wif(pi)

∑k

i=1wi

, (3)

where

f(pi)

represents the samples near

, and

is the weights function that is deﬁned in terms of

Mahalanobis distance as

wi=1

dist M(pe,pi)2. (4)

2.3. Performance Measures

The goal of any data-driven methodology is to estimate (learn) a useful model of the unknown

system from available data. A criteria related to usefulness is the prediction accuracy (generalization),

related to the capability of the model to provide accurate estimates for future data. In the learning

problem, the goal is to estimate a function by using a ﬁnite number of training samples. The availability

of a ﬁnite number of training samples implies that any estimate of an unknown function is often

inaccurate. In regression learning problems, we can obtain a measurement of the performance in

Water 2016,8, 507 8 of 17

terms of the generalization capabilities of the model, with the goal of minimizing the empirical risk as

described below [19].

Given

D= (xi

mi)n

i=1

as the training set, the pairs

(xi

ti)

are identiﬁed as inputs and

(mi)

outputs, where

represents the distance,

is time, and

is any water quality measurement. The basic

goal of supervised learning is to use the training set

to learn a function

(in the hypothesis space

)

that evaluates at a new pair (x,t)and estimates its associated value (m).

In order to measure the quality of

function, we use a loss function denoted by

l(ˆ

The estimation for a given

f(x

, and the true value is

(m)

. One of the loss functions

used in this paper is the absolute error loss, which can be written as

l(ˆ

f,D) = |ˆ

f(x,t)−m|. (5)

Given a function

, a loss function

, and a probability distribution

over

, the generalization

error (also called actual error) of ˆ

fis deﬁned as

Rgen[ˆ

f] = EDl(ˆ

f,D), (6)

which is also the expected loss on a new example which has been randomly drawn from the distribution.

In general, we do not know

and cannot compute

Rgen[ˆ

. Therefore, we use the empirical error

(or risk) of ˆ

fas

Remp[ˆ

f] = 1

∑

i=1

l(ˆ

f,Di), (7)

and when the loss function is the absolute error loss, the empirical error is

Remp[ˆ

f] = 1

∑

i=1

|ˆ

f(xi,ti)−mi|, (8)

which is the risk function used in this work, but from now on, we will use for it MAE, [20,21],

MAE =Rem p[ˆ

f]. (9)

Therefore, predictive performance of regression models can be estimated by using standard

metrics such as the regression MAE.

The loss function can be calculated using the validation data, which are sensitive to the choice of

the validation set. This is a problem when the data set is small, and, in these cases, the cross validation

technique allows more efﬁcient use of available data [

]. For statistical result evaluation, the k-fold

cross-validation method was used here, where data are partitioned into

subsets or folds,

, ...,

that are generally of the same size. A

partition serves for testing and the remaining ones for training.

On the ﬁrst iteration,

is used for the test and the remaining

, ...,

for training. Therefore,

iterations are carried out until

are tested, and the others are used for training. Each data set

sample is used once for training and after that just for testing. Leave-one-out is a special case of k-fold

cross-validation where

is set to number of initial tuples. That is, only one sample is “left out” at a time

for the test set. Therefore, in this work, we have used Leave-One-Out for the estimation of the MAE

in the two interpolation algorithms used here, called Delaunay (either with linear or with nearest

criterion) and kNN (with Mahalanobis distance).

2.4. Behavior of Interpolation Errors

Given that the interpolation error depends on the analyzed variable, the number of measurements,

and the interpolation method, it is advisable to use a relative error of the MAE value. Following [

in this work, we use

MAEr=MAE

u, (10)

Water 2016,8, 507 9 of 17

where MAE

is the relative error of MAE, and

is the average value of each variable of water quality.

On the other hand, the MAE obtained by the

NN algorithm for different variables of water

quality depends on the kparameter, which takes different values due to the nature of each variable.

3. Results

In this section, the performance of the interpolation algorithms is analyzed, based on data from

monitoring campaigns conducted in irregular time periods and non-uniform distances between

stations. This is a usual situation, which can be due to logistical problems or bad weather conditions,

among other factors.

3.1. Free Parameter k and Algorithm Comparisons

In order to establish a comparison between deterministic and statistical interpolation, we started

by scrutinizing the value of

to be used as a free parameter in the

NN algorithm. Figure 3shows the

changes of MAE

with respect to

. It can be observed that we almost always need few neighbors for

yielding a value close to the minimum MAE. As errors decrease very slowly after some point, and for

computational simplicity, we decided to use

15 for all the

NN variable models. On the other hand,

Figure 4shows the variability of MAE

with respect to the normalized number of measurements. It can

be observed that MAE

is reduced by increasing the number of available samples, though a extremely

reduced number of available samples sometimes can yield an apparently reduced error, probably due

to the poor representation of the dynamics in these cases.

Figure 3. Behavior of MAErfor different kvalues.

Figure 4.

MAE

for different normalized numbers of measurements. DL-MAE

is for Delaunay linear,

DN-MAEris for Delaunay nearest, and kNN-MAEris for kNN method.

Water 2016,8, 507 10 of 17

Table 3presents the MAE of each variable obtained with each interpolation algorithm (with

k= 15 for kNN

). Note that MAE values are signiﬁcantly different among water quality variables, and

because of that, we also included the relative MAE (MAE

). The average value of MAE

was 0.31 for

Delaunay-linear, 0.33 for Delaunay-nearest, and 0.32 for

NN, which, roughly speaking, shows that

about two thirds of the variations are jointly explained by the underlying dynamics.

We also analyzed which interpolation method provides with the best estimation of the dynamics

(i.e., trends or patterns) for the observed variables. Figure 5a shows the interpolation of NH

with

Delaunay-linear, which also resembles the one obtained by applying Delaunay-nearest shown in

Figure 5b. Both interpolation techniques present a typical step-like view of the interpolated variable.

On the other hand, Figure 5c shows the interpolation results of NH

with the

NN method. In this

later case, data dynamics are better observed because of the improved smoothing, allowing us to

easily see spatial and temporal trends. As another example, Figure 5d shows PO

interpolation with

Delaunay-linear, while Figure 5e shows a noticeable smoothing when the

NN method is used. Again,

the

NN technique shows more clearly some spatial trends for the PO

variable. Figure 5f shows

another example of the ColiT interpolation when using the

NN method, displaying the dynamics

of some trends and consistent peaks on it. Interpolation errors of each method on each variable are

detailed in Table 3.

Figure 6a shows the rainfall in the study area during the period 2002–2007 recorded by a nearby

weather station. This information is included for comparison of some of the water quality variables in

the same ﬁgure. The variables in Figure 6are Q, T, DO, BOD/COD, and TNK, whose representations

are drawn in elevation view for better visual observation of their spatio-temporal dynamics. As far as

Figures 5and 6show eight variables of a total of 15 water quality parameters of Machángara River,

the seven remaining variables that are not represented are BOD, COD, TDS, TSS, NO

, DET, and

O&F. It should be noted that those representations exhibit a similar smoothing compared to the eight

variables previously represented when using the kNN method.

(a) (b)

Figure 5. Cont.

Water 2016,8, 507 11 of 17

(e) (f)

Figure 5.

Results of Delaunay and

NN Interpolation methods: (

) NH

with Delaunay linear; (

) NH

with Delaunay nearest; (

) NH

with

NN; (

) PO

with Delaunay linear; (

) PO

with

NN; and

(f) ColiT with kNN.

500 1000 1500 2000

100

120

140

160

180

200

time, days

Rain, (mm/m2)

(a) (b)

(e) (f)

Figure 6.

Spatio-temporal variation: (

) rainfall level in Quito from 2002 to 2007 at ‘La Tola’ monitoring

station; (b) Q; (c) DO; (d) T; (e) BOD/COD; (f) TNK.

Water 2016,8, 507 12 of 17

3.2. Analysis of the Spatio-Temporal Model Residuals

In the previous section, it was not clear which interpolation method performed better just

in terms of averaged error. For a fair benchmarking, we proposed making an analysis on the

spatio-temporal distribution of the model residuals. Taking into account that the leave-one-out residual

was obtained for each method in each sample, Figure 7displays the difference in terms of Absolute

Error (

) between

NN and Delaunay methods for six different variables. Blue markers represent

the difference of

(

∆AE =AEDel aun −AEkN N

) when Delaunay obtains worse performances than

NN (i.e., for the case

AEDelaun −AEk NN >

0) and red markers are shown otherwise (i.e., for the case

∆AE =AEkN N −AED elaun).

(a) (b)

(e) (f)

Figure 7.

Spatio-temporal distributions for

|∆AE|=|AEkN N −AED ela un |

: (

) O&F; (

) DET;

Water 2016,8, 507 13 of 17

From Figure 7, several ideas can be summarized. Although the largest differences (due to outliers

or atypical measurements) can be obtained for both methods in some cases, as seen in (e,f), outliers are

better treated by

NN in most of the cases, as seen in (a–d). In addition,

NN works better for some

given variables, which are (a,b), and (d), whereas its performance can degrade compared to Delaunay

in cases such as (e), or it can be similar in cases such as (f).

If we compare these results with the observations and estimation in Figure 5, it can be concluded

that

NN works better for outliers and for slow-dynamics variables with smooth changes, whereas

fast-dynamics variables can be over-smoothed by this method, and, then its model residuals are not

capable of improving the trivial interpolation made by Delaunay.

3.3. Evolution of Water Quality Measurements

Flow rate (Q). Figure 6b shows the Machángara River ﬂow rate, and it depends on several factors,

namely, the tributaries formed by streams coming from the Pichincha volcano (Quito is a city located

between the slopes of a volcano and the Machángara River), the runoffs due to rainfall in the upper

basin of Quito, and the wastewater from domestic and industrial discharges in the central and the

southern parts of the city. Additionally, Quito does not have independent pipes for domestic, industrial,

or runoff water, but rather this wastewater is a composition of all them. Figure 8represents the average

of the maximum values of ﬂow rates for each year from 2002 to 2007 of a total of 19 major tributaries

upstream of the Machángara River. Figure 8shows two peaks (

m3/s

) that are present at about 500 days

(2003) and 1250 days (2005), whereas we can see a decrease of the ﬂow rate at 800 days (2004), likely

due to the scarcity of rains. Although this represents an annually averaged measurements of ﬂow

rates, this plot and the rainfall one (Figure 6a) could better explain the evolution of water discharges

into the river as shown in Figure 6b. In this last ﬁgure, two ﬂow rate peak at about 500 and 1500 days

are displayed, which could be due to rainfall and ﬂow rate of the Machángara River’s tributaries.

The changes in the total ﬂow could affect the behavior of other water quality characteristics.

Figure 8.

Average of the maximum values of ﬂow rates for each year (from 2002 to 2007) depicted

when time is shown in days.

Dissolved Oxygen (DO). Figure 6c shows that DO increases in space after about 2 km, and in time

especially after January 2006 (1460 days). Figure 6d shows a temperature increase in the river

s water at

the last station (9.49 km). As temperature increases, oxygen solubility decreases. Therefore, dissolved

oxygen should be lower (see Figure 6c). In addition, it can be observed that DO increases during the

last 600 days between ST3 and ST5. In this section, the Machángara River has a lot of stones and debris.

This condition may cause an intensive crush of water against these materials, hence producing a large

amount of small water bubbles. It is well known that as water bubbles get smaller, the liquid–gas

interface area increases, and thus oxygen can be dissolved at a higher rate. As a result, DO should be

Water 2016,8, 507 14 of 17

also higher. In addition, temperatures in the last 600 days in those stations showed a slight decrease

which could also contribute to the increase of DO.

Temperature (T). Water temperature, shown in Figure 6d, is another relevant parameter in the study

of the river water quality. It mainly depends on temperature of domestic and industrial discharges,

rainfall, and environmental temperature. The spatio-temporal distribution shows two main effects,

namely, an increase in the space after 8 km, and an increase after 6 km only present after 1750 days

(October 2006). This temperature change in the last 300 days could be two-fold: (1) in general, the

ambient temperature has increased in the last years due to the global warming effect, and the water of

the Machángara River (shallow river) has also received the global impact increasing its temperature;

and (2) population close to the river has also increased in that period of time. In fact, Quito’s population

was 1,842,202 inhabitants in 2001, while it was 2,239,191 in 2010, a growth rate of 2.41% per year [

Thus, hot water for personal care, washing kitchen utensils, and cleaning activities in hospitals and

industries are discharged in the river.

Biodegradability index (BOD/COD). Organic matter biodegradability can be estimated by the ratio

between BOD and COD [

]. According to [

], the organic matter biodegradability is classiﬁed

as follows:

• If BOD/COD ≥0.4, then organic matter is very degradable.

• If BOD/COD ∈(0.2, 0.4), then organic matter is moderately degradable.

• If BOD/COD ≤0.2, then organic matter is little degradable.

Figure 6e shows the BOD/COD ratio where there is a relatively stable value with distance. In the

time period between 800 and 1200 days, there were several industries in the study area, which used

to discharge a high amount of non-biodegradable liquid compounds. To investigate the pollution

impact caused by their water discharges directly into the Machángara River in the time period of 2002

to 2007, there were taken into account a total of 54 representative industries of all cities upstream

of the river, and there were two important industrial zones that had 15 industries (27.78%), mainly

textiles (dyes) and food and beverage (dyes). The municipal authorities of Quito assessed industries

that met wastewater treatment regulations before discharging them into river. Results showed that

industries meeting water quality standards were 75% in 2005, 63% in 2006 and 69% in 2007. It is

most likely that industries that did not meet environmental regulations contributed to a high load of

non-biodegradable compounds discharged into the river. Unfortunately, there is no more information

from the other years.

Total Nitrogen Kjeldahl (TNK). This variable is the sum of ammonia (NH

) and ammonium (NH

and the maximum allowed value in Ecuador is 40 mg/L according to [

]. Figure 6f shows the TKN

variation, which is in this case constrained to about the last 400 days of measurements. In general, there

is a sustained level near the limit, both below and above it, for most of the available monitored periods.

4. Discussion

Since topography is stable with time, it can be treated with deterministic interpolation (such as the

Delaunay algorithm). However, water dynamics can not be determined accurately by just deterministic

interpolation, except for simple visualization purposes. Our work shows that statistical interpolation

is capable of estimating the water dynamics with moderate model orders and distinguishing between

dynamics, given by the spatio-temporal trends present in the model, and perturbations of a very

different nature, given by the model residuals (including system perturbations, measurement errors,

outliers and atypical values, and other uncertainty sources). Despite the main relevance of system

knowledge to improve the water quality, our motivation for this work has been given by the idea that

current system knowledge is partly guided by measurements. In addition, spatio-temporal statistical

interpolation of measurements can enhance the information that can be extracted from the data for

helping to improve the knowledge on the sources of pollution.

In many previous works, (i.e, [

]) databases built in no longer than two years were used.

Alternatively, in this work, we used a ﬁve-year monitoring database, which allowed for a signiﬁcant

Water 2016,8, 507 15 of 17

amount of records of water quality parameters similar to the work described in [

]. Our database

consisted of 64 monitoring campaigns and 4867 water quality records. This allowed us to build

interpolation grids with a spatial resolution of 400 m and a temporal resolution of one day. We obtained

a simple to adjust

value by using the

NN algorithm where a stable and close to minimum MAE

was achieved. This simplicity allowed us to construct a spatio-temporal grid with the measured water

quality parameters and the data processed by nonuniform interpolation methods.

When analyzing the model residuals for comparison between

NN and Delaunay interpolation,

we found that

NN estimation provides acceptable estimation of the variable dynamics in the presence

of atypical samples, and in slow-dynamics variables, whereas it can present some over-smoothing

effects on fast-changing variables. This suggests that, whereas conventional interpolation algorithms

can provide acceptable estimation capabilities, further interpolation algorithms should be designed for

overcoming their current limitations.

The MAE obtained for phosphates in [

] was 0.466 by using Bayesian methods, while, in this

work, it is 1.08 when using

NN. This difference could be due to different water quality datasets, and,

therefore, it does not stand for a straight comparison. However, we consider this previous work as

comparable to ours in terms of estimation techniques. While [

] presents only the nitrate dynamics

of the Turia River (Valencia Spain), in this paper we show nitrogen and other variables with good

spatio-temporal resolution.

5. Conclusions

The proposed spatio-temporal analysis of water quality measurements using interpolation

algorithms for measurements from campaigns can provide useful and relevant information on their

dynamics, in the sense of trends and structure. This can complement the current knowledge from the

experience and from physical models and help extend it. New methods of interpolation are encouraged

to overcome the limitations of conventional interpolation methods in this scenario. While a secondary

target, visualization of these trends provides a way of visually inspecting the data models, and

residual visualization can provide data quality measurement of the estimation model under use and

its uncertainty.

Water quality values resulting from the application of the smoothing interpolation algorithms,

especially for those places that are difﬁcult to reach and for irregular time periods, can also provide

relevant information for designers of wastewater treatment plants. For example, it can be used for

other sections of the Machángara River and make studies about inter-dependence between water

quality variables, (e.g., nitrates and phosphates).

The database used in this work corresponds to a period between 2002 and 2007, a time period

when few hydrology monitoring stations existed for capturing the rainfall in the city or near the study

zone. Even today, there are no more water quality monitoring stations than those ones constructed in

2002–2007. The major contributors of wastewater in the Machángara River are domestic and industrial

discharge, and furthermore, in our city, there were no separate pipes for rainfall and wastewater

(and still today there are not yet any). For these reasons, in our study, we especially missed having

denser spatial sampling rates (stations), as well as the always desirable increase in time sampling rates

(measurement campaigns).

A limitation of this study is the lack of time records (the hour of the day) in which the water

samples were collected and analyzed. Variables such as water temperature, concentrations of detergents,

phosphates, oils and fats are not constant during the 24 h, since they depend on discharge of domestic

and industrial wastewater and meteorological conditions. Therefore, conducting an extended study

considering smaller time periods between samples for 24 h each day could provide us with useful

information for studies on the uses of water than could be characterized by time and population type.

Acknowledgments:

This work was supported in part by the Universidad de las Fuerzas Armadas ESPE under

Grant 2015-PIC-004 and has also been partly supported by Research Grants PRINCIPIAS (TEC2013-48439-C4-1-R)

and FINALE (TEC2016-75161-C2-1-R) from the Spanish Government and PRICAM (S2013/ICE-2933) from

Water 2016,8, 507 16 of 17

Comunidad de Madrid. The authors thank the Metropolitan Water Company of Quito, Ecuador, for providing the

Machángara River water quality data.

Author Contributions:

Iván P. Vizcaíno, Enrique V. Carrera, José Luis Rojo-Álvarez and Luis H. Cumbal conceived

and designed the experiments. Iván P. Vizcaíno, Sergio Muñoz-Romero and Margarita Sanromán-Junquera

performed the experiments. Enrique V. Carrera, Luis H. Cumbal and José Luis Rojo-Álvarez supervised the

experiments. Iván P. Vizcaíno wrote the paper, and all authors contributed with changes in all sections.

Conﬂicts of Interest: The authors declare no conﬂicts of interest.

References

Van der Perk, M. Soild and Water Contamination from Molecular to Catchment Scale, 1st ed.; Taylor and

Francis/Balkema: Leiden, The Netherlands, 2006.

Duan, W.; Takara, K.; He, B.; Luo, P.; Nover, D.; Yamashiki, Y. Spatial and temporal trends in estimates of

nutrient and suspended sediment loads in the Ishikari River, Japan, 1985 to 2010. Sci. Total Environ.

2013

461–462, 499–508.

Duan, W.; He, B.; Takara, K.; Luo, P.; Nover, D.; Sahu, N.; Yamashiki, Y. Spatiotemporal evaluation of water

quality incidents in Japan between 1996 and 2007. Chemosphere 2013,93, 946–953.

4. Heinke, G.G.; Ingeniería Ambiental, 2nd ed.; Prentice Hall Hispanoamericana, S.A.: Upper Saddle River, NJ,

USA, 1999; pp. 421–424.

Tebbutt, T.H.Y. Principles of Water Quality Control, 5th ed.; Butterworth-Heinemann an Imprint of Elsevier

Science: Oxford, UK, 1998; pp. 21–22.

Corcoran, E.; Nellemann, C.; Baker, E.; Bos, R.; Osborn, D. Sick Water? The Central Role of Wastewater

Management in Sustainable Development; Savelli, H., Ed.; Birkeland Trykkeri AS: Birkeland, Norway, 2010.

Meneses, M.; Concepción, H.; Vilanova, R. Joint Environmental and Economical Analysis of Wastewater

Treatment Plants Control Strategies: A Benchmark Scenario Analysis. Sustainability 2016,8, 360.

Thangarajan, M. Groundwater Resource Evaluation, Augmentation, Contamination, Restoration, Modeling and

Management; Springer: Dordrecht, The Netherlands; Capital Publishing Company: New Delhi, India, 2007;

pp. 12–17.

Taalohi, M.; Tabatabaee, H. Predicting Bar Dam Water Quality using Neural-Fuzzy Inference System. Indian J.

Fundam. Appl. Life Sci. 2014,4, 630–636.

10.

Li, S.; Liu, W.; Gu, S.; Cheng, X.; Xu, Z.; Zhang, Q. Spatio temporal dynamic of nutrients in the upper Han

River basin, China. Hazard. Mater. 2009,162, 1340–1346.

11.

Serre, M.; Carter, G.; Money, E. Geostatistical space/time estimation of water quality along the Raritan river

basin in New Jersey. Dev. Water Sci. 2004,55, 1839–1852.

12.

Duan, W.; He, B.; Nover, D.; Yang, G.; Chen, W.; Meng, H.; Zou, S.; Liu, C. Water Quality Assessment and

Pollution Source Identiﬁcation of the Eastern Poyang Lake Basin Using Multivariate Statistical Methods.

Sustainability 2016,8, 133.

13.

Gomez, M.; Herrera S.; Solé, D.; García-Calvo, E.; Fernández-Alba, A. Spatio temporal evaluation of organic

contaminants and their transformation products along a river basin affected by urban, agricultural and

industrial pollution. Sci. Total Environ. 2012,420, 134–145.

14.

Empresa Pública Metropolitana de Agua Potable Quito. Estudios de Factibilidad y Diseños Deﬁnitivos del

Plan de Descontaminación de los Ríos de Quito Informe No.1 “Revisión de la Información Existente y Diagnóstico”;

Technical Report; Empresa Pública Metropolitana de Agua Potable Quito: Quito, Ecuador, 2009.

15.

Municipio del Distrito Metropolitano de Quito. Plan de Desarrollo 2012–2022. Consejo Metropolitano de

Planiﬁcación. Quito, Ecuador; Municipio del Distrito Metropolitano de Quito: Quito, Ecuador, 2011; pp. 14–26.

16.

Priego de los Santos, J.; Porres de la Haza, M. La Triangulación Delaunay Aplicada a los Modelos Digitales

del Terreno; Universidad Politécnica de Valencia: Valencia, Spain, 2002; pp. 1–8.

17.

De-Berg, M.; Cheong, O.; Van-Kreveld, M.; Overmars, M. Computational Geometry, Algorithms and Applications,

3rd ed.; Springer: Berlin/Heidelberg, Germany, 2008; pp. 196–198.

18.

Karl, S.; Truong, Q. An Adaptable k-Nearest Neighbors Algorithm for MMSE Image Interpolation.

IEEE Trans. Image Process. 2009,18, 1976–1987.

19.

Cherkassky, V.; Mulier, F. Learning From Data: Concepts, Theory, and Methods, 2nd ed.; Wiley-Interscience:

Hoboken, NJ, USA, 2007; pp. 61–64.

Water 2016,8, 507 17 of 17

20.

Elisseeff, A.; Pontil, M. Leave-one-out error and stability of learning algorithms with applications.

Mach. Learn. Res. 2002,55, 71–97.

21.

Mukherjee, S.; Niyogi, P.; Poggio, T.; Rifkin, R. Statistical Learning: Stability Is Sufﬁcient For Generalization and

Necessary and Sufﬁcient for Consistency of Empirical Risk Minimization; Massachusetts Institute of Technology:

Cambridge, MA, USA, 2004.

22.

Rogers, S.; Girolami, M. A First Course in Machine Learning, 1st ed.; Chapman & Hall/CRC: New York, NY,

USA, 2011; pp. 29–31.

23.

Uriel-Jiménez, E.; Aldás-Manzano, J. Análisis Multivariante Aplicado; Thomson Editores Spain Paraninfo S.A.:

Madrid, Spain, 2005; pp. 56–57.

24. Instituto Nacional de Estadísticas y Censos. Base de Datos Censo 2010; INEC: Quito, Ecuador, 2010.

25.

Tien, M.; Lai, J.; Jin, H. Estimating the Biodegradability of Treated Sewage Samples Using Synchronous

Fluorescence Spectra . Sensors 2011,11, 7382–7394.

26.

Martín, I.; Betancourt, J. Guía Sobre Tratamientos de Aguas Residuales Urbanas para PequeñOs NúCleos de

PoblacióN. Mejora de la Calidad de los Eﬂuentes, 1st ed.; Daute Diseño, S.L.: Las Palmas, Spain, 2006.

27.

Presidencia de la República del Ecuador. Norma de Calidad Ambiental y de Descarga de Eﬂuentes: Recurso Agua;

Technical Report; Presidencia de la República del Ecuador: Quito, Ecuador, 2012.

28.

Clement, L.; Thas, O.; Vanrolleghem, P.A.; Ottoy, J.P. Spatio-temporal statistical models for river monitoring networks.

Water Sci. Technol. 2006,53, 9–15.

29.

Capella, J.; Bonastre, A.; Ors R.; Peris, M. In line river monitoring of nitrate concentration by means of

a Wireless Sensor Network with energy harvesting. Sens. Actuators 2013,177, 419–427.

2016 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access

article distributed under the terms and conditions of the Creative Commons Attribution

(CC-BY) license (http://creativecommons.org/licenses/by/4.0/).

Longitudinal Chemical Gradients and the Functional Responses of Nutrients, Organic Matter, and Other Parameters to the Land Use Pattern and Monsoon Intensity

Article

Full-text available

Jan 2022

River water quality degradation is one of the hottest environmental issues worldwide. Therefore, monitoring water quality longitudinally and temporally is crucial for effective water management and contamination control. The main aim of this study was to assess the longitudinal variations in water quality in the mainstream of the Han River, Korea, from 2015 to 2019. The trophic state classification (TSC), microbial pollution indicator (MPI), and river pollution index (RPI) were calculated to characterize river water quality and revealed more serious pollution toward the downstream zone (Dz) due to agricultural and urban-dominated areas. The biodegradability index (BI) indicated that non-biodegradable organic pollutants are increasing in the water body from the urban and animal wastewater treatment plants. Nutrients, organic matter contents, total suspended solids, ionic factors, and algal chlorophyll were higher in the Dz than in any other zones and were markedly influenced by the summer monsoon. Empirical analysis showed that nutrients and organic matter had positive linear functional relations with agricultural and urban coverage and negative linear relations with forest coverage. The pollutant-transport function suggested that suspended solids act as TP and TN carriers. Regression analysis indicated that TP (R2 = 0.47) has more positive functional relations with algal growth than TN (R2 = 0.22). Our findings suggest that a combination of empirical models and pollution indices might be utilized to assess river water quality and that the resulting information could aid policymakers in managing the Han Rive

Spatial and Temporal Analysis of Nitrate Dynamics along the Tigris River

Article

Full-text available

Jan 2023

Muwafaq H. Al-Lami

Given the wide dependency on surface water used to supply drinking water, agricultural irrigation, and industrial activities, nitrate pollution has posed a serious concern in the Tigris River in recent years. The main objective of this study was to develop an understanding of the spatiotemporal patterns of nitrate distribution in the Tigris River through an integrated approach using hydrological data, physicochemical parameters, and model-based analysis. Eighty-four monthly sampling campaigns from forty monitoring locations along the Tigris River were carried out from January 2011 to December 2018. Obtained results demonstrated that the NO3- dynamics were strongly correlated with the length of transport distance and flow rates along the river system (p < 0.05). High flow rates in the upper courses of the river system favored physical transport of NO3- and promoted a dilution effect. However, low flow rates in the lower sections favored the accumulation processes of NO3- and promoted a concentration effect. High concentration of 7.0±1.96 g NO3- m-3 was observed in February 2018 downstream in the river. No significant seasonal effect in NO3- concentrations were observed. These results were supported by the changes in dissolved oxygen concentration and pH in the river system and indicated high nitrification rates and elevated NO3- accumulation, particularly downstream in the river. This modeling approach has also confirmed field observations of NO3- dynamics with 65% of the variances in the river system being explained by the model.

Reconstructing missing data by comparing interpolation techniques: Applications for long-term water quality data

Article

Full-text available

May 2023
LIMNOL OCEANOGR-METH

Missing data are typical yet must be addressed for proper inferences or expanding datasets to guide our lim-nological understanding and management of aquatic systems. Interpolation methods (i.e., estimating missing values using known values within the dataset) can alleviate data gaps and common problems. We compared seven popular interpolation methods for predicting substantial missingness in a long-term water quality dataset from the Upper Mississippi River, U.S.A. The dataset included 80,000 sampling sites collected over 30 yr that had substantial missingness for total nitrogen (TN), total phosphorus (TP), and water velocity. For all three interpolated water quality variables, random forests had very high prediction accuracy and outperformed the methods of ordinary kriging, polynomial regressions, regression trees, and inverse distance weighting. TP had a mean absolute error (MAE) of 0.03 mg (L-TP) À1 , TN had a MAE of 0.39 mg (L-TN) À1 , and water velocity had a MAE of 0.10 m s À1. The random forests' error rates were mapped and showed low spatiotemporal variability across the riverscape, indicating high model performance across many habitat types and large spatial scales. In the current era of "big data," interpolation becomes an imperative step prior to ecological analyses yet remains unfamiliar and underutilized. Our research briefly describes the importance of addressing missingness and provides a roadmap to conduct model intercomparisons of other big datasets. We also share adaptable data analysis scripts, which allows others to readily conduct interpolation comparisons for many limnology applications and contexts.

Water quality modelling using principal component analysis and artificial neural network

Article

Full-text available

Dec 2022

The study investigates the latent pollution sources and most significant parameters that cause spatial variation and develops the best input for water quality modelling using principal component analysis (PCA) and artificial neural network (ANN). The dataset, 22 water quality parameters were obtained from Department of Environment Malaysia (DOE). The PCA generated six significant principal component scores (PCs) which explained 65.40 % of the total variance. Parameters for water quality variation are mainlyrelated to mineral components, anthropogenic activities, and natural processes. However, in ANN three input combination models (ANN A, B, and C) were developed to identify the best model that can predict water quality index (WQI) with very high precision. ANN A model appears to have the best prediction capacity with a coefficient of determination (R2) = 0.9999 and root mean square error (RMSE) = 0.0537. These results proved that the PCA and ANN methods can be applied as tools for decision-making and problem-solving for better managing of river quality.

Modelling the effects of urbanization on nutrients pollution for prospective management of a tropical watershed: A case study of Skudai River watershed

Article

Nov 2021
ECOL MODEL

Nutrient pollution is considered as a primary factor of water quality deterioration in urban-dominated watersheds in which an informed decision on the management strategies are required to improve the water quality condition. The Hydrological Simulation Program Fortran (HSPF) model is used to evaluate the impacts of pollution by these nutrients using the Skudai River watershed in Malaysia as a case study. A developed land-use/land-cover (LU/LC) scenarios were used to evaluate these impacts. Statistical methods were employed to assess the extent of these impacts and their significance in shifting the trophic state of the rivers in the watershed. The study shows that when urban development increases from 18.2 to 49.2%, the total nitrogen (TN) and total phosphorus (TP) loads increase from 3.08 to 4.56 × 10 ³ kg/yr and from 0.13 to 0.27 × 10³ kg/yr, respectively. Streamflow and stream concentrations (NH3N, NO3N, and PO4-P) produce varying responses as the watershed land-use changes (from 1989 to 2039). As the rivers in the watershed shift their trophic state with respect to the level of anthropogenic disturbance within their catchments, the TN and TP concentrations at the estuaries are likely to change from oligotrophic to eutrophic state. This is an indication that the Johor Strait and the coastal rivers will be exposed to eutrophication, subsequently resulting in harmful algal bloom. This condition can be prevented by integrating water quality management alongside urban development because it is observed that a control of non-point source (NPS) pollutants from 1% of the urban development will decrease TN and TP concentration in Skudai River by 0.023 mg/L and 0.004 mg/L respectively.

Evaluation of Physicochemical Parameters, Carbamazepine and Diclofenac as Emerging Pollutants in the Machángara River, Quito, Ecuador

Article

Full-text available

Apr 2024

This study evaluates the pollution of the Machángara River basin in Ecuador. For the assessment, water samples were pumped from the river for 1 to 4 h, with a representative water sample of 4 L collected. In the site and laboratory, the physicochemical parameters, carbamazepine (CBZ), and diclofenac (DIC) concentrations were measured using standardized analytical methods. On average, a temperature of 17.02 °C, pH of 7.06, electrical conductivity of 760.96 µS/cm, and turbidity of 83.43 NTU were found. Furthermore, the average solids content was 72.88, 495.47, and 568.35 mg/L for total suspended solids (TSS), total dissolved solids (TDS), and total solids (TS) in that order. The highest chloride concentration (Cl− = 87.97 mg/L) was below the maximum permissible limit (MPL) based on the Ecuadorian regulations for surface and underground water for human consumption and domestic use, which only require conventional treatment. In contrast, levels of nitrate (NO3− = 27.75–288.25 mg/L) and nitrite in five points (NO2− = 2.02–5.42 mg/L) were higher than the MPLs. Moreover, sulfate (SO42− = 34.75–110 mg/L) and phosphate (PO4−P = 4.15–16.58 mg/L) contents caused turbidity and eutrophication in the river water., Additionally, concentrations of copper (Cu2+ = 0.002–0.071 mg/L), zinc (Zn2+ = 0.001–0.011 mg/L) and iron (Fe3+ = 0.000–0.287 mg/L) were within the permissible limits. On the other hand, carbamazepine concentrations in the Machángara River basin were below the limit of detection (LOD) up to a value of 0.121 mg/L. At the same time, diclofenac levels ranged from 9.32 to 48.05 mg/L. The concentration discrepancy for both pharmaceuticals is linked with the trend of drug consumption by Quito’s inhabitants. As measured in this investigation, meaningful amounts of CBZ and DIC are released to the Machángara River. Accordingly, the two pharmaceuticals in the river water may be dangerous for aquatic species.

Análisis bibliométrico de la producción científica sobre fósforo y nitrógeno en ecosistemas acuáticos ecuatorianos en el periodo 2000-2019 indexado en SCOPUS

Article

Full-text available

Jun 2023

La presencia de fósforo y nitrógeno en proporciones elevadas puede afectar negativamente la calidad de cualquier ecosistema. Hasta la fecha no ha sido publicado un estudio bibliométrico de fósforo y nitrógeno en ecosistemas acuáticos ecuatorianos. Nuestro objetivo es analizar y presentar marcadores bibliométricos de la producción científica de fósforo y nitrógeno en ecosistemas acuáticos ecuatorianos. Para la recolección de datos se utilizó la base de datos Scopus. Se hallaron un total de 823 documentos, de los cuales solo se validaron 49. Las publicaciones aumentaron considerablemente a partir del año 2013, y llegaron a un máximo de 16 en el año 2018. De estos los ríos fueron los ecosistemas más estudiados a lo largo del periodo, la mayor parte de las publicaciones fueron en el idioma inglés y la revista con más artículos fue Water. La mayor parte de los artículos se publicaron en revistas prestigiosas de biología, limnología, agua e hidrobiología. Las universidades lideres en esta temática son: la Universidad de Cuenca y la Escuela Superior Politécnica del Litoral.

Investigating the Accuracy of Hybrid Models with Wavelet Transform in the Forecast of Watershed Runoff

Article

Full-text available

Jan 2023

In the hydrological cycle, runoff precipitation is one of the most significant and complex phenomena. In order to develop and improve predictive models, different perspectives have been presented in its modeling. Hydrological processes can be confidently modeled with the help of artificial intelligence techniques. In this study, the runoff of the Leilanchai watershed was simulated using artificial neural networks (ANNs) and M5 model tree methods and their hybrid with wavelet transform. Seventy percent of the data used in the train state and thirty percent in the test state were collected in this watershed from 2000 to 2021. In addition to daily and monthly scales, simulated and observed results were compared within each scale. Initially, the rainfall and runoff time series were divided into multiple sub-series using the wavelet transform to combat instability. The resultant subheadings were then utilized as input for an ANN and M5 model tree. The results demonstrated that hybrid models with wavelet improved the ANN model's daily accuracy by 4% and its monthly accuracy by 26%. It also improved the M5 model tree's daily and monthly accuracy by 4% and 41%. The wavelet-M5 model's accuracy does not diminish to the same degree as the wavelet-ANN (WANN) model as the forecast horizon lengthens. Consequently, the Leilanchai watershed has a relatively stable behavior pattern. Finally, hybrid models, in conjunction with the wavelet transform, improve forecast accuracy.

Investigation of scarce input data augmentation for modelling nitrogenous compounds in South African rivers

Article

Full-text available

Nov 2022
Water Pract Tech

In this study, basic interpolation and machine learning data augmentation were applied to scarce data used in Water Quality Analysis Simulation Programme (WASP) and Continuous Stirred Tank Reactor (CSTR) that were applied to nitrogenous compound degradation modelling in a river reach. Model outputs were assessed for statistically significant differences. Furthermore, artificial data gaps were introduced into the input data to study the limitations of each augmentation method. The Python Data Analysis Library (Pandas) was used to perform the deterministic interpolation. In addition, the effect of missing data at local maxima was investigated. The results showed little statistical difference between deterministic interpolation methods for data augmentation but larger differences when the input data were infilled specifically at locations where extrema occurred. HIGHLIGHTS Basic interpolation methods did not produce statistically significant differences in augmented datasets.; Increasing the gaps yielded greater differences between augmented datasets.; ML methods on real and artificial gaps produced acceptable results.; No significant differences between the WASP and Basic Model on real and artificial input.; Difference between the WASP and Basic Model on real and artificial input.;

Heavy Metal Pollution Monitoring of Yamuna from Dak Patthar to Agra

Article

Full-text available

Dec 2020

Joint Environmental and Economical Analysis of Wastewater Treatment Plants Control Strategies: A Benchmark Scenario Analysis

Article

Full-text available

Apr 2016

In this paper, a joint environmental and economic analysis of different Wastewater Treatment Plant (WWTP) control strategies is carried out. The assessment is based on the application of the Life Cycle Assessment (LCA) as a method to evaluate the environmental impact and the Benchmark Simulation Model No. 1 (BSM1). The BSM1 is taken as the benchmark scenario used to implement the control strategies. The Effluent Quality Index (EQI) and the Overall Cost Index (OCI) are two indicators provided by BSM1 and used to evaluate the plant’s performance from the effluent quality and the economic points of view, respectively. This work conducts a combined analysis and assessment of ten different control strategies defined to operate a wastewater treatment plant. This analysis includes the usual economic and performance indexes provided by BSM1 joined with the LCA analysis that determines the environmental impact linked to each one of the considered control strategies. It is shown how to get an overall evaluation of the environmental effects by using a normalized graphical representation that can be easily used to compare control strategies from the environmental impact point of view. The use of only the BSM1 indexes provides an assessment that leads to a clustering of control strategies according to the cost/quality tradeoff they show. Therefore, regarding the cost/quality tradeoff, all strategies in the same group are almost equal and do not provide an indication on how to proceed in order to select the appropriate one. It is therefore shown how the fact of adding a new, complementary, evaluation (LCA based) allows either to reinforce a decision that could be taken solely on the basis of the EQI/OCI tradeoff or to select one control strategy among the others.

Water Quality Assessment and Pollution Source Identification of the Eastern Poyang Lake Basin Using Multivariate Statistical Methods

Article

Full-text available

Jan 2016

Multivariate statistical methods including cluster analysis (CA), discriminant analysis (DA) and component analysis/factor analysis (PCA/FA), were applied to explore the surface water quality datasets including 14 parameters at 28 sites of the Eastern Poyang Lake Basin, Jiangxi Province of China, from January 2012 to April 2015, characterize spatiotemporal variation in pollution and identify potential pollution sources. The 28 sampling stations were divided into two periods (wet season and dry season) and two regions (low pollution and high pollution), respectively, using hierarchical CA method. Four parameters (temperature, pH, ammonia-nitrogen (NH4-N), and total nitrogen (TN)) were identified using DA to distinguish temporal groups with close to 97.86% correct assignations. Again using DA, five parameters (pH, chemical oxygen demand (COD), TN, Fluoride (F), and Sulphide (S)) led to 93.75% correct assignations for distinguishing spatial groups. Five potential pollution sources including nutrients pollution, oxygen consuming organic pollution, fluorine chemical pollution, heavy metals pollution and natural pollution, were identified using PCA/FA techniques for both the low pollution region and the high pollution region. Heavy metals (Cuprum (Cu), chromium (Cr) and Zinc (Zn)), fluoride and sulfide are of particular concern in the study region because of many open-pit copper mines such as Dexing Copper Mine. Results obtained from this study offer a reasonable classification scheme for low-cost monitoring networks. The results also inform understanding of spatio-temporal variation in water quality as these topics relate to water resources management.

Spatiotemporal evaluation of water quality incidents in Japan between 1996 and 2007

Article

Full-text available

Jun 2013

We present a spatiotemporal evaluation of water quality incidents in Japan considering incident numbers, incident causes, pollutant categories, and pollution effects. Water pollution incidents in first-class river systems almost tripled to about 1487 in the 12years from 1996 to 2007. In addition, oil makes up the largest proportion of pollutants nationwide (76.61%) and the major source of pollution for each region in Japan. Moreover, every category shows a growth trend, especially since 2005. The main cause of incidents was "Unknown" (43%), followed by "Poor working practice" (24%), and then by "Accident" (10%) and "Other" (10%). In Hokuriku, however, the main cause of incidents was "Poor working practice" (36%), which is greater than "Unknown" (30%). Finally, waterworks (approximately 60%) was the largest of four kinds of water supply infrastructure affected by pollution incidents, followed by simplified waterworks. The population affected by offensive odors and tastes peaked in 1990 and has been decreasing. Overall, the results show the characteristics of incidents from 1996 to 2007, with significant implications for adaptation measures, strategies and policies to reduce water quality incidents.

Learning from Data

Book

Aug 2007

A First Course in Machine Learning

Book

Oct 2011

A First Course in Machine Learning covers the core mathematical and statistical techniques needed to understand some of the most popular machine learning algorithms. The algorithms presented span the main problem areas within machine learning: classification, clustering and projection. The text gives detailed descriptions and derivations for a small number of algorithms rather than cover many algorithms in less detail. Referenced throughout the text and available on a supporting website (http://bit.ly/firstcourseml), an extensive collection of MATLAB®/Octave scripts enables students to recreate plots that appear in the book and investigate changing model specifications and parameter values. By experimenting with the various algorithms and concepts, students see how an abstract set of equations can be used to solve real problems. Requiring minimal mathematical prerequisites, the classroom-tested material in this text offers a concise, accessible introduction to machine learning. It provides students with the knowledge and confidence to explore the machine learning literature and research specific methods in more detail.

Learning from data : concepts, theory, and methods

Book

Jan 1998

Groundwater: Resource evaluation, augmentation, contamination, restoration, modeling and management

Book

Jan 2007

Mathanesh Thangarajan

More than 360 high quality operative photographs and diagrams 'Essentials of Pediatric Endoscopic Surgery' addresses the essential and practical aspects of laparoscopic surgery in the pediatric age group. It offers the readers the opportunity to understand, in a well-outlined manner, the approach to decision making in the implementation of laparoscopic surgery, along with a step-by-step explanation of the procedures using high quality operative photographs. The extensive coverage of pediatric surgery makes this book suitable for both beginners and experts.

An Adaptable k-Nearest Neighbors Algorithm for MMSE Image Interpolation

Article

Nov 2010

We propose an image interpolation algorithm that is nonparametric and learning-based, primarily using an adaptive k-nearest neighbor algorithm with global considerations through Markov random fields. The empirical nature of the proposed algorithm ensures image results that are data-driven and, hence, reflect ldquoreal-worldrdquo images well, given enough training data. The proposed algorithm operates on a local window using a dynamic k -nearest neighbor algorithm, where k differs from pixel to pixel: small for test points with highly relevant neighbors and large otherwise. Based on the neighbors that the adaptable k provides and their corresponding relevance measures, a weighted minimum mean squared error solution determines implicitly defined filters specific to low-resolution image content without yielding to the limitations of insufficient training. Additionally, global optimization via single pass Markov approximations, similar to cited nearest neighbor algorithms, provides additional weighting for filter generation. The approach is justified in using a sufficient quantity of training per test point and takes advantage of image properties. For in-depth analysis, we compare to existing methods and draw parallels between intuitive concepts including classification and ideas introduced by other nearest neighbor algorithms by explaining manifolds in low and high dimensions.

Learning from Data : Concepts, Theory, and Methods / V. Cherkassky, F. Mulier.

Article

Jan 1998

An interdisciplinary framework for learning methodologies-covering statistics, neural networks, and fuzzy logic, this book provides a unified treatment of the principles and methods for learning dependencies from data. It establishes a general conceptual framework in which various learning methods from statistics, neural networks, and fuzzy logic can be applied-showing that a few fundamental principles underlie most new methods being proposed today in statistics, engineering, and computer science. Complete with over one hundred illustrations, case studies, and examples making this an invaluable text.

In line river monitoring of nitrate concentration by means of a Wireless Sensor Network with energy harvesting

Article

Feb 2013
SENSOR ACTUAT B-CHEM

Wireless Sensor Networks (WSNs) are highly promising tools in the advanced automation of chemical analysis processes. This work helps to prove how they can take advantage of this technology. Two main drawbacks have been addressed in order to test this applicability: the availability of suitable transducers and the energy supply.The development and deployment of a WSN for the continuous in-line monitoring of the content of nitrates in a real scenario (River Turia, Valencia, Eastern Spain) are then presented. In this application an ion selective electrode (ISE) transducer and an energy harvesting system based on a solar panel have been used.The proposed system has been deployed along a certain stretch of the river, its operation being studied and validated. The results obtained not only show WSN applicability to analytical chemical environments, but also highlight all the advantages offered by this novel technology, namely: easy deployment, friendly use, and the fact that the great amount of data obtained allows for new possibilities for both spatial and temporal analysis of chemical species.

Spatio-Temporal Analysis of Water Quality Parameters in Machángara River with Nonuniform Interpolation Methods

Abstract and Figures

Recommended publications

RETRACTED ARTICLE: Evaluation of water quality pollution and identification index for the Liu River,...

Optimal management of waste loading into a river system with nonpoint source pollutants

CHEMICAL TRENDS IN A SUCCESSFUL STREAM RESTORATION

Study of physicochemical characteristic of ground water from different sites in Nanded city