ArticlePDF Available

Monitoring and prediction of high fluoride concentrations in groundwater in Pakistan

Authors:
  • Eawag, Swiss Federal Institute of Aquatic Science and Technology
  • Pakistan Council of Research in Water Resources

Abstract

Concentrations of naturally occurring fluoride in groundwater exceeding the WHO guideline of 1.5 mg/L have been detected in many parts of Pakistan. This may lead to dental or skeletal fluorosis and thereby poses a potential threat to public health. Utilizing a total of 5483 fluoride concentrations, comprising 2160 of new measurements as well as those from other sources, we have applied machine learning techniques to predict the probability of fluoride in groundwater in Pakistan exceeding 1.5 mg/L at a 250 m spatial resolution. Climate, soil, lithology, topography, and land cover parameters were identified as effective predictors of high fluoride concentrations in groundwater. Excellent model performance was observed in a random forest model that achieved an Area Under the Curve (AUC) of 0.92 on test data that were not used in modeling. The highest probabilities of high fluoride concentrations in groundwater are predicted in the Thar Desert, Sargodha Division, and scattered along the Sulaiman Mountains. Applying the model predictions to the population density and accounting for groundwater usage in both rural and urban areas, we estimate that about 13 million people may be at risk of fluorosis due to consuming groundwater with fluoride concentrations >1.5 mg/L in Pakistan, which corresponds to ~6% of the total population. Both the fluoride prediction map and the health risk map can be used as important decision-making tools for authorities and water resource managers in the identification and mitigation of groundwater fluoride contamination.
Monitoring and prediction of high uoride concentrations in groundwater
in Pakistan
Yuya Ling
a
,Joel Podgorski
a,
,Muhammad Sadiq
b
, Hifza Rasheed
c
,
Syed Ali Musstjab Akber Shah Eqani
b
, Michael Berg
a
a
Eawag, Swiss Federal Institute of Aquatic Science and Technology, Department Water Resources and Drinking Water, 8600 Dübendorf, Switzerland
b
Public Health and Environment Division, Department of Biosciences, COMSATS University, Islamabad, Pakistan
c
National Water Quality Laboratory, Pakistan Council of Research in Water Resources (PCRWR), Islamabad, Pakistan
HIGHLIGHTS
Groundwater uoride risk maps (>1.5
mg/L) created with >5000 data and
machine learning for all of Pakistan.
Arid climate and soil composition are
statistically important predictors of
geogenic uoride contamination.
The high-resolution maps reveal the
vulnerable areas and the number of peo-
ple exposed.
An estimated13 million people (6% of the
population) are at risk of uorosis.
Most affected areas are in the Thar Desert,
the Thal Desert, and scattered along the
Sulaiman Mountain Range.
GRAPHICAL ABSTRACT
ABSTRACTARTICLE INFO
Editor: Daniel Alessi Concentrations of naturally occurring uoride in groundwater exceeding the WHO guideline of 1.5 mg/L have been
detected in many parts of Pakistan. This may lead to dental or skeletal uorosis and thereby poses a potential threat
to public health. Utilizing a total of 5483 uoride concentrations, comprising 2160 new measurements as well as
those from other sources, we have applied machine learning techniques to predict the probability of uoride in
groundwater in Pakistan exceeding 1.5 mg/L at a 250 m spatial resolution. Climate, soil, lithology, topography, and
land cover parameters wereidentied as effectivepredictors of high uorideconcentrations ingroundwater. Excellent
model performance was observed in a random forest model that achieved an Area Under the Curve (AUC) of 0.92 on
test data that were not used in modeling. The highest probabilities of high uoride concentrations in groundwater are
predictedin the Thar Desert, Sargodha Division, and scattered alongthe Sulaiman Mountains. Applying the modelpre-
dictions to the population density and accounting for groundwater usage in both rural and urban areas, we estimate
that about 13 million people may be at risk of uorosis due to consuming groundwater with uoride concentrations
>1.5 mg/L in Pakistan, which corresponds to ~6% of the total population. Both the uoride prediction map and the
health risk map can be used as important decision-making tools for authorities and water resource managers in the
identication and mitigation of groundwater uoride contamination.
Keywords:
Aquifers
Geogenic groundwater pollution
Drinking water quality
Human health threat
Fluorosis
Random forest modeling
Science of the Total Environment 839 (2022) 156058
Corresponding author.
E-mail address: joel.podgorski@eawag.ch (J. Podgorski).
http://dx.doi.org/10.1016/j.scitotenv.2022.156058
Received 9 March 2022; Received in revised form 14 May 2022; Accepted 15 May 2022
Available online 20 May 2022
0048-9697/© 2022 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
Contents lists available at ScienceDirect
Science of the Total Environment
journal homepage: www.elsevier.com/locate/scitotenv
1. Introduction
Fluorine is greatly abundant in Earth's crust (Amini et al., 2008;
Edmunds and Smedley, 2013). It is usually present in the ionic form in
the natural environment as a consequence of having the highest electroneg-
ativity among the chemical elements. Fluorine is prevalent in a wide variety
of minerals but mainly in uorite, uorapatite, topaz, and to a moderate
degree in biotite and muscovite (Farooqi, 2015;Kumar et al., 2020). Fluo-
rites are commonly found as cements in carbonate rocks (García and
Borgnino, 2015). Aside from natural sources, anthropogenic activities can
introduce uoride to the environment, for example, through uoride phos-
phate fertilizer efuents (Ali et al., 2019) or fossil fuel combustion (García
and Borgnino, 2015). These uorine-bearing minerals gradually become
enriched in groundwater primarily through dissolution and desorption
from metal oxides (Kumar et al., 2020), the processes of which are
promotedby high alkalinity, low calcium concentration, and the sodium bi-
carbonate water type in groundwater (Banerjee, 2015;Farooqi, 2015;
Kumar et al., 2016;Kumar et al., 2020;Raque et al., 2009;Singh and
Mukherjee, 2015). Elevated uoride concentrations are also tied to arid cli-
matic conditions (Handa, 1975;Rasool et al., 2018), which are associated
with processes such as a higher cation exchange rate, faster dissolution
from uoride-containing minerals and prolonged groundwater residence
times and therefore water-rock interactions (Ali et al., 2019;Podgorski
et al., 2018).
While lower concentrations of uoride in drinking water (0.51.0
mg/L) are known to prevent tooth decay (World Health Organization,
1994), excessive exposure to uoride (above 1.5 mg/L) can lead to dental
and/or skeletal uorosis (Wang et al., 2004). It has been estimated that
more than 200 million people in the world are at risk of uorosis (Ayoob
and Gupta, 2006). Given that drinking water is one of the major sources
of uoride for humans, the World Health Organization (WHO) maintains
a health-based guideline of 1.5 mg/L for uoride in drinking water
(World Health Organization, 2011), which is also the permissible limit in
Pakistan (Khwaja and Aslam, 2018). However, some countries such as
India (Podgorski et al., 2018;Shah and Bandekar, 1998) and China (Bo
et al., 2003) have adopted the uoride concentration limit of 1.0 mg/L to
account for arid climates and other uoride intake pathways, e.g. food
(Ozsvath, 2009).
Groundwater is used extensively to serve the growing population of
Pakistan, supplying around 39% of drinking water (World Health
Organization and UNICEF, 2019) and 73% of irrigation water (Qureshi,
2020). Having a predominantly semi-arid to arid climate, the conditions
in Pakistan are favorable for uoride accumulation in groundwater. Fur-
thermore, uoride-bearing rocks such as granites are present in many
parts of the country (Naseem et al., 2010), and many cities contain high
levels of Na
+
and K
+
in groundwater (Raza et al., 2017), which promote
calcite precipitation by cation exchange and reinforce uoride release
from minerals (Farooqi, 2015).
Elevated uoride concentrations in groundwater (>1.5 mg/L) have
been identied in many places in Pakistan, though mainly conned to the
populous and at-lying Punjab and Sindh Provinces (Ali et al., 2019;
Brahman et al., 2013;Farooqi, 2015;Farooqi et al., 2007;Iwasaki, 2007;
Khattak et al., 2022;Raque, 2008;Raque et al., 2008;Raque et al.,
2009). For instance, 27.2% of the samples in a study along the riverine
systems in the Punjab and Sindh exceeded 1.5 mg/L (Ali et al., 2019). The
aquifers of these two provinces are recharged primarily from rainwater
along with inltration from the ve major rivers in Punjab: the Sutlej,
Ravi, Jhelum, Chenab, and Indus Rivers, all of which come together as the
Indus that ows further south through Sindh. Very high uoride concentra-
tions up to 30 mg/L have been measured in the Thar Desert (Iwasaki, 2007;
Raque et al., 2009), which is situated at the southeastern corner of the
country where sand dunes and kaolin/granite abound. Widespread concerns
have been raised about dental and skeletal uorosis that has been detected,
particularly among children, in the cities of Mianwali, Quetta, Lahore, Kara-
chi, Peshawar, and the Thar Desert (Ahmad et al., 2020;Khan et al., 2004;
Khan et al., 2015;Raque et al., 2015;Sami et al., 2016).
To protect public health, it is essential to determine if wells and springs
contain safe or hazardous levels of uoride. As such, maps of affected areas
can provide a key rst step in determining the locations of safe and hazard-
ous wells. The most comprehensive nationwide analysis previously
conducted of the distribution of uoride in groundwater in Pakistan
consisted of some 1000 measurements that were summarized at the sub-
tehsil scale (Khan et al., 2002). This corresponds to an average size of
about 2100 km
2
per administrative unit, which provides only very limited
spatial resolution of uoride contamination. Furthermore, no attempt had
been made to make predictions beyond the data collected.
Thanks to the availability of high-resolution data sets of environmental
predictor variables, machine learning approaches have been used to create
accurate geospatial prediction models of various groundwater as well as
soil parameters (Chen et al., 2017;Erickson et al., 2021;Hengl et al.,
2017;Podgorski and Berg, 2020;Podgorski et al., 2020;Reichstein et al.,
2019;Winkel et al., 2008;Wu et al., 2021). In contrast to conventional
geostatistical techniques based upon interpolation among observations,
machine learning models have the potential to be applied with high
accuracy to larger scales (regional to country scale) where sufcient data
are available.
Machine-learning methods, such as random forest, have proved effec-
tive in modeling a binary target variable (e.g. uoride concentrations
above a threshold) that is effectively unchanging in time and predicting
its occurrence on the basis of relevant predictors (e.g., climate, geology)
(Podgorski et al., 2017;Podgorski et al., 2018;Winkel et al., 2008). Since
machine learning models learn from data, relationships can be inferred
between predictors and the target variable from modeling results and
thus learn more about how hydro-geochemical conditions regulate uoride
concentrations in groundwater.
In this paper, we analyzed uoride in over 2100 groundwater samples
from all over Pakistan, including in the as yet poorly examined Thal Desert,
and combine these with previously published data to produce with
machine learning a rst-ever high-resolution prediction map of high uo-
ride concentrations in Pakistan. This allows us to generate a health-risk
map, estimating the locations and number of people at risk of excessive
uoride exposure. Furthermore, the statistically important predictor vari-
ables are also discussed for their insight into the environmental conditions
associated with high uoride concentrations. The main purpose of these
hazard and risk maps is to identify areas with greater and lesser chances
of containing high uoride concentrations in groundwater. They can
thereby provide valuable guidance for authorities and water resource
managersin testing for and ultimately mitigating hazardously high concen-
trations of uoride for the protection of health.
2. Materials and methods
2.1. Study area
Pakistan is characterized by a variable geomorphology, with the at-
lying Indus plain that comprises the provinces of Punjab and Sindh in the
east; the Hindu Kush, Karakoram, and Himalaya ranges in the north; and
the vast BaluchistanPlateau in the west.The climatic conditions areprimar-
ily arid to semi-arid, with temperate conditions in the northwest and arctic
conditions in the northern mountain ranges. Desert areas cover around
one-third of the country, with mountain areas, grasslands and agricultural
regions covering the otherparts (Greenman et al., 1967). Pakistan's geology
is dominated by young (Quaternary age) alluvial and deltaic deposits
(Sanaullah et al., 2019) that outcrop across much of the Indus plain (Pun-
jab, Sindh) and Baluchistan basin, while older formations (granites, meta-
morphic rocks) are mainly restricted to the Khyber Pakhtunkhwa region
(WAPDA/EUAD, 1989). Most of Pakistan's population is concentrated in
Punjab and Sindh due to fertile soil and an abundant water supply. Over
100 million inhabitants of Punjab and Sindh rely on groundwater
replenished by the Indus River and its tributaries (Jhelum, Chenab, Ravi,
and Sutlej) for drinking water and agricultural uses (Bhowmik et al.,
2015). There are more than one million private tube wells in the country,
Y. Ling et al. Science of the Total Environment 839 (2022) 156058
2
of which 3.8% are in Khyber-Pakhtunkhwa, 6.4% in Sindh, 4.8% in Balu-
chistan and more than 80% inPunjab, with a total groundwater abstraction
of about 60 billion m
3
per year (Qureshi, 2020). Unregulated groundwater
abstraction, unsustainable pumping, agricultural irrigation and increasing
water demands in urban areas result in uctuating groundwater levels
and generally impact the water quality of the aquifers of the Indus plain
(Rasheed et al., 2022;Ullah et al., 2022).
2.2. Groundwater samples
We sampled 2160 groundwater wells between 2013 and 2019 through-
out Pakistan (Fig. S2) and measured uoride with a portable photometer
eld test kit (FTK; HI96739, Hanna Instruments) and veried values by
ion chromatography (IC). The majority of measurements was taken in
well-populated Punjab and Sindh Provinces, especially in the Thal Desert
where a comprehensive study of uoride contamination has been lacking.
To test the accuracy of the FTK, water samples with known uoride con-
centrations ranging from 0 to 8 mg/L were preparedand subsequentlymea-
sured by IC and the FTK. This comparison showed that the FTK is generally
able to recover the uoride concentrations with high accuracy (Table S3),
in particular in the range of 0 to 2 mg/L. In addition, 54 of the eld samples
were also veried by IC. These mostly plot along the 1:1 line (Fig.S3), with
deviations generally being due to higher FTK measurements. With respect
to the threshold of 1.5 mg/L, 83% of these samples were classied the
same by both measurement methods.
The original 2160 uoride measurements were combined with data
from other sources to form a dataset of 5543 geolocated concentrations of
uoride in groundwater throughout Pakistan (see Fig. 1, Fig. S5, and
Table S1), which were later used in modeling. Of these, 2814 samples
were collected between October 2013 and March 2014 by the Rural
Water Quality Monitoring Program (RWQMP) of the Pakistan Council of
Research in Water Resources (PCRWR, 2015). Approximately 573 of the
2865 villages in Punjab and Sindh Provinces were covered in this program,
with 4 to 5 samples collected in each village. In addition, we were able to
include a few hundred data points from the Indus plain in Punjab and
Sindh (Ali et al., 2019) as well as the Thar Desert (Raque, 2008;Raque
et al., 2008;Raque et al., 2009), where 79% of the samples exceeded
1.5 mg/L.
Overall, 911 of the 5543 groundwater samples (16%) are greater than
1.5 mg/L, and 1451 samples (26%) are above 1 mg/L. The clusters of
samples reported at the same location in a village were averaged after
removing outliers, the points of which are distinctly different from their
neighboring data (e.g. in the RWQMP project). This reduced the number
of the total data points from 5543 to 5483, with 16% of points having a
high uoride level after outlier removal and averaging. The uoride con-
centrations were then converted into binary levels high(F >1.5 mg/L)
and low(F 1.5 mg/L).
2.3. Predictor data sets
A variety of environmental data sets dealing with climate, lithology,
land cover and soil (n= 48, see Fig. S11, and Table S4) were considered
for modeling based on known or possible links with uoride contamination
(García and Borgnino, 2015;Handa, 1975;Podgorski et al., 2017;
Podgorski et al., 2018). All of the variables represent data at the surface,
or up to 2 m depth in the case of soil parameters (Hengl et al., 2017).
Each level of the categorical predictors of soil groups, geologic age, and
lithology was converted into a binary data set to enable testing the signi-
cance of each category in relation to uoride in subsequent variable selec-
tion procedures. Pearson correlations were determined between each
continuous predictor variable and uoride concentration, whereas the
fraction of uoride concentrations exceeding 1.5 mg/L was calculated for
categorical variables.
Predictor values were extracted at the locations of the 5483 uoride
measurements and added to the uoride data set. As most predictors
were available in 250 m resolution, rasters with a coarser (1000 m) or
ner (100 m) resolution were resampled by nearest neighbor and bilinear
methods respectively, to 250 m for subsequent prediction procedures,
allowing the resolution of the prediction maps to be 250 m.
2.4. Training and testing data sets
Training and test data sets were split at the ratio of 80% to 20%. Tightly
clustered data points selected by random sampling may result in data rows
that do not contain much variance in the predictor variables. Therefore, a
method was developed that ensuresgreater predictor variability as follows:
i). Selection of continuous predictors with the highest correlations with
uoride (Table 1, Table S4): coarse fragments fraction, nitrogen
fraction, organic carbon density, potential evapotranspiration (PET),
aridity index, compound topographic index, and slope.
ii). The quantile interval (025%, 2550%, 5075%, 75100%) of each
predictor in step i) was determined for each of the 5483 uoride
measurements.
iii). The data points were then divided into groups that share the same
predictor quantile interval combinations (1714 unique groups).
iv). Training dataset (80% of data) created by random sampling, ensuring
that at leastone data point is selected from each group. The remaining
data points are assigned to the test dataset (20% of data).
Fig. 1. Groundwater uoride concentrations (n= 5543) from original and existing
measurements. a) High (F >1.5) and low (F 1.5) uoride levels are plotted with
topography.About 16% of the wells exceed the WHO drinking-water guideline of
1.5 mg/L. b) Population density map of Pakistan with labels of provinces and rivers.
Y. Ling et al. Science of the Total Environment 839 (2022) 156058
3
Applying this procedure, the proportion of highto lowuoride clas-
ses in the training and test sets remains approximately 16% (Table S5).
2.5. Random forest modeling
Random forest is a machine-learning algorithm that constructs an
ensemble of decision trees, which recursively partition predictor variables
to predict a dependent variable (Breiman, 2001). Individual decision
trees consider a random subset of candidate predictors at each split,
which reduces correlation between decision trees and helps avoid
overtting (James et al., 2013). Increasing the number of trees can further
reduce overtting (Breiman, 2001). Randomness is also introduced by
growing trees with bootstrapped samples (sampling with replacement) of
the training set, which results in approximately one-third of sample data
being left out of each tree. The unselected, out-of-bag samples can also be
used to estimate the generalization error of the random forest model
(Breiman, 2001).
The R programming language (R Core Team, 2013) was used with the
randomForestpackage (Liaw and Wiener, 2002) to create random forest
classication models of groundwater uoride. The output of the random
forest is therefore the probability of the occurrence of high uoride concen-
trations. As such, the WHO drinking water guideline of 1.5 mg/L was
generally used to dene the cut-off between high and low uoride concen-
trations. Bootstrapped sampling was made with a balancebetween the two
categories of high and low uoride.
The performance of a model can be evaluated through the accuracy of
the predictions for a given probability cut-off value. The Area Under the
ROC (receiver operator characteristic) Curve (AUC) overcomes the subjec-
tivity of choosing a threshold by summing up over true positive and false
positive rates at all cut-off values (Huang and Ling, 2005). The AUC score
is bounded between 0 and 1 (perfect classiers); an uninformative classier
that uses random guessing may yield 0.5 (Tharwat, 2020).
Variance importance plots of a random forest model help evaluate the
inuence of individual variables. They are composed of two importance in-
dices, namely, Mean Decrease Accuracy (MDA) and Mean Decease Impurity
(MDI). MDA is calculated by averaging the change in out-of-bag error
estimates after permuting an individual variable's values in out-of-bag ob-
servations (Biau and Scornet, 2016). In general, permuting the values of
an important variable will lead to deterioration of model performance,
thus a decrease in accuracy. Likewise, MDI is dened as the average
decrease in node Gini impurity from splitting an individual variable over
all grown decision trees (Biau and Scornet, 2016). Gini impurity indicates
the homogeneity of a node, which is lower after splitting an important var-
iable that divides observations roughly into the same class.
To simplify the model but retain its predictive power, variable selection
by recursive feature elimination (RFE) was conducted with the caret
package in R (Kuhn, 2009), which backwardly reduces the number of
variables by removing the least important one at each step. Random forests
were created starting with all 48 variables (Table S4) and working down to
just one variable. The predictor subset ultimately chosen was that with
which the corresponding model was simplest and obtained high test AUC
score.
For all models, 1000 trees were grown, and the default number of vari-
able candidates at each split (i.e. square root of the number of features) was
used. Sampling for each tree was made with replacement and an even
balance between high and low uoride classes. The optimal number of sam-
ples from each class was determined by trying 70%, 80%, 90%, and 100%
of the minority class (high uoride) and selecting the sample size with the
highest AUC score in 10-fold cross validation. The nal random forest
model was built with the RFE-selected variables and optimal sample size.
A prediction map of the occurrence of uoride contamination >1.5 mg/L
was then created by applying the nal random forest model to the predictor
datasets using the raster package (Hijmans, 2021).
We then estimated the population in Pakistan at risk of exposure to high
uoride concentrations from groundwater used as drinking water by multi-
plying the population densit y in 2020 (Gao, 2017;Jones and ONeill, 2016)
by the probability of high uoride concentrations and accounting for the
average rate of domestic groundwater usage (39.1%) (World Health
Organization and UNICEF, 2019). Only areas above the probability cut-
off value at which the accuracy rates for the two classes (sensitivity and
specicity) are equal were taken into consideration.
3. Results and discussion
3.1. Fluoride concentrations in groundwater
Summary statistics, box plots and spatial distributions of the new
groundwater uoride measurements (n= 2160) are shown in Table S2,
Fig. S1-and Fig. S3, respectively. Whereas the data points from previously
published studies (Ali et al., 2019;Brahman et al., 2013;Khattak et al.,
Table 1
Correlations and signicance of environmental parameters used in the model with uoride concentrations.
Type Variable Resolution Correlation (p)
Climate (continuous) Aridity index (Zomer et al., 2007;Zomer et al., 2008) 1000 m 0.1276 (3.09E-21)
Actual evapotranspiration (AET) (mm/year) (Trabucco and Zomer, 2010) 1000 m 0.0276 (4.17E-02)
Potential evapotranspiration (PET) (mm/year) (Zomer et al., 2007;Zomer et al., 2008) 1000 m 0.1550 (1.07E-30)
Precipitation (mm/year) (Fick and Hijmans, 2017) 1000 m 0.0412 (2.31E-03)
Temperature (°C)(Fick and Hijmans, 2017) 1000 m 0.0902 (2.53E-11)
Soil (continuous) Silt fraction (g/kg) (Hengl et al., 2017) 250 m 0.0319 (1.84E-2)
Nitrogen fraction (cg/kg) (Hengl et al., 2017) 250 m 0.1363 (4.83E-24)
Coarse fragments fraction (cm
3
/dm
3
)(Hengl et al., 2017) 250 m 0.2515 (1.75E-79)
Organic carbon density (g/dm
3
)(Hengl et al., 2017) 250 m 0.1111 (1.91E-16)
Soil (categorical) Arenosols (Hengl et al., 2017) 250 m 0.6677
Calcisols (Hengl et al., 2017) 250 m 0.3010
Cambisols (Hengl et al., 2017) 250 m 0.0993
Gypsisols (Hengl et al., 2017) 250 m 0
Solonchaks (Hengl et al., 2017) 250 m 0.0376
Solonetz (Hengl et al., 2017) 250 m 0.0500
Lithology (categorical) Carbonate sedimentary rocks (Hengl, 2018) 250 m 0.2468
Metamorphic rocks (Hengl, 2018) 250 m 0.0566
Mixed sedimentary rocks (Hengl, 2018) 250 m 0.2025
Evaporite (Hengl, 2018) 250 m 0.1644
Siliciclastic sedimentary rocks (Hengl, 2018) 250 m 0.0222
Topography (continuous) Elevation (m) ( Verdin, 2017) 100 m 0.0275 (4.22E-2)
Land cover (categorical) Shrubland (Buchhorn et al., 2020) Polygon 0.5664
Cropland (Buchhorn et al., 2020) Polygon 0.0885
Herbaceous vegetation (Buchhorn et al., 2020) Polygon 0.3839
Y. Ling et al. Science of the Total Environment 839 (2022) 156058
4
2022;Naseem et al., 2010;Raque, 2008;Raque et al., 2008;Raque
et al., 2009)areconned almost exclusively to the Punjab and Sindh prov-
inces, our data span the entire country (Fig. S5).
High uoride concentrations were detected in all regions, including
maximum values of 27.5 mg/L in Punjab and 33.3 mg/L in Sindh. How-
ever, the average concentrations by province are all less than 1.5 mg/L
(Table S2). As seen in the inset of Fig. S3, high-uoride areas were discov-
ered in the Sargodha Division in northwest Punjab, which had an average
measured uoride concentration of 1.8 mg/L. Particularly affected are the
upper-Thal Desert districts of Bhakkar, Khushab and Mianwali.
Correlations between uoride and other geochemical indicators are
shown in Fig. S2. The alkaline environment and the presence of bicarbon-
ates create a favorable condition for high uoride waters, while calcium
ions suppress uoride in groundwater.
3.2. Prediction modeling
3.2.1. Random forest model
Of the 48 variables considered, eight were selected by the RFE process:
actual evapotranspiration (AET), aridity index, coarse fragments fraction,
elevation, nitrogen fraction, PET, precipitation, and temperature. The RFE
algorithm selects variables based on their importance, which is generally
lower in categorical variables than in continuous ones due to only a xed
(generally small) number of possible values in the former. The following
binary predictors were therefore manually added: arenosols, calcisols,
cambisols, carbonate sedimentary rocks, cropland, evaporite, gypsisols,
herbaceous vegetation, mixed sedimentary rocks, shrubland, siliciclastic
(noncarbonate) rocks, solonchaks, solonetz, and herbaceous vegetation,
which are considered to be important from a geochemical standpoint, for
example, by acting as a sink or source of uoride (Ali et al., 2019;García
and Borgnino, 2015;Podgorski et al., 2018). The optimal sample size deter-
mined by tuning was 562, which is 80% of the number of the minority class
(high uoride) in the training dataset.
The nal random forest model attained an AUC score of 0.92 as deter-
mined with the test dataset (Fig. 2b). The cut-off value of 0.47 was found
at the point at which sensitivity equals specicity, that is, where the accu-
racy rates for the two classes are evenly balanced. Using this cut-off, the
overall accuracy with the test set is 0.83, which is comparable to the
out-of-bag accuracy of 0.82 (accuracy calculated with out-of-bag samples
during training), conrming that the distribution of data in the training
and testing datasets is generally similar.
Unsurprisingly, the measured importance of the continuous variables is
higher than that of the binary ones (Fig. 2a). The climate predictors
temperature, precipitation, PET, aridity index, and AET as well as nitrogen
content, coarse fragments fraction, and elevation received the highest
importance. Among the binary predictors, calcisols and cropland are the
most important.
3.2.2. Prediction map
The prediction map derived from the randomforest model (Fig. 3)indi-
cates that approximately 30% of Pakistan is at-risk of uoride concentra-
tions in groundwater exceeding 1.5 mg/L. Two particularly high-hazard
areas include the Thar Desert (Sindh Province) and the Bhakkar, Mianwali,
and Khushab districts in the upper Thal Desert (Sargodha Division, Punjab
Province). Both locations have an arid climate and are situated within the
at-lying Indus Plain. Higher probabilities are also found in the Sulaiman
Mountains in eastern Balochistan. While most areas of the prediction map
clearly show higher or lower hazard, the determination ismore ambiguous
in southwestern Balochistan, where the probabilities are around 0.50. This
may result from a relative lack of measurements to adequately sample the
distribution of uoride in this region, whereas most of the data used in
the model stem from the Indus Plain.
The uoride prediction map was compared with a similar uoride map
of India (Podgorski et al., 2018) (Fig. S8). High probabilities of uoride
exceeding 1.5 mg/L in northern Punjab align very well with those across
the border in Indian Punjab. Similarly in the Thar Desert, which forms a
natural boundary between Pakistan and India, the predictions on either
side of the border are compatible. Whereas these sections are well
constrained by uoride data, border areas in between match less well,
which may be due in part to a lack of measurements in southern
Pakistani Punjab.
Since much of Pakistan has a warm, dry climate and people conse-
quently drink more water, a prediction model was also created for uoride
concentrations exceeding 1.0 mg/L (Fig. S7). However, with only 543 of
the 5483 measurements falling in the range of 1.01.5 mg/L, the
high-hazard regions of the prediction maps for 1.5 mg/L (Fig. 3)and
1.0 mg/L (Fig. S7) are largely similar. For example, higher probabilities
for 1.0 mg/L were found for central Pakistan near the borders of Punjab,
Khyber Pakhtunkhwa, and Balochistan. The selection of variables by RFE
was also similar for both models (1.0 mg/L and 1.5 mg/L).
3.2.3. Predictor importance
Some of the most important predictors are the climate parameters
(Fig. 2a). The correlations in Table S4 also conrm that drier climate condi-
tions favor uoride release. High importance was also found for the eleva-
tion variable, which is strongly negatively correlated with temperature.
Fig. 2. Random forest modeling results of groundwater uoride concentrations in Pakistan exceeding the WHO guideline of 1.5 mg/L. a) Variable importance plots of Mean
Decrease Accuracy and Mean Decrease Gini Impurity for each variable in the random forest model. Due to having only two degrees of possible values, the binary variables
(bin.) show lower importance scores relative to theother continuous variables. b) ROC curve (AUC score: 0.92) of the model response using the test dataset plotted with the
true positive rate (sensitivity) against the false positive rate (1 specicity). The red dot indicates the cut-off value of 0.47.(For interpretationof the references to colorin this
gure legend, the reader is referred to the web version of this article.)
Y. Ling et al. Science of the Total Environment 839 (2022) 156058
5
The two continuous variables of soil parameters, fraction of coarse soil
fragments and nitrogen fraction, also have high importance measures.
The fraction of coarse fragments is high in the Sulaiman Mountains of
eastern Balochistan and western Punjab and Sindh and are composed of
mixed or carbonate sedimentary rock that may contain uoride-bearing
minerals (Fig. 4). On the other hand, nitrogen fraction is associated closely
with the presence of forested mountains in the north, which has lower tem-
peratures, and where the sparse measurementsgenerally show low uoride
concentrations. The calcisolsbinary soil predictor, which is associated
with substantial accumulations of lime, is connected to the presence of
high uoride concentrations, as the precipitation of calcite removes
calcium from dissolution and results in higher uoride concentrations
(Banerjee, 2015).
Carbonate sedimentary rocks can represent an important source of uo-
rite (García and Borgnino, 2015), and silicate minerals may contain small
amounts of uorine. Shrubland and herbaceous vegetation are the principal
vegetation cover type in the highly contaminated Thar Desert, and thus
assigned high importance in the model, though it is unclear if this relation-
ship exists elsewhere.
In the Thal Desert (Punjab), calcisols likely control Ca
2+
levels and
enhance the dissolution of F-bearing minerals. Moreover, alkaline pH
conditions can promote F
leaching by ion exchange processes. This is
particularly the case along the Jhelum River where water samples are dom-
inated by the Na-Cl type of water (Ali et al., 2019). A rapid change in aqui-
fer recharge in recentyears has resulted in increaseduoride levels (Younas
et al., 2019). High alkaline waters in the Lahore and Kasur areas, which
experience excessive pumping, are also associated with high uoride
leaching (Farooqi et al., 2007).
While calcisols are also the dominant soil type in the Sulaiman Moun-
tains (Fig. 2d), uoride levels are not as high as in the Thal Desert, possibly
due to a stable water table.In the south in the Thar Desert (Sindh), dolomite
dissolution and arid climatic conditions promote evaporation process and
the dissolution of evaporites, contributing to the formation of saline
groundwater.
3.3. Health risk map
A health risk map was made using the optimal cut-off value of 0.47
(Fig. 5). It identies numerous densely populated regions where people
rely on groundwater associated with high uoride. In total, over 13 mil-
lion people are estimated to be potentially affected by uoride contam-
ination in groundwater, which is 6.0% of the total population in
Pakistan. However, this number would be even larger if the population
in regions with probabilities under the cut-off of 0.47 would be taken
into consideration. With the increasing population in Pakistan (popula-
tion growth rate around 2% in the year 2020) (World Bank, 2020), the
problem may become more severe in the future if the reliance on
groundwater remains high. Large at-risk populations are found in north-
ern Punjab, Islamabad, and Khyber Pakhtunkhwa. Furthermore, the
uoride risk map (Fig. 5) indicates that residents in the cities of Lahore,
Sargodha, Depalpur, Peshawar, Bannu, Karachi, Quetta, and others are
at high risk, which is conrmed by the high prevalence of uorosis in
some of these cities (Ahmad et al., 2020;Mohsin et al., 2014;Rahman
et al., 2018;Sami et al., 2016). Conversely, the total number of people
at risk in the Thar Desert (SE Sindh), which has high probabilities of
uoride contamination, is far smaller owing to its sparse population,
yet residents there are still under high risk.
The presented probability and health risk maps (Figs. 3 and 5)raise
awareness about uoride contamination and its adverse health impacts
in Pakistan. Furthermore, they can help authorities in prioritizing areas
Fig. 3. Prediction map at a resolution of approximately 250 m of groundwater uoride inPakistan exceedingthe WHO guideline of1.5 mg/L. (AUC scoreis 0.92). a) Full map
with locations of insets. b) Fluoride hotspot of the Sargodha Division in the upper Thal Desert (Bhakkar, Mianwali, Khushab districts). c) Hotspots in the Kasur and densely
populated Lahore districts. d) Hotspot of theThar Desert. The prediction map is also available for viewing at high-resolution on the GIS-based Groundwater Assessment Plat-
form (GAP), www.gapmaps.org.
Y. Ling et al. Science of the Total Environment 839 (2022) 156058
6
Fig. 4. Maps of selected predictor variables of a) coarse fragments fraction, b) AET, c) lithology, and d) soil groups.
Fig. 5. Population at risk of exposure to uoride concentrations in groundwater exceeding 1.5 mg/L.
Y. Ling et al. Science of the Total Environment 839 (2022) 156058
7
for implementing mitigation measures. These could include monitoring
programs for drinking water wells or uoride removal, e.g. adsorption
treatment (Bhatnagar et al., 2011) or membrane separation processes
(Waghmare and Arn, 2015), and improving health management sys-
tems. Compared to the previous nationwide representation of uoride
at the sub-tehsil-scale (Khan et al., 2002), the novel maps presented
here have a 34 order of magnitude higher spatial resolution (250 m),
are based on much larger new datasets, and predict the probability of
high groundwater uoride for areas where data are lacking. Also in re-
lation to a recent study by Khattak et al. that contains clusters of many
groundwater uoride measurements across much of Punjab (Khattak
et al., 2021), the new maps identify hotspots, e.g. in the Sargodha Divi-
sion, that that study did not uncover.
4. Conclusions
This study, which presents a large new dataset of uoride in ground-
water across Pakistan, combined with geospatial modeling and risk
mapping using various environmental predictors, highlights several re-
gions where exposure to high uoride levels pose a signicant public
health risk. Hot spots include the Thal Desert in Punjab (Sargodha Divi-
sion), the Thar Desert in Sindh, and the Sulaiman Mountains in the west-
ern part of the country. Analysis of the importance of the predictor
variables and their correlation with uoride show that high uoride
concentrations in groundwater benet from arid climatic conditions
with high temperatures and evapotranspiration, the presence of
uoride-bearing minerals (e.g. carbonate sedimentary rock), and the
presence of calcisols.
Knowing the countrywide groundwater uoride risk and affected popu-
lations shall be helpful for authorities and water resource managers in iden-
tifying uoride-contaminated wells and mitigating the risk for residents. All
groundwater wells in areas with a high probability (e.g., above the cut-off
value of 0.47) should be tested, for instance, in the Thar Desert and the
SargodhaDivision (especially the Bhakkar, Mianwali,and Khushab districts
in the upper Thal Desert). Particular attention should also be paid to risk
areas with a high population density such as Lahore, Sargodha, Depalpur,
Peshawar, and Bannu. Mitigation measures include monitoring, provision
of alternative sources of drinking water, uoride removal treatment, and
awareness-raising campaigns. These maps are not a replacement for actual
groundwater testing but indicate hazard and risk for drinking water use.
Future work could consider additional groundwater contaminants, e.g.
uranium, nitrate, pesticides or salinity in order to obtain a more compre-
hensive understanding of the safety of groundwater. Model accuracy
could be further improved by incorporating additional data and other
predictor variables, such as hydrological parameters, if available.
CRediT authorship contribution statement
Yuya Ling:Data curation, Formal analysis, Investigation, Methodology,
Software, Validation, Visualization, Writing original draft, Writing
review &editing. Joel Podgorski: Conceptualization, Data curation,
Formal analysis, Funding acquisition, Investigation, Methodology, Project
administration, Software, Validation, Visualization, Writing review &
editing. Muhammad Sadiq: Data curation, Formal analysis, Validation.
Hifza Rasheed: Data curation, Formal analysis, Validation. Syed Ali
Musstjab Akber Shah Eqani: Conceptualization, Funding acquisition,
Project administration, Supervision, Data curation, Validation, Writing
review &editing. Michael Berg: Conceptualization, Funding acquisition,
Investigation, Methodology, Project administration, Supervision, Visualiza-
tion, Writing review &editing.
Declaration of Competing Interest
The authors declare that they have no known competing nancial inter-
ests or personal relationships that could have appeared to inuence the
work reported in this paper.
Acknowledgments
We are grateful to Peter Molnar for valuable input and feedback on this
work. This project benetted from nancial support of the Swiss Agency
for Development and Cooperation (project no. 7F-09963.01.01). M.S. and
S.A.M.A.S.E. acknowledge the Higher Education Commission of Pakistan
for project funding (project no. 20-14825/NRPU/R&D/HEC/2021 2021)
and the scholarship award (project no. 1-8/HEC/HRD/2021/10887).
Appendix A. Supplementary data
Supplementary data to this article can be found online at https://doi.
org/10.1016/j.scitotenv.2022.156058.
References
Ahmad, M., Jamal, A., Tang, X.W., Al-Sughaiyer, M.A., Al-Ahmadi, H.M., Ahmad, F., 2020.
Assessingpotable water quality and identifying areas of waterborne diarrheal and uoro-
sis health risks using spatial interpolation in Peshawar, Pakistan. Water 12 (8).
Ali, W., Aslam, M.W., Junaid, M., Ali, K., Guo, Y.K., Rasool, A., Zhang, H., 2019. Elucidating
various geochemical mechanisms drive uoride contamination in unconned aquifers
along the major rivers in Sindh and Punjab, Pakistan. Environ. Pollut. 249, 535549.
Amini, M., Mueller, K., Abbaspour, K.C., Rosenberg, T., Afyuni, M., Moller, K.N., et al., 2008.
Statisti cal modeling of g lobal geo genic uoride contamination in groundwaters. Environ.
Sci. Technol. 42 (10), 36623668.
Ayoob, S., Gupta, A.K., 2006. Fluoride in drinking water: a review on the status and stress
effects. Crit. Rev. Environ. Sci. Technol. 36 (6), 433487.
Banerjee, A., 2015. Groundwater uoride contamination: a reappraisal. Geosci. Front. 6 (2),
277284.
Bhatnagar, A., Kumar, E., Sillanpää, M., 2011. Fluoride removal from water by adsorptiona
review. Chem. Eng. J. 171 (3), 811840.
Bhowmik, A.K., Alamdar, A., Katsoyiannis, I., Shen, H., Ali, N., Ali, S.M., Eqani, S.A.M.A.S.,
2015. Mapping human health risks from exposure to trace metal contamination of drink-
ing water sources in Pakistan. Sci. Total Environ. 538, 306316.
Biau, G., Scornet, E., 2016. A random forest guided tour. TEST 25 (2), 197227.
Bo, Z., Mei, H., Yongsheng, Z., Xueyu, L., Xu elin, Z., Jun, D., 2003. Distribution and risk
assessment of uoride in drinking water in th e west plain regio n of Jilin province,
China. Environ. Geochem. Health 25 (4), 421431.
Brahman, K.D., Kazi, T.G., Afridi, H.I., Naseem, S., Arain, S.S., Ullah, N., 2013. Evaluation of
high levels of uoride, arsenic species and other physicochemical parameters in under-
ground water of two sub districts of tharparkar, Pakistan: a multivariate study. Water
Res. 47 (3), 10051020.
Breiman, L., 2001. Random forests. Mach. Learn. 45 (1), 532.
Buchhorn, M., Smets, B., Bertels, L., De Roo, B., Lesiv, M., Tsendbazar, N.-E., et al., 2020. Co-
pernicusGlobal Land Service:Land Cover 100m: Collection 3 Epoch 2015, Globe. Version
V3. 0.1) [Data set].
Chen, W., Xie, X.S., Wang, J.L., Pradhan, B., Hong,H.Y., Bui, D.T., et al.,2017. A comparative
study of logistic model tree, random forest, and classication and regression tree models
for spatial prediction of landslide susceptibility. Catena 151, 147160.
Edmunds, W.M., Smedley, P.L., 2013. Fluoride in natural waters. Essentials of Medical Geol-
ogy. Springer, pp. 311336.
Erickson, M.L., Elliott, S.M., B rown, C.J., Stac kelberg, P.E., Ransom, K.M., Red dy, J.E.,
Cravotta III, C.A., 2021. Machine-learning predictions of high arsenic and high manga-
nese at drinking water depths of the glacial aquifer system, northern continental United
States. Environ. Sci. Technol. 55 (9), 57915805.
Farooqi, A., 2015. Arsenic and Fluoride Contamination. A Pakistan Perspective.
Farooqi, A., Masuda, H., Firdous, N., 2007. Toxic uoride and arsenic contaminated ground-
water in the Laho re and Kasur dist ricts, Punjab, P akistan and poss ible contamina nt
sources. Environ. Pollut. 145 (3), 839849.
Fick, S.E., Hijmans, R.J., 2017. WorldClim 2: new 1-km spatial resolution climate surfaces for
global land areas. Int. J. Climatol. 37 (12), 43024315.
Gao, J., 2017. Downscaling Global Spatial Population Projections From 1/8-Degree to 1-km
Grid Cells. National Center for Atmospheric Research, Boulder, CO, USA.
García, M.G., Borgnino, L., 2015. Fluoride in the Context of the Environment.
Greenman,D.W., Swarzenski,W.V., Bennett, G.D., 1967. Ground-water Hydrology of the Pun-
jab, West Pakistan, With Emphasis on Problems Caused by Canal Irrigation. Government
Printing Ofce.
Handa, B., 1975. Geochemistry and genesis of uoride-containing ground waters in India.
Groundwater 13 (3), 275281.
Hengl, T., 2018. Global Landform and Lithology Class at 250 m Based on the USGS Global
Ecosystem Map. Zenodo.
Hengl, T., deJesus, J.M., Heuvelink, G.B.M., Gonzalez, M.R., Kilibarda, M., Blagotic, A., et al.,
2017. SoilGrids250m: global gridded soil information based on machine learning. PLoS
One 12 (2).
Hijmans, R.J., 2021. Geographic Data Analysis and Modeling [R Package Raster Version 3.4-
10].
Huang, J., Ling, C.X., 2005. Using AUC and accuracy in evaluating learning algorithms. IEEE
Trans. Knowl. Data Eng. 17 (3), 299310.
Iwasaki, A., 2007. Mucosal dendritic cells. Annu. Rev. Immunol. 25, 381418.
James, G., Witten, D., Hastie, T., Tibshirani, R., 2013. An Introduction to Statistical Learning.
112. Springer.
Y. Ling et al. Science of the Total Environment 839 (2022) 156058
8
Jones, B., ONeill, B.C., 2016. Spatially explicit globalpopulation scenariosconsistent with the
shared socioeconomic pathways. Environ. Res. Lett. 11 (8), 084003.
Khan, A., Whelton, H., O'Mullane, D., 2002. A map of natural uoride in drinking water in
Pakistan. Int. Dent. J. 52 (4), 291297.
Khan, A., Whelton, H., O'Mullane, D., 2004. Determining the optimal concentration of uo-
ride in drinking water in Pakistan. Community Dent. Oral Epidemiol. 32 (3), 166172.
Khan, S., Moheet, I.A., Farooq, I., Farooqi, F.A., ArRejaie, A.S., Al Abbad,M.H.A., Khabeer, A.,
2015. Prevalence of dental uorosis in school going children of Dammam, Saudi Arabia.
J. Dent. Allied Sci. 4 (2), 69.
Khattak, J.A., Farooqi, A., Hussain, I., Kumar, A., Singh, C.K., Mailloux, B.J., et al., 2021.
Groundwater uoride acrossthe Punjab plains of Pakistan and India: Distribution and un-
derlying mechanisms. Sci. Total Environ. 151353.
Khattak, J.A., Farooqi, A., Hussain, I., Kumar, A., Singh, C.K., Mailloux, B.J., et al., 2022.
Groundwater uoride across the Punjab plains of Pakistan and India: distribution andun-
derlying mechanisms. Sci. Total Environ. 806, 151353.
Khwaja, M.A., Aslam, A.,2018. Comparative Assessment of Pakistan NationalDrinking Water
Quality Standards With Selected Asian Countries and World Health Organization.
Kuhn, M., 2009. The caret package. J. Stat. Softw. 28 (5).
Kumar, M., Das, N., Goswami, R., Sarma, K.P., Bhattacharya, P., Ramanathan, A.L., 2016.
Coupling fractio nation and batch deso rption to understand arsenic and uoride co-
contamination in the aquifer system. Chemosphere 164, 657667.
Kumar, M., Goswami, R., Patel, A.K., Srivastava, M., Das, N., 2020.S cenario, perspecti ves and
mechanism of arsenic and uoride co-occurrence in t he groundwater : a review.
Chemosphere 249.
Liaw, A., Wiener, M., 2002. Classication and regression by randomForest. R News 2 (3),
1822.
Mohsin, A.,Hakeem, S., Arain, A.H.,Ali, T., Mirza, D., 2014.Frequency and severity of dental
uorosis among school children in Gadap Town, Karachi. Pakistan Oral &Dental Journal
34 (4).
Naseem, S.,Raque, T., Bashir,E., Bhanger, M.I., Laghari, A., Usmani, T.H.,2010. Lithological
inuenceson occurrence of high-uoride groundwater in Nagar Parkar area, Thar Desert,
Pakistan. Chemosphere 78 (11), 13131321.
Ozsvath, D.L., 2009. Fluoride and Environmental Health: A Review. 8, pp. 5979 1.
PCRWR, 2015. Pakistan Council of Research in Water Resources PCRWR, Pakistan.
Podgorski, J., Berg, M., 2020. Global threat of arsenic in groundwater. Science 368 (6493),
845.
Podgorski,J., Eqani, S.A.M.A.S., Khanam, T., Ullah, R., Shen, H.Q., Berg, M., 2017. Extensive
arsenic contaminationin high-pH unconned aquifers in the IndusValley. Sci. Adv. 3 (8).
Podgorski,J., Labhasetwar, P., Saha, D., Berg, M., 2018. Prediction modeling andmapping of
groundwater uoride contamination throughout India. Environ. Sci. Technol. 52 (17),
98899898.
Podgorski,J., Wu, R., Chakravorty, B., Polya,D.A., 2020. Groundwater arsenic distribution in
India by machine learning geospatial modeling. Int. J. Environ. Res. Public Health 17
(19), 711 9.
Qureshi, A.S., 2020. Groundwater go vernance in Pakistan: from col ossal development to
neglected management. Water 12 (11).
R Core Team, 2013. R: A Language and Environment for Statistical Computing.
Raque, T., 2008. Occurrence, Distribution and Origin of Fluoride-rich Groundwater in the
Thar Desert. Sindh University Jamshoro, Pakistan.
Raque, T., Naseem, S., Bhanger, M.I., Usmani, T.H., 2008. Fluoride ion contamination in the
groundwater of Mi thi sub-distri ct, the Thar Deser t, Pakistan. Envi ron. Geol. 56 (2),
317326.
Raque, T., Naseem, S., Usmani, T.H., Bashir, E., Khan, F.A., Bhanger, M.I., 2009. Geochem-
ical factors controlling the occurrence of high uoride groundwater in the Nagar Parkar
area, Sindh, Pakistan. J. Hazard. Mater. 171 (13), 424430.
Raque, T., Ahmed, I., Soomro, F., Khan, M.H., Shirin, K., 2015. Fluoride levels in urin e,
blood plasma and serum of people living in an endemicuorosis area in the Thar Desert,
Pakistan. J. Chem. Soc. Pak. 37 (6), 12121219.
Rahman, Z.U., Khan, B., Ahmada, I., Mian, I.A., Saeed, A., Afaq, A., et al., 2018. Areviewof
groundwater uoride contamination in Pakistan and an assessment of the risk of uoro-
sis. Fluoride 51 (2), 171181.
Rasheed, H., Iqbal, N., Ashraf, M., ul Hasan, F., 2022. Groundwater quality and availability
assessment: a case study of District Jhelum in the Upper Indus, Pakistan. Environ. Adv.
7, 100148.
Rasool, A., Farooqi, A., Xiao, T., Ali, W., Noor, S., Abiola, O., et al., 2018. A review of global
outlook on uoride contamination in groundwater with prominence on the Pakistan cur-
rent situation. Environ. Geochem. Health 40 (4), 12651281.
Raza, M., Hussain, F., Lee, J.Y., Shakoor, M.B., Kwo n, K.D., 2017. Groundwater status in
Pakistan: a review of contamination, health risks, and potential needs. Crit. Rev. Environ.
Sci. Technol. 47 (18), 17131762.
Reichstein, M., Camps-Valls, G., Stevens, B., Jung, M., Denzler, J., Carvalhais, N., Prabhat,
2019. Deep learning and process understanding for data-driven earth system science. Na-
ture 566 (7743), 195204.
Sami, E., Vichayanrat, T., Satitvipawee, P., 2016. Caries with dental uorosis and oral health
behaviour among 12-year school children in moderate-uoride drinking water commu-
nity in Quetta, Pakistan. J. Coll. Phys. Surg. Pakistan 26 (9), 744747.
Sanaullah, M., Me hmood, Q., Ahmad , S.R., Rehman, H .U., 2019. Arseni c contamination
trends of abandoned river banks: a case study at the left bank of river Ravi, Lahore. Int.
J. Econ. Environ. Geol. 2124.
Shah, S., Bandekar, K., 1998. Drinking water compared to WHO guidelines (1993). Indian
Waterworks Assoc. 30, 179184.
Singh, C.K., Mukherjee, S., 2015. Aqueous geochemistry of uoride enriched groundwater in
arid part of Western India. Environ. Sci. Pollut. Res. 22 (4), 26682678.
Tharwat, A., 2020. Classication assessment methods. Appl. Comput. Informatics 17 (1).
Trabucco, A., Zomer, R.J., 2010. Global soil water balance geospatial database. CGIAR Con-
sortium for spatial information. CGIAR Consort ium for Spatial Information. https://
cgiarcsi.community.
Ullah, Zahid, et al., 2022. Arsenic contamination, water toxicity, source apportionment, and
potential heal th risk in groundw ater of Jhelum Basin, Punjab, Pakist an. Biol. Trace
Elem. Res. 111.
Verdin, K., 2017. Hydrologic Derivatives for Modeling and Applications (HDMA) Database:
US Geological Survey Data Release.
Waghmare, S.S., Arn, T., 2015. Fluoride removal from water by various techniques. Int.
J. Innov. Sci. Eng. Technol. 2 (9), 560571.
Wang, B.B., Zheng, B.S., Zhai, C., Yu, G.Q., Liu, X.J., 2004. Relationship between uorine in
drinking water and dental health of residents in some large cities in Ch ina. Environ.
Int. 30 (8), 10671073.
WAPDA/EUAD, 1989. Booklet on Hydrogeological Map of Pakistan, 1:2,000,000 Scale.Water
&Power Development Authority, Lahore and Environment &Urban Affairs Division,
Govt. of Pakistan, Islamabad.
Winkel, L., Berg, M., Amini, M., Hug, S.J., Johnson, C.A., 2008. Predicting groundwater arse-
nic contaminat ion in Southeas t Asia from surf ace parameter s. Nat. Geosci. 1 (8),
536542.
World Bank, 2020. P opulation grow th (annual), Pakistan. https://dat a.worldbank. org/
indicator/SP.POP.GROW?locations=PK.
World Health Organization, 1994. Expert committee on oral health status and uoride use.
Fluorides and Oral Health. WHO Technical Report Series. 846.
World Health Organization, 2011. Guidelines for drinking-water quality. WHO Chronicle 38
(4), 104108.
World Health OrganizationUNICEF, 2019. Jonit Monitoring Programme for Water Supply,
Sanitation and Hy giene: Estima tes on the Use of Water, Sanitation and Hygiene in
Pakistan.
Wu, R., Podgorski, J., Berg, M., Polya, D.A., 2021. Geostatistical model of the spatial distribu-
tion of arsenicin groundwaters in Gujarat State, India. Environ. Geochem. Health 43 (7),
26492664.
Younas, A., Mushtaq, N., Khattak, J.A., Javed, T., Rehman, H.U., Fa rooqi, A., 2019.
High levels of uoride contamination in groundwater of the semi-arid alluvial aquifers,
Pakistan: evaluating the recharge sources and geochemical identication via stable
isotopes and othe r major elemental da ta. Environ. Sci. Po llut. Res. 26 (35),
3572835741.
Zomer, R.J., Bossio, D.A., Trabucco, A., Yuanjie, L., Gupta, D.C., Singh, V.P., 2007. Trees and
Water: Smallholder Agroforestry on Irrigated Lands in Northern India. 122. IWMI.
Zomer, R.J., Trabucco, A., Bossio, D.A., Verchot, L.V., 2008. Climate changemitigation: a spa-
tial analysis of global land suitability for clean development mechanismafforestation and
reforestation. Agric. Ecosyst. Environ. 126 (12), 6780.
Y. Ling et al. Science of the Total Environment 839 (2022) 156058
9
... Machine learning methods can effectively solve these problems [23][24][25]; they can establish models between driving factors and measurements of elements, such as fluoride, selenium, and other potentially toxic elements, to predict their distribution [26][27][28]. Moreover, the random forest machine learning method is effective in modelling binary target variables (for example, fluoride concentrations above a threshold) and predicting their occurrence based on relevant geospatial predictors [29,30]. ...
... Precipitation replenishes surface water and groundwater, diluting fluoride concentrations, and evapotranspiration concentrates it. Therefore, a dry climate is conducive to fluoride accumulation in natural waters, which is consistent with our results and those in the literature [11,29]. However, climatic conditions can connect to other factors, indirectly driving fluoride concentrations in natural water. ...
... In Ethiopia, India, Tanzania, Mexico, China and other countries, the average fluoride concentration in drinking water is approximately 2 mg/L (7)(8)(9)(10)(11)(12). Therefore, the World Health Organization has set the maximum fluoride concentration in water to 1.5 mg/L (13). Long-term ingestion of fluoride causes fluorine to accumulate in the body, which can damage bone tissues and the cardiovascular, nervous, reproductive and digestive systems, as well as result in a range of adverse effects on immune system (14,15). ...
Article
Full-text available
Excessive fluoride intake from residential environments may affect multiple tissues and organs; however, the specific pathogenic mechanisms are unclear. Researchers have recently focused on the damaging effects of fluoride on the immune system. Damage to immune function seriously affects the quality of life of fluoride-exposed populations and increases the incidence of infections and malignant tumors. Probing the mechanism of damage to immune function caused by fluoride helps identify effective drugs and methods to prevent and treat fluorosis and improve people’s living standards in fluorosis-affected areas. Here, the recent literature on the effects of fluoride on the immune system is reviewed, and research on fluoride damage to the immune system is summarized in terms of three perspectives: immune organs, immune cells, and immune-active substances. We reviewed that excessive fluoride can damage immune organs, lead to immune cells dysfunction and interfere with the expression of immune-active substances. This review aimed to provide a potential direction for future fluorosis research from the perspective of fluoride-induced immune function impairment. In order to seek the key regulatory indicators of fluoride on immune homeostasis in the future.
... To assess the suitability of water for agricultural and drinking purposes, a variety of water quality indices were also studied by researchers (Ganguli et al., 2022;Kouser et al., 2022). Further, use of machine learning algorithms has given a new direction to these studies worldwide (Jaydhar et al., 2022;Ling et al., 2022). In the study by Podgorski et al. (2022), random forest algorithm was utilized to predict the probability of F − concentrations in the groundwaters of Pakistan. ...
Article
The natural and anthropogenic alterations in groundwater are exposing human health to nitrates and fluorides ingestion, especially in arid and semi-arid regions. The study focused on the depth-wise assessment of groundwater aquifers (n = 150) and non-carcinogenic health risks to vulnerable population in semi-arid regions of South, North and Central Punjab, Pakistan. The water quality index delineated the shallow and moderate aquifers unfit for human consumption with high values in North region (88%) as “very poor” to “unsuitable for drinking” as compared to the South and Central regions. The North region (56%) was categorized as high risk zone for nitrate pollution (2.48 times), while fluorides affected aquifers of the South region. The piper diagram elucidated bicarbonates prominence in the North and South, while bicarbonates and chlorides type in the Central region aquifers. The Gibbs plot attributed geogenic exchanges like silicate-weathering and ion-exchange mechanisms as dominant groundwater features. The non-carcinogenic health risks for nitrates were higher in the North region (Children: 1.36E+00 > Adult: 1.34E+00 > Infants: 1.08E+00) while for fluorides, South region (Children: 1.18E+00 > Adults: 1.17E+00) was at risk. The overall health impact was higher in South > North > Central for Children > Adults > Infants, respectively. Nitrates and fluorides made children (age 1–12 years) vulnerable to non-carcinogenic health risks in these regions. It is crucial to prioritize efforts to treat and reduce groundwater pollution to protect human health and the environment.
Article
Full-text available
The present study found that ∼80 million people in India, ∼60 million people in Pakistan, ∼70 million people in Bangladesh, and ∼3 million people in Nepal are exposed to arsenic groundwater contamination above 10 μg/L, while Sri Lanka remains moderately affected. In the case of fluoride contamination, ∼120 million in India, >2 million in Pakistan, and ∼0.5 million in Sri Lanka are exposed to the risk of fluoride above 1.5 mg/L, while Bangladesh and Nepal are mildly affected. The hazard quotient (HQ) for arsenic varied from 0 to 822 in India, 0 to 33 in Pakistan, 0 to 1,051 in Bangladesh, 0 to 582 in Nepal, and 0 to 89 in Sri Lanka. The cancer risk of arsenic varied from 0 to 1.64 × 1−1 in India, 0 to 1.07 × 10−1 in Pakistan, 0 to 2.10 × 10−1 in Bangladesh, 0 to 1.16 × 10−1 in Nepal, and 0 to 1.78 × 10−2 in Sri Lanka. In the case of fluoride, the HQ ranged from 0 to 21 in India, 0 to 33 in Pakistan, 0 to 18 in Bangladesh, 0 to 10 in Nepal, and 0 to 10 in Sri Lanka. Arsenic and fluoride have adverse effects on animals, resulting in chemical poisoning and skeletal fluorosis. Adsorption and membrane filtration have demonstrated outstanding treatment outcomes.
Article
Fluoride is one of the abundant elements found in the Earth's crust and is a global environmental issue. The present work aimed to find the impact of chronic consumption of fluoride contained groundwater on human subjects. Five hundred and twelve volunteers from different areas of Pakistan were recruited. Cholinergic status, acetylcholinesterase and butyrylcholinesterase gene SNPs and pro-inflammatory cytokines were examined. Association analysis, regression and other standard statistical analyses were performed. Physical examination of the fluoride endemic areas' participants revealed the symptoms of dental and skeletal fluorosis. Cholinergic enzymes (AChE and BChE) were significantly increased among different exposure groups. ACHE gene 3'-UTR variant and BCHE K-variant showed a significant association with risk of fluorosis. Pro-inflammatory cytokines (TNF-α, IL-1β and IL-6) were found to be increased and have a significant correlation in response to fluoride exposure and cholinergic enzymes. The study concludes that chronic consumption of high fluoride-contained water is a risk factor for developing low-grade systemic inflammation through the cholinergic pathway and the studied cholinergic gene SNPs were identified to be associated with the risk of flurosis.
Article
Full-text available
Potable groundwater (GW) contamination through arsenic (As) is a commonly reported environmental issue in Pakistan. In order to examine the groundwater quality for As contamination, its geochemical behavior, and other physicochemical parameters, 69 samples from various groundwater sources were collected from the mining area of Pind Dadan Khan, Punjab, Pakistan. The results showed the concentration of elevated As, its source of mobilization, and linked public health risk. Arsenic detected in the groundwater samples varied from 0.5 to 100 µg/L, with an average value of 21.38 µg/L. Forty-two samples were beyond the acceptable limit of 10 µg/L of the WHO for drinking purposes. The statistical summary showed that the groundwater cation concentration was in decreasing order such as Na⁺ > Ca²⁺ > Mg²⁺ > K⁺, while anions were as follows: HCO3⁻ > SO4²⁻ > Cl⁻ > NO3⁻. Hydrochemical facies results depicted that groundwater samples belong to CaHCO3 type. Rock-water interactions control the hydrochemistry of groundwater. Saturation indices’ results indicated the saturation of the groundwater sources for CO3 minerals due to their positive SI values. Such minerals include aragonite, calcite, dolomite, and fluorite. The principal component analysis (PCA) findings possess a total variability of 77.36% suggesting the anthropogenic and geogenic contributing sources of contaminant. The results of the Exposure-health-risk-assessment model for measuring As reveal significant potential carcinogenic risk exceeding the threshold level (value > 10⁻⁴) and HQ level (value > 1.0).
Article
Full-text available
Globally, over 200 million people are chronically exposed to arsenic (As) and/or manganese (Mn) from drinking water. We used machine-learning (ML) boosted regression tree (BRT) models to predict high As (>10 μg/L) and Mn (>300 μg/L) in groundwater from the glacial aquifer system (GLAC), which spans 25 states in the northern United States and provides drinking water to 30 million people. Our BRT models’ predictor variables (PVs) included recently developed three-dimensional estimates of a suite of groundwater age metrics, redox condition, and pH. We also demonstrated a successful approach to significantly improve ML prediction sensitivity for imbalanced data sets (small percentage of high values). We present predictions of the probability of high As and high Mn concentrations in groundwater, and uncertainty, at two nonuniform depth surfaces that represent moving median depths of GLAC domestic and public supply wells within the three-dimensional model domain. Predicted high likelihood of anoxic condition (high iron or low dissolved oxygen), predicted pH, relative well depth, several modeled groundwater age metrics, and hydrologic position were all PVs retained in both models; however, PV importance and influence differed between the models. High-As and high-Mn groundwater was predicted with high likelihood over large portions of the central part of the GLAC.
Article
Full-text available
Groundwater is playing an essential role in expanding irrigated agriculture in many parts of the world. Pakistan is the third-largest user of groundwater for irrigation in the world. The surface water supplies are sufficient to irrigate 27% of the area, whereas the remaining 73% is directly or indirectly irrigated using groundwater. The Punjab province uses more than 90% of the total groundwater abstraction. Currently, 1.2 million private tubewells are working in the country, out of which 85% are in Punjab, 6.4% are in Sindh, 3.8% are in Khyber-Pakhtunkhwa, and 4.8% are in Baluchistan. The total groundwater extraction in Pakistan is about 60 billion m 3. The access to groundwater has helped farmers in securing food for the increasing population. However, unchecked groundwater exploitation has created severe environmental problems. These include rapidly falling groundwater levels in the irrigated areas and increased soil salinization problems. The groundwater levels in more than 50% of the irrigated areas of Punjab have dropped below 6 m, resulting in increased pumping cost and degraded groundwater quality. Despite hectic efforts, about 21% of the irrigated area is affected by different levels of salinity. The country has introduced numerous laws and regulations for the sustainable use and management of groundwater resources, but the success has so far been limited. Besides less respect for the law, unavailability of needed data and information, lack of political will and institutional arrangements are the primary reasons for poor groundwater management. Pakistan needs to revisit its strategies to make them adaptable to local conditions. An integrated water resource management approach that brings together relevant government departments, political leadership, knowledge institutions, and other stakeholders could be an attractive option.
Article
Full-text available
Groundwater is a critical resource in India for the supply of drinking water and for irrigation. Its usage is limited not only by its quantity but also by its quality. Among the most important contaminants of groundwater in India is arsenic, which naturally accumulates in some aquifers. In this study we create a random forest model with over 145,000 arsenic concentration measurements and over two dozen predictor variables of surface environmental parameters to produce hazard and exposure maps of the areas and populations potentially exposed to high arsenic concentrations (>10 µg/L) in groundwater. Statistical relationships found between the predictor variables and arsenic measurements are broadly consistent with major geochemical processes known to mobilize arsenic in aquifers. In addition to known high arsenic areas, such as along the Ganges and Brahmaputra rivers, we have identified several other areas around the country that have hitherto not been identified as potential arsenic hotspots. Based on recent reported rates of household groundwater use for rural and urban areas, we estimate that between about 18-30 million people in India are currently at risk of high exposure to arsenic through their drinking water supply. The hazard models here can be used to inform prioritization of groundwater quality testing and environmental public health tracking programs.
Article
Full-text available
Waterborne diseases have become one of the major public health concerns worldwide. This study is aimed to investigate and develop spatial distribution mapping of the potable water quality parameters in the city of Peshawar, Pakistan. A total of 108 water samples collected across the entire study area were subjected to physio-chemical and biological analyses. Tested parameters included pH, turbidity, temperature, fluoride concentration levels, and bacterial counts (faecal coliforms). Inverse distance weighting (IDW) interpolation in geographic information systems (GIS) was used for spatial analysis. Test results revealed that 48% of water samples had faecal coliforms count (per 100 mL) greater than World Health Organization (WHO) minimum limits, while 31% of samples had fluoride concentrations in excess of the WHO maximum guide values. Spatial distribution mapping was developed for faecal coliforms count and fluoride ion concentration using ArcGIS to highlight the high-risk settlements in the study area. Results showed that around 20% area under faecal coliforms and approximately 33% area based on fluoride concentrations fall under the need for treatment category. The pH and turbidity were found in compliance with WHO desirable limits. The sanitary inspection score significantly depicted that ineffective multi-barrier approaches consequently deteriorated the water quality at the consumer's end. Findings from the present study shall be useful to policymakers for adopting necessary remedial measures before it severely affects public health.
Article
Full-text available
Geogenic arsenic contamination in groundwaters poses a severe health risk to hundreds of millions of people globally. Notwithstanding the particular risks to exposed populations in the Indian sub-continent, at the time of writing, there was a paucity of geostatistically based models of the spatial distribution of groundwater hazard in India. In this study, we used logistic regression models of secondary groundwater arsenic data with research-informed secondary soil, climate and topographic variables as principal predictors generate hazard and risk maps of groundwater arsenic at a resolution of 1 km across Gujarat State. By combining models based on different arsenic concentrations, we have generated a pseudo-contour map of groundwater arsenic concentrations, which indicates greater arsenic hazard (> 10 μg/L) in the northwest, northeast and south-east parts of Kachchh District as well as northwest and southwest Banas Kantha District. The total number of people living in areas in Gujarat with groundwater arsenic concentration exceeding 10 μg/L is estimated to be around 122,000, of which we estimate approximately 49,000 people consume groundwater exceeding 10 µg/L. Using simple previously published dose–response relationships, this is estimated to have given rise to 700 (prevalence) cases of skin cancer and around 10 cases of premature avoidable mortality/annum from internal (lung, liver, bladder) cancers—that latter value is on the order of just 0.001% of internal cancers in Gujarat, reflecting the relative low groundwater arsenic hazard in Gujarat State.
Article
People's well-being and their economic development are linked to the availability and accessibility of water. The Pind Dadan Khan tehsil located on the right bank of River Jhelum is a classic example of water stressed confronting water quality and quantity issues. To evaluate usable potential and qualitative variations of groundwater, an integrated approach involving geophysical, water quality and risk assessment techniques was used. Accordingly, groundwater potential zones were categorized. A small shallow fresh groundwater pocket with acceptable water quality (<1.5 dS/m) for a depth between 15 m to 50 m exists in the eastern part of the study area. The groundwater of remaining tehsil was highly saline (TDS: 3852.23±5091.54 mg/L with maximum level up to 23164.03 mg/L). The quality of domestic wells at these 82 sites was unsafe (90%) due to salts, bacteriological contamination (71%), fluoride (45%), arsenic (5%), and nitrate (4%). Compared to these, public water supply schemes show comparatively lower salts (total dissolved solids of 144-2690 mg/L). However, arsenic was found beyond the WHO Drinking water guidelines (10 µg/L) in 65% sources which may pose serious cancer risks for 2 to 5 persons (maximum 12 persons) per 10,000 population. The study reveals that the freshwater in the study area is scarce and of vulnerable quality and require integrated water quantity and quality management. Our results also suggest that in arid to semi-arid regions, scoring factors based on salinity levels and relative size of the saline zone should be incorporated into indicators of water access and availability.
Article
Chronic exposure from drinking well-water with naturally high concentrations of fluoride (F⁻) has serious health consequences in several regions across the world including South Asia, where the rural population is particularly dependent on untreated groundwater pumped from private wells. An extensive campaign to test 28,648 wells was conducted across the Punjab plains of Pakistan and India by relying primarily on field kits to document the scale of the problem and shed light on the underlying mechanisms. Groundwater samples were collected from a subset of 712 wells for laboratory analysis of F⁻ and other constituents. A handful of sites showing contrasting levels of F⁻ in groundwater were also drilled to determine if the composition of aquifer sediment differed between these sites. The laboratory data show that the field kits correctly classified 91% of the samples relative to the World Health Organization guideline for drinking water of 1.5 mg/L F⁻. The kit data indicate that 9% of wells across a region extending from the Indus to the Sutlej rivers were elevated in F⁻ relative to this guideline. Field data indicate an association between the proportion of well-water samples with F⁻ > 1.5 mg/L and electric conductivity (EC) > 1.5 mS/cm across six floodplains and six intervening doabs. Low Ca²⁺ concentrations and elevated bicarbonate (HCO3⁻ > 500 mg/L) and sodium (Na⁺ > 200 mg/L) in high F⁻ groundwater suggest regulation by fluorite. This could be through either the lack of precipitation or the dissolution of fluorite regulated by the loss of Ca²⁺ from groundwater due to precipitation of calcite and/or ion exchange with clay minerals. Widespread salinization of Punjab aquifers attributed to irrigation may have contributed to higher F⁻ levels in groundwater of the region. Historical conductivity data suggest salinization has yet to be reversed in spite of changes in water resources management.
Article
Dowsing for danger Arsenic is a metabolic poison that is present in minute quantities in most rock materials and, under certain natural conditions, can accumulate in aquifers and cause adverse health effects. Podgorski and Berg used measurements of arsenic in groundwater from ∼80 previous studies to train a machine-learning model with globally continuous predictor variables, including climate, soil, and topography (see the Perspective by Zheng). The output global map reveals the potential for hazard from arsenic contamination in groundwater, even in many places where there are sparse or no reported measurements. The highest-risk regions include areas of southern and central Asia and South America. Understanding arsenic hazard is especially essential in areas facing current or future water insecurity. Science , this issue p. 845 ; see also p. 818
Article
Arsenic (As) and fluoride (F-) are the two most conspicuous contaminants, in terms of distribution and menace, in aquifers around the world. While the majority of studies focus on the individual accounts of their hydro-geochemistry, the current work is an effort to bring together the past and contemporary works on As and F- co-occurrence. Co-occurrence in the context of As and F- is a broad umbrella term and necessarily does not imply a positive correlation between the two contaminants. In arid oxidized aquifers, healthy relationships between As and F- is reported owing desorption based release from the positively charged (hydr)oxides of metals like iron (Fe) under alkaline pH. In many instances, multiple pathways of release led to little or no correlation between the two, yet there were high concentrations of both at the same time. The key influencer of the strength of the co-occurrence is seasonality, environment, and climatic conditions. Besides, the existing primary ion and dissolved organic matter also affect the release and enrichment of As-F- in the aquifer system. Anthropogenic forcing in the form of mining, irrigation return flow, extraction, recharge, and agrochemicals remains the most significant contributing factor in the co-occurrence. The epidemiological indicate that the interface of these two interacting elements concerning public health is considerably complicated and can be affected by some uncertain factors. The existing explanations of interactions between As-F are indecisive, especially their antagonistic interactions that need further investigation. “Multi-contamination perspectives of groundwater” is an essential consideration for the overarching question of freshwater sustainability.