ArticlePDF Available

Quantifying Climate and Catchment Control on Hydrological Drought in the Continental United States

Wiley
Water Resources Research
Authors:
  • Goddard Space flight Center-NASA

Abstract and Figures

The evolution of hydrological drought events is a result of complex (nonlinear) interactions between climate and catchment processes. To investigate such nonlinear relationship, we integrated a machine learning modeling framework based on the random forest (RF) algorithms with an interpretation framework to quantify the role of climate and catchment controls on hydrological drought. More particularly, our framework interprets a built RF machine‐learning model to identify dominant variables and visualize their functional dependence and interaction effects on hydrological drought characteristics utilizing concepts of minimal depth, interactive depth, and partial dependence. We test our proposed modeling framework based on a set of 652 continental United States catchments with minimal human interference for a period of 1979–2010. Application of this framework indicated presence of three distinct drought regimes, which includes, Regime 1: droughts with longer duration, less frequent and lesser intensity; Regime 2: droughts with moderate duration, moderate frequency, and moderate intensity; and Regime 3: droughts with shorter duration, more frequent, and more intense. RF algorithm was able to accurately model the drought characteristics (intensity, duration, and number of events) for all the three drought regimes as a function of selected variables. It was observed that the type of dominant variables as well as their nonlinear functional relationship with hydrological droughts characteristics can vary between three selected regimes. Our interpretation framework indicated that catchment characteristics have a significant role in controlling the hydrologic drought for catchments (regime 1), whereas both climate and catchment characteristics control hydrological drought in regimes 2 and 3.
This content is subject to copyright. Terms and conditions apply.
Quantifying Climate and Catchment Control on
Hydrological Drought in the Continental
United States
Goutam Konapala
1,2
and Ashok Mishra
1
1
Glenn Department of Civil Engineering, Clemson University, Clemson, SC, USA,
2
Environmental Sciences Division, Oak
Ridge National Laboratory, Oakridge, TN, USA
Abstract The evolution of hydrological drought events is a result of complex (nonlinear) interactions
between climate and catchment processes. To investigate such nonlinear relationship, we integrated a
machine learning modeling framework based on the random forest (RF) algorithms with an interpretation
framework to quantify the role of climate and catchment controls on hydrological drought. More
particularly, our framework interprets a built RF machinelearning model to identify dominant variables
and visualize their functional dependence and interaction effects on hydrological drought characteristics
utilizing concepts of minimal depth, interactive depth, and partial dependence. We test our proposed
modeling framework based on a set of 652 continental United States catchments with minimal human
interference for a period of 19792010. Application of this framework indicated presence of three distinct
drought regimes, which includes, Regime 1: droughts with longer duration, less frequent and lesser
intensity; Regime 2: droughts with moderate duration, moderate frequency, and moderate intensity; and
Regime 3: droughts with shorter duration, more frequent, and more intense. RF algorithm was able to
accurately model the drought characteristics (intensity, duration, and number of events) for all the three
drought regimes as a function of selected variables. It was observed that the type of dominant variables as
well as their nonlinear functional relationship with hydrological droughts characteristics can vary between
three selected regimes. Our interpretation framework indicated that catchment characteristics have a
signicant role in controlling the hydrologic drought for catchments (regime 1), whereas both climate and
catchment characteristics control hydrological drought in regimes 2 and 3.
1. Introduction
Hydrologic drought events dened as a period with inadequate surface and subsurface water resources are a
result of multifaceted interaction between climate and catchment processes (Mishra & Singh, 2010; Van
Lanen et al., 2013; Van Loon et al., 2014; Wang et al., 2011). Therefore, hydrologic drought not only depends
on decrease in precipitation or increase in temperature, but it is further inuenced by the interaction of
various climate and terrestrial components (e.g., soil characteristics, elevation, and stream order). An
inadequate understanding of this complexity can be a major challenge for accurate prediction as well as
efcient drought management (Cayan et al., 2010; Mishra & Singh, 2011; Narasimhan & Srinivasan, 2005;
Shefeld et al., 2012). To address this complex hydrological drought processes, many studies have investi-
gated the potential inuence of terrestrial catchment characteristics on hydrological droughts by utilizing
physically based models (Apurv et al., 2017; Tallaksen et al., 2009; Van Loon et al., 2014; Van Loon &
Laaha, 2015; Van Loon & Van Lanen, 2012). However, the application of physically based models for
catchments is often plagued by differences in spatial scale, over/underparameterization, and model
structural error, including model calibration uncertainties.
A few studies have utilized a linear regressionbased framework (Saft et al., 2016, 2015; Van loon et al., 2014;
Van Loon & Laaha, 2015; Van Loon & Van Lanen, 2012) to understand the role of climate and terrestrial
components in the development of hydrological drought. On the other hand, many studies suggested the
response of streamow to meteorological conditions is predominantly nonlinear in nature (Konapala &
Mishra, 2016; Latt et al., 2015; Stahl et al., 2008). Therefore, we expect that hydrological drought character-
istics derived based on streamow likely to have a nonlinear dependence due to the complex interaction
between climate and catchment processes within a watershed. In addition to that, the evolution of
©2019. American Geophysical Union.
All Rights Reserved.
RESEARCH ARTICLE
10.1029/2018WR024620
Special Section:
Big Data & Machine Learning
in Water Sciences: Recent
Progress and Their Use in
Advancing Science
Key Points:
An integrated random forest
algorithm interpretation framework
was applied to investigate
hydrological drought characteristics
in CONUS
This framework indicated the
presence of three drought regimes
which witnesses dominant climate
and catchment controls
The dominant climate and
catchment controls exhibit varied
functional relationships with
hydrological droughts
Supporting Information:
Supporting Information S1
Table S1
Correspondence to:
A. Mishra,
ashokm@g.clemson.edu
Citation:
Konapala, G., & Mishra, A. (2020).
Quantifying climate and catchment
control on hydrological drought in the
continental United States. Water
Resources Research,56,
e2018WR024620. https://doi.org/
10.1029/2018WR024620
Received 18 DEC 2018
Accepted 27 NOV 2019
Accepted article online 11 DEC 2019
KONAPALA AND MISHRA 1of25
hydrological drought is often clustered based on neighboring catchments due to the similarity in climate and
catchment characteristics (Rajasekhar et al., 2014; Zhang et al., 2012). Hence, it is important to gain a deeper
understanding of the dominant linear and nonlinear controls resulting in distinct drought regimes using
robust nonparametric techniques. Therefore, there is a great potential to further quantify the nonlinear
association between climate (catchment) variables and the evolution of clustered drought events based on
nonparametric techniques, machine learning algorithms and interpretive framework.
Machine learning algorithms are a class of nonparametric techniques that can successfully capture subtle
functional relationship between the input (e.g., precipitation, evaporation, and base ow) and the output
variables (e.g., streamow) of a hydrologic system (e.g., watersheds), even if the underlying mechanism pro-
ducing data is not known (Elshorbagy et al., 2010a, 2010b; Nourani et al., 2014; Raghavendra & Deka, 2014).
In addition to that, these methods have no distributional or functional assumptions on covariate relation to
the response function. Hence, majority of the studies in hydrology have utilized machine learning algo-
rithms for prediction purposes in hydrology (Shen, 2018). However, the formulation of machine learning
algorithms may not be straightforward to quantify underlying mechanisms responsible for model behavior
in case of hydrologic processes (Gupta & Nearing, 2014; Karpatne et al., 2017). Recognizing these issues in
machine learning algorithms, recently several studies have introduced interpretation frameworks [see
Guidotti et al., 2018, for review] to address such limitations. Works on interpreting these blackbox models
have focused on understanding how a xed machinelearning model leads to particular predictions. These
interpretation frameworks can provide a deeper understanding on the functioning of machine learning
models like articial neural networks, random forests (RF), and support vector machines (Bastani et al.,
2017; Bibal & Frénay, 2016; DoshiVelez & Kim, 2017). Although, the machine learning approaches are
widely used in hydroclimatology (Veettil et al., 2018; Fahimi et al., 2017; Shortridge et al., 2016;
Raghavendra & Deka, 2014), the interpretation framework to quantify the causal relationship between input
and modeled outputs is emerging (Fienen et al., 2018; Koch et al., 2019; Schwalm et al., 2017) and especially
not applied to extreme events.
The above discussion suggests that a limited research conducted to investigate the dominant nonlinear inu-
ence of the climate as well as catchment characteristics on evolution of clustered hydrological drought
regimes. Therefore, we followed twostep approach to improve our understanding of heterogeneous nature
of drought characteristics over CONUS: (i) First, a classication algorithm was applied to identify optimal
number of clusters associated with drought regimes and (ii) an interpretative modeling framework was
applied within individual drought regimes to identify key climate and catchment characteristics that has
potential inuence on the hydrological drought characteristics. For this purpose, we selected 652 watersheds
located in the CONUS due to the availability of abundant hydrologic, physical, soil, and geomorphic
information with least human interference from Geospatial Attributes of Gages for Evaluating Streamow
Version 2 (GAGES II) database (Falcone, 2011). Overall, we aim to address following questions:
1. How are the hydrological drought regimes clustered in the CONUS? What are the key climate and
catchment characteristics that control hydrological drought regimes?
2. To identify and extract the functional relationships and interactions among dominant variables inuen-
cing the hydrological drought characteristics based on the interpretive machine learning techniques (i.e.,
minimal depth, interactive depth and partial dependence plots).
The remainder of the manuscript is organized as follows: Section 2 provides an overview of data, study area,
section 3 presents the methods designed for this study, section 4 presents the results, section 5 discusses the
ndings and the outlook, and nally, the manuscript is concluded with section 6.
2. Data and Study Area Description
We selected the catchments located in CONUS due to the availability of extensive and open source data
associated with various characteristics of catchments. In addition, to understand the dominant variables
associated with different drought regimes, it is important to utilize data from catchments with minimal
human interference. Therefore, we rst identied catchments with minimal human interference based on
the GAGES II database (Falcone, 2011), which provides geospatial data for 9,322 stream gages maintained
by USGS. This data set serves the purpose of providing users with a comprehensive set of geospatial charac-
teristics for many gaged catchments with long ow record. In addition to that, it also provides information
10.1029/2018WR024620
Water Resources Research
KONAPALA AND MISHRA 2of25
on catchments which are least disturbed by human inuences. In this database, 2,057 catchments are iden-
tied to have minimal human interference based on three criteria: (1) a quantitative index of anthropogenic
modication within the catchment based on Geographical Information system derived variables, (2) visual
inspection of every stream gage and drainage basin from recent highresolution imagery and topographic
maps, and (3) information about human inuences from USGS Annual Water Data Reports (Falcone,
2011). We have selected water years of 19802011 to represent the U.S. climate normal period as our study
period to reect the current climate conditions. Overall, we identied 652 catchments with no missing data
during the period of 19802011. The spatial location of catchments with minimal human interference and
continuous streamow data are shown in Figure 1.
2.1. Overview of Selected Climate and Catchment Variables
A lack of precipitation and increase in evapotranspiration (i.e., meteorological drought) causes low soil
moisture content (i.e., agricultural drought), which further reduces surface and subsurface water resources
(i.e., hydrological drought) (Mishra & Singh, 2010; Mukherjee et al., 2018). The propagation of meteorologi-
cal to hydrological drought is inuenced by interaction between climate and catchment variables (Apurv
et al., 2017; Haslinger et al., 2014; Mishra & Singh, 2010; Tallaksen et al., 2009; Van Loon et al., 2014).
The hydrological drought is directly related to the streamow generated in a watershed, and it is inuenced
(controlled) by climate and catchment characteristics of the selected watershed. In our analysis, we selected
sixty variables related to climate, catchment, and morphological aspects of catchments documented by the
GAGES II data set (Table S1 in the supporting information). Among them, 12 climate variables describing
the annual magnitude and intraannual variability of precipitation, temperature, and potential evapotran-
spiration, and these data are obtained from the highresolution data available from PRISM database (Daly
et al., 2000). Fifteen hydrologic catchment variables related to stream order, baseow index, and overland
ow are derived from the U.S. National Hydrography Data Set (NHD). Four land cover variables describing
the percentage of different land cover types are derived from 2006 Land cover product obtained from
National Land Cover Database. Twentythree soil characteristics are derived from State Soil Geographic data
base for the CONUS. Finally, six topographic variables related to elevation, slope, and geographical aspect
features of the catchments are included in the analysis. A brief discussion and data sources of the selected
variables are provided in Table S1. The interplay between these catchment characteristics are assumed to
shape catchment behavior by inuencing how catchments store and transfer water. The variables selected
and provided in this database are considered to signicantly affect the hydrologic processes. Some of these
catchment attributes have been previously used for predicting mean streamow (Rice et al., 2015) and other
streamow signatures (Addor et al., 2018) and drought (Stoelzle et al., 2014). In addition to previous
variables, we have selected multiple attributes to cover a wide range of features, such as the catchment
climate, hydrology, land cover, soil, geology, topography, and river network.
3. Methodology
3.1. Hydrological Drought Characterization
Hydrological drought often expressed a time period with inadequate surface and subsurface water resources
with respect to a normal condition of a given water resources management system (Mishra & Singh, 2010).
Therefore, we applied the concept of Standardized Streamow Index (SSI) (Shukla & Wood, 2008; Vicente
Serrano et al., 2011) to characterize hydrological drought at monthly time scale for selected watersheds
across USA. SSI can be computed for multiple timescales and is exible to determine the drought conditions
at seasonal (3 to 6 months), annual (12 months), and longer (>12month SSI) time scales. However, in this
study, we restrict our analysis to seasonal scale as the droughts usually take 3 or more months to develop.
Therefore, we calculated the 3month SSI by aggregating streamow over 3 months and tting these accu-
mulated values to a parametric statistical distribution. The probabilities from these tted distributions are
then transformed to the standard normal distribution to create hydrological drought index [Vicente
Serrano et al., 2011; Modarres, 2008; Shukla & Wood, 2008]. Therefore, SSI determines the conditions of
stream ow drought relative to the longterm monthly streamow. The positive SSI values indicate a surplus
relative to the longterm streamow conditions whereas the negative values indicate a decit (i.e., hydrolo-
gic drought). Hydrological drought indices similar to SSI have been previously applied to understand the
U.S. hydrological drought characteristics (Shukla & Wood, 2008; Veettil et al., 2018). Shukla and Wood
10.1029/2018WR024620
Water Resources Research
KONAPALA AND MISHRA 3of25
(2008) reported that the twoparameter gamma and lognormal distributions generally performed well for
deriving hydrological drought in USA. In this study, lognormal distribution was selected for deriving
hydrological drought index based on SSI by using streamow time series. The formulation of SSI is
presented in the supporting information Text S1.
3.2. Classication of Hydrological Drought Regimes
The SSI time series were constructed for all the 652 catchments to investigate the hydrological droughts. In
this study, we quantied hydrological drought when SSI < 0.5 for a period of more than 3 months. By
extracting drought events based on these two conditions we can differentiate the hydrological droughts from
seasonal streamow uctuations (Konapala & Mishra, 2017; Mishra & Singh, 2010, 2011). Once all the
drought events based on the above conditions are extracted, we computed the average drought duration,
severity and number of events based on the wellestablished theory of runs (Yevchevich, 1967). The number
of drought events is the total number of times the drought has occurred based on the abovedened thresh-
olds. The average duration of drought per event is determined by dividing total number of drought months
by number of drought events as shown in equation (1)
DD ¼ND
i¼1Di
ND (1)
where D
i
is duration of single drought event. Similarly, the average drought intensity is calculated by rst
estimating the intensity (S) of each drought event as
S¼D>3
SRI<0 SSI
D(2)
Its average over the period is calculated based on equation (3),
DI ¼ND
i¼1S
ND (3)
By applying the procedure of multivariate clustering, we can possibly distinguish the evolution of hydrolo-
gical drought regimes exhibited by catchments based on the longterm (~30 year) statistics of drought
characteristics (Rajsekhar et al., 2014; Gocic & Trajkovic, 2014; Yoo et al., 2012). The clusteringbased
Figure 1. Spatial locations of catchments which were selected in this study based on minimal human interference and
nonmissing data criterion.
10.1029/2018WR024620
Water Resources Research
KONAPALA AND MISHRA 4of25
approaches typically used in hydrological studies are based on hierarchical clustering, kmeans/medoids
clustering, and fuzzy partition clustering (Carrillo et al., 2011; Ley et al., 2011; Olden et al., 2012; Sawicz
et al., 2011; Yadav et al., 2007). Since, we aim to investigate dominant controls of catchment characteristics
dened by drought regimes; we applied a fuzzy partitioning algorithm that accounts for uncertainty in the
classication process.
We identied drought regimes based on the three drought characteristics (i.e., intensity, duration, and num-
ber of events) using a fuzzy medoid clustering algorithm. Fuzzy clustering assigns membership values, and it
is more generalized and useful to describe a point by its membership values in all the clusters. The method
chosen for this study is fuzzy k medoids clustering algorithm introduced by Krishnapuram et al. (2001),
which is usually more robust, and the effect of outliers can be signicantly reduced compared to other
clustering algorithms that uses mean values for classication. Hence, the data objects closer to the median
of clusters as determined by Euclidean distance likely to have higher degrees of membership compared to
objects scattered around the limits of clusters. Similar to other clustering algorithms, fuzzy kmedoids
follows a heuristic approach to minimize the within cluster variance. The formulation of this approach is
presented in the supporting information Text S2.
Xie and Beni (XB) index (Xie & Beni, 1991) is a widely used criterion for quantifying the quality of fuzzy clus-
tering. In order to complement XB index, we included fuzzy silhouette (FS) index to measure the similarity
of an object with respect to its own cluster (cohesion) compared to other clusters (separation). Therefore, we
utilized the fuzzy extension of silhouette index (Campello & Hruschka, 2006) as the second criterion to eval-
uate the optimal number of clusters in this study. These two indices can complement each other by capturing
the similarity of an object with respect to its own cluster (FS index) and the compactness of the clusters (XB
index). Higher the value of Silhouette index, more optimal is the resultant clustering. This is given
by equation (4)
FS kðÞ¼
n
i¼1uiguig

sikðÞ
n
i¼1uiguig
 (4)
where s
i
is the silhouette index for object i. Whereas, the XB index measures the compactness of the clusters
and it is especially formulated for evaluation of fuzzy clustering performance. This is given by equation (5) as
XB kðÞ¼
n
i¼1k
g¼1um
ig d2xi;mg

n×ðmin
g;gggðÞ
d2mg;mg
 (5)
where
X¼xij

:drought characteristics of order n×tðÞU¼uig

:membership degree of order k×tðÞd2
Mxi;mg

¼ximg:Euclidean distance mg;g¼1;;k

xi;i¼1;;nfg
:Medoids of the drought characteristics
Smaller the XieBeni index, more compact is the cluster. Therefore, each catchment is assigned to a specic
class with a certain probability, and the catchments with highest probability are considered as primary clus-
ters for subsequent analysis. The resulting clusters based on these trivariate drought characteristics (inten-
sity, duration, and number of events) are a consequence of natural partitions identied by the clustering
algorithm. The drought characteristics in each cluster would indicate a distinct drought regime that can pro-
vide valuable information on the controls of climate and catchment characteristics on hydrological droughts.
3.3. RF Model
In this study, we utilize RF algorithm (Breiman, 2001) to investigate the dominant catchment and climate
variables that plays an important role for evolution of clustered drought characteristics. It is important to
acknowledge that selection of an algorithm depends on the objectives and the types of data to be analyzed
(Caruana & NiculescuMizil, 2006; Huang et al., 2015; Kotsiantis et al., 2007). The RF algorithms differ from
linear regression methods. In this study, we used nonlinear RF model and it has a major advantage that they
are (mostly) unaffected by multicollinearity (Ishwaran et al., 2010; Zhang & Ma, 2012; DíazUriarte & De
10.1029/2018WR024620
Water Resources Research
KONAPALA AND MISHRA 5of25
Andres, 2006). The multicollinearity problem is alleviated since a random subset of features is chosen for
each tree in a RF. (Hsilch et al., 2014; Ishwaran et al., 2010; Zhang & Ma, 2012; DíazUriarte & De
Andres, 2006). The ability of RF algorithm to deal with overtting issues makes it suitable for
our application.
RF algorithm uses a set of bootstraps (Efron & Tibshirani, 1994) samples and grows an independent tree
model on each bootstrapped sample of the population. Each tree is grown by recursively partitioning the
population with an objective to minimize the mean square errors. At each split, a subset of candidate vari-
ables is tested for the split optimization and each node is divided into two successor nodes. Each successor
node is then split again until the process reaches the stopping criteria of either maximum node purity or
node member size, which denes the set of terminal (unsplit) nodes for the tree. RF algorithm then ranks
each training set observation into one unique terminal node per tree. The RF estimate for each observation
is then calculated by averaging the terminal node results across the collection of trees. A basic pseudo
algorithm explaining the RF procedure is presented in Table 1 and Figure 2. The resampling and averaging
procedure circumvents the problem of overtting and multicollinearity making this approach suitable for
our study (Cutler et al., 2007; DíazUriarte & De Andres, 2006; Prasad et al., 2006; Zhang & Ma, 2012). RF
algorithm can be tuned to reduce the prediction error (Boulesteix et al., 2012; Breiman, 2001; Strobl et al.,
2009). The accuracy of RF algorithm output mainly depends on three parameters (1) the number of trees
(ntrees) to grow in the forests, (2) the number of randomly selected predictor variables (mtry) at each node,
and (3) the minimal number of observations at the terminal nodes (nodesize) of the trees. We set the number
of trees (ntrees) to 1,000 as suggested by Hengl et al. (2018) and Probst and Boulesteix (2017), and we
randomly resampled different combinations of parameter sets with mtryranging from one to total
variables considered (60 variables) and nodesizeranging from one to total number of catchments in each
regime. The combination of mtryand nodesizeare selected based on the least outofbag error is
considered as the optimal parameter.
3.4. Framework for Interpreting RF Algorithm
We interpreted the RF model by examining three important features exploring variable importance, variable
interaction and partial dependence. Variable importance and interaction are based on maximal trees and
minimal depth concept (Ishwaran et al., 2010), whereas partial dependence is estimated by integrating the
effects of all the variables besides the covariate of interest (Breiman, 2001). The concept of minimal depth
would allow us to identify the dominant variables, whereas the partial dependence quanties the approxi-
mate relationship between each dominant variable and the drought characteristic. The concept of interac-
tion depth would allow us to understand the interaction among dominant controls of climate, catchment
and morphological variables related to a particular drought characteristic.
3.4.1. Minimal Depth
The concept of minimal depth [Diller et al., 2012; Hsisch et al., 2011; Ishwaran et al., 2010] is useful for asses-
sing the variable importance and variable interactions within a RF modeling framework. The concept of
minimal depth of a RF can be formulated precisely in terms of a maximal subtree. The maximal subtree
for a variable vis the largest subtree whose root node is split based on the changes in variable v. The shortest
distance from the root of the tree to the root of the closest maximal subtree of vis the minimal depth of v.
The minimal depth for any variable vcan be expressed as
MD vðÞ¼
NRF
i¼1TvðÞ
kk
NRF
(6)
where N
RF
is the total of number of trees (i.e., ntrees = 1,000), T(v)represents the distance of variable v
from the root of any tree T. To illustrate this concept of maximal tree and minimal depth of variable v,we
show three separate trees (Figure 3) representing three randomized trees to mimic the behavior of RFs. In
this way, depth of variable vfor all the maximal subtrees are identied and averaged across all the rando-
mized trees to calculate the (minimal depth) MD(v). A smaller MD(v) value indicates that the corresponding
variable vis more inuential. Those variables with averaged minimal depth exceeding the average minimal
depth threshold are treated as noisy and therefore removed from the nal model.
10.1029/2018WR024620
Water Resources Research
KONAPALA AND MISHRA 6of25
Figure 2. An illustration of random forest algorithm used to model the inuence of dominant variables on hydrologic droughts for CONUS.
Figure 3. A pictorial illustration of concept of maximal subtrees, minimal depth, and interactive. The maximal trees are indicated by red, and the depth is indicated
by an integer located in the center of tree with the root node as zero. In the rst tree (Figure 3a), the variable vsplits the root node; therefore, the entire tree
can be considered as a maximal subtree for the variable v, whereas, in the second tree (Figure 3b), the maximal subtree for variable vis not the entire tree as
exhibited in the previous scenario. This is because the variable vdoes not split on the root node unlike the previous case. Figure 3c presents another scenario with
two maximal subtrees for variable v. The maximal subtree on the left side has a depth of two, whereas on the right side it has a depth of one.
10.1029/2018WR024620
Water Resources Research
KONAPALA AND MISHRA 7of25
3.4.2. Interactive Depth
Dominant controls identied by the concept of minimal depth can potentially identify the effect of each
independent variable on the drought characteristics; however, it ignores the interaction effect with respect
to other variables. For instance, drought propagation might be inuenced by two or more interacting vari-
ables in a specic regime. Therefore, an interactive minimal depth metric that measures the interactions
between any two variables vand wis needed (Diller et al., 2012; Hsisch et al., 2011; Ishwaran et al., 2010).
For this purpose, we rst dene the variable interactive distance MT(v,w), which represents the distance
between variables vand wfrom the root of any maximal tree (MT). Since, the maximal tree depths signi-
cantly vary across each randomized tree (Figure 3), a standardization procedure needs to be applied. The
interactive depth can be formulated as
ID v;wðÞ¼
NRF
i¼11
MT v;wðÞ
kk
MTD vðÞ
NRF
(7)
where N
RF
is the total of number of trees (i.e., ntrees = 1,000), MT(v,w)represents the distance between
variable vand wfrom the root of any maximal tree MT, and MTD(v) is the depth of maximal subtree
MT(v). Based on the formulation, the ID(v,w) has a range between 0 to 1. Among them, the interactive depth
(ID) values closer to zero indicates higher interaction between any two considered variables. Figure 3c
illustrates these interactions between variables vand w, where the right maximal subtree of variable vand
wsplits further inside the subtree. If this concept is observed over all the randomized trees, then there is a
signicant interaction between variables vand wand they collectively inuence a prediction outcome.
3.4.3. Partial Dependence
The concept of partial dependence can quantify the functional relationship between dominant variables and
drought characteristics. Partial dependence is assessed by integrating the effects of all the variables beside
the covariate of interest (Breiman, 2001; Friedman & Meulman, 2003). Partial dependence of a variable x
k
can be estimated by averaging over the input variables {X
i
,i=1,,n} with xed x
k
as
e
fkxk
ðÞ¼
1
nn
i¼1b
fX
i;Ck;xk
 (8)
whereb
frepresents the outputs based on the RF models. This partial dependence estimate can be visualized to
understand the functional relationship between the variables (x
k
) and their potential inuence on hydrolo-
gical droughts. As the RF algorithm randomly resamples the variables for bagging the trees, we run each
model 1,000 times and then average the minimal and interactive depth variables to interpret and identify
the dominant variables.
4. Results and Discussions
4.1. Classication of Drought Regimes
The fuzzy k medoids clustering approach was applied to 652 catchments to classify the drought regimes
based on drought intensity (DI), drought duration (DD), and number of events. First, we identied the opti-
mal number of regimes based on fuzzy silhouette (FS) index andXB index. Figure 4a shows the behavior of
FS and XB indices with respect to the number of regimes. It was observed that the optimal number of clusters
appears to be three based on the maximum and minimum value of FS and XB, respectively. Therefore, we
consider the optimal number of clusters as three for further analysis.
The drought characteristics (i.e., DI, DD, and ND) for three selected drought regimes are shown in
Figures 4b and 4d. Since the units of DI, DD, and number of drought (ND) are different, we applied the con-
cept of Z score to standardize and compare the drought characteristics for three selected regimes. Z score
measures the standard deviation of the sample data points from their population average. The boxplots with
Z score are plotted in Figure 4b, so that the drought characteristics can be compared among the identied
regimes. The number of catchments representing each regime are shown in Figure 4c. The absolute values
of drought characteristics for each of the drought regime is plotted as probability distribution as shown in
Figure 4d. Regime 1 is represented by 142 catchments with longer droughts (median DDzs~1), lower
drought intensities (median DIzs~0.75) and occurrences with median NDzs~1. The magnitude of DD
10.1029/2018WR024620
Water Resources Research
KONAPALA AND MISHRA 8of25
for regime 1 varies between 7 and 20 months, whereas the magnitude of DI and ND varies from 0.4 to 0.8 and
5 to 20, respectively. Regime 2 is represented by 242 catchments that exhibit relatively moderate drought
characteristics with median z scores close to 0 (Figure 4b). The magnitude of DD for the catchments
located in regime 2 varies between 7 and 12 months, DI within the range of 0.5 to 0.9, and ND within the
range of 15 to 25. Higher number of catchments (total: 268) is located in regime 3, which represents low
drought duration (median DDzs~0.8) occurring frequently (median NDzs~0.75) with higher intensity
(median DIzs~1). The catchments located in regime 3 witness droughts with duration between 5 and
8 months, intensity varies between 0.7 and 1.2 and frequency between 20 and 30 (Figure 4d).
The spatial locations of catchments for three drought regimes are shown in Figure 5. The catchments
located in Pacic North West, parts of north eastern, and central USA represent regime 1, with drought
characteristics of longer duration but are less intense and occur less frequently. Whereas, the catchments
representing regime 2 with moderate drought characteristics are in different parts of CONUS, and the
catchments representing regime 3 are mostly located in north central and eastern USA including
watersheds in pacic North West region. Overall, it was observed that the spatial proximity between
the catchments does play a considerable role in the clustering of regimes, which is probably due to similar
climatological variability and catchment response characteristics (Brutsaert & Nieber, 1977; Knapp et al.,
2002; Serrano, 2006)
4.2. RF Model Performance
The effect of multicollinearity in data analysis can make it difcult to get appropriate linear coefcient
estimates with small standard errors (Achen, 1982). Our analysis is different due to the application of RF
Figure 4. Characteristics of the fuzzy clustered drought regimes. (a) The variation of fuzzy silhouette index (FSI) and
Xie and Beni (XB) index values according to the number of clusters with 3 being the optimal number of clusters. (b) The
box plots of Z scores of considered drought characteristics in each cluster. (c) The number of catchments belonging
to each cluster (d) provides the kernel density estimates of the probability distribution of the drought characteristics
specic to each regime.
10.1029/2018WR024620
Water Resources Research
KONAPALA AND MISHRA 9of25
algorithm, which is nonlinear in nature, and we do not rely any regression coefcients in our analysis.
Therefore, even though there is a linear correlation between the predictors, it does not interfere with our
analysis. In addition, this multicollinearity problem is alleviated since a random subset of features is
chosen for each tree in a RF [Hsilch et al., 2014; Ishwaran et al., 2010; Zhang & Ma, 2012; DíazUriarte &
De Andres, 2006].
As highlighted before, the primary purpose of our study is to identify the key climate, catchment, and mor-
phological variables using a machine learning interpretation framework. Although, our machine learning
application is not focused on prediction, we performed a preliminary analysis to evaluate the performance
of RF model by splitting the data in to training (75%) and testing (25%) phase of the optimized RF algorithm.
The model performed well based on the rootmeansquare error information, and these plots are presented
in the supplementary text.
We evaluated the performance of RF algorithm to model the variations in drought characteristics (DI, DD,
and ND) in each regime with respect to the selected catchment and climate variables by applying on the
entire data set. As highlighted in section 3, the optimal parameters of RFs (i.e., mtry and nodesize) which
was derived based on the least outofbag error are listed in Table 2. In addition to that, the metrics of R
2
,
percentage bias (PBIAS), and NashSutcliffe efciency (NSE) for the corresponding optimal model cong-
urations are listed in Table 2. The coefcient of determination (R
2
) in case of each RF model is more than
0.9. This measure indicates that the adopted RF algorithm can explain more than 90% of the variance found
in the drought characteristics. The PBIAS values which are expressed in percentage remain closer to 0 indi-
cating comparatively lesser bias among all the RF models. Finally, the NSE values are in the range of 0.77 to
0.85. NSE values closer to 1 correspond to a perfect match between the modeled and observed data points.
Also, NSE values greater than 0 indicate an unbiased model. Hence, the NSE values also point toward an
unbiased and efcient model. Therefore, all the models have high coefcient of determination (R
2
> 0.9),
lower PBIAS values and NSE values closer to 1.
Figure 5. Spatial distribution of the three drought regimes located in CONUS.
10.1029/2018WR024620
Water Resources Research
KONAPALA AND MISHRA 10 of 25
4.3. Application of Interpretation Framework to Understand Drought Characteristics
4.3.1. Application to Drought Duration
Figure 6 shows the ranking of climate and catchment variables that has potential inuence on the hydrolo-
gical DD for each drought regime. As discussed earlier, variables with least minimal depth likely to have
higher dominant control, whereas the increase in depth will have lower inuence on drought duration
within each regime. The dashed line (Figure 6) indicates the average minimal depth of all the variables,
which can be used as a threshold to determine the signicant variables of interest (Ishwaran et al., 2011).
Based on this threshold, the signicant inuencing variables are highlighted in green color, and the nonin-
uential variables are highlighted in orange color (Figure 6).
Overall, 20 variables have more than average minimal depth for regime 1, which represents catchments with
higher drought duration (median DDzs~1). In case of regime 2, which represents catchments with average
Figure 6. Rank plots are provided in ascending order with variables exhibiting the least minimal depth on the top across clusters (a) 1, (b) 2, and (c) 3 in case of
drought duration (DD). The dotted line represents the average minimal depth. The variables having below average minimal depths are color codes as green and
variables with above average minimal depths are color coded as red.
10.1029/2018WR024620
Water Resources Research
KONAPALA AND MISHRA 11 of 25
drought duration, 14 variables have more than average minimal depth. Finally, in case of regime 3, which
consists of catchments with lower drought duration (median DDzs~0.8), a total of 11 variables have more
than average minimal depth. The potential inuence of number of climate and catchment variables on
drought characteristics varies for three different drought regimes. For instance, maximum number of catch-
ment variables dominate in controlling the drought duration for catchments that witnesses low drought
durations, whereas soil and climate variables dominate for catchments witnessing high and medium
drought durations, respectively. It was observed that in the case of catchments with high drought durations
(regime 1), base ow index (BFI_AVE) has signicant lesser minimal depth compared to other variables sug-
gesting its dominant role in that regime. Base ow index is a key variable as it captures the interaction
between climate and catchment variables that generates streamow a given watershed.
To further understand how these dominant variables interact with each other to potentially inuence the
drought duration, the normalized interactive minimal depth was plotted between the top 5 variables
(Figure 7). As highlighted before, normalized interactive minimal depth varies from 0 to 1, where 0
indicates highly interactive and 1 being no interaction between the selected variables. In case of regime
1 and 2, the interactive minimal depth between the variables is closer to 1 indicating that there is less
interaction between the dominant variables. However, Base ow index (BFI_AVE) seems to interact with
the other variables and especially with the mean Relief ratio (RR_MEAN) and aspect with respect the
geographical north (ASPECT_NORTH) in regime 1. In case of regime 3, the maximum number of days
in a month with nonzero precipitation (WDMAX_BASIN) interacts with other variables and particularly
with length of streams per square kilometer (STREAMS_KM_SQ_KM) within the catchments. Overall,
these results suggest no signicant interaction between the dominant variables, although they have direct
inuence on the drought duration.
We further assessed the partial dependence of top 5 dominant variables on drought duration (Figure 8). In
case of regime 1 (Figure 8a), base ow (BFI_AVE) controls the drought duration based on a power law beha-
vior. The relation between baseow index and drought is often complicated. Higher base ow index can
result in low duration drought events, and as the magnitude of baseow index increases, it shares a power
law function with the drought duration. In addition to that, the power law behavior extends over the entire
range of drought duration, which suggests a greater control of base ow on higher drought durations. Mean
elevation and percentage of soils with low inltration rate (HGC) exhibit nonlinear relationships; however,
unlike the case of base ow index, they explain the variability of drought duration partially ranging from 12
to 13 months. In case of regime 2 (Figure 8b), base ow index predominantly controls the drought duration
based on a nonlinear relationship. However, it is interesting to see that the underlying functional relation-
ship does not obey power law, as in the case of regime 1. Other variables, such as, basin compactness
(BASIN_COMPACTNESS), percentage of soils with low inltration rate (HGC), aspect with respect the
geographical north (ASPECT_NORTHNESS), and temperature variability (TMEAN_SD) also exhibit a
nonlinear and inversely proportional functional dependence on drought duration. In case of regime 3, the
maximum of number of days in a month with nonzero precipitation (WDMAX_BASIN) plays a key role
compared to the base ow index. A left truncated parabolic relationship can be observed, which indicates
a nonlinear control of precipitation intensity on drought duration.
4.3.2. Application to Number of Drought Events
Figure 9 shows the ranking of catchment and climate variables that has potential inuence on ND events
within each regime. Overall, 23 variables have more than average minimal depth for regime 1, which
represents catchments with lesser drought occurrences (median NDzs~1). A total of 13 variables have
more than average minimal depth are selected for regime 2, which represents catchments with moderate
number of drought events, whereas 14 variables have more than average minimal depth for regime 3.
Although, different dominant variables are identied that controls drought duration for each regime;
however, similar variables within each regime dominate both drought duration and drought
event occurrences.
The interaction between the ve most dominant variables within each regime based on the number of
drought events is illustrated in Figure 10. Similar to the case of drought duration, lowest interactive depth
was observed in the case of base ow index (BFI_AVE) and it has some interaction with the mean relief ratio
(RR_MEAN) in case of regime 1. However, no such signicant interactions observed in case of regime 2 due
to the relatively high ID values.
10.1029/2018WR024620
Water Resources Research
KONAPALA AND MISHRA 12 of 25
Figure 11a illustrates the partial dependence between variables specic to regime 1. It can be observed that
the variables that has potential inuence on drought duration also inuences drought event occurrences.
The BFI_Ave is inversely proportional to drought occurrences following an exponential relationship. An
increase in base ow likely to increase in ground water contribution to streamow resulting in lesser num-
ber of droughts. Elevation exhibits an inverse relationship up to 2,000 m and then exhibits a directly propor-
tional relationship till 3,000 m, whereas HGC exhibits a semi parabolic relationship and it can be observed
that other variables do not explain much of the variability of drought event occurrences.
Overall, it was observed that RF modeling framework is exible to accommodate different functional rela-
tionships between the dominant variables and the number of drought events. In case of regime 2, variability
of temperature (TMEAN_SD) exhibits more dominant behavior in controlling the drought event occur-
rences, whereas BFI_AVE shares an inversely proportional relationship for the same regime. The other three
selected variables exhibit dominant and different functional relationships as shown in Figure 11b. Bulk den-
sity of the soil (BD_AVE) is a key variable that has potential inuence on the drought event occurrences in
regime 3 (Figure 11c). However, it does not explain the variability of the entire range of drought event occur-
rences as in the case of other two regimes. WDMAX_BASIN was able to explain the variability of drought
Figure 7. Interactive depth of the top ve dominant variables controlling drought duration (DI) for clusters (a) 1, (b) 2,
and (c) 3. (Note: In each gure, the xaxis represents the same variables which are provided as heading in each gure
facet. Reference variables are marked with blue crossing each panel. Higher values indicate lower interactivity with
reference variable.)
10.1029/2018WR024620
Water Resources Research
KONAPALA AND MISHRA 13 of 25
occurrences on the higher end which was previously ignored by the BD_AVE. This highlights the
complementary behavior of the climate and catchment characteristics for controlling the drought
event occurrences.
4.3.3. Application to Drought Intensity
Climate and catchment variables are ranked based on their potential inuence on DI (Figure 12). A total
number of 24 variables have more than average minimal depth for regime 1, which represents catchments
with lower drought intensity (median DIzs~1). In comparison to regime1, a lesser number of inuencing
variables were observed for regimes 2 and 3. A total number of 10 and 11 variables have more than average
minimal depth for regimes 2 and 3, which represents catchments with moderate and higher drought inten-
sity, respectively. The type of variables which dominate drought intensity are mostly similar in the case of
drought duration and number of drought events. Overall, it was observed that the majority of the variables
are related to soil, climate and catchment characteristics that has potential inuence on drought intensity in
regimes 13, respectively.
The interaction between the top ve dominant variables within each regime in case of drought intensity is
illustrated in Figure 13. In case of regime 1, none of the dominant variables have shown any signicant inter-
actions as the ID values are closer to 1. However, in case of regime 2, temperature variability (TMEAN_SD)
exhibits potential interaction with other dominant variables as ID value is around 0.7. In case of regime 3,
Figure 8. Partial dependence plots of top ve dominant variables controlling drought duration (DI) clusters (a) 1, (b) 2,
and (c) 3.
10.1029/2018WR024620
Water Resources Research
KONAPALA AND MISHRA 14 of 25
the average clay content (CLAYAVE) in the basin exhibits signicant interactive effects. Among them, the
percentage of soil with high inltration (HGA) has exhibited signicant interaction with CLAYAVE. As
in the case of other drought properties, the interacting effects are signicant but to the lesser as exhibited
by the relative higher ID values in case of drought intensity.
Figure 14a illustrates the partial dependence specic to regime 1. The percentage of soils with high inltra-
tion capacity (HGA) has a dominant role and it shares a directly proportional relationship with drought
intensity. The presence of soils with high inltration likely to create a competition between ground water
recharge and streamow. Hence, drier antecedent conditions may result in more intense droughts. On the
other hand, forest cover (FOREST) has an inversely proportional relationship with drought intensity. Base
ow index also inuences the drought intensity, but not as signicant as in other cases. Mean aspect degree
(Aspect_Degrees) shares a direct proportional relationship with drought intensity.
Figure 9. Same as Figure 7 but in case of number of drought events (NE).
10.1029/2018WR024620
Water Resources Research
KONAPALA AND MISHRA 15 of 25
In case of regime 2, the temperature variability (TMEAN_SD) exhibits the most dominant control on
drought intensity similar to the case of number of drought events. However, the functional relationship is
opposite in nature. The percentage of streams in Strahler's forth order (PCT_4th_ORDER) exhibits an
inverse exponential relationship with the drought intensity. Additional variables, such as, rainfall factor
(R_FACT), silt content (SILT_AVE), and precipitation variability (PRCP_CV) also inuences the intensity
of drought. In regime 3, basin compactness (BAS_COMPACTNESS) and base ow index (BFI_AVE) exhibit
dominant control on drought intensity. However, the remaining variables are not as dominant as in the case
of other clusters.
5. Discussion and Outlook
Hydrological drought in a catchment is controlled by the climate characteristics (recharge) and catchment
characteristics (storage). Based on our interpretation framework by using MD, ID, and partial dependence
metrics, the important climate and catchment characteristics that controls hydrological drought character-
istics (number of events, duration, and intensity) are provided in Table 3. It was observed that the catch-
ments for regime 1 are mostly located in the higher elevations or mountainous regions characterized by
the steep sloping terrain (Figure 3). The hydrological drought characteristics for regime 1 are mostly inu-
enced by the catchment characteristics, which includes base ow index, elevation, and soil characteristics
Figure 10. Same as Figure 8 but in case of number of drought events (NE).
10.1029/2018WR024620
Water Resources Research
KONAPALA AND MISHRA 16 of 25
(inltration rates). Baseow is inuenced by natural factors such as climate, geology, relief, soils, and
vegetation. Factors that promote inltration and recharge of subsurface storage will increase baseows,
while factors associated with higher evapotranspiration will reduce baseow. Therefore, in these
catchments, groundwater drainage moves slowly, which results in prolonged baseow following rainfall
events and thus being more inuential in generating hydrological droughts. Interestingly, the elevation is
a standalone catchment characteristic that plays an important role for drought characteristics in regime 1.
Most of these catchments receive snows during winter months; therefore, the hydrological drought can be
inuenced by a combination of rain and snow and depending on the difference between elevation ranges
the timing and intensity of drought can vary among the watersheds.
In addition to elevation, the soil characteristic (e.g., inltration and hydraulic conductivity) is a key variable
for hydrological droughts for regime 1. In these catchments, subsurface ow generation is directly propor-
tional to the hydraulic conductivity of soils and thus controlling the discharge rate specic to soil types
(e.g., Armbruster, 1976; Musiake et al., 1984; Smith, 1981). In addition, soil properties are known to affect
inltration, rooting depth/restrictions, available water capacity, soil porosity, and soil microorganism activ-
ity, which inuence the streamow discharge rate (Bennie et al., 2008; Moeslund et al., 2013; Strachan &
Daly, 2017). The moisture storage capacity in soil decreases due to reduced precipitation and high evapotran-
spiration that further reduces baseow leading to evolution of hydrological droughts in different segments of
the hydrological system. Hence, streamow generation in base ow dominant streams is strongly inuenced
Figure 11. Same as Figure 9 but in case of number of drought events (NE).
10.1029/2018WR024620
Water Resources Research
KONAPALA AND MISHRA 17 of 25
by the subsurface hydrogeologic, conguration, the saturated permeabilities of the component formations,
and the unsaturated soil characteristics of the soil types (Freeze, 1972). Hence, in addition to the
topography, the soil features may control the hydrologic drought properties through these physical
processes. Further, the forest cover inuences drought intensity compared to duration and number
of events.
Some of the catchment characteristics that controls hydrologic drought in regime also inuence drought
characteristics in regimes 2 and 3. However, there is a clear difference between regime 1 and regimes 2
and 3 in terms of climate control on hydrological droughts. The Climate factors such as precipitation may
have lesser direct inuence on hydrological droughts in regime 1, which can be attributed to limited time
available to store water in comparatively higher gradient watersheds as well as possible contribution of snow
for the watersheds located in snowy regions (e.g., northeast and central north watersheds). The climate
Figure 12. Same as Figure 7 but in case of drought intensity (DI).
10.1029/2018WR024620
Water Resources Research
KONAPALA AND MISHRA 18 of 25
variable which has a potential inuence on hydrological drought in regimes 2 and 3 includes temperature
and precipitation. For example, temperature can have a direct inuence on the development of
hydrological drought in snow dominated regions. The combination of elevation and temperature on
triggering hydrological droughts can vary due to snow dominated regions located in mountain regions.
The role of precipitation characteristics on propagation of hydrological drought is well recognized (Mishra
& Singh, 2010; Mukherjee et al., 2018; Wan et al., 2017). The amount of rainwater held in storage is
different for three regimes, for example, higher elevation areas can hold less rain water compared to low
lying forested areas. The rainfall pattern in semiarid regions (typically western USA) is very irregular
leading to very low storage and increase in hydrological drought.
However, for the humid catchments located in regimes 2 and 3, the soils are mostly saturated due to the ante-
cedent climate conditions that results in a more direct relationship between precipitation, potential evapo-
transpiration, and temperature with hydrologic drought characterization. In addition to precipitation and
temperature, the lower relative humidity can inuence rainfall patterns leading to the evolution of hydrolo-
gical drought in regime 3. The role relative humidity on evolution of drought is complex in nature. During
dry hydrologic conditions, the moisture depletes from the upper soil layers leading to decrease in evapotran-
spiration and atmospheric relative humidity (Mishra & Singh, 2010). Further, the reduced relative humidity
reduces the probability of the rainfall, which further triggers hydrological drought (Mishra & Singh, 2010).
Figure 13. Same as Figure 8 but in case of drought intensity (DI).
10.1029/2018WR024620
Water Resources Research
KONAPALA AND MISHRA 19 of 25
In regimes 2 and 3, the stream networks dened by Strahler number has a potential inuence on the hydro-
logical drought. An increase in Strahler order can be related to a decline in the catchment's general slope
(Haidary et al., 2015) and potentially increase the storage capacity, groundwater recharge, and baseow of
the watershed. Therefore, the rst order streams that represent the outermost tributaries are typically
located at higher slopes compared to fourth order streams. This suggests that the higher storage capacity
likely to be observed in fourth order stream will have a better control on hydrological drought compared
with rst order stream. The rst order stream which is usually located at the upper end of channel networks
(Strahler, 1952) comparatively has larger slope likely to drain out excess water immediately following a pre-
cipitation event (McMahon & Finlayson, 2003). As a result, if there is a decit in precipitation or increase in
evapotranspiration in the catchment, the rst order streams likely to facilitate a more direct propagation of
meteorological drought to hydrological drought with no buffer (Godsey & Kirchner, 2014; McMahon &
Finlayson, 2003; Pinna et al., 2004). As a result, the presence of rst and fourth stream orders might inu-
ence the drought properties in contrasting ways.
In regimes 2 and 3, the mean aspect degree found to be an important variable in controlling the hydrologic
drought properties. Mean aspect degree is often associated with variability in microclimate, including near
surface temperatures, evaporative demand, soil moisture content, and vegetation (Strachan & Daly, 2017;
Srinivasan et al., 2015; Moeslund et al., 2013). As a result, the mean aspect degree, which is a topographic
metric, controls the microclimatic and vegetation features likely to control drought characteristics. It was
Figure 14. Same as Figure 9 but in case of drought intensity (DI).
10.1029/2018WR024620
Water Resources Research
KONAPALA AND MISHRA 20 of 25
observed that in addition to the common processes that control the hydrologic drought characteristics, there
are additional distinct processes observed specic to each regime. This highlights the differential nature of
climate and catchment control on hydrological droughts. Under humid conditions, the evolution of
hydrological droughts in small size catchments can be attributed more to climate characteristics, whereas
for the larger watersheds the storage capacity and baseow associated with catchment characteristics can
play a dominant role, whereas, for the catchments under severe dry condition, the climate signal can have
less predictive power compared to the storage properties of the watershed.
The application of RF algorithm can provide a better understanding of how the climate and catchment
controls differ for a specic hydrological drought characteristic. For instance, previous studies highlighted
that the BFI, which represents the storage characteristics, plays a key role in controlling the drought dura-
tion (; Van Lanen et al., 2013; Van Loon & Laaha, 2015). Our framework does reconrm the role of BFI in
case of regimes 1 and 2; however, we further identied two important distinctions. First, the relationship
between BFI with drought characteristics can be nonlinear, and as a result, it cannot be generalized about
increase in drought duration with BFI. Second, the base ow acts as a dominant process mostly for the catch-
ments that witnesses medium and high duration drought events, whereas it has lesser inuence for the
catchments with lower drought durations. The linear regression approach may not capture such phenom-
enon as the model parameters are more biased toward high magnitude variables (Hastie et al., 2009).
Our empirical analysis suggests lack of prominent interactions between dominant variables on hydrological
drought propagation. This highlights the fact that dominant drivers of drought characteristics are more addi-
tive and independent in nature. For example, the percentage of soils with high inltration rate (HGA) and
percentage of fourth order streams (PCT_4th_ORDER) both have dominant control on drought intensity but
have minimal interaction effect. As a result, even though both of these variables control the propagation of
drought independently, the underlying processes for drought propagation have minimum interaction with
each other.
Our results also indicated that even though similar dominant controls exist across different regimes; their
functional relationship with drought characteristics might be different as highlighted in Jencso and
McGlynn (2011) and Knapp et al. (2015). For instance, we identied that the base ow index controls
drought duration for both regimes 1 and 2. However, the functional relationship of base ow with respect
to drought duration is different in regimes 1 and 2. In regime 1, the base ow index exhibits an exponential
relationship with drought duration, where as in the case of regime 2, a different form of nonlinear relation-
ship exists, which does not t into the traditional exponential functions. Therefore, even though the catch-
ment and climate characteristics exhibit a nonlinear relationship with respect to drought characteristics, the
relationships across regions with different hydrologic characteristics should not be generalized.
Our interpretative modeling framework also highlighted the inuence of dominant variables can vary over a
range of drought characteristics. In other words, individual climate (catchment) characteristic can have a
higher (lower) inuence on the variability in the upper (lower) range of drought characteristic. For instance,
the bulk density may affect the soil features which control the drought intensity in regime 1. However, it
does not explain the variability of the entire range of drought event occurrences as in the case of other
two regimes. Whereas, WDMAX_BASIN can able to explain the variability of drought occurrences on the
higher end which was previously ignored by the BD_AVE. This highlights the complementary behavior of
the climate and catchment characteristics for controlling the drought event occurrences.
The framework presented in this study introduces valuable interpretability components of RF algorithm in
the context of understanding hydrologic processes. Even though we have applied this framework for under-
standing drought characteristics, there are other frameworks for understanding the black box model inter-
pretability. Among them, individual conditional expectations (Goldstein et al., 2015; Guidotti et al., 2018),
local interpretable modelagnostic explanations (Ribeiro et al., 2016), and inuence functions (Koh &
Liang, 2017), which are recently introduced in machine learning literatures. There is a potential scope to
compare these interpretability frameworks and the quality of machine learning algorithms. We believe
our approach can serve as a preliminary avenue to further delve deeper into the application of interpretative
machine learning frameworks for understanding not only droughts but also other hydrologic processes.
Therefore, further works along these directions might improve our understanding of hydrologic processes
using interpretative machine learning algorithms.
10.1029/2018WR024620
Water Resources Research
KONAPALA AND MISHRA 21 of 25
6. Conclusions
In this study, we applied machine learning methods by integrating fuzzy clustering and RF algorithm to
develop an interpretation framework (i.e., minimal and interactive depth and partial dependence) to
quantify the role of climate and catchment controls on hydrological drought for 652 catchments located in
CONUS. RF algorithm can adequately capture the functional relationship between climate and catchment
characteristics and hydrological droughts. The proposed framework based on MD, ID, and partial
dependence metrics can identify the important climate and catchment characteristics that can further
improve our understanding of the dominant role of climate and catchment characteristics in propagation
of hydrological droughts.
Using a large number of catchments under different climatic regimes enabled us to explore the dominant
control of these land scape control on CONUS hydrological droughts. We conclude that the RFbased inter-
pretative approach is a simple, robust, and yet powerful way to gain insights into the drivers of hydrological
droughts. The applied framework can provide useful information to understand different combination of
climate and catchment characteristics that can either attenuated or intensify the hydrological droughts.
The following conclusions can be drawn from this study: (i) Three drought regimes are identied based
on their duration, frequency and intensity, which includes Regime 1: droughts with longer duration, less fre-
quent, and lesser intensity; Regime 2: droughts with moderate duration, moderate frequency, and moderate
intensity; and Regime 3: droughts with shorter duration, more frequent, and more intense; (ii) among the
identied regimes, even though some common hydrologic processes control the drought characteristics,
there are some distinct processes specic to each regime; (iii) similar climate, catchment, and morphological
characteristics may exhibit varied functional relationships (i.e., exponential, hyperbolic, and linear) with
drought characteristics located in different regimes; and (iv) the dominant variables may not explain the
variability of the entire range of drought characteristics. From the above insights, we propose that these
issues deserve more attention by integrating the knowledge obtained from the application of machine
learning algorithms in hydroclimatic process (e.g., hydrological drought) and hydrological models used
for such analysis. Although, hydrologic models can able to capture the streamow with reasonable accuracy,
but it often over (under) estimated the extreme events such as extreme drought events. This implies that a
better understanding of the role of climate and catchment characteristics for the evolution and propagation
of hydrological drought events is essential. The results obtained from our proposed machine learning
framework can complement the ongoing research related to hydrological droughts by better exploitation
of the value of nonclimatic attributes (such as soil, land cover, and geology), and a more systematic
characterization of the uncertainties in catchment attributes needs to performed.
References
Achen, C. H. (1982). Interpreting and using regression (Vol. 29). Sage.
Addor, N., Nearing, G., Prieto, C., Newman, A. J., Le Vine, N., & Clark, M. P. (2018). A ranking of hydrological signatures based on their
predictability in space. Water Resources Research,54(11), 87928812.
Apurv, T., Sivapalan, M., & Cai, X. (2017). Understanding the role of climate characteristics in drought propagation. Water Resources
Research,53, 93049329. https://doi.org/10.1002/2017WR021445
Armbruster, J. T. (1976). An inltration index useful in estimating lowow characteristics of drainage basins. Journal of Research of the
U. S. Geological Survey,4(5), 533538.
Bastani, O., Kim, C., & Bastani, H. (2017). Interpreting blackbox models via model extraction. arXiv preprint arXiv:1705.08504.
Bennie, J., Huntley, B., Wiltshire, A., Hill, M. O., & Baxter, R. (2008). Slope, aspect and climate: Spatially explicit and implicit models of
topographic microclimate in chalk grassland. Ecological modelling,216(1), 4759.
Bibal, A., & Frénay, B. (2016). Interpretability of machine learning models and representations: An introduction. In Proceedings on
ESANN (pp. 7782).
Boulesteix, A. L., Janitza, S., Kruppa, J., & König, I. R. (2012). Overview of random forest methodology and practical guidance with
emphasis on computational biology and bioinformatics. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery,2(6),
493507.
Breiman, L. (2001). Random rorests. Machine Learning,45,532. https://doi.org/10.1023/A:1010933404324
Brutsaert, W., & Nieber, J. L. (1977). Regionalized drought ow hydrographs from a mature glaciated plateau. Water Resources Research,
13(3), 637643.
Campello, R. J., & Hruschka, E. R. (2006). A fuzzy extension of the silhouette width criterion for cluster analysis. Fuzzy Sets and Systems,
157(21), 28582875.
Carrillo, G., Troch, P. A., Sivapalan, M., Wagener, T., Harman, C., & Sawicz, K. (2011). Catchment classication: Hydrological
analysis of catchment behavior through processbased modeling along a climate gradient. Hydrology and Earth System Sciences,15(11),
34113430.
Caruana, R., & NiculescuMizil, A. (2006, June). An empirical comparison of supervised learning algorithms. In Proceedings of the 23rd
international conference on Machine learning (pp. 161168). ACM.
10.1029/2018WR024620
Water Resources Research
KONAPALA AND MISHRA 22 of 25
Acknowledgments
We very much appreciate Associate
Editor and three reviewer's valuable
comments that helped us improve our
manuscript. This study was supported
by the NSF award 1653841. Any
opinion, ndings, and conclusions or
recommendations expressed in this
material are those of the authors and do
not necessarily reect the views of the
NSF. The authors used GAGES II data
set in this study, and these data sets are
publicly available at https://water.usgs.
gov/GIS/metadata/usgswrd/XML/
gagesII_Sept2011.xml website.
Cayan, D. R., Das, T., Pierce, D. W., Barnett, T. P., Tyree, M., & Gershunov, A. (2010). Future dryness in the southwest US and the
hydrology of the early 21st century drought. Proceedings of the National Academy of Sciences,107(50), 21,27121,276.
Cutler, D. R., Edwards, T. C. Jr., Beard, K. H., Cutler, A., Hess, K. T., Gibson, J., & Lawler, J. J. (2007). Random forests for classication in
ecology. Ecology,88(11), 27832792.
Daly, C., Taylor, G. H., Gibson, W. P., Parzybok, T. W., Johnson, G. L., & Pasteris , P. A. (2000). Highquality spatial climate data sets for the
United States and beyond. Transactions of the ASAE,43(6), 1957.
DíazUriarte, R., & De Andres, S. A. (2006). Gene selection and classication of microarray data using random forest. BMC Bioinformatics,
7(1), 3.
Diller, G. P., AlonsoGonzalez, R., Kempny, A., Dimopoulos, K., Inuzuka, R., Giannakoulas, G., & Swan, L. (2012). Btype natriuretic
peptide concentrations in contemporary Eisenmenger syndrome patients: Predictive value and response to disease targeting therapy.
Heart, heartjnl2011.
DoshiVelez, F., & Kim, B. (2017). Towards a rigorous science of interpretable machine learning. arXiv preprint arXiv:1702.08608.
Efron, B., & Tibshirani, R. J. (1994). An introduction to the bootstrap. Washington, DC: CRC press.
Elshorbagy, A., Corzo, G., Srinivasulu, S., & Solomatine, D. P. (2010a). Experimental investigation of the predictive capabilities of data
driven modeling techniques in hydrologyPart 1: Concepts and methodology. Hydrology and Earth System Sciences,14(10), 19311941.
Elshorbagy, A., Corzo, G., Srinivasulu, S., & Solomatine, D. P. (2010b). Experimental investigation of the predictive capabilities of data
driven modeling techniques in hydrologyPart 2: Application. Hydrology and Earth System Sciences,14(10), 19431961.
Fahimi, F., Yaseen, Z. M., & Elshae, A. (2017). Application of soft computing based hybrid models in hydrological variables modeling: A
comprehensive review. Theoretical and applied climatology,128(34), 875903.
Falcone, J. A. (2011). GAGESII: Geospatial attributes of gages for evaluating streamow. US Geological Survey.
Fienen, M. N., Nolan, B. T., Kauffman, L. J., & Feinstein, D. T. (2018). Metamodeling for groundwater age forecasting in the Lake Michigan
Basin. Water Resources Research,54, 47504766. https://doi.org/10.1029/2017WR022387
Freeze, R. A. (1972). Role of subsurface ow in generating surface runoff: 1. Base ow contributions to channel ow. Water Resources
Research,8(3), 609623.
Friedman, J. H., & Meulman, J. J. (2003). Multiple additive regression trees with application in epidemiology. Statistics in medicine,22(9),
13651381. https://doi.org/10.1002/sim.1501
Gocic, M., & Trajkovic, S. (2014). Spatiotemporal characteristics of drought in Serbia. Journal of Hydrology,510, 110123.
Godsey, S. E., & Kirchner, J. W. (2014). Dynamic, discontinuous stream networks: Hydrologically driven variations in active drainage
density, owing channels and stream order. Hydrological Processes,28(23), 57915803.
Goldstein, A., Kapelner, A., Bleich, J., & Pitkin, E. (2015). Peeking inside the black box: Visualizing statistical learning with plots of indi-
vidual conditional expectation. Journal of Computational and Graphical Statistics,24(1), 4465. https://doi.org/10.1080/
10618600.2014.907095
Guidotti, R., Monreale, A., Ruggieri, S., Turini, F., Giannotti, F., & Pedreschi, D. (2018). A survey of methods for explaining black box
models. ACM Computing Surveys (CSUR),51(5), 93.
Gupta, H. V., & Nearing, G. S. (2014). DebatesThe future of hydrological sciences: A (common) path forward? Using models and data to
learn: A systems theoretic perspective on the future of hydrological science. Water Resources Research,50, 53515359. https://doi.org/
10.1002/2013WR015096
Haidary, A., Amiri, B. J., Adamowski, J., Fohrer, N., & Nakane, K. (2015). Modelling the relationship between catchment attributes and
wetland water quality in Japan. Ecohydrology,8, 726737. https://doi.org/10.1002/eco.1539
Haslinger, K., Kofer, D., Schöner, W., & Laaha, G. (2014). Exploring the link between meteorological drought and streamow: Effects of
climatecatchment interaction. Water Resources Research,50, 24682487. https://doi.org/10.1002/2013WR015051
Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning: Data mining, inference, and prediction. Springer
Science & Business Media.
Hengl, T., Nussbaum, M., Wright, M. N., Heuvelink, G. B., & Gräler, B. (2018). Random forest as a generic framework for predictive
modeling of spatial and spatiotemporal variables. PeerJ,6, e5518.
Huang, G., Huang, G. B., Song, S., & You, K. (2015). Trends in extreme learning machines: A review. Neural Networks,61,3248.
https://doi.org/10.1016/j.neunet.2014.10.001
Ishwaran, H., Kogalur, U. B., Chen, X., & Minn, A. J. (2011). Random survival forests for highdimensional data. Statistical Analysis and
Data Mining: The ASA Data Science Journal,4(1), 115132.
Ishwaran, H., Kogalur, U. B., Gorodeski, E. Z., Minn, A. J., & Lauer, M. S. (2010). Highdimensional variable selection for survival data.
Journal of the American Statistical Association,105, 205217.
Jencso, K. G., & McGlynn, B. L. (2011). Hierarchical controls on runoff generation: Topographically driven hydrologic connectivity,
geology, and vegetation. Water Resources Research,47, W11527. https://doi.org/10.1029/2011WR010666
Karpatne, A., Atluri, G., Faghmous, J. H., Steinbach, M., Banerjee, A., Ganguly, A., & Kumar, V. (2017). Theoryguided data science: A new
paradigm for scientic discovery from data. IEEE Transactions on Knowledge and Data Engineering,29(10), 23182331.
Knapp, A. K., Carroll, C. J., Denton, E. M., La Pierre, K. J., Collins, S. L., & Smith, M. D. (2015). Differential sensitivity to regionalscale
drought in six central US grasslands. Oecologia,177(4), 949957. https://doi.org/10.1007/s0044201532336
Knapp, A. K., Fay, P. A., Blair, J. M., Collins, S. L., Smith, M. D., Carlisle, J. D., et al. (2002). Rainfall variability, carbon cycling, and plant
species diversity in a mesic grassland. Science,298(5601), 22022205.
Koch, J., Stisen, S., Refsgaard, J. C., Ernstsen, V., Jakobsen, P. R., & Højberg, A. L. (2019). Modeling depth of the redox interface at high
resolution at national scale using random forest and residual Gaussian simulation. Water Resources Research,55, 14511469.
https://doi.org/10.1029/2018WR023939
Koh, P. W., & Liang, P. (2017, July). Understanding blackbox predictions via in uence functions. In International Conference on Machine
Learning (pp. 18851894).
Konapala, G., & Mishra, A. (2017). Review of complex networks application in hydroclimatic extremes with an implementation to char-
acterize spatiotemporal drought propagation in continental USA. Journal of Hydrology,555, 600620.
Konapala, G., & Mishra, A. K. (2016). Threeparameterbased streamow elasticity model: Application to MOPEX basins in the USA at
annual and seasonal scales. Hydrology and Earth System Sciences,20(6), 25452556.
Kotsiantis, S. B., Zaharakis, I., & Pintelas, P. (2007). Supervised machine learning: A review of classication techniques. Emerging Arti cial
Intelligence Applications in Computer Engineering,160,324.
Krishnapuram, R., Joshi, A., Nasraoui, O., & Yi, L. (2001). Lowcomplexity fuzzy relational clustering algorithms for web mining. IEEE
transactions on Fuzzy Systems,9(4), 595607.
10.1029/2018WR024620
Water Resources Research
KONAPALA AND MISHRA 23 of 25
Latt, Z. Z., Wittenberg, H., & Urban, B. (2015). Clustering hydrological homogeneous regions and neural network based index ood esti-
mation for ungauged catchments: an example of the Chindwin River in Myanmar. Water Resources Management,29(3), 913928.
Ley, R., Casper, M. C., Hellebrand, H., & Merz, R. (2011). Catchment classication by runoff behaviour with selforganizing maps (SOM).
Hydrology and Earth System Sciences,15(9), 29472962.
McMahon, T. A., & Finlayson, B. L. (2003). Droughts and antidroughts: The low ow hydrology of Australian rivers. Freshwater Biology,
48(7), 11471160.
Mishra, A. K., & Singh, V. P. (2010). A review of drought concepts. Journal of hydrology,391(12), 202216.
Mishra, A. K., & Singh, V. P. (2011). Drought modelingA review. Journal of Hydrology,403(12), 157175.
Moeslund, J. E., Arge, L., Bøcher, P. K., Dalgaard, T., & Svenning, J. C. (2013). Topography as a driver of local terrestrial vascular plant
diversity patterns. Nordic Journal of Botany,31(2), 129144.
Mukherjee, S., Mishra, A., & Trenberth, K. E. (2018). Climate change and drought: A perspective on drought indices. Current Clima te
Change Reports,4(2), 145163. https://doi.org/10.1007/s406410180098x
Musiake, K., Takahasi, Y., Ando, Y., 1984. Statistical analysis on effects of basin geology on river ow regime in mountainous areas of
Japan. Proc. Fourth Cong. Asian & Pacic Reg. Div. Int. Assoc. Hydraul. Res., Bangkok, APDIAHR/Asian Institute Technology, vol. 2,
pp. 11411150.
Narasimhan, B., & Srinivasan, R. (2005). Development and evaluation of Soil Moisture Decit Index (SMDI) and Evapotranspiration
Decit Index (ETDI) for agricultural drought monitoring. Agricultural and Forest Meteorology,133(14), 6988.
Nourani, V., Baghanam, A. H., Adamowski, J., & Kisi, O. (2014). Applications of hybrid waveletArticial Intelligence models in
hydrology: A review. Journal of Hydrology,514,358377.
Olden, J. D., Kennard, M. J., & Pusey, B. J. (2012). A framework for hydrologic classication with a review of methodologies and appli-
cations in ecohydrology. Ecohydrology,5(4), 503518.
Pinna, M., Fonnesu, A., Sangiorgio, F., & Basset, A. (2004). Inuence of summer drought on spatial patterns of resource availability and
detritus processing in Mediterranean stream subbasins (Sardinia, Italy). International Review of Hydrobiology: A Journal Covering all
Aspects of Limnology and Marine Biology,89(56), 484499.
Prasad, A. M., Iverson, L. R., & Liaw, A. (2006). Newer classication and regression tree techniques: bagging and random forests for
ecological prediction. Ecosystems,9(2), 181199.
Probst, P., & Boulesteix, A. L. (2017). To tune or not to tune the number of trees in random forest? arXiv preprint arXiv:1705.05654.
Raghavendra, S., & Deka, P. C. (2014). Support vector machine applications in the eld of hydrology: A review. Applied soft computing,19,
372386.
Rajsekhar, D., Singh, V. P., & Mishra, A. K. (2014). Hydrologic drought atlas for Texas. Journal of Hydrologic Engineering,20(7), 05014023.
Ribeiro, M. T., Singh, S., & Guestrin, C. (2016, August). Why should I trust you?: Explain ing the predictions of any classier. In Proceedings
of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining (pp. 11351144). ACM.
Rice, J. S., Emanuel, R. E.,Vose, J. M., & Nelson, S. A. (2015). Continental US streamow trends from 1940 to 2009 and their relationships
with watershed spatial characteristics. Water Resources Research,51, 62626275.
Saft, M., Peel, M. C., Western, A. W., & Zhang, L. (2016). Predicting shifts in rainfallrunoff partitioning during multiyear drought: Roles of
dry period and catchment characteristics. Water Resources Research,52, 92909305. https://doi.org/10.1002/2016WR019525
Saft, M., Western, A. W., Zhang, L., Peel, M. C., & Potter, N. J. (2015). The inuence of multiyear drought on the annual rainfallrunoff
relationship: An Australian perspective. Water Resources Research,51, 24442463. https://doi.org/10.1002/2014WR015348
Sawicz, K., Wagener, T., Sivapalan, M., Troch, P. A., & Carrillo, G. (2011). Catchment classication: Empirical analysis of hydrologic
similarity based on catchment function in the eastern USA. Hydrology and Earth System Sciences,15(9), 28952911.
Schwalm, C. R., Anderegg, W. R., Michalak, A. M., Fisher, J. B., Biondi, F., Koch, G., & Huntzinger, D. N. (2017). Global patterns of drought
recovery. Nature,548(7666), 202205. https://doi.org/10.1038/nature23021
Scornet, E. (2017). Tuning parameters in random forests. ESAIM: Proceedings and Surveys 60: 144162.
Shefeld, J., Wood, E. F., & Roderick, M. L. (2012). Little change in global drought over the past 60 years. Nature,491(7424), 435438.
https://doi.org/10.1038/nature11575
Shen, C. (2018). A transdisciplinary review of deep learning research and its relevance for water resources scientists. Water Resources
Research,54(11), 85588593.
Shortridge, J. E., Guikema, S. D., & Zaitchik, B. F. (2016). Machine learning methods for empirical streamow simulation: A comparison of
model accuracy, interpretability, and uncertainty in seasonal watersheds. Hydrology and Earth System Sciences,20(7), 26112628.
Shukla, S., & Wood, A. W. (2008). Use of a standardized runoff index for characterizing hydrologic drought. Geophysical Research Letters,
35, L02405. https://doi.org/10.1029/2007GL032487
Smith, R. W. (1981). Rock type and minimum 7day/10year ow in Virginia streams. Virginia Water Resource Research Center, Virginia
Polytechnology Institute and State University, Blacksburg, Bulletin, vol. 116, 43 pp.
Stahl, K., Moore, R. D., Shea, J. M., Hutchinson, D., & Cannon, A. J. (2008). Coupled modelling of glacier and streamow response to future
climate scenarios. Water Resources Research,44, W02422. https://doi.org/10.1029/2007WR005956
Stoelzle, M., Stahl, K., Morhard, A., & Weiler, M. (2014). Streamow sensitivity to drought scenarios in catchments with different geology.
Geophysical Research Letters,41, 61746183. https://doi.org/10.1002/2014GL061344
Strachan, S., & Daly, C. (2017). Testing the daily PRISM air temperature model on semiarid mountain slopes. Journal of Geophysical
Research: Atmospheres,122, 56975715. https://doi.org/10.1002/2016JD025920
Strahler, A. N. (1952). Hypsometric (areaaltitude) analysis of erosional topography. Geological Society of America Bulletin,63(11),
11171142.
Strobl, C., Malley, J., & Tutz, G. (2009). An introduction to recursive partitioning: Rationale, application, and characteristics of classica-
tion and regression trees, bagging, and random forests. Psychological methods,14(4), 323348. https://doi.org/10.1037/a0016973
Tallaksen, L. M., Hisdal, H., & Van Lanen, H. A. (2009). Spacetime modelling of catchment scale drought characteristics. Journal of
Hydrology,375(34), 363372.
Van Lanen, H. A. J., Wanders, N., Tallaksen, L. M., & Van Loon, A. F. (2013). Hydrological drought across the world: Impact of climate and
physical catchment structure. Hydrology and Earth System Sciences,17, 17151732.
Van Loon, A. F., & Laaha, G. (2015). Hydrological drought severity explained by climate and catchment characteristics. Journal of
Hydrology,526,314.
Van Loon, A. F., Tijdeman, E., Wanders, N., Van Lanen, H. J., Teuling, A. J., & Uijlenhoet, R. (2014). How climate seasonality
modies drought duration and decit. Journal of Geophysical Research: Atmospheres,119, 46404656. https://doi.org/10.1002/
2013JD020383
10.1029/2018WR024620
Water Resources Research
KONAPALA AND MISHRA 24 of 25
Van Loon, A. F., & Van Lanen, H. A. J. (2012). A processbased typology of hydrological drought. Hydrology and Earth System Sciences,
16(7), 19151946.
Veettil, A. V., Konapala, G., Mishra, A. K., & Li, H. Y. (2018). Sensitivity of drought resiliencevulnerabilityexposure to hydrologic ratios in
contiguous United States. Journal of Hydrology,564, 294306.
VicenteSerrano, S. M. (2006). Spatial and temporal analysis of droughts in the Iberian Peninsula (19102000). Hydrological Sciences
Journal,51(1), 8397.
VicenteSerrano, S. M., LópezMoreno, J. I., Beguería, S., LorenzoLacruz, J., AzorinMolina, C., & MoránTejeda, E. (2011). Accurate
computation of a streamow drought index. Journal of Hydrologic Engineering,17(2), 318332.
Wan, W., Zhao, J., Li, H. Y., Mishra, A., Ruby Leung, L., Hejazi, M., et al. (2017). Hydrological drought in the Anthropocene: Impacts of
local water extraction and reservoir regulation in the US. Journal of Geophysical Research: Atmospheres,122, 11,31311,328. https://doi.
org/10.1002/2017JD026899
Wang, D., Hejazi, M., Cai, X., & Valocchi, A. J. (2011). Climate change impact on meteorological, agricultural, and hydrological drought in
central Illinois. Water Resources Research,47, W09527. https://doi.org/10.1029/2010WR009845
Xie, X. L., & Beni, G. (1991). A validity measure for fuzzy clustering. IEEE Transactions on Pattern Analysis & Machine Intelligence,8,
841847.
Yadav, M., Wagener, T., & Gupta, H. (2007). Regionalization of constraints on expected watershed response behavior for improved pre-
dictions in ungauged basins. Advances in Water Resources,30(8), 17561774.
Yevjevich, V. (1967). An objective approach to denitions and investigations of continental hydrologic droughts. Hydrol. Papers 23,
Colorado State University Publication, Colorado State University, Fort Collins, Colorado, USA.
Yoo, J., Kwon, H. H., Kim, T. W., & Ahn, J. H. (2012). Drought frequency analysis using cluster analysis and bivariate probability
distribution. Journal of Hydrology,420, 102111.
Zhang, C., & Ma, Y. (2012). Ensemble machine learning: Methods and applications. Springer Science & Business Media.
Zhang, Q., Xiao, M., Singh, V. P., & Li, J. (2012). Regionalization and spatial changing properties of droughts across the Pearl River basin,
China. Journal of Hydrology,472, 355366.
10.1029/2018WR024620
Water Resources Research
KONAPALA AND MISHRA 25 of 25
... While many studies have explored the impact of climate drivers on FDs at a global scale providing valuable insights into global trends, only a limited number (Konapala & Mishra, 2020;Van Loon & Laaha, 2015) have delved into the specifics of FDs at watersheds scales. Watersheds are essential hydrologic units for understanding water budgets, enabling drought risk assessment, and managing water resources. ...
... These characteristics significantly impact various aspects of the hydrological cycle, influencing root zone soil moisture, subsurface recharge, and flow patterns. The variability in watershed attributes contributes to the intricate interactions affecting FD propagation, duration, and their implications for the overall water balance (Konapala & Mishra, 2020;Van Loon & Laaha, 2015). It's important to note that while this study doesn't explicitly consider specific watershed characteristics, the analysis of FD at the watershed level inherently encompasses their influence as a significant driving factor in shaping distinct FD attributes. ...
Article
Full-text available
Plain Language Summary Flash droughts (FDs), which are sudden and severe dry periods, are causing problems for our water and food systems and making it harder to prepare for disasters. To address these challenges effectively, it is crucial to gain a thorough understanding of the underlying mechanisms and factors driving FDs at the watershed level. In this study, we looked at climatic patterns alongside the lengths of dry and wet periods spanning from 1980 to 2019. Our primary focus was on three key aspects: the extent of FDs, when they begin, and how long they persist. Our research findings demonstrate considerable variations in FDs occurrences across different regions. Notably, in the Southern Hemisphere, FDs are expanding rapidly, developing more swiftly, and enduring for extended periods, closely mirroring shifts in precipitation and temperature patterns. Interestingly, the onset and duration of FDs seem to depend more on the intensity of climatic factors than on how long it's been dry or wet. The expansion of FDs in a region is linked to both the climatic and dry/wet periods, emphasizing the geophysical connectivity within a watershed.
... Water quality can be influenced by various factors at different spatial and temporal scales (Mosley, 2015;Lintern et al., 2018;Shoda et al., 2019). For example, recent findings show that an increase in extreme precipitation (Prein et al., 2017;Vu and Mishra, 2020), droughts and flooding with high intensity and duration (Change, 2014;Konapala and Mishra, 2020;Abegaz et al., 2023) impact surface water quality as a result of climate change that brings drought and more flood on a global scale (Konapala and Mishra, 2020). ...
... Water quality can be influenced by various factors at different spatial and temporal scales (Mosley, 2015;Lintern et al., 2018;Shoda et al., 2019). For example, recent findings show that an increase in extreme precipitation (Prein et al., 2017;Vu and Mishra, 2020), droughts and flooding with high intensity and duration (Change, 2014;Konapala and Mishra, 2020;Abegaz et al., 2023) impact surface water quality as a result of climate change that brings drought and more flood on a global scale (Konapala and Mishra, 2020). ...
Article
Full-text available
Lakes, the most widespread inland water bodies in the globe, are highly susceptible to change in trophic state due to external factors. Changing hydro-climatic conditions and land cover changes (LCC) can cause lake water quality deterioration. This study establishes the quantitative relationship between variability in the water quality index and changes in hydro-climatic and LCC variables. Water quality is represented by the Forel-ule index (FUI) whereas the hydro-climatic variables considered in this study are lake bottom layer temperature (lblt), lake total layer temperature (ltlt), precipitation, runoff, evaporation, lake skin temperature (lskt), surface wind speed and air temperature. The LCC is quantified by lower and higher level leaf area index (Lv-lai and Hv-lai). FUI has a positive relationship with surface wind speed, precipitation, runoff, ltlt, lblt, and LCC and a negative relationship with evaporation, lskt, and air temperature with 95% confidence level over most parts of the Lake. The temporal correlation is also apparent from the long-term trend pattern. A significant decreasing trend is observed in FUI and lake bottom layer temperature (lblt). In contrast, an insignificant increasing trend is observed in air tem�perature and lake skin temperature (lskt). The changes in LCC, runoff, precipitation, and surface wind speed is insignificant between 2000 and 2020. Moreover, the phase composites of FUI and hydro-climatic and LCC variables derived from multichannel singular spectrum analysis (MSSA) show strong seasonal modulation of water quality by hydro-climatic and LCC variables. The annual cycle represented by the first two eigenmodes (except wind speed which is represented by the second and third eigenmodes) accounts for between 27.41% (wind speed) to 52.32% (precipitation) of the total joint spatiotemporal variability of FUI and the driving var�iables. The convergent cross-mapping (CCM) analysis shows that cross-map skill (ρ2 ) is increased with increasing library length (L) and time delay (τ), which suggests significant causal effects of hydro-climatic and LCC variables on FUI and the lagged causation is consistent with maximum values of ρ2 . The significant feedback of FUI to changes in hydro-climatic and LCC variables shows the possibility of hindcast/forecast of the historical/future status of water quality from hydro-climatic and LCC variables. As a result, a multivariate nonlinear regression model (MNWQFM) is developed to forecast the lake water quality index from the hydro-climatic and LCC var�iables. The model has high performance with R2 of 83.6% and root means square error (RMSE) of 0.15 in FUI.
... There are two types of connections that drought parameters can show: linear and nonlinear [24,25]. Drought frequency and intensity have been effectively proved by the probability density functions (PDFs) of drought indices [26,27,28]. ...
Article
Drought is a natural phenomenon that damages agricultural land severely. The severity of drought must be reduced to decrease its impact on agricultural productivity. The study of drought was carried out for the state Odisha which experienced drought 8 times during the last 20 years due to failure of monsoon. Analysis for the data was explored by explorative analysis.The drought forecasting was carried out using machine learning techniques like the Auto-regressive model (AR), Long Short-Term Memory (LSTM), and Auto-regressive Integrated Moving Average (ARIMA) using daily rainfall data collected for 28 years (1993-2020). Further using this data each district was categorised into four different categories namely Flood (FL), No Drought (ND), Moderate Drought (MD), and Severe Drought (SD). To classify the districts after forecasting, classification models were used like Support Vector Classifier (SVC) and Naïve Bayes. The results of the forecasting model as well as the classification model were compared. It becomes important to forecast drought for proper planning and management of the water resource system to decrease the damage due to such calamities. This study is valuable for the government, farmers, and other stakeholders to understand the pattern and reason behind the severity of drought to take relevant precautionary measures and improve decisions and facilities to tackle such natural calamities.
... Various methods have been employed to assess water quality, including the water pollution index [16,29], Mann-Kendall test, Spearman's rho test, and cross-wavelet transform [16,30,31]. Other techniques, such as linear regression and Mann-Whitney U statistic [21], lake sediments [32,33], diatom analysis [34], machine-learning models [35], and hydrodynamic modeling [36], have also been used to study the effects of drought on water systems. ...
Article
Full-text available
Drought stress has a significant impact on the quality and quantity of lake water. Understanding this impact is crucial for preventing water security risks and pollution recovery. However, there is a lack of systemic understanding of how drought affects water quality and quantity, and how they change in multiple dimensions. This manuscript established a synthesized methodology with the principles to judge the applicability and three steps of application to detect the change in water quality and water level under severe drought in Xingyun Lake, China. Results show that (1) The water level and water quality of Xingyun Lake have a synchronous and evident response to drought during 2009–2014. The rainfall during 2008–2015 declined by 22.9 % to normal, and the inundated area and lake water depth in 2012 decreased by 10.50 % from 2002 to 1.38 m to the average depth, respectively. The pollution index climbed above 1.21 after 2008, fluctuating around 1.42. (2) Under drought, the water quality indicators significantly changed in the terms of the overall feature, trend, eigenvalue, and morphological characteristics. The water quality indicators of Set2008-2015 are significantly different from set2000-2007 and not in the groups of set1994-2000. The morphological characteristics of water quality indicators in set2008-2015 differs significantly from that in set2000-2007 shown by the minimum, maximum, median, quartiles, and extreme values. (3) Although NH3–N showed no significant change, the water quality deteriorated in the physical, chemical, and biological aspects. The TP, IMN, and BOD5 changed more evidently than DO and NH3–N. (4) Water quality grade and indicator concentration deteriorated significantly and sharply under severe drought and are threatened deeply by TP and TN. The synthesized methodology is scientifically constructed and canbe employed in the characteristics cognition of water quality and water level to severe drought in and out of this research. And the intervention time and various regulating measures for pollution degradation and water quality recovery canbe constructed based on the multi-dimensional analysis of water quality change under drought evolution.
... Soil characteristics are another vital aspect, influencing infiltration, water capacity, and rooting depth, which in turn, affect subsurface water flow, discharge rates, and soil moisture in mountain regions (Bennie et al., 2008;Moeslund et al., 2013;Strachan & Daly, 2017). Baseflow, shaped by region-specific natural factors, also exerts a direct and indirect influence on drought dynamics in these areas (Konapala & Mishra, 2020). The slow movement of groundwater in mountainous areas results in extended baseflow periods, subsequently impacting soil moisture. ...
Article
Full-text available
Droughts are among the most devastating natural hazards, occurring in all regions with different climate conditions. The impacts of droughts result in significant damages annually around the world. While drought is generally described as a slow‐developing hazardous event, a rapidly developing type of drought, the so‐called flash drought has been revealed by recent studies. The rapid onset and strong intensity of flash droughts require accurate real‐time monitoring. Addressing this issue, a Generative Adversarial Network (GAN) is developed in this study to monitor flash droughts over the Contiguous United States (CONUS). GAN contains two models: (a) discriminator and (b) generator. The developed architecture in this study employs a Markovian discriminator, which emphasizes the spatial dependencies, with a modified U‐Net generator, tuned for optimal performance. To determine the best loss function for the generator, four different networks are developed with different loss functions, including Mean Absolute Error (MAE), adversarial loss, a combination of adversarial loss with Mean Square Error (MSE), and a combination of adversarial loss with MAE. Utilizing daily datasets collected from NLDAS‐2 and Standardized Soil Moisture Index (SSI) maps, the network is trained for real‐time daily SSI monitoring. Comparative assessments reveal the proposed GAN's superior ability to replicate SSI values over U‐Net and Naïve models. Evaluation metrics further underscore that the developed GAN successfully identifies both fine‐ and coarse‐scale spatial drought patterns and abrupt changes in the SSI temporal patterns that is important for flash drought identification.
Article
Drought mechanisms vary markedly within different ecogeographical regions. Existing drought indices do not reflect the impact of climate anomalies on drought. In this study, the climate anomaly index was incorporated into the drought model based on the study of the effects of the El Niño-Southern Oscillation (ENSO) and Madden-Julian Oscillation (MJO) on drought in different eco-geographic zones of China. The model uses climate anomaly indices, meteorological drought indices, vegetation growth condition data, surface temperature, and biophysical attributes as characteristic variables and the Palmer Drought Severity Index (PDSI) as the dependent variable for model construction based on the Random Forest (RF) method. The results showed that the model has high accuracy for drought monitoring. The correlation coefficients between model results and observed drought condition values for all four seasons were above 0.95. The model was applied to drought monitoring in North China and the Huang-Huai-Hai region from 2006 to 2018. The statistical interpolation results of the meteorological drought indices and precipitation data were used to verify the application effect of the model. It was found that the model can accurately monitor drought caused by precipitation scarcity and reflect local variations in drought. This study provides a new model (Climate Anomaly Considering Integrated Surface Drought Index, CAC-ISDI) for drought monitoring that aims at quantitative and detailed monitoring of drought conditions and regional differences. It provides a robust method for accurate drought monitoring and evaluation in China and the rest of the world.
Article
Full-text available
A comprehensive understanding of water quality is essential for assessing the complex relationship between surface water and sources of pollution. Primarily, surface water pollution is linked to human and animal waste discharges. This study aimed to investigate the physico-chemical characteristics of drinking water under both dry and wet conditions, assess the extent of bacterial contamination in samples collected from various locations in District Shangla, and evaluate potential health risks associated with consuming contaminated water within local communities. For this purpose, 120 groundwater and surface water samples were randomly collected from various sources such as storage tanks, user sites, streams, ponds and rivers in the study area. The results revealed that in Bisham, lakes had the highest fecal coliform levels among seven tested sources, followed by protected wells, reservoirs, downstream sources, springs, rivers, and ditches; while in Alpuri, nearly 80% of samples from five sources contained fecal coliform bacteria. Similarly, it was observed that the turbidity level, total dissolved solids, electrical conductivity, biological oxygen demand, and dissolved oxygen in the surface drinking water sources of Bisham were significantly higher than those in the surface drinking water sources of Alpuri. Furthermore, the results showed that in the Alpuri region, 14% of the population suffers from dysentery, 27% from diarrhea, 22% from cholera, 13% from hepatitis A, and 16% and 8% from typhoid and kidney problems, respectively, while in the Bisham area, 24% of residents are affected by diarrhea, 17% by cholera and typhoid, 15% by hepatitis A, 14% by dysentery, and 13% by kidney problems. These findings underscore the urgent need for improved water quality management practices and public health interventions to mitigate the risks associated with contaminated drinking water. It is recommended to implement regular water quality monitoring programs, enhance sanitation infrastructure, and raise awareness among local communities about the importance of safe drinking water practices to safeguard public health.
Preprint
Full-text available
Machine learning (ML) is increasingly considered the solution to environmental problems where only limited or no physico-chemical process understanding is available. But when there is a need to provide support for high-stake decisions, where the ability to explain possible solutions is key to their acceptability and legitimacy, ML can come short. Here, we develop a method, rooted in formal sensitivity analysis (SA), that can detect the primary controls on the outputs of ML models. Unlike many common methods for explainable artificial intelligence (XAI), this method can account for complex multi-variate distributional properties of the input-output data, commonly observed with environmental systems. We apply this approach to a suite of ML models that are developed to predict various water quality variables in a pilot-scale experimental pit lake. A critical finding is that subtle alterations in the design of an ML model (such as variations in random seed for initialization, functional class, hyperparameters, or data splitting) can lead to entirely different representational interpretations of the dependence of the outputs on explanatory inputs. Further, models based on different ML families (decision trees, connectionists, or kernels) seem to focus on different aspects of the information provided by data, although displaying similar levels of predictive power. Overall, this underscores the importance of employing ensembles of ML models when explanatory power is sought. Not doing so may compromise the ability of the analysis to deliver robust and reliable predictions, especially when generalizing to conditions beyond the training data.
Article
Full-text available
The process of burial and exhumation of bedload particles within a certain depth of the riverbed leads to vertical exchange of particles, which significantly affects the characteristics of streamwise bedload transport. In this paper, we revisit the classic active layer formulation and extend it by incorporating the burial and exhumation through conceptualizing the fluctuations of bed surface as the relative vertical movement of buried tracer particles in the substrate layer (i.e. we change the static reference system to the fluctuating riverbed surface). We theoretically demonstrate, for the first time, the emergence of the transient anomalous (both super- and sub-) diffusion and power-law advective slowdown at the intermediate timescales, which are induced by the non-equilibrium transport as characterized by the inhomogeneous vertical mixing of tracers due to particle burial and exhumation. Neglecting the ballistic regime at extremely short times, at small- and large- timescales the transport regimes show normal diffusion. This result further implies that for the most typical fluvial riverbed with finite vertical exchange depth (i.e. non-aggrading or -degrading bed), the sub-diffusion of bedload tracers for large timescale transport may still be transient, which will eventually converge to the normal diffusion as time increases. Comparing the obtained analytical solutions with available numerical results as well as field observations, we show that the proposed formulation can capture well anomalous diffusion and the power-law slowdown of the advective velocity of bedload tracers at intermediate timescales, and more importantly the transition from anomalous to normal diffusion at large timescales.
Article
Full-text available
The management of water resources needs robust methods to efficiently reduce nitrate loads. Knowledge on where natural denitrification takes place in the subsurface is thereby essential. Nitrate is naturally reduced in anoxic environments and high-resolution information of the redox interface, that is, the depth of the uppermost reduced zone is crucial to understand the variability of the denitrification potential. In this study we explore the opportunity to use random forest (RF) regression to model redox depth across Denmark at 100-m resolution based on ~13,000 boreholes as training data. We highlight the importance of expert knowledge to guide the RF model in areas where our conceptual understanding is not represented correctly in the training data set by addition of artificial observations. We apply random forest regression kriging in which sequential Gaussian simulation models the RF residuals. The RF model reaches a R ² score of 0.48 for an independent validation test. Including sequential Gaussian simulation honors observations through local conditioning, and the spread of 800 realizations can be utilized to map uncertainty. Emphasis is put on adequate handling of nonstationarities in variance and spatial correlation of the RF residuals. The RF residuals show no spatial correlation for large parts of the modeling domain, and a local variance scaling method is applied to account for the nonstationary variance. Moreover, we present and exemplify a framework where newly acquired field data can easily be integrated into random forest regression kriging to quickly update local models.
Article
Full-text available
Knowledge of the statistical distributions of particle hop properties (distances, travel, and rest times) enables a deeper understanding of bed load sediment transport. However, the measurement of particle hops is prone to censorship: Since many hops cross the boundaries of a spatial‐temporal observation window, one knows that they exist but does not know how long they are. An option is to build particle hop samples considering only the hops that are completely observed and excluding (censoring) those observed only partially. Such a choice, however, biases the frequency distributions of the hop properties. Moreover, censorship acts in both space and time, and a hop censored in time will also not contribute to a sample of hop lengths, and vice versa. Time censorship similarly applies to particle rest times. This paper presents a theoretical formulation of censorship that leads to nonparametric bias corrections recovering estimates of values of the underlying distributions of hop distance, travel time, and rest time up to sampling window dimensions. We illustrate the occurrence and consequences of experimental censorship, and the benefit of applying the bias corrections, for both synthetic and laboratory samples of particle hops. The corrections reasonably recover the relative proportions of frequency distributions represented by the data up to the sampling dimensions and improve the estimates of the first two moments of particle hop properties. Recommendations are given regarding how the size of an observation window may be chosen to reduce the bias to below some prescribed value, if the forms of the underlying distributions are known.
Article
Full-text available
The findings of hydrological modeling studies depend on which model was used. Although hydrological model selection is a crucial step, experience suggests that hydrologists tend to stick to the model they have experience with, and rarely switch to competing models, although these models might be more adequate given the study objectives. To gain quantitative insights into model selection, we explored the use of seven rainfall-runoff models based on the abstract of 1,529 peer-reviewed papers published between 1991 and 2018. The models selected were the Hydrologiska Byråns Vattenbalansavdelning model (HBV), the Variable Infiltration Capacity model (VIC), the mesoscale Hydrological model (mHM), the TOPography-based hydrologic model (TOPMODEL), the Precipitation Runoff Modelling System (PRMS), the Génie Rural model à 4 paramètres Journaliers (GR4J), and the Sacramento soil moisture accounting model. We provide quantitative evidence of regional preferences in model use across the world and demonstrate that specific models are consistently preferred by certain institutes. Model attachment is particularly strong. In ~74% of the studies, the model selected can be predicted solely based on the affiliation of the first author. The influence of adequacy on the model selection process is less clear. Our data reveal that each model is used across a wide range of purposes, landscapes, and temporal and spatial scales (i.e., as a model of everything and everywhere). Model intercomparisons can provide guidance for model selection and improve model adequacy, but they are still rare (because each model must usually be setup individually) and the insights they provide are currently limited (because they are rarely controlled experiments). We suggest that moving from fixed-structure models to modular modeling frameworks (master templates for model generation) can overcome these issues, enable a more collaborative and responsive model development environment, and result in improved model adequacy.
Article
Full-text available
Hydrological signatures are now used for a wide range of purposes, including catchment classification, process exploration and hydrological model calibration. The recent boost in the popularity and number of signatures has however not been accompanied by the development of clear guidance on signature selection, meaning that signature selection is often arbitrary. Here we use three complementary approaches to compare and rank 15 commonly-used signatures, which we evaluate in 671 US catchments from the CAMELS data set (Catchment Attributes and MEteorology for Large-sample Studies). Firstly, we employ machine learning (random forests) to explore how attributes characterizing the climatic conditions, topography, land cover, soil and geology influence (or not) the signatures. Secondly, we use a conceptual hydrological model (Sacramento) to critically assess which signatures are well captured by the simulations. Thirdly, we take advantage of the large sample of CAMELS catchments to characterize the spatial smoothness (using Moran's I) of the signature field. These three approaches lead to remarkably similar rankings of the signatures. We show that signatures with the noisiest spatial pattern tend to be poorly captured by hydrological simulations, that their relationship to catchments attributes are elusive (in particular they are not correlated to climatic indices like aridity) and that they are particularly sensitive to discharge uncertainties. We question the utility and reliability of those signatures in experimental and modeling hydrological studies, and we underscore the general importance of accounting for uncertainties in hydrological signatures.
Article
Full-text available
The calibration of hydrological models without streamflow observations is problematic, and the simultaneous, combined use of remotely sensed products for this purpose has not been exhaustively tested thus far. Our hypothesis is that the combined use of products can 1) reduce the parameter search space and 2) improve the representation of internal model dynamics and hydrological signatures. Five different conceptual hydrological models were applied to 27 catchments across Europe. A parameter selection process, similar to a likelihood weighting procedure, was applied for 1023 possible combinations of ten different data sources, ranging from using 1 to all 10 of these products. Distances between the two empirical distributions of model performance metrics with and without using a specific product, were determined to assess the added value of a specific product. In a similar way, the performance of the models to reproduce 27 hydrological signatures was evaluated relative to the unconstrained model. Significant reductions in the parameter space were obtained when combinations included AMSR‐E and ASCAT soil moisture, GRACE total water storage anomalies, as well as, in snow dominated catchments, the MODIS snow cover products. The evaporation products of LSA‐SAF and MOD16 were less effective for deriving meaningful, well constrained posterior parameter distributions. The hydrological signature analysis indicated that most models profited from constraining with an increasing number of data sources. Concluding, constraining models with multiple data sources simultaneously was shown to be valuable for at least four of the five hydrological models to determine model parameters in absence of streamflow.
Article
Full-text available
Random forest and similar Machine Learning techniques are already used to generate spatial predictions, but spatial location of points (geography) is often ignored in the modeling process. Spatial auto-correlation, especially if still existent in the cross-validation residuals, indicates that the predictions are maybe biased, and this is suboptimal. This paper presents a random forest for spatial predictions framework (RFsp) where buffer distances from observation points are used as explanatory variables, thus incorporating geographical proximity effects into the prediction process. The RFsp framework is illustrated with examples that use textbook datasets and apply spatial and spatio-temporal prediction to numeric, binary, categorical, multivariate and spatiotemporal variables. Performance of the RFsp framework is compared with the state-of-the-art kriging techniques using fivefold cross-validation with refitting. The results show that RFsp can obtain equally accurate and unbiased predictions as different versions of kriging. Advantages of using RFsp over kriging are that it needs no rigid statistical assumptions about the distribution and stationarity of the target variable, it is more flexible towards incorporating, combining and extending covariates of different types, and it possibly yields more informative maps characterizing the prediction error. RFsp appears to be especially attractive for building multivariate spatial prediction models that can be used as “knowledge engines” in various geoscience fields. Some disadvantages of RFsp are the exponentially growing computational intensity with increase of calibration data and covariates and the high sensitivity of predictions to input data quality. The key to the success of the RFsp framework might be the training data quality—especially quality of spatial sampling (to minimize extrapolation problems and any type of bias in data), and quality of model validation (to ensure that accuracy is not effected by overfitting). For many data sets, especially those with lower number of points and covariates and close-to-linear relationships, model-based geostatistics can still lead to more accurate predictions than RFsp.
Conference Paper
Full-text available
Interpretability is often a major concern in machine learning. Although many authors agree with this statement, interpretability is often tackled with intuitive arguments, distinct (yet related) terms and heuristic quantifications. This short survey aims to clarify the concepts related to interpretability and emphasises the distinction between interpreting models and representations, as well as heuristic-based and user-based approaches.