Content uploaded by Goutam Kumar
Author content
All content in this area was uploaded by Goutam Kumar on Jan 20, 2021
Content may be subject to copyright.
Quantifying Climate and Catchment Control on
Hydrological Drought in the Continental
United States
Goutam Konapala
1,2
and Ashok Mishra
1
1
Glenn Department of Civil Engineering, Clemson University, Clemson, SC, USA,
2
Environmental Sciences Division, Oak
Ridge National Laboratory, Oakridge, TN, USA
Abstract The evolution of hydrological drought events is a result of complex (nonlinear) interactions
between climate and catchment processes. To investigate such nonlinear relationship, we integrated a
machine learning modeling framework based on the random forest (RF) algorithms with an interpretation
framework to quantify the role of climate and catchment controls on hydrological drought. More
particularly, our framework interprets a built RF machine‐learning model to identify dominant variables
and visualize their functional dependence and interaction effects on hydrological drought characteristics
utilizing concepts of minimal depth, interactive depth, and partial dependence. We test our proposed
modeling framework based on a set of 652 continental United States catchments with minimal human
interference for a period of 1979–2010. Application of this framework indicated presence of three distinct
drought regimes, which includes, Regime 1: droughts with longer duration, less frequent and lesser
intensity; Regime 2: droughts with moderate duration, moderate frequency, and moderate intensity; and
Regime 3: droughts with shorter duration, more frequent, and more intense. RF algorithm was able to
accurately model the drought characteristics (intensity, duration, and number of events) for all the three
drought regimes as a function of selected variables. It was observed that the type of dominant variables as
well as their nonlinear functional relationship with hydrological droughts characteristics can vary between
three selected regimes. Our interpretation framework indicated that catchment characteristics have a
significant role in controlling the hydrologic drought for catchments (regime 1), whereas both climate and
catchment characteristics control hydrological drought in regimes 2 and 3.
1. Introduction
Hydrologic drought events defined as a period with inadequate surface and subsurface water resources are a
result of multifaceted interaction between climate and catchment processes (Mishra & Singh, 2010; Van
Lanen et al., 2013; Van Loon et al., 2014; Wang et al., 2011). Therefore, hydrologic drought not only depends
on decrease in precipitation or increase in temperature, but it is further influenced by the interaction of
various climate and terrestrial components (e.g., soil characteristics, elevation, and stream order). An
inadequate understanding of this complexity can be a major challenge for accurate prediction as well as
efficient drought management (Cayan et al., 2010; Mishra & Singh, 2011; Narasimhan & Srinivasan, 2005;
Sheffield et al., 2012). To address this complex hydrological drought processes, many studies have investi-
gated the potential influence of terrestrial catchment characteristics on hydrological droughts by utilizing
physically based models (Apurv et al., 2017; Tallaksen et al., 2009; Van Loon et al., 2014; Van Loon &
Laaha, 2015; Van Loon & Van Lanen, 2012). However, the application of physically based models for
catchments is often plagued by differences in spatial scale, over/underparameterization, and model
structural error, including model calibration uncertainties.
A few studies have utilized a linear regression‐based framework (Saft et al., 2016, 2015; Van loon et al., 2014;
Van Loon & Laaha, 2015; Van Loon & Van Lanen, 2012) to understand the role of climate and terrestrial
components in the development of hydrological drought. On the other hand, many studies suggested the
response of streamflow to meteorological conditions is predominantly nonlinear in nature (Konapala &
Mishra, 2016; Latt et al., 2015; Stahl et al., 2008). Therefore, we expect that hydrological drought character-
istics derived based on streamflow likely to have a nonlinear dependence due to the complex interaction
between climate and catchment processes within a watershed. In addition to that, the evolution of
©2019. American Geophysical Union.
All Rights Reserved.
RESEARCH ARTICLE
10.1029/2018WR024620
Special Section:
Big Data & Machine Learning
in Water Sciences: Recent
Progress and Their Use in
Advancing Science
Key Points:
•An integrated random forest
algorithm interpretation framework
was applied to investigate
hydrological drought characteristics
in CONUS
•This framework indicated the
presence of three drought regimes
which witnesses dominant climate
and catchment controls
•The dominant climate and
catchment controls exhibit varied
functional relationships with
hydrological droughts
Supporting Information:
•Supporting Information S1
•Table S1
Correspondence to:
A. Mishra,
ashokm@g.clemson.edu
Citation:
Konapala, G., & Mishra, A. (2020).
Quantifying climate and catchment
control on hydrological drought in the
continental United States. Water
Resources Research,56,
e2018WR024620. https://doi.org/
10.1029/2018WR024620
Received 18 DEC 2018
Accepted 27 NOV 2019
Accepted article online 11 DEC 2019
KONAPALA AND MISHRA 1of25
hydrological drought is often clustered based on neighboring catchments due to the similarity in climate and
catchment characteristics (Rajasekhar et al., 2014; Zhang et al., 2012). Hence, it is important to gain a deeper
understanding of the dominant linear and nonlinear controls resulting in distinct drought regimes using
robust nonparametric techniques. Therefore, there is a great potential to further quantify the nonlinear
association between climate (catchment) variables and the evolution of clustered drought events based on
nonparametric techniques, machine learning algorithms and interpretive framework.
Machine learning algorithms are a class of nonparametric techniques that can successfully capture subtle
functional relationship between the input (e.g., precipitation, evaporation, and base flow) and the output
variables (e.g., streamflow) of a hydrologic system (e.g., watersheds), even if the underlying mechanism pro-
ducing data is not known (Elshorbagy et al., 2010a, 2010b; Nourani et al., 2014; Raghavendra & Deka, 2014).
In addition to that, these methods have no distributional or functional assumptions on covariate relation to
the response function. Hence, majority of the studies in hydrology have utilized machine learning algo-
rithms for prediction purposes in hydrology (Shen, 2018). However, the formulation of machine learning
algorithms may not be straightforward to quantify underlying mechanisms responsible for model behavior
in case of hydrologic processes (Gupta & Nearing, 2014; Karpatne et al., 2017). Recognizing these issues in
machine learning algorithms, recently several studies have introduced interpretation frameworks [see
Guidotti et al., 2018, for review] to address such limitations. Works on interpreting these black‐box models
have focused on understanding how a fixed machine‐learning model leads to particular predictions. These
interpretation frameworks can provide a deeper understanding on the functioning of machine learning
models like artificial neural networks, random forests (RF), and support vector machines (Bastani et al.,
2017; Bibal & Frénay, 2016; Doshi‐Velez & Kim, 2017). Although, the machine learning approaches are
widely used in hydroclimatology (Veettil et al., 2018; Fahimi et al., 2017; Shortridge et al., 2016;
Raghavendra & Deka, 2014), the interpretation framework to quantify the causal relationship between input
and modeled outputs is emerging (Fienen et al., 2018; Koch et al., 2019; Schwalm et al., 2017) and especially
not applied to extreme events.
The above discussion suggests that a limited research conducted to investigate the dominant nonlinear influ-
ence of the climate as well as catchment characteristics on evolution of clustered hydrological drought
regimes. Therefore, we followed two‐step approach to improve our understanding of heterogeneous nature
of drought characteristics over CONUS: (i) First, a classification algorithm was applied to identify optimal
number of clusters associated with drought regimes and (ii) an interpretative modeling framework was
applied within individual drought regimes to identify key climate and catchment characteristics that has
potential influence on the hydrological drought characteristics. For this purpose, we selected 652 watersheds
located in the CONUS due to the availability of abundant hydrologic, physical, soil, and geomorphic
information with least human interference from Geospatial Attributes of Gages for Evaluating Streamflow
Version 2 (GAGES II) database (Falcone, 2011). Overall, we aim to address following questions:
1. How are the hydrological drought regimes clustered in the CONUS? What are the key climate and
catchment characteristics that control hydrological drought regimes?
2. To identify and extract the functional relationships and interactions among dominant variables influen-
cing the hydrological drought characteristics based on the interpretive machine learning techniques (i.e.,
minimal depth, interactive depth and partial dependence plots).
The remainder of the manuscript is organized as follows: Section 2 provides an overview of data, study area,
section 3 presents the methods designed for this study, section 4 presents the results, section 5 discusses the
findings and the outlook, and finally, the manuscript is concluded with section 6.
2. Data and Study Area Description
We selected the catchments located in CONUS due to the availability of extensive and open source data
associated with various characteristics of catchments. In addition, to understand the dominant variables
associated with different drought regimes, it is important to utilize data from catchments with minimal
human interference. Therefore, we first identified catchments with minimal human interference based on
the GAGES II database (Falcone, 2011), which provides geospatial data for 9,322 stream gages maintained
by USGS. This data set serves the purpose of providing users with a comprehensive set of geospatial charac-
teristics for many gaged catchments with long flow record. In addition to that, it also provides information
10.1029/2018WR024620
Water Resources Research
KONAPALA AND MISHRA 2of25
on catchments which are least disturbed by human influences. In this database, 2,057 catchments are iden-
tified to have minimal human interference based on three criteria: (1) a quantitative index of anthropogenic
modification within the catchment based on Geographical Information system derived variables, (2) visual
inspection of every stream gage and drainage basin from recent high‐resolution imagery and topographic
maps, and (3) information about human influences from USGS Annual Water Data Reports (Falcone,
2011). We have selected water years of 1980–2011 to represent the U.S. climate normal period as our study
period to reflect the current climate conditions. Overall, we identified 652 catchments with no missing data
during the period of 1980–2011. The spatial location of catchments with minimal human interference and
continuous streamflow data are shown in Figure 1.
2.1. Overview of Selected Climate and Catchment Variables
A lack of precipitation and increase in evapotranspiration (i.e., meteorological drought) causes low soil
moisture content (i.e., agricultural drought), which further reduces surface and subsurface water resources
(i.e., hydrological drought) (Mishra & Singh, 2010; Mukherjee et al., 2018). The propagation of meteorologi-
cal to hydrological drought is influenced by interaction between climate and catchment variables (Apurv
et al., 2017; Haslinger et al., 2014; Mishra & Singh, 2010; Tallaksen et al., 2009; Van Loon et al., 2014).
The hydrological drought is directly related to the streamflow generated in a watershed, and it is influenced
(controlled) by climate and catchment characteristics of the selected watershed. In our analysis, we selected
sixty variables related to climate, catchment, and morphological aspects of catchments documented by the
GAGES II data set (Table S1 in the supporting information). Among them, 12 climate variables describing
the annual magnitude and intraannual variability of precipitation, temperature, and potential evapotran-
spiration, and these data are obtained from the high‐resolution data available from PRISM database (Daly
et al., 2000). Fifteen hydrologic catchment variables related to stream order, base‐flow index, and over‐land
flow are derived from the U.S. National Hydrography Data Set (NHD). Four land cover variables describing
the percentage of different land cover types are derived from 2006 Land cover product obtained from
National Land Cover Database. Twenty‐three soil characteristics are derived from State Soil Geographic data
base for the CONUS. Finally, six topographic variables related to elevation, slope, and geographical aspect
features of the catchments are included in the analysis. A brief discussion and data sources of the selected
variables are provided in Table S1. The interplay between these catchment characteristics are assumed to
shape catchment behavior by influencing how catchments store and transfer water. The variables selected
and provided in this database are considered to significantly affect the hydrologic processes. Some of these
catchment attributes have been previously used for predicting mean streamflow (Rice et al., 2015) and other
streamflow signatures (Addor et al., 2018) and drought (Stoelzle et al., 2014). In addition to previous
variables, we have selected multiple attributes to cover a wide range of features, such as the catchment
climate, hydrology, land cover, soil, geology, topography, and river network.
3. Methodology
3.1. Hydrological Drought Characterization
Hydrological drought often expressed a time period with inadequate surface and subsurface water resources
with respect to a normal condition of a given water resources management system (Mishra & Singh, 2010).
Therefore, we applied the concept of Standardized Streamflow Index (SSI) (Shukla & Wood, 2008; Vicente‐
Serrano et al., 2011) to characterize hydrological drought at monthly time scale for selected watersheds
across USA. SSI can be computed for multiple timescales and is flexible to determine the drought conditions
at seasonal (3 to 6 months), annual (12 months), and longer (>12‐month SSI) time scales. However, in this
study, we restrict our analysis to seasonal scale as the droughts usually take 3 or more months to develop.
Therefore, we calculated the 3‐month SSI by aggregating streamflow over 3 months and fitting these accu-
mulated values to a parametric statistical distribution. The probabilities from these fitted distributions are
then transformed to the standard normal distribution to create hydrological drought index [Vicente‐
Serrano et al., 2011; Modarres, 2008; Shukla & Wood, 2008]. Therefore, SSI determines the conditions of
stream flow drought relative to the long‐term monthly streamflow. The positive SSI values indicate a surplus
relative to the long‐term streamflow conditions whereas the negative values indicate a deficit (i.e., hydrolo-
gic drought). Hydrological drought indices similar to SSI have been previously applied to understand the
U.S. hydrological drought characteristics (Shukla & Wood, 2008; Veettil et al., 2018). Shukla and Wood
10.1029/2018WR024620
Water Resources Research
KONAPALA AND MISHRA 3of25
(2008) reported that the two‐parameter gamma and lognormal distributions generally performed well for
deriving hydrological drought in USA. In this study, lognormal distribution was selected for deriving
hydrological drought index based on SSI by using streamflow time series. The formulation of SSI is
presented in the supporting information Text S1.
3.2. Classification of Hydrological Drought Regimes
The SSI time series were constructed for all the 652 catchments to investigate the hydrological droughts. In
this study, we quantified hydrological drought when SSI < −0.5 for a period of more than 3 months. By
extracting drought events based on these two conditions we can differentiate the hydrological droughts from
seasonal streamflow fluctuations (Konapala & Mishra, 2017; Mishra & Singh, 2010, 2011). Once all the
drought events based on the above conditions are extracted, we computed the average drought duration,
severity and number of events based on the well‐established theory of runs (Yevchevich, 1967). The number
of drought events is the total number of times the drought has occurred based on the above‐defined thresh-
olds. The average duration of drought per event is determined by dividing total number of drought months
by number of drought events as shown in equation (1)
DD ¼∑ND
i¼1Di
ND (1)
where D
i
is duration of single drought event. Similarly, the average drought intensity is calculated by first
estimating the intensity (S) of each drought event as
S¼∑D>3
SRI<0 SSI
D(2)
Its average over the period is calculated based on equation (3),
DI ¼∑ND
i¼1S
ND (3)
By applying the procedure of multivariate clustering, we can possibly distinguish the evolution of hydrolo-
gical drought regimes exhibited by catchments based on the long‐term (~30 year) statistics of drought
characteristics (Rajsekhar et al., 2014; Gocic & Trajkovic, 2014; Yoo et al., 2012). The clustering‐based
Figure 1. Spatial locations of catchments which were selected in this study based on minimal human interference and
nonmissing data criterion.
10.1029/2018WR024620
Water Resources Research
KONAPALA AND MISHRA 4of25
approaches typically used in hydrological studies are based on hierarchical clustering, k‐means/medoids
clustering, and fuzzy partition clustering (Carrillo et al., 2011; Ley et al., 2011; Olden et al., 2012; Sawicz
et al., 2011; Yadav et al., 2007). Since, we aim to investigate dominant controls of catchment characteristics
defined by drought regimes; we applied a fuzzy partitioning algorithm that accounts for uncertainty in the
classification process.
We identified drought regimes based on the three drought characteristics (i.e., intensity, duration, and num-
ber of events) using a fuzzy medoid clustering algorithm. Fuzzy clustering assigns membership values, and it
is more generalized and useful to describe a point by its membership values in all the clusters. The method
chosen for this study is fuzzy k medoids clustering algorithm introduced by Krishnapuram et al. (2001),
which is usually more robust, and the effect of outliers can be significantly reduced compared to other
clustering algorithms that uses mean values for classification. Hence, the data objects closer to the median
of clusters as determined by Euclidean distance likely to have higher degrees of membership compared to
objects scattered around the limits of clusters. Similar to other clustering algorithms, fuzzy k‐medoids
follows a heuristic approach to minimize the within cluster variance. The formulation of this approach is
presented in the supporting information Text S2.
Xie and Beni (XB) index (Xie & Beni, 1991) is a widely used criterion for quantifying the quality of fuzzy clus-
tering. In order to complement XB index, we included fuzzy silhouette (FS) index to measure the similarity
of an object with respect to its own cluster (cohesion) compared to other clusters (separation). Therefore, we
utilized the fuzzy extension of silhouette index (Campello & Hruschka, 2006) as the second criterion to eval-
uate the optimal number of clusters in this study. These two indices can complement each other by capturing
the similarity of an object with respect to its own cluster (FS index) and the compactness of the clusters (XB
index). Higher the value of Silhouette index, more optimal is the resultant clustering. This is given
by equation (4)
FS kðÞ¼
∑n
i¼1uig−uig′
sikðÞ
∑n
i¼1uig−uig′
(4)
where s
i
is the silhouette index for object i. Whereas, the XB index measures the compactness of the clusters
and it is especially formulated for evaluation of fuzzy clustering performance. This is given by equation (5) as
XB kðÞ¼
∑n
i¼1∑k
g¼1um
ig d2xi;mg
n×ðmin
g;g′g≠g′ðÞ
d2mg;mg′
(5)
where
X¼xij
:drought characteristics of order n×tðÞU¼uig
:membership degree of order k×tðÞd2
Mxi;mg
¼‖xi−mg‖:Euclidean distance mg;g¼1;…;k
⊂xi;i¼1;…;nfg
:Medoids of the drought characteristics
Smaller the Xie‐Beni index, more compact is the cluster. Therefore, each catchment is assigned to a specific
class with a certain probability, and the catchments with highest probability are considered as primary clus-
ters for subsequent analysis. The resulting clusters based on these trivariate drought characteristics (inten-
sity, duration, and number of events) are a consequence of natural partitions identified by the clustering
algorithm. The drought characteristics in each cluster would indicate a distinct drought regime that can pro-
vide valuable information on the controls of climate and catchment characteristics on hydrological droughts.
3.3. RF Model
In this study, we utilize RF algorithm (Breiman, 2001) to investigate the dominant catchment and climate
variables that plays an important role for evolution of clustered drought characteristics. It is important to
acknowledge that selection of an algorithm depends on the objectives and the types of data to be analyzed
(Caruana & Niculescu‐Mizil, 2006; Huang et al., 2015; Kotsiantis et al., 2007). The RF algorithms differ from
linear regression methods. In this study, we used nonlinear RF model and it has a major advantage that they
are (mostly) unaffected by multicollinearity (Ishwaran et al., 2010; Zhang & Ma, 2012; Díaz‐Uriarte & De
10.1029/2018WR024620
Water Resources Research
KONAPALA AND MISHRA 5of25
Andres, 2006). The multicollinearity problem is alleviated since a random subset of features is chosen for
each tree in a RF. (Hsilch et al., 2014; Ishwaran et al., 2010; Zhang & Ma, 2012; Díaz‐Uriarte & De
Andres, 2006). The ability of RF algorithm to deal with overfitting issues makes it suitable for
our application.
RF algorithm uses a set of bootstraps (Efron & Tibshirani, 1994) samples and grows an independent tree
model on each bootstrapped sample of the population. Each tree is grown by recursively partitioning the
population with an objective to minimize the mean square errors. At each split, a subset of candidate vari-
ables is tested for the split optimization and each node is divided into two successor nodes. Each successor
node is then split again until the process reaches the stopping criteria of either maximum node purity or
node member size, which defines the set of terminal (unsplit) nodes for the tree. RF algorithm then ranks
each training set observation into one unique terminal node per tree. The RF estimate for each observation
is then calculated by averaging the terminal node results across the collection of trees. A basic pseudo‐
algorithm explaining the RF procedure is presented in Table 1 and Figure 2. The resampling and averaging
procedure circumvents the problem of overfitting and multicollinearity making this approach suitable for
our study (Cutler et al., 2007; Díaz‐Uriarte & De Andres, 2006; Prasad et al., 2006; Zhang & Ma, 2012). RF
algorithm can be tuned to reduce the prediction error (Boulesteix et al., 2012; Breiman, 2001; Strobl et al.,
2009). The accuracy of RF algorithm output mainly depends on three parameters (1) the number of trees
(ntrees) to grow in the forests, (2) the number of randomly selected predictor variables (mtry) at each node,
and (3) the minimal number of observations at the terminal nodes (nodesize) of the trees. We set the number
of trees (ntrees) to 1,000 as suggested by Hengl et al. (2018) and Probst and Boulesteix (2017), and we
randomly resampled different combinations of parameter sets with “mtry”ranging from one to total
variables considered (60 variables) and “nodesize”ranging from one to total number of catchments in each
regime. The combination of “mtry”and “nodesize”are selected based on the least out‐of‐bag error is
considered as the optimal parameter.
3.4. Framework for Interpreting RF Algorithm
We interpreted the RF model by examining three important features exploring variable importance, variable
interaction and partial dependence. Variable importance and interaction are based on maximal trees and
minimal depth concept (Ishwaran et al., 2010), whereas partial dependence is estimated by integrating the
effects of all the variables besides the covariate of interest (Breiman, 2001). The concept of minimal depth
would allow us to identify the dominant variables, whereas the partial dependence quantifies the approxi-
mate relationship between each dominant variable and the drought characteristic. The concept of interac-
tion depth would allow us to understand the interaction among dominant controls of climate, catchment
and morphological variables related to a particular drought characteristic.
3.4.1. Minimal Depth
The concept of minimal depth [Diller et al., 2012; Hsisch et al., 2011; Ishwaran et al., 2010] is useful for asses-
sing the variable importance and variable interactions within a RF modeling framework. The concept of
minimal depth of a RF can be formulated precisely in terms of a maximal subtree. The maximal subtree
for a variable vis the largest subtree whose root node is split based on the changes in variable v. The shortest
distance from the root of the tree to the root of the closest maximal subtree of vis the minimal depth of v.
The minimal depth for any variable vcan be expressed as
MD vðÞ¼
∑NRF
i¼1TvðÞ
kk
NRF
(6)
where N
RF
is the total of number of trees (i.e., ntrees = 1,000), ‖T(v)‖represents the distance of variable v
from the root of any tree T. To illustrate this concept of maximal tree and minimal depth of variable v,we
show three separate trees (Figure 3) representing three randomized trees to mimic the behavior of RFs. In
this way, depth of variable vfor all the maximal subtrees are identified and averaged across all the rando-
mized trees to calculate the (minimal depth) MD(v). A smaller MD(v) value indicates that the corresponding
variable vis more influential. Those variables with averaged minimal depth exceeding the average minimal
depth threshold are treated as noisy and therefore removed from the final model.
10.1029/2018WR024620
Water Resources Research
KONAPALA AND MISHRA 6of25
Figure 2. An illustration of random forest algorithm used to model the influence of dominant variables on hydrologic droughts for CONUS.
Figure 3. A pictorial illustration of concept of maximal subtrees, minimal depth, and interactive. The maximal trees are indicated by red, and the depth is indicated
by an integer located in the center of tree with the root node as zero. In the first tree (Figure 3a), the variable vsplits the root node; therefore, the entire tree
can be considered as a maximal subtree for the variable v, whereas, in the second tree (Figure 3b), the maximal subtree for variable vis not the entire tree as
exhibited in the previous scenario. This is because the variable vdoes not split on the root node unlike the previous case. Figure 3c presents another scenario with
two maximal subtrees for variable v. The maximal subtree on the left side has a depth of two, whereas on the right side it has a depth of one.
10.1029/2018WR024620
Water Resources Research
KONAPALA AND MISHRA 7of25
3.4.2. Interactive Depth
Dominant controls identified by the concept of minimal depth can potentially identify the effect of each
independent variable on the drought characteristics; however, it ignores the interaction effect with respect
to other variables. For instance, drought propagation might be influenced by two or more interacting vari-
ables in a specific regime. Therefore, an interactive minimal depth metric that measures the interactions
between any two variables vand wis needed (Diller et al., 2012; Hsisch et al., 2011; Ishwaran et al., 2010).
For this purpose, we first define the variable interactive distance ‖MT(v,w)‖, which represents the distance
between variables vand wfrom the root of any maximal tree (MT). Since, the maximal tree depths signifi-
cantly vary across each randomized tree (Figure 3), a standardization procedure needs to be applied. The
interactive depth can be formulated as
ID v;wðÞ¼
∑NRF
i¼11−
MT v;wðÞ
kk
MTD vðÞ
NRF
(7)
where N
RF
is the total of number of trees (i.e., ntrees = 1,000), ‖MT(v,w)‖represents the distance between
variable vand wfrom the root of any maximal tree MT, and MTD(v) is the depth of maximal subtree
MT(v). Based on the formulation, the ID(v,w) has a range between 0 to 1. Among them, the interactive depth
(ID) values closer to zero indicates higher interaction between any two considered variables. Figure 3c
illustrates these interactions between variables vand w, where the right maximal subtree of variable vand
wsplits further inside the subtree. If this concept is observed over all the randomized trees, then there is a
significant interaction between variables vand wand they collectively influence a prediction outcome.
3.4.3. Partial Dependence
The concept of partial dependence can quantify the functional relationship between dominant variables and
drought characteristics. Partial dependence is assessed by integrating the effects of all the variables beside
the covariate of interest (Breiman, 2001; Friedman & Meulman, 2003). Partial dependence of a variable x
k
can be estimated by averaging over the input variables {X
i
,i=1,…,n} with fixed x
k
as
e
fkxk
ðÞ¼
1
n∑n
i¼1b
fX
i;Ck;xk
(8)
whereb
frepresents the outputs based on the RF models. This partial dependence estimate can be visualized to
understand the functional relationship between the variables (x
k
) and their potential influence on hydrolo-
gical droughts. As the RF algorithm randomly resamples the variables for bagging the trees, we run each
model 1,000 times and then average the minimal and interactive depth variables to interpret and identify
the dominant variables.
4. Results and Discussions
4.1. Classification of Drought Regimes
The fuzzy k medoids clustering approach was applied to 652 catchments to classify the drought regimes
based on drought intensity (DI), drought duration (DD), and number of events. First, we identified the opti-
mal number of regimes based on fuzzy silhouette (FS) index andXB index. Figure 4a shows the behavior of
FS and XB indices with respect to the number of regimes. It was observed that the optimal number of clusters
appears to be three based on the maximum and minimum value of FS and XB, respectively. Therefore, we
consider the optimal number of clusters as three for further analysis.
The drought characteristics (i.e., DI, DD, and ND) for three selected drought regimes are shown in
Figures 4b and 4d. Since the units of DI, DD, and number of drought (ND) are different, we applied the con-
cept of Z score to standardize and compare the drought characteristics for three selected regimes. Z score
measures the standard deviation of the sample data points from their population average. The boxplots with
Z score are plotted in Figure 4b, so that the drought characteristics can be compared among the identified
regimes. The number of catchments representing each regime are shown in Figure 4c. The absolute values
of drought characteristics for each of the drought regime is plotted as probability distribution as shown in
Figure 4d. Regime 1 is represented by 142 catchments with longer droughts (median DDzs~1), lower
drought intensities (median DIzs~−0.75) and occurrences with median NDzs~−1. The magnitude of DD
10.1029/2018WR024620
Water Resources Research
KONAPALA AND MISHRA 8of25
for regime 1 varies between 7 and 20 months, whereas the magnitude of DI and ND varies from 0.4 to 0.8 and
5 to 20, respectively. Regime 2 is represented by 242 catchments that exhibit relatively moderate drought
characteristics with median z scores close to 0 (Figure 4b). The magnitude of DD for the catchments
located in regime 2 varies between 7 and 12 months, DI within the range of 0.5 to 0.9, and ND within the
range of 15 to 25. Higher number of catchments (total: 268) is located in regime 3, which represents low
drought duration (median DDzs~−0.8) occurring frequently (median NDzs~0.75) with higher intensity
(median DIzs~1). The catchments located in regime 3 witness droughts with duration between 5 and
8 months, intensity varies between 0.7 and 1.2 and frequency between 20 and 30 (Figure 4d).
The spatial locations of catchments for three drought regimes are shown in Figure 5. The catchments
located in Pacific North West, parts of north eastern, and central USA represent regime 1, with drought
characteristics of longer duration but are less intense and occur less frequently. Whereas, the catchments
representing regime 2 with moderate drought characteristics are in different parts of CONUS, and the
catchments representing regime 3 are mostly located in north central and eastern USA including
watersheds in pacific North West region. Overall, it was observed that the spatial proximity between
the catchments does play a considerable role in the clustering of regimes, which is probably due to similar
climatological variability and catchment response characteristics (Brutsaert & Nieber, 1977; Knapp et al.,
2002; Serrano, 2006)
4.2. RF Model Performance
The effect of multicollinearity in data analysis can make it difficult to get appropriate linear coefficient
estimates with small standard errors (Achen, 1982). Our analysis is different due to the application of RF
Figure 4. Characteristics of the fuzzy clustered drought regimes. (a) The variation of fuzzy silhouette index (FSI) and
Xie and Beni (XB) index values according to the number of clusters with 3 being the optimal number of clusters. (b) The
box plots of Z scores of considered drought characteristics in each cluster. (c) The number of catchments belonging
to each cluster (d) provides the kernel density estimates of the probability distribution of the drought characteristics
specific to each regime.
10.1029/2018WR024620
Water Resources Research
KONAPALA AND MISHRA 9of25
algorithm, which is nonlinear in nature, and we do not rely any regression coefficients in our analysis.
Therefore, even though there is a linear correlation between the predictors, it does not interfere with our
analysis. In addition, this multicollinearity problem is alleviated since a random subset of features is
chosen for each tree in a RF [Hsilch et al., 2014; Ishwaran et al., 2010; Zhang & Ma, 2012; Díaz‐Uriarte &
De Andres, 2006].
As highlighted before, the primary purpose of our study is to identify the key climate, catchment, and mor-
phological variables using a machine learning interpretation framework. Although, our machine learning
application is not focused on prediction, we performed a preliminary analysis to evaluate the performance
of RF model by splitting the data in to training (75%) and testing (25%) phase of the optimized RF algorithm.
The model performed well based on the root‐mean‐square error information, and these plots are presented
in the supplementary text.
We evaluated the performance of RF algorithm to model the variations in drought characteristics (DI, DD,
and ND) in each regime with respect to the selected catchment and climate variables by applying on the
entire data set. As highlighted in section 3, the optimal parameters of RFs (i.e., mtry and nodesize) which
was derived based on the least out‐of‐bag error are listed in Table 2. In addition to that, the metrics of R
2
,
percentage bias (PBIAS), and Nash‐Sutcliffe efficiency (NSE) for the corresponding optimal model config-
urations are listed in Table 2. The coefficient of determination (R
2
) in case of each RF model is more than
0.9. This measure indicates that the adopted RF algorithm can explain more than 90% of the variance found
in the drought characteristics. The PBIAS values which are expressed in percentage remain closer to 0 indi-
cating comparatively lesser bias among all the RF models. Finally, the NSE values are in the range of 0.77 to
0.85. NSE values closer to 1 correspond to a perfect match between the modeled and observed data points.
Also, NSE values greater than 0 indicate an unbiased model. Hence, the NSE values also point toward an
unbiased and efficient model. Therefore, all the models have high coefficient of determination (R
2
> 0.9),
lower PBIAS values and NSE values closer to 1.
Figure 5. Spatial distribution of the three drought regimes located in CONUS.
10.1029/2018WR024620
Water Resources Research
KONAPALA AND MISHRA 10 of 25
4.3. Application of Interpretation Framework to Understand Drought Characteristics
4.3.1. Application to Drought Duration
Figure 6 shows the ranking of climate and catchment variables that has potential influence on the hydrolo-
gical DD for each drought regime. As discussed earlier, variables with least minimal depth likely to have
higher dominant control, whereas the increase in depth will have lower influence on drought duration
within each regime. The dashed line (Figure 6) indicates the average minimal depth of all the variables,
which can be used as a threshold to determine the significant variables of interest (Ishwaran et al., 2011).
Based on this threshold, the significant influencing variables are highlighted in green color, and the nonin-
fluential variables are highlighted in orange color (Figure 6).
Overall, 20 variables have more than average minimal depth for regime 1, which represents catchments with
higher drought duration (median DDzs~1). In case of regime 2, which represents catchments with average
Figure 6. Rank plots are provided in ascending order with variables exhibiting the least minimal depth on the top across clusters (a) 1, (b) 2, and (c) 3 in case of
drought duration (DD). The dotted line represents the average minimal depth. The variables having below average minimal depths are color codes as green and
variables with above average minimal depths are color coded as red.
10.1029/2018WR024620
Water Resources Research
KONAPALA AND MISHRA 11 of 25
drought duration, 14 variables have more than average minimal depth. Finally, in case of regime 3, which
consists of catchments with lower drought duration (median DDzs~−0.8), a total of 11 variables have more
than average minimal depth. The potential influence of number of climate and catchment variables on
drought characteristics varies for three different drought regimes. For instance, maximum number of catch-
ment variables dominate in controlling the drought duration for catchments that witnesses low drought
durations, whereas soil and climate variables dominate for catchments witnessing high and medium
drought durations, respectively. It was observed that in the case of catchments with high drought durations
(regime 1), base flow index (BFI_AVE) has significant lesser minimal depth compared to other variables sug-
gesting its dominant role in that regime. Base flow index is a key variable as it captures the interaction
between climate and catchment variables that generates streamflow a given watershed.
To further understand how these dominant variables interact with each other to potentially influence the
drought duration, the normalized interactive minimal depth was plotted between the top 5 variables
(Figure 7). As highlighted before, normalized interactive minimal depth varies from 0 to 1, where 0
indicates highly interactive and 1 being no interaction between the selected variables. In case of regime
1 and 2, the interactive minimal depth between the variables is closer to 1 indicating that there is less
interaction between the dominant variables. However, Base flow index (BFI_AVE) seems to interact with
the other variables and especially with the mean Relief ratio (RR_MEAN) and aspect with respect the
geographical north (ASPECT_NORTH) in regime 1. In case of regime 3, the maximum number of days
in a month with nonzero precipitation (WDMAX_BASIN) interacts with other variables and particularly
with length of streams per square kilometer (STREAMS_KM_SQ_KM) within the catchments. Overall,
these results suggest no significant interaction between the dominant variables, although they have direct
influence on the drought duration.
We further assessed the partial dependence of top 5 dominant variables on drought duration (Figure 8). In
case of regime 1 (Figure 8a), base flow (BFI_AVE) controls the drought duration based on a power law beha-
vior. The relation between baseflow index and drought is often complicated. Higher base flow index can
result in low duration drought events, and as the magnitude of baseflow index increases, it shares a power
law function with the drought duration. In addition to that, the power law behavior extends over the entire
range of drought duration, which suggests a greater control of base flow on higher drought durations. Mean
elevation and percentage of soils with low infiltration rate (HGC) exhibit nonlinear relationships; however,
unlike the case of base flow index, they explain the variability of drought duration partially ranging from 12
to 13 months. In case of regime 2 (Figure 8b), base flow index predominantly controls the drought duration
based on a nonlinear relationship. However, it is interesting to see that the underlying functional relation-
ship does not obey power law, as in the case of regime 1. Other variables, such as, basin compactness
(BASIN_COMPACTNESS), percentage of soils with low infiltration rate (HGC), aspect with respect the
geographical north (ASPECT_NORTHNESS), and temperature variability (TMEAN_SD) also exhibit a
nonlinear and inversely proportional functional dependence on drought duration. In case of regime 3, the
maximum of number of days in a month with nonzero precipitation (WDMAX_BASIN) plays a key role
compared to the base flow index. A left truncated parabolic relationship can be observed, which indicates
a nonlinear control of precipitation intensity on drought duration.
4.3.2. Application to Number of Drought Events
Figure 9 shows the ranking of catchment and climate variables that has potential influence on ND events
within each regime. Overall, 23 variables have more than average minimal depth for regime 1, which
represents catchments with lesser drought occurrences (median NDzs~−1). A total of 13 variables have
more than average minimal depth are selected for regime 2, which represents catchments with moderate
number of drought events, whereas 14 variables have more than average minimal depth for regime 3.
Although, different dominant variables are identified that controls drought duration for each regime;
however, similar variables within each regime dominate both drought duration and drought
event occurrences.
The interaction between the five most dominant variables within each regime based on the number of
drought events is illustrated in Figure 10. Similar to the case of drought duration, lowest interactive depth
was observed in the case of base flow index (BFI_AVE) and it has some interaction with the mean relief ratio
(RR_MEAN) in case of regime 1. However, no such significant interactions observed in case of regime 2 due
to the relatively high ID values.
10.1029/2018WR024620
Water Resources Research
KONAPALA AND MISHRA 12 of 25
Figure 11a illustrates the partial dependence between variables specific to regime 1. It can be observed that
the variables that has potential influence on drought duration also influences drought event occurrences.
The BFI_Ave is inversely proportional to drought occurrences following an exponential relationship. An
increase in base flow likely to increase in ground water contribution to streamflow resulting in lesser num-
ber of droughts. Elevation exhibits an inverse relationship up to 2,000 m and then exhibits a directly propor-
tional relationship till 3,000 m, whereas HGC exhibits a semi parabolic relationship and it can be observed
that other variables do not explain much of the variability of drought event occurrences.
Overall, it was observed that RF modeling framework is flexible to accommodate different functional rela-
tionships between the dominant variables and the number of drought events. In case of regime 2, variability
of temperature (TMEAN_SD) exhibits more dominant behavior in controlling the drought event occur-
rences, whereas BFI_AVE shares an inversely proportional relationship for the same regime. The other three
selected variables exhibit dominant and different functional relationships as shown in Figure 11b. Bulk den-
sity of the soil (BD_AVE) is a key variable that has potential influence on the drought event occurrences in
regime 3 (Figure 11c). However, it does not explain the variability of the entire range of drought event occur-
rences as in the case of other two regimes. WDMAX_BASIN was able to explain the variability of drought
Figure 7. Interactive depth of the top five dominant variables controlling drought duration (DI) for clusters (a) 1, (b) 2,
and (c) 3. (Note: In each figure, the xaxis represents the same variables which are provided as heading in each figure
facet. Reference variables are marked with blue crossing each panel. Higher values indicate lower interactivity with
reference variable.)
10.1029/2018WR024620
Water Resources Research
KONAPALA AND MISHRA 13 of 25
occurrences on the higher end which was previously ignored by the BD_AVE. This highlights the
complementary behavior of the climate and catchment characteristics for controlling the drought
event occurrences.
4.3.3. Application to Drought Intensity
Climate and catchment variables are ranked based on their potential influence on DI (Figure 12). A total
number of 24 variables have more than average minimal depth for regime 1, which represents catchments
with lower drought intensity (median DIzs~−1). In comparison to regime1, a lesser number of influencing
variables were observed for regimes 2 and 3. A total number of 10 and 11 variables have more than average
minimal depth for regimes 2 and 3, which represents catchments with moderate and higher drought inten-
sity, respectively. The type of variables which dominate drought intensity are mostly similar in the case of
drought duration and number of drought events. Overall, it was observed that the majority of the variables
are related to soil, climate and catchment characteristics that has potential influence on drought intensity in
regimes 1–3, respectively.
The interaction between the top five dominant variables within each regime in case of drought intensity is
illustrated in Figure 13. In case of regime 1, none of the dominant variables have shown any significant inter-
actions as the ID values are closer to 1. However, in case of regime 2, temperature variability (TMEAN_SD)
exhibits potential interaction with other dominant variables as ID value is around 0.7. In case of regime 3,
Figure 8. Partial dependence plots of top five dominant variables controlling drought duration (DI) clusters (a) 1, (b) 2,
and (c) 3.
10.1029/2018WR024620
Water Resources Research
KONAPALA AND MISHRA 14 of 25
the average clay content (CLAYAVE) in the basin exhibits significant interactive effects. Among them, the
percentage of soil with high infiltration (HGA) has exhibited significant interaction with CLAYAVE. As
in the case of other drought properties, the interacting effects are significant but to the lesser as exhibited
by the relative higher ID values in case of drought intensity.
Figure 14a illustrates the partial dependence specific to regime 1. The percentage of soils with high infiltra-
tion capacity (HGA) has a dominant role and it shares a directly proportional relationship with drought
intensity. The presence of soils with high infiltration likely to create a competition between ground water
recharge and streamflow. Hence, drier antecedent conditions may result in more intense droughts. On the
other hand, forest cover (FOREST) has an inversely proportional relationship with drought intensity. Base
flow index also influences the drought intensity, but not as significant as in other cases. Mean aspect degree
(Aspect_Degrees) shares a direct proportional relationship with drought intensity.
Figure 9. Same as Figure 7 but in case of number of drought events (NE).
10.1029/2018WR024620
Water Resources Research
KONAPALA AND MISHRA 15 of 25
In case of regime 2, the temperature variability (TMEAN_SD) exhibits the most dominant control on
drought intensity similar to the case of number of drought events. However, the functional relationship is
opposite in nature. The percentage of streams in Strahler's forth order (PCT_4th_ORDER) exhibits an
inverse exponential relationship with the drought intensity. Additional variables, such as, rainfall factor
(R_FACT), silt content (SILT_AVE), and precipitation variability (PRCP_CV) also influences the intensity
of drought. In regime 3, basin compactness (BAS_COMPACTNESS) and base flow index (BFI_AVE) exhibit
dominant control on drought intensity. However, the remaining variables are not as dominant as in the case
of other clusters.
5. Discussion and Outlook
Hydrological drought in a catchment is controlled by the climate characteristics (recharge) and catchment
characteristics (storage). Based on our interpretation framework by using MD, ID, and partial dependence
metrics, the important climate and catchment characteristics that controls hydrological drought character-
istics (number of events, duration, and intensity) are provided in Table 3. It was observed that the catch-
ments for regime 1 are mostly located in the higher elevations or mountainous regions characterized by
the steep sloping terrain (Figure 3). The hydrological drought characteristics for regime 1 are mostly influ-
enced by the catchment characteristics, which includes base flow index, elevation, and soil characteristics
Figure 10. Same as Figure 8 but in case of number of drought events (NE).
10.1029/2018WR024620
Water Resources Research
KONAPALA AND MISHRA 16 of 25
(infiltration rates). Baseflow is influenced by natural factors such as climate, geology, relief, soils, and
vegetation. Factors that promote infiltration and recharge of subsurface storage will increase baseflows,
while factors associated with higher evapotranspiration will reduce baseflow. Therefore, in these
catchments, groundwater drainage moves slowly, which results in prolonged baseflow following rainfall
events and thus being more influential in generating hydrological droughts. Interestingly, the elevation is
a standalone catchment characteristic that plays an important role for drought characteristics in regime 1.
Most of these catchments receive snows during winter months; therefore, the hydrological drought can be
influenced by a combination of rain and snow and depending on the difference between elevation ranges
the timing and intensity of drought can vary among the watersheds.
In addition to elevation, the soil characteristic (e.g., infiltration and hydraulic conductivity) is a key variable
for hydrological droughts for regime 1. In these catchments, subsurface flow generation is directly propor-
tional to the hydraulic conductivity of soils and thus controlling the discharge rate specific to soil types
(e.g., Armbruster, 1976; Musiake et al., 1984; Smith, 1981). In addition, soil properties are known to affect
infiltration, rooting depth/restrictions, available water capacity, soil porosity, and soil microorganism activ-
ity, which influence the streamflow discharge rate (Bennie et al., 2008; Moeslund et al., 2013; Strachan &
Daly, 2017). The moisture storage capacity in soil decreases due to reduced precipitation and high evapotran-
spiration that further reduces baseflow leading to evolution of hydrological droughts in different segments of
the hydrological system. Hence, streamflow generation in base flow dominant streams is strongly influenced
Figure 11. Same as Figure 9 but in case of number of drought events (NE).
10.1029/2018WR024620
Water Resources Research
KONAPALA AND MISHRA 17 of 25
by the subsurface hydrogeologic, configuration, the saturated permeabilities of the component formations,
and the unsaturated soil characteristics of the soil types (Freeze, 1972). Hence, in addition to the
topography, the soil features may control the hydrologic drought properties through these physical
processes. Further, the forest cover influences drought intensity compared to duration and number
of events.
Some of the catchment characteristics that controls hydrologic drought in regime also influence drought
characteristics in regimes 2 and 3. However, there is a clear difference between regime 1 and regimes 2
and 3 in terms of climate control on hydrological droughts. The Climate factors such as precipitation may
have lesser direct influence on hydrological droughts in regime 1, which can be attributed to limited time
available to store water in comparatively higher gradient watersheds as well as possible contribution of snow
for the watersheds located in snowy regions (e.g., northeast and central north watersheds). The climate
Figure 12. Same as Figure 7 but in case of drought intensity (DI).
10.1029/2018WR024620
Water Resources Research
KONAPALA AND MISHRA 18 of 25
variable which has a potential influence on hydrological drought in regimes 2 and 3 includes temperature
and precipitation. For example, temperature can have a direct influence on the development of
hydrological drought in snow dominated regions. The combination of elevation and temperature on
triggering hydrological droughts can vary due to snow dominated regions located in mountain regions.
The role of precipitation characteristics on propagation of hydrological drought is well recognized (Mishra
& Singh, 2010; Mukherjee et al., 2018; Wan et al., 2017). The amount of rainwater held in storage is
different for three regimes, for example, higher elevation areas can hold less rain water compared to low
lying forested areas. The rainfall pattern in semiarid regions (typically western USA) is very irregular
leading to very low storage and increase in hydrological drought.
However, for the humid catchments located in regimes 2 and 3, the soils are mostly saturated due to the ante-
cedent climate conditions that results in a more direct relationship between precipitation, potential evapo-
transpiration, and temperature with hydrologic drought characterization. In addition to precipitation and
temperature, the lower relative humidity can influence rainfall patterns leading to the evolution of hydrolo-
gical drought in regime 3. The role relative humidity on evolution of drought is complex in nature. During
dry hydrologic conditions, the moisture depletes from the upper soil layers leading to decrease in evapotran-
spiration and atmospheric relative humidity (Mishra & Singh, 2010). Further, the reduced relative humidity
reduces the probability of the rainfall, which further triggers hydrological drought (Mishra & Singh, 2010).
Figure 13. Same as Figure 8 but in case of drought intensity (DI).
10.1029/2018WR024620
Water Resources Research
KONAPALA AND MISHRA 19 of 25
In regimes 2 and 3, the stream networks defined by Strahler number has a potential influence on the hydro-
logical drought. An increase in Strahler order can be related to a decline in the catchment's general slope
(Haidary et al., 2015) and potentially increase the storage capacity, groundwater recharge, and baseflow of
the watershed. Therefore, the first order streams that represent the outermost tributaries are typically
located at higher slopes compared to fourth order streams. This suggests that the higher storage capacity
likely to be observed in fourth order stream will have a better control on hydrological drought compared
with first order stream. The first order stream which is usually located at the upper end of channel networks
(Strahler, 1952) comparatively has larger slope likely to drain out excess water immediately following a pre-
cipitation event (McMahon & Finlayson, 2003). As a result, if there is a deficit in precipitation or increase in
evapotranspiration in the catchment, the first order streams likely to facilitate a more direct propagation of
meteorological drought to hydrological drought with no buffer (Godsey & Kirchner, 2014; McMahon &
Finlayson, 2003; Pinna et al., 2004). As a result, the presence of first and fourth stream orders might influ-
ence the drought properties in contrasting ways.
In regimes 2 and 3, the mean aspect degree found to be an important variable in controlling the hydrologic
drought properties. Mean aspect degree is often associated with variability in microclimate, including near‐
surface temperatures, evaporative demand, soil moisture content, and vegetation (Strachan & Daly, 2017;
Srinivasan et al., 2015; Moeslund et al., 2013). As a result, the mean aspect degree, which is a topographic
metric, controls the microclimatic and vegetation features likely to control drought characteristics. It was
Figure 14. Same as Figure 9 but in case of drought intensity (DI).
10.1029/2018WR024620
Water Resources Research
KONAPALA AND MISHRA 20 of 25
observed that in addition to the common processes that control the hydrologic drought characteristics, there
are additional distinct processes observed specific to each regime. This highlights the differential nature of
climate and catchment control on hydrological droughts. Under humid conditions, the evolution of
hydrological droughts in small size catchments can be attributed more to climate characteristics, whereas
for the larger watersheds the storage capacity and baseflow associated with catchment characteristics can
play a dominant role, whereas, for the catchments under severe dry condition, the climate signal can have
less predictive power compared to the storage properties of the watershed.
The application of RF algorithm can provide a better understanding of how the climate and catchment
controls differ for a specific hydrological drought characteristic. For instance, previous studies highlighted
that the BFI, which represents the storage characteristics, plays a key role in controlling the drought dura-
tion (; Van Lanen et al., 2013; Van Loon & Laaha, 2015). Our framework does reconfirm the role of BFI in
case of regimes 1 and 2; however, we further identified two important distinctions. First, the relationship
between BFI with drought characteristics can be nonlinear, and as a result, it cannot be generalized about
increase in drought duration with BFI. Second, the base flow acts as a dominant process mostly for the catch-
ments that witnesses medium and high duration drought events, whereas it has lesser influence for the
catchments with lower drought durations. The linear regression approach may not capture such phenom-
enon as the model parameters are more biased toward high magnitude variables (Hastie et al., 2009).
Our empirical analysis suggests lack of prominent interactions between dominant variables on hydrological
drought propagation. This highlights the fact that dominant drivers of drought characteristics are more addi-
tive and independent in nature. For example, the percentage of soils with high infiltration rate (HGA) and
percentage of fourth order streams (PCT_4th_ORDER) both have dominant control on drought intensity but
have minimal interaction effect. As a result, even though both of these variables control the propagation of
drought independently, the underlying processes for drought propagation have minimum interaction with
each other.
Our results also indicated that even though similar dominant controls exist across different regimes; their
functional relationship with drought characteristics might be different as highlighted in Jencso and
McGlynn (2011) and Knapp et al. (2015). For instance, we identified that the base flow index controls
drought duration for both regimes 1 and 2. However, the functional relationship of base flow with respect
to drought duration is different in regimes 1 and 2. In regime 1, the base flow index exhibits an exponential
relationship with drought duration, where as in the case of regime 2, a different form of nonlinear relation-
ship exists, which does not fit into the traditional exponential functions. Therefore, even though the catch-
ment and climate characteristics exhibit a nonlinear relationship with respect to drought characteristics, the
relationships across regions with different hydrologic characteristics should not be generalized.
Our interpretative modeling framework also highlighted the influence of dominant variables can vary over a
range of drought characteristics. In other words, individual climate (catchment) characteristic can have a
higher (lower) influence on the variability in the upper (lower) range of drought characteristic. For instance,
the bulk density may affect the soil features which control the drought intensity in regime 1. However, it
does not explain the variability of the entire range of drought event occurrences as in the case of other
two regimes. Whereas, WDMAX_BASIN can able to explain the variability of drought occurrences on the
higher end which was previously ignored by the BD_AVE. This highlights the complementary behavior of
the climate and catchment characteristics for controlling the drought event occurrences.
The framework presented in this study introduces valuable interpretability components of RF algorithm in
the context of understanding hydrologic processes. Even though we have applied this framework for under-
standing drought characteristics, there are other frameworks for understanding the black box model inter-
pretability. Among them, individual conditional expectations (Goldstein et al., 2015; Guidotti et al., 2018),
local interpretable model‐agnostic explanations (Ribeiro et al., 2016), and influence functions (Koh &
Liang, 2017), which are recently introduced in machine learning literatures. There is a potential scope to
compare these interpretability frameworks and the quality of machine learning algorithms. We believe
our approach can serve as a preliminary avenue to further delve deeper into the application of interpretative
machine learning frameworks for understanding not only droughts but also other hydrologic processes.
Therefore, further works along these directions might improve our understanding of hydrologic processes
using interpretative machine learning algorithms.
10.1029/2018WR024620
Water Resources Research
KONAPALA AND MISHRA 21 of 25
6. Conclusions
In this study, we applied machine learning methods by integrating fuzzy clustering and RF algorithm to
develop an interpretation framework (i.e., minimal and interactive depth and partial dependence) to
quantify the role of climate and catchment controls on hydrological drought for 652 catchments located in
CONUS. RF algorithm can adequately capture the functional relationship between climate and catchment
characteristics and hydrological droughts. The proposed framework based on MD, ID, and partial
dependence metrics can identify the important climate and catchment characteristics that can further
improve our understanding of the dominant role of climate and catchment characteristics in propagation
of hydrological droughts.
Using a large number of catchments under different climatic regimes enabled us to explore the dominant
control of these land scape control on CONUS hydrological droughts. We conclude that the RF‐based inter-
pretative approach is a simple, robust, and yet powerful way to gain insights into the drivers of hydrological
droughts. The applied framework can provide useful information to understand different combination of
climate and catchment characteristics that can either attenuated or intensify the hydrological droughts.
The following conclusions can be drawn from this study: (i) Three drought regimes are identified based
on their duration, frequency and intensity, which includes Regime 1: droughts with longer duration, less fre-
quent, and lesser intensity; Regime 2: droughts with moderate duration, moderate frequency, and moderate
intensity; and Regime 3: droughts with shorter duration, more frequent, and more intense; (ii) among the
identified regimes, even though some common hydrologic processes control the drought characteristics,
there are some distinct processes specific to each regime; (iii) similar climate, catchment, and morphological
characteristics may exhibit varied functional relationships (i.e., exponential, hyperbolic, and linear) with
drought characteristics located in different regimes; and (iv) the dominant variables may not explain the
variability of the entire range of drought characteristics. From the above insights, we propose that these
issues deserve more attention by integrating the knowledge obtained from the application of machine
learning algorithms in hydroclimatic process (e.g., hydrological drought) and hydrological models used
for such analysis. Although, hydrologic models can able to capture the streamflow with reasonable accuracy,
but it often over (under) estimated the extreme events such as extreme drought events. This implies that a
better understanding of the role of climate and catchment characteristics for the evolution and propagation
of hydrological drought events is essential. The results obtained from our proposed machine learning
framework can complement the ongoing research related to hydrological droughts by better exploitation
of the value of nonclimatic attributes (such as soil, land cover, and geology), and a more systematic
characterization of the uncertainties in catchment attributes needs to performed.
References
Achen, C. H. (1982). Interpreting and using regression (Vol. 29). Sage.
Addor, N., Nearing, G., Prieto, C., Newman, A. J., Le Vine, N., & Clark, M. P. (2018). A ranking of hydrological signatures based on their
predictability in space. Water Resources Research,54(11), 8792–8812.
Apurv, T., Sivapalan, M., & Cai, X. (2017). Understanding the role of climate characteristics in drought propagation. Water Resources
Research,53, 9304–9329. https://doi.org/10.1002/2017WR021445
Armbruster, J. T. (1976). An infiltration index useful in estimating low‐flow characteristics of drainage basins. Journal of Research of the
U. S. Geological Survey,4(5), 533–538.
Bastani, O., Kim, C., & Bastani, H. (2017). Interpreting blackbox models via model extraction. arXiv preprint arXiv:1705.08504.
Bennie, J., Huntley, B., Wiltshire, A., Hill, M. O., & Baxter, R. (2008). Slope, aspect and climate: Spatially explicit and implicit models of
topographic microclimate in chalk grassland. Ecological modelling,216(1), 47–59.
Bibal, A., & Frénay, B. (2016). Interpretability of machine learning models and representations: An introduction. In Proceedings on
ESANN (pp. 77‐82).
Boulesteix, A. L., Janitza, S., Kruppa, J., & König, I. R. (2012). Overview of random forest methodology and practical guidance with
emphasis on computational biology and bioinformatics. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery,2(6),
493–507.
Breiman, L. (2001). Random rorests. Machine Learning,45,5–32. https://doi.org/10.1023/A:1010933404324
Brutsaert, W., & Nieber, J. L. (1977). Regionalized drought flow hydrographs from a mature glaciated plateau. Water Resources Research,
13(3), 637–643.
Campello, R. J., & Hruschka, E. R. (2006). A fuzzy extension of the silhouette width criterion for cluster analysis. Fuzzy Sets and Systems,
157(21), 2858–2875.
Carrillo, G., Troch, P. A., Sivapalan, M., Wagener, T., Harman, C., & Sawicz, K. (2011). Catchment classification: Hydrological
analysis of catchment behavior through process‐based modeling along a climate gradient. Hydrology and Earth System Sciences,15(11),
3411–3430.
Caruana, R., & Niculescu‐Mizil, A. (2006, June). An empirical comparison of supervised learning algorithms. In Proceedings of the 23rd
international conference on Machine learning (pp. 161‐168). ACM.
10.1029/2018WR024620
Water Resources Research
KONAPALA AND MISHRA 22 of 25
Acknowledgments
We very much appreciate Associate
Editor and three reviewer's valuable
comments that helped us improve our
manuscript. This study was supported
by the NSF award 1653841. Any
opinion, findings, and conclusions or
recommendations expressed in this
material are those of the authors and do
not necessarily reflect the views of the
NSF. The authors used GAGES II data
set in this study, and these data sets are
publicly available at https://water.usgs.
gov/GIS/metadata/usgswrd/XML/
gagesII_Sept2011.xml website.
Cayan, D. R., Das, T., Pierce, D. W., Barnett, T. P., Tyree, M., & Gershunov, A. (2010). Future dryness in the southwest US and the
hydrology of the early 21st century drought. Proceedings of the National Academy of Sciences,107(50), 21,271–21,276.
Cutler, D. R., Edwards, T. C. Jr., Beard, K. H., Cutler, A., Hess, K. T., Gibson, J., & Lawler, J. J. (2007). Random forests for classification in
ecology. Ecology,88(11), 2783–2792.
Daly, C., Taylor, G. H., Gibson, W. P., Parzybok, T. W., Johnson, G. L., & Pasteris , P. A. (2000). High‐quality spatial climate data sets for the
United States and beyond. Transactions of the ASAE,43(6), 1957.
Díaz‐Uriarte, R., & De Andres, S. A. (2006). Gene selection and classification of microarray data using random forest. BMC Bioinformatics,
7(1), 3.
Diller, G. P., Alonso‐Gonzalez, R., Kempny, A., Dimopoulos, K., Inuzuka, R., Giannakoulas, G., & Swan, L. (2012). B‐type natriuretic
peptide concentrations in contemporary Eisenmenger syndrome patients: Predictive value and response to disease targeting therapy.
Heart, heartjnl‐2011.
Doshi‐Velez, F., & Kim, B. (2017). Towards a rigorous science of interpretable machine learning. arXiv preprint arXiv:1702.08608.
Efron, B., & Tibshirani, R. J. (1994). An introduction to the bootstrap. Washington, DC: CRC press.
Elshorbagy, A., Corzo, G., Srinivasulu, S., & Solomatine, D. P. (2010a). Experimental investigation of the predictive capabilities of data
driven modeling techniques in hydrology—Part 1: Concepts and methodology. Hydrology and Earth System Sciences,14(10), 1931–1941.
Elshorbagy, A., Corzo, G., Srinivasulu, S., & Solomatine, D. P. (2010b). Experimental investigation of the predictive capabilities of data
driven modeling techniques in hydrology‐Part 2: Application. Hydrology and Earth System Sciences,14(10), 1943–1961.
Fahimi, F., Yaseen, Z. M., & El‐shafie, A. (2017). Application of soft computing based hybrid models in hydrological variables modeling: A
comprehensive review. Theoretical and applied climatology,128(3‐4), 875–903.
Falcone, J. A. (2011). GAGES‐II: Geospatial attributes of gages for evaluating streamflow. US Geological Survey.
Fienen, M. N., Nolan, B. T., Kauffman, L. J., & Feinstein, D. T. (2018). Metamodeling for groundwater age forecasting in the Lake Michigan
Basin. Water Resources Research,54, 4750–4766. https://doi.org/10.1029/2017WR022387
Freeze, R. A. (1972). Role of subsurface flow in generating surface runoff: 1. Base flow contributions to channel flow. Water Resources
Research,8(3), 609–623.
Friedman, J. H., & Meulman, J. J. (2003). Multiple additive regression trees with application in epidemiology. Statistics in medicine,22(9),
1365–1381. https://doi.org/10.1002/sim.1501
Gocic, M., & Trajkovic, S. (2014). Spatiotemporal characteristics of drought in Serbia. Journal of Hydrology,510, 110–123.
Godsey, S. E., & Kirchner, J. W. (2014). Dynamic, discontinuous stream networks: Hydrologically driven variations in active drainage
density, flowing channels and stream order. Hydrological Processes,28(23), 5791–5803.
Goldstein, A., Kapelner, A., Bleich, J., & Pitkin, E. (2015). Peeking inside the black box: Visualizing statistical learning with plots of indi-
vidual conditional expectation. Journal of Computational and Graphical Statistics,24(1), 44–65. https://doi.org/10.1080/
10618600.2014.907095
Guidotti, R., Monreale, A., Ruggieri, S., Turini, F., Giannotti, F., & Pedreschi, D. (2018). A survey of methods for explaining black box
models. ACM Computing Surveys (CSUR),51(5), 93.
Gupta, H. V., & Nearing, G. S. (2014). Debates—The future of hydrological sciences: A (common) path forward? Using models and data to
learn: A systems theoretic perspective on the future of hydrological science. Water Resources Research,50, 5351–5359. https://doi.org/
10.1002/2013WR015096
Haidary, A., Amiri, B. J., Adamowski, J., Fohrer, N., & Nakane, K. (2015). Modelling the relationship between catchment attributes and
wetland water quality in Japan. Ecohydrology,8, 726–737. https://doi.org/10.1002/eco.1539
Haslinger, K., Koffler, D., Schöner, W., & Laaha, G. (2014). Exploring the link between meteorological drought and streamflow: Effects of
climate‐catchment interaction. Water Resources Research,50, 2468–2487. https://doi.org/10.1002/2013WR015051
Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning: Data mining, inference, and prediction. Springer
Science & Business Media.
Hengl, T., Nussbaum, M., Wright, M. N., Heuvelink, G. B., & Gräler, B. (2018). Random forest as a generic framework for predictive
modeling of spatial and spatio‐temporal variables. PeerJ,6, e5518.
Huang, G., Huang, G. B., Song, S., & You, K. (2015). Trends in extreme learning machines: A review. Neural Networks,61,32–48.
https://doi.org/10.1016/j.neunet.2014.10.001
Ishwaran, H., Kogalur, U. B., Chen, X., & Minn, A. J. (2011). Random survival forests for high‐dimensional data. Statistical Analysis and
Data Mining: The ASA Data Science Journal,4(1), 115–132.
Ishwaran, H., Kogalur, U. B., Gorodeski, E. Z., Minn, A. J., & Lauer, M. S. (2010). High‐dimensional variable selection for survival data.
Journal of the American Statistical Association,105, 205–217.
Jencso, K. G., & McGlynn, B. L. (2011). Hierarchical controls on runoff generation: Topographically driven hydrologic connectivity,
geology, and vegetation. Water Resources Research,47, W11527. https://doi.org/10.1029/2011WR010666
Karpatne, A., Atluri, G., Faghmous, J. H., Steinbach, M., Banerjee, A., Ganguly, A., & Kumar, V. (2017). Theory‐guided data science: A new
paradigm for scientific discovery from data. IEEE Transactions on Knowledge and Data Engineering,29(10), 2318–2331.
Knapp, A. K., Carroll, C. J., Denton, E. M., La Pierre, K. J., Collins, S. L., & Smith, M. D. (2015). Differential sensitivity to regional‐scale
drought in six central US grasslands. Oecologia,177(4), 949–957. https://doi.org/10.1007/s00442‐015‐3233‐6
Knapp, A. K., Fay, P. A., Blair, J. M., Collins, S. L., Smith, M. D., Carlisle, J. D., et al. (2002). Rainfall variability, carbon cycling, and plant
species diversity in a mesic grassland. Science,298(5601), 2202–2205.
Koch, J., Stisen, S., Refsgaard, J. C., Ernstsen, V., Jakobsen, P. R., & Højberg, A. L. (2019). Modeling depth of the redox interface at high
resolution at national scale using random forest and residual Gaussian simulation. Water Resources Research,55, 1451–1469.
https://doi.org/10.1029/2018WR023939
Koh, P. W., & Liang, P. (2017, July). Understanding black‐box predictions via in fluence functions. In International Conference on Machine
Learning (pp. 1885‐1894).
Konapala, G., & Mishra, A. (2017). Review of complex networks application in hydroclimatic extremes with an implementation to char-
acterize spatio‐temporal drought propagation in continental USA. Journal of Hydrology,555, 600–620.
Konapala, G., & Mishra, A. K. (2016). Three‐parameter‐based streamflow elasticity model: Application to MOPEX basins in the USA at
annual and seasonal scales. Hydrology and Earth System Sciences,20(6), 2545–2556.
Kotsiantis, S. B., Zaharakis, I., & Pintelas, P. (2007). Supervised machine learning: A review of classification techniques. Emerging Arti ficial
Intelligence Applications in Computer Engineering,160,3–24.
Krishnapuram, R., Joshi, A., Nasraoui, O., & Yi, L. (2001). Low‐complexity fuzzy relational clustering algorithms for web mining. IEEE
transactions on Fuzzy Systems,9(4), 595–607.
10.1029/2018WR024620
Water Resources Research
KONAPALA AND MISHRA 23 of 25
Latt, Z. Z., Wittenberg, H., & Urban, B. (2015). Clustering hydrological homogeneous regions and neural network based index flood esti-
mation for ungauged catchments: an example of the Chindwin River in Myanmar. Water Resources Management,29(3), 913–928.
Ley, R., Casper, M. C., Hellebrand, H., & Merz, R. (2011). Catchment classification by runoff behaviour with self‐organizing maps (SOM).
Hydrology and Earth System Sciences,15(9), 2947–2962.
McMahon, T. A., & Finlayson, B. L. (2003). Droughts and anti‐droughts: The low flow hydrology of Australian rivers. Freshwater Biology,
48(7), 1147–1160.
Mishra, A. K., & Singh, V. P. (2010). A review of drought concepts. Journal of hydrology,391(1‐2), 202–216.
Mishra, A. K., & Singh, V. P. (2011). Drought modeling—A review. Journal of Hydrology,403(1‐2), 157–175.
Moeslund, J. E., Arge, L., Bøcher, P. K., Dalgaard, T., & Svenning, J. C. (2013). Topography as a driver of local terrestrial vascular plant
diversity patterns. Nordic Journal of Botany,31(2), 129–144.
Mukherjee, S., Mishra, A., & Trenberth, K. E. (2018). Climate change and drought: A perspective on drought indices. Current Clima te
Change Reports,4(2), 145–163. https://doi.org/10.1007/s40641‐018‐0098‐x
Musiake, K., Takahasi, Y., Ando, Y., 1984. Statistical analysis on effects of basin geology on river flow regime in mountainous areas of
Japan. Proc. Fourth Cong. Asian & Pacific Reg. Div. Int. Assoc. Hydraul. Res., Bangkok, APD‐IAHR/Asian Institute Technology, vol. 2,
pp. 1141–1150.
Narasimhan, B., & Srinivasan, R. (2005). Development and evaluation of Soil Moisture Deficit Index (SMDI) and Evapotranspiration
Deficit Index (ETDI) for agricultural drought monitoring. Agricultural and Forest Meteorology,133(1‐4), 69–88.
Nourani, V., Baghanam, A. H., Adamowski, J., & Kisi, O. (2014). Applications of hybrid wavelet—Artificial Intelligence models in
hydrology: A review. Journal of Hydrology,514,358–377.
Olden, J. D., Kennard, M. J., & Pusey, B. J. (2012). A framework for hydrologic classification with a review of methodologies and appli-
cations in ecohydrology. Ecohydrology,5(4), 503–518.
Pinna, M., Fonnesu, A., Sangiorgio, F., & Basset, A. (2004). Influence of summer drought on spatial patterns of resource availability and
detritus processing in Mediterranean stream sub‐basins (Sardinia, Italy). International Review of Hydrobiology: A Journal Covering all
Aspects of Limnology and Marine Biology,89(5‐6), 484–499.
Prasad, A. M., Iverson, L. R., & Liaw, A. (2006). Newer classification and regression tree techniques: bagging and random forests for
ecological prediction. Ecosystems,9(2), 181–199.
Probst, P., & Boulesteix, A. L. (2017). To tune or not to tune the number of trees in random forest? arXiv preprint arXiv:1705.05654.
Raghavendra, S., & Deka, P. C. (2014). Support vector machine applications in the field of hydrology: A review. Applied soft computing,19,
372–386.
Rajsekhar, D., Singh, V. P., & Mishra, A. K. (2014). Hydrologic drought atlas for Texas. Journal of Hydrologic Engineering,20(7), 05014023.
Ribeiro, M. T., Singh, S., & Guestrin, C. (2016, August). Why should I trust you?: Explain ing the predictions of any classifier. In Proceedings
of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining (pp. 1135‐1144). ACM.
Rice, J. S., Emanuel, R. E.,Vose, J. M., & Nelson, S. A. (2015). Continental US streamflow trends from 1940 to 2009 and their relationships
with watershed spatial characteristics. Water Resources Research,51, 6262–6275.
Saft, M., Peel, M. C., Western, A. W., & Zhang, L. (2016). Predicting shifts in rainfall‐runoff partitioning during multiyear drought: Roles of
dry period and catchment characteristics. Water Resources Research,52, 9290–9305. https://doi.org/10.1002/2016WR019525
Saft, M., Western, A. W., Zhang, L., Peel, M. C., & Potter, N. J. (2015). The influence of multiyear drought on the annual rainfall‐runoff
relationship: An Australian perspective. Water Resources Research,51, 2444–2463. https://doi.org/10.1002/2014WR015348
Sawicz, K., Wagener, T., Sivapalan, M., Troch, P. A., & Carrillo, G. (2011). Catchment classification: Empirical analysis of hydrologic
similarity based on catchment function in the eastern USA. Hydrology and Earth System Sciences,15(9), 2895–2911.
Schwalm, C. R., Anderegg, W. R., Michalak, A. M., Fisher, J. B., Biondi, F., Koch, G., & Huntzinger, D. N. (2017). Global patterns of drought
recovery. Nature,548(7666), 202–205. https://doi.org/10.1038/nature23021
Scornet, E. (2017). Tuning parameters in random forests. ESAIM: Proceedings and Surveys 60: 144‐162.
Sheffield, J., Wood, E. F., & Roderick, M. L. (2012). Little change in global drought over the past 60 years. Nature,491(7424), 435–438.
https://doi.org/10.1038/nature11575
Shen, C. (2018). A transdisciplinary review of deep learning research and its relevance for water resources scientists. Water Resources
Research,54(11), 8558–8593.
Shortridge, J. E., Guikema, S. D., & Zaitchik, B. F. (2016). Machine learning methods for empirical streamflow simulation: A comparison of
model accuracy, interpretability, and uncertainty in seasonal watersheds. Hydrology and Earth System Sciences,20(7), 2611–2628.
Shukla, S., & Wood, A. W. (2008). Use of a standardized runoff index for characterizing hydrologic drought. Geophysical Research Letters,
35, L02405. https://doi.org/10.1029/2007GL032487
Smith, R. W. (1981). Rock type and minimum 7‐day/10‐year flow in Virginia streams. Virginia Water Resource Research Center, Virginia
Polytechnology Institute and State University, Blacksburg, Bulletin, vol. 116, 43 pp.
Stahl, K., Moore, R. D., Shea, J. M., Hutchinson, D., & Cannon, A. J. (2008). Coupled modelling of glacier and streamflow response to future
climate scenarios. Water Resources Research,44, W02422. https://doi.org/10.1029/2007WR005956
Stoelzle, M., Stahl, K., Morhard, A., & Weiler, M. (2014). Streamflow sensitivity to drought scenarios in catchments with different geology.
Geophysical Research Letters,41, 6174–6183. https://doi.org/10.1002/2014GL061344
Strachan, S., & Daly, C. (2017). Testing the daily PRISM air temperature model on semiarid mountain slopes. Journal of Geophysical
Research: Atmospheres,122, 5697–5715. https://doi.org/10.1002/2016JD025920
Strahler, A. N. (1952). Hypsometric (area‐altitude) analysis of erosional topography. Geological Society of America Bulletin,63(11),
1117–1142.
Strobl, C., Malley, J., & Tutz, G. (2009). An introduction to recursive partitioning: Rationale, application, and characteristics of classifica-
tion and regression trees, bagging, and random forests. Psychological methods,14(4), 323–348. https://doi.org/10.1037/a0016973
Tallaksen, L. M., Hisdal, H., & Van Lanen, H. A. (2009). Space‐time modelling of catchment scale drought characteristics. Journal of
Hydrology,375(3‐4), 363–372.
Van Lanen, H. A. J., Wanders, N., Tallaksen, L. M., & Van Loon, A. F. (2013). Hydrological drought across the world: Impact of climate and
physical catchment structure. Hydrology and Earth System Sciences,17, 1715–1732.
Van Loon, A. F., & Laaha, G. (2015). Hydrological drought severity explained by climate and catchment characteristics. Journal of
Hydrology,526,3–14.
Van Loon, A. F., Tijdeman, E., Wanders, N., Van Lanen, H. J., Teuling, A. J., & Uijlenhoet, R. (2014). How climate seasonality
modifies drought duration and deficit. Journal of Geophysical Research: Atmospheres,119, 4640–4656. https://doi.org/10.1002/
2013JD020383
10.1029/2018WR024620
Water Resources Research
KONAPALA AND MISHRA 24 of 25
Van Loon, A. F., & Van Lanen, H. A. J. (2012). A process‐based typology of hydrological drought. Hydrology and Earth System Sciences,
16(7), 1915–1946.
Veettil, A. V., Konapala, G., Mishra, A. K., & Li, H. Y. (2018). Sensitivity of drought resilience‐vulnerability‐exposure to hydrologic ratios in
contiguous United States. Journal of Hydrology,564, 294–306.
Vicente‐Serrano, S. M. (2006). Spatial and temporal analysis of droughts in the Iberian Peninsula (1910–2000). Hydrological Sciences
Journal,51(1), 83–97.
Vicente‐Serrano, S. M., López‐Moreno, J. I., Beguería, S., Lorenzo‐Lacruz, J., Azorin‐Molina, C., & Morán‐Tejeda, E. (2011). Accurate
computation of a streamflow drought index. Journal of Hydrologic Engineering,17(2), 318–332.
Wan, W., Zhao, J., Li, H. Y., Mishra, A., Ruby Leung, L., Hejazi, M., et al. (2017). Hydrological drought in the Anthropocene: Impacts of
local water extraction and reservoir regulation in the US. Journal of Geophysical Research: Atmospheres,122, 11,313–11,328. https://doi.
org/10.1002/2017JD026899
Wang, D., Hejazi, M., Cai, X., & Valocchi, A. J. (2011). Climate change impact on meteorological, agricultural, and hydrological drought in
central Illinois. Water Resources Research,47, W09527. https://doi.org/10.1029/2010WR009845
Xie, X. L., & Beni, G. (1991). A validity measure for fuzzy clustering. IEEE Transactions on Pattern Analysis & Machine Intelligence,8,
841–847.
Yadav, M., Wagener, T., & Gupta, H. (2007). Regionalization of constraints on expected watershed response behavior for improved pre-
dictions in ungauged basins. Advances in Water Resources,30(8), 1756–1774.
Yevjevich, V. (1967). An objective approach to definitions and investigations of continental hydrologic droughts. Hydrol. Papers 23,
Colorado State University Publication, Colorado State University, Fort Collins, Colorado, USA.
Yoo, J., Kwon, H. H., Kim, T. W., & Ahn, J. H. (2012). Drought frequency analysis using cluster analysis and bivariate probability
distribution. Journal of Hydrology,420, 102–111.
Zhang, C., & Ma, Y. (2012). Ensemble machine learning: Methods and applications. Springer Science & Business Media.
Zhang, Q., Xiao, M., Singh, V. P., & Li, J. (2012). Regionalization and spatial changing properties of droughts across the Pearl River basin,
China. Journal of Hydrology,472, 355–366.
10.1029/2018WR024620
Water Resources Research
KONAPALA AND MISHRA 25 of 25
A preview of this full-text is provided by Wiley.
Content available from Water Resources Research
This content is subject to copyright. Terms and conditions apply.