ArticlePDF Available

Quantifying Climate and Catchment Control on Hydrological Drought in the Continental United States

Water Resources Research

January 2020
56(1)

DOI:10.1029/2018WR024620

Authors:

Goutam Kumar

Goddard Space flight Center-NASA

Ashok Mishra

Texas A&M University

The evolution of hydrological drought events is a result of complex (nonlinear) interactions between climate and catchment processes. To investigate such nonlinear relationship, we integrated a machine learning modeling framework based on the random forest (RF) algorithms with an interpretation framework to quantify the role of climate and catchment controls on hydrological drought. More particularly, our framework interprets a built RF machine‐learning model to identify dominant variables and visualize their functional dependence and interaction effects on hydrological drought characteristics utilizing concepts of minimal depth, interactive depth, and partial dependence. We test our proposed modeling framework based on a set of 652 continental United States catchments with minimal human interference for a period of 1979–2010. Application of this framework indicated presence of three distinct drought regimes, which includes, Regime 1: droughts with longer duration, less frequent and lesser intensity; Regime 2: droughts with moderate duration, moderate frequency, and moderate intensity; and Regime 3: droughts with shorter duration, more frequent, and more intense. RF algorithm was able to accurately model the drought characteristics (intensity, duration, and number of events) for all the three drought regimes as a function of selected variables. It was observed that the type of dominant variables as well as their nonlinear functional relationship with hydrological droughts characteristics can vary between three selected regimes. Our interpretation framework indicated that catchment characteristics have a significant role in controlling the hydrologic drought for catchments (regime 1), whereas both climate and catchment characteristics control hydrological drought in regimes 2 and 3.

Spatial locations of catchments which were selected in this study based on minimal human interference and nonmissing data criterion.

…

An illustration of random forest algorithm used to model the influence of dominant variables on hydrologic droughts for CONUS.

…

A pictorial illustration of concept of maximal subtrees, minimal depth, and interactive. The maximal trees are indicated by red, and the depth is indicated by an integer located in the center of tree with the root node as zero. In the first tree (Figure 3a), the variable v splits the root node; therefore, the entire tree can be considered as a maximal subtree for the variable v, whereas, in the second tree (Figure 3b), the maximal subtree for variable v is not the entire tree as exhibited in the previous scenario. This is because the variable v does not split on the root node unlike the previous case. Figure 3c presents another scenario with two maximal subtrees for variable v. The maximal subtree on the left side has a depth of two, whereas on the right side it has a depth of one.

…

Characteristics of the fuzzy clustered drought regimes. (a) The variation of fuzzy silhouette index (FSI) and Xie and Beni (XB) index values according to the number of clusters with 3 being the optimal number of clusters. (b) The box plots of Z scores of considered drought characteristics in each cluster. (c) The number of catchments belonging to each cluster (d) provides the kernel density estimates of the probability distribution of the drought characteristics specific to each regime.

…

Spatial distribution of the three drought regimes located in CONUS.

…

Figures - available from: Water Resources Research

This content is subject to copyright. Terms and conditions apply.

Content uploaded by Goutam Kumar

Content may be subject to copyright.

Quantifying Climate and Catchment Control on

Hydrological Drought in the Continental

United States

Goutam Konapala

1,2

and Ashok Mishra

Glenn Department of Civil Engineering, Clemson University, Clemson, SC, USA,

Environmental Sciences Division, Oak

Ridge National Laboratory, Oakridge, TN, USA

Abstract The evolution of hydrological drought events is a result of complex (nonlinear) interactions

between climate and catchment processes. To investigate such nonlinear relationship, we integrated a

machine learning modeling framework based on the random forest (RF) algorithms with an interpretation

framework to quantify the role of climate and catchment controls on hydrological drought. More

particularly, our framework interprets a built RF machine‐learning model to identify dominant variables

and visualize their functional dependence and interaction effects on hydrological drought characteristics

utilizing concepts of minimal depth, interactive depth, and partial dependence. We test our proposed

modeling framework based on a set of 652 continental United States catchments with minimal human

interference for a period of 1979–2010. Application of this framework indicated presence of three distinct

drought regimes, which includes, Regime 1: droughts with longer duration, less frequent and lesser

intensity; Regime 2: droughts with moderate duration, moderate frequency, and moderate intensity; and

Regime 3: droughts with shorter duration, more frequent, and more intense. RF algorithm was able to

accurately model the drought characteristics (intensity, duration, and number of events) for all the three

drought regimes as a function of selected variables. It was observed that the type of dominant variables as

well as their nonlinear functional relationship with hydrological droughts characteristics can vary between

three selected regimes. Our interpretation framework indicated that catchment characteristics have a

signiﬁcant role in controlling the hydrologic drought for catchments (regime 1), whereas both climate and

catchment characteristics control hydrological drought in regimes 2 and 3.

1. Introduction

Hydrologic drought events deﬁned as a period with inadequate surface and subsurface water resources are a

result of multifaceted interaction between climate and catchment processes (Mishra & Singh, 2010; Van

Lanen et al., 2013; Van Loon et al., 2014; Wang et al., 2011). Therefore, hydrologic drought not only depends

on decrease in precipitation or increase in temperature, but it is further inﬂuenced by the interaction of

various climate and terrestrial components (e.g., soil characteristics, elevation, and stream order). An

inadequate understanding of this complexity can be a major challenge for accurate prediction as well as

efﬁcient drought management (Cayan et al., 2010; Mishra & Singh, 2011; Narasimhan & Srinivasan, 2005;

Shefﬁeld et al., 2012). To address this complex hydrological drought processes, many studies have investi-

gated the potential inﬂuence of terrestrial catchment characteristics on hydrological droughts by utilizing

physically based models (Apurv et al., 2017; Tallaksen et al., 2009; Van Loon et al., 2014; Van Loon &

Laaha, 2015; Van Loon & Van Lanen, 2012). However, the application of physically based models for

catchments is often plagued by differences in spatial scale, over/underparameterization, and model

structural error, including model calibration uncertainties.

A few studies have utilized a linear regression‐based framework (Saft et al., 2016, 2015; Van loon et al., 2014;

Van Loon & Laaha, 2015; Van Loon & Van Lanen, 2012) to understand the role of climate and terrestrial

components in the development of hydrological drought. On the other hand, many studies suggested the

response of streamﬂow to meteorological conditions is predominantly nonlinear in nature (Konapala &

Mishra, 2016; Latt et al., 2015; Stahl et al., 2008). Therefore, we expect that hydrological drought character-

istics derived based on streamﬂow likely to have a nonlinear dependence due to the complex interaction

between climate and catchment processes within a watershed. In addition to that, the evolution of

RESEARCH ARTICLE

10.1029/2018WR024620

Special Section:

Big Data & Machine Learning

in Water Sciences: Recent

Progress and Their Use in

Advancing Science

Key Points:

•An integrated random forest

algorithm interpretation framework

was applied to investigate

hydrological drought characteristics

in CONUS

•This framework indicated the

presence of three drought regimes

which witnesses dominant climate

and catchment controls

•The dominant climate and

catchment controls exhibit varied

functional relationships with

hydrological droughts

Supporting Information:

•Supporting Information S1

•Table S1

Correspondence to:

A. Mishra,

ashokm@g.clemson.edu

Citation:

Konapala, G., & Mishra, A. (2020).

Quantifying climate and catchment

control on hydrological drought in the

continental United States. Water

Resources Research,56,

e2018WR024620. https://doi.org/

10.1029/2018WR024620

Received 18 DEC 2018

Accepted 27 NOV 2019

Accepted article online 11 DEC 2019

KONAPALA AND MISHRA 1of25

hydrological drought is often clustered based on neighboring catchments due to the similarity in climate and

catchment characteristics (Rajasekhar et al., 2014; Zhang et al., 2012). Hence, it is important to gain a deeper

understanding of the dominant linear and nonlinear controls resulting in distinct drought regimes using

robust nonparametric techniques. Therefore, there is a great potential to further quantify the nonlinear

association between climate (catchment) variables and the evolution of clustered drought events based on

nonparametric techniques, machine learning algorithms and interpretive framework.

Machine learning algorithms are a class of nonparametric techniques that can successfully capture subtle

functional relationship between the input (e.g., precipitation, evaporation, and base ﬂow) and the output

variables (e.g., streamﬂow) of a hydrologic system (e.g., watersheds), even if the underlying mechanism pro-

ducing data is not known (Elshorbagy et al., 2010a, 2010b; Nourani et al., 2014; Raghavendra & Deka, 2014).

In addition to that, these methods have no distributional or functional assumptions on covariate relation to

the response function. Hence, majority of the studies in hydrology have utilized machine learning algo-

rithms for prediction purposes in hydrology (Shen, 2018). However, the formulation of machine learning

algorithms may not be straightforward to quantify underlying mechanisms responsible for model behavior

in case of hydrologic processes (Gupta & Nearing, 2014; Karpatne et al., 2017). Recognizing these issues in

machine learning algorithms, recently several studies have introduced interpretation frameworks [see

Guidotti et al., 2018, for review] to address such limitations. Works on interpreting these black‐box models

have focused on understanding how a ﬁxed machine‐learning model leads to particular predictions. These

interpretation frameworks can provide a deeper understanding on the functioning of machine learning

models like artiﬁcial neural networks, random forests (RF), and support vector machines (Bastani et al.,

2017; Bibal & Frénay, 2016; Doshi‐Velez & Kim, 2017). Although, the machine learning approaches are

widely used in hydroclimatology (Veettil et al., 2018; Fahimi et al., 2017; Shortridge et al., 2016;

Raghavendra & Deka, 2014), the interpretation framework to quantify the causal relationship between input

and modeled outputs is emerging (Fienen et al., 2018; Koch et al., 2019; Schwalm et al., 2017) and especially

not applied to extreme events.

The above discussion suggests that a limited research conducted to investigate the dominant nonlinear inﬂu-

ence of the climate as well as catchment characteristics on evolution of clustered hydrological drought

regimes. Therefore, we followed two‐step approach to improve our understanding of heterogeneous nature

of drought characteristics over CONUS: (i) First, a classiﬁcation algorithm was applied to identify optimal

number of clusters associated with drought regimes and (ii) an interpretative modeling framework was

applied within individual drought regimes to identify key climate and catchment characteristics that has

potential inﬂuence on the hydrological drought characteristics. For this purpose, we selected 652 watersheds

located in the CONUS due to the availability of abundant hydrologic, physical, soil, and geomorphic

information with least human interference from Geospatial Attributes of Gages for Evaluating Streamﬂow

Version 2 (GAGES II) database (Falcone, 2011). Overall, we aim to address following questions:

1. How are the hydrological drought regimes clustered in the CONUS? What are the key climate and

catchment characteristics that control hydrological drought regimes?

2. To identify and extract the functional relationships and interactions among dominant variables inﬂuen-

cing the hydrological drought characteristics based on the interpretive machine learning techniques (i.e.,

minimal depth, interactive depth and partial dependence plots).

The remainder of the manuscript is organized as follows: Section 2 provides an overview of data, study area,

section 3 presents the methods designed for this study, section 4 presents the results, section 5 discusses the

ﬁndings and the outlook, and ﬁnally, the manuscript is concluded with section 6.

2. Data and Study Area Description

We selected the catchments located in CONUS due to the availability of extensive and open source data

associated with various characteristics of catchments. In addition, to understand the dominant variables

associated with different drought regimes, it is important to utilize data from catchments with minimal

human interference. Therefore, we ﬁrst identiﬁed catchments with minimal human interference based on

the GAGES II database (Falcone, 2011), which provides geospatial data for 9,322 stream gages maintained

by USGS. This data set serves the purpose of providing users with a comprehensive set of geospatial charac-

teristics for many gaged catchments with long ﬂow record. In addition to that, it also provides information

10.1029/2018WR024620

Water Resources Research

KONAPALA AND MISHRA 2of25

on catchments which are least disturbed by human inﬂuences. In this database, 2,057 catchments are iden-

tiﬁed to have minimal human interference based on three criteria: (1) a quantitative index of anthropogenic

modiﬁcation within the catchment based on Geographical Information system derived variables, (2) visual

inspection of every stream gage and drainage basin from recent high‐resolution imagery and topographic

maps, and (3) information about human inﬂuences from USGS Annual Water Data Reports (Falcone,

2011). We have selected water years of 1980–2011 to represent the U.S. climate normal period as our study

period to reﬂect the current climate conditions. Overall, we identiﬁed 652 catchments with no missing data

during the period of 1980–2011. The spatial location of catchments with minimal human interference and

continuous streamﬂow data are shown in Figure 1.

2.1. Overview of Selected Climate and Catchment Variables

A lack of precipitation and increase in evapotranspiration (i.e., meteorological drought) causes low soil

moisture content (i.e., agricultural drought), which further reduces surface and subsurface water resources

(i.e., hydrological drought) (Mishra & Singh, 2010; Mukherjee et al., 2018). The propagation of meteorologi-

cal to hydrological drought is inﬂuenced by interaction between climate and catchment variables (Apurv

et al., 2017; Haslinger et al., 2014; Mishra & Singh, 2010; Tallaksen et al., 2009; Van Loon et al., 2014).

The hydrological drought is directly related to the streamﬂow generated in a watershed, and it is inﬂuenced

(controlled) by climate and catchment characteristics of the selected watershed. In our analysis, we selected

sixty variables related to climate, catchment, and morphological aspects of catchments documented by the

GAGES II data set (Table S1 in the supporting information). Among them, 12 climate variables describing

the annual magnitude and intraannual variability of precipitation, temperature, and potential evapotran-

spiration, and these data are obtained from the high‐resolution data available from PRISM database (Daly

et al., 2000). Fifteen hydrologic catchment variables related to stream order, base‐ﬂow index, and over‐land

ﬂow are derived from the U.S. National Hydrography Data Set (NHD). Four land cover variables describing

the percentage of different land cover types are derived from 2006 Land cover product obtained from

National Land Cover Database. Twenty‐three soil characteristics are derived from State Soil Geographic data

base for the CONUS. Finally, six topographic variables related to elevation, slope, and geographical aspect

features of the catchments are included in the analysis. A brief discussion and data sources of the selected

variables are provided in Table S1. The interplay between these catchment characteristics are assumed to

shape catchment behavior by inﬂuencing how catchments store and transfer water. The variables selected

and provided in this database are considered to signiﬁcantly affect the hydrologic processes. Some of these

catchment attributes have been previously used for predicting mean streamﬂow (Rice et al., 2015) and other

streamﬂow signatures (Addor et al., 2018) and drought (Stoelzle et al., 2014). In addition to previous

variables, we have selected multiple attributes to cover a wide range of features, such as the catchment

climate, hydrology, land cover, soil, geology, topography, and river network.

3. Methodology

3.1. Hydrological Drought Characterization

Hydrological drought often expressed a time period with inadequate surface and subsurface water resources

with respect to a normal condition of a given water resources management system (Mishra & Singh, 2010).

Therefore, we applied the concept of Standardized Streamﬂow Index (SSI) (Shukla & Wood, 2008; Vicente‐

Serrano et al., 2011) to characterize hydrological drought at monthly time scale for selected watersheds

across USA. SSI can be computed for multiple timescales and is ﬂexible to determine the drought conditions

at seasonal (3 to 6 months), annual (12 months), and longer (>12‐month SSI) time scales. However, in this

study, we restrict our analysis to seasonal scale as the droughts usually take 3 or more months to develop.

Therefore, we calculated the 3‐month SSI by aggregating streamﬂow over 3 months and ﬁtting these accu-

mulated values to a parametric statistical distribution. The probabilities from these ﬁtted distributions are

then transformed to the standard normal distribution to create hydrological drought index [Vicente‐

Serrano et al., 2011; Modarres, 2008; Shukla & Wood, 2008]. Therefore, SSI determines the conditions of

stream ﬂow drought relative to the long‐term monthly streamﬂow. The positive SSI values indicate a surplus

relative to the long‐term streamﬂow conditions whereas the negative values indicate a deﬁcit (i.e., hydrolo-

gic drought). Hydrological drought indices similar to SSI have been previously applied to understand the

U.S. hydrological drought characteristics (Shukla & Wood, 2008; Veettil et al., 2018). Shukla and Wood

10.1029/2018WR024620

Water Resources Research

KONAPALA AND MISHRA 3of25

(2008) reported that the two‐parameter gamma and lognormal distributions generally performed well for

deriving hydrological drought in USA. In this study, lognormal distribution was selected for deriving

hydrological drought index based on SSI by using streamﬂow time series. The formulation of SSI is

presented in the supporting information Text S1.

3.2. Classiﬁcation of Hydrological Drought Regimes

The SSI time series were constructed for all the 652 catchments to investigate the hydrological droughts. In

this study, we quantiﬁed hydrological drought when SSI < −0.5 for a period of more than 3 months. By

extracting drought events based on these two conditions we can differentiate the hydrological droughts from

seasonal streamﬂow ﬂuctuations (Konapala & Mishra, 2017; Mishra & Singh, 2010, 2011). Once all the

drought events based on the above conditions are extracted, we computed the average drought duration,

severity and number of events based on the well‐established theory of runs (Yevchevich, 1967). The number

of drought events is the total number of times the drought has occurred based on the above‐deﬁned thresh-

olds. The average duration of drought per event is determined by dividing total number of drought months

by number of drought events as shown in equation (1)

DD ¼∑ND

i¼1Di

ND (1)

where D

is duration of single drought event. Similarly, the average drought intensity is calculated by ﬁrst

estimating the intensity (S) of each drought event as

S¼∑D>3

SRI<0 SSI

D(2)

Its average over the period is calculated based on equation (3),

DI ¼∑ND

i¼1S

ND (3)

By applying the procedure of multivariate clustering, we can possibly distinguish the evolution of hydrolo-

gical drought regimes exhibited by catchments based on the long‐term (~30 year) statistics of drought

characteristics (Rajsekhar et al., 2014; Gocic & Trajkovic, 2014; Yoo et al., 2012). The clustering‐based

Figure 1. Spatial locations of catchments which were selected in this study based on minimal human interference and

nonmissing data criterion.

10.1029/2018WR024620

Water Resources Research

KONAPALA AND MISHRA 4of25

approaches typically used in hydrological studies are based on hierarchical clustering, k‐means/medoids

clustering, and fuzzy partition clustering (Carrillo et al., 2011; Ley et al., 2011; Olden et al., 2012; Sawicz

et al., 2011; Yadav et al., 2007). Since, we aim to investigate dominant controls of catchment characteristics

deﬁned by drought regimes; we applied a fuzzy partitioning algorithm that accounts for uncertainty in the

classiﬁcation process.

We identiﬁed drought regimes based on the three drought characteristics (i.e., intensity, duration, and num-

ber of events) using a fuzzy medoid clustering algorithm. Fuzzy clustering assigns membership values, and it

is more generalized and useful to describe a point by its membership values in all the clusters. The method

chosen for this study is fuzzy k medoids clustering algorithm introduced by Krishnapuram et al. (2001),

which is usually more robust, and the effect of outliers can be signiﬁcantly reduced compared to other

clustering algorithms that uses mean values for classiﬁcation. Hence, the data objects closer to the median

of clusters as determined by Euclidean distance likely to have higher degrees of membership compared to

objects scattered around the limits of clusters. Similar to other clustering algorithms, fuzzy k‐medoids

follows a heuristic approach to minimize the within cluster variance. The formulation of this approach is

presented in the supporting information Text S2.

Xie and Beni (XB) index (Xie & Beni, 1991) is a widely used criterion for quantifying the quality of fuzzy clus-

tering. In order to complement XB index, we included fuzzy silhouette (FS) index to measure the similarity

of an object with respect to its own cluster (cohesion) compared to other clusters (separation). Therefore, we

utilized the fuzzy extension of silhouette index (Campello & Hruschka, 2006) as the second criterion to eval-

uate the optimal number of clusters in this study. These two indices can complement each other by capturing

the similarity of an object with respect to its own cluster (FS index) and the compactness of the clusters (XB

index). Higher the value of Silhouette index, more optimal is the resultant clustering. This is given

by equation (4)

FS kðÞ¼

∑n

i¼1uig−uig′



sikðÞ

∑n

i¼1uig−uig′

 (4)

where s

is the silhouette index for object i. Whereas, the XB index measures the compactness of the clusters

and it is especially formulated for evaluation of fuzzy clustering performance. This is given by equation (5) as

XB kðÞ¼

∑n

i¼1∑k

g¼1um

ig d2xi;mg



n×ðmin

g;g′g≠g′ðÞ

d2mg;mg′

 (5)

where

X¼xij



:drought characteristics of order n×tðÞU¼uig



:membership degree of order k×tðÞd2

Mxi;mg



¼‖xi−mg‖:Euclidean distance mg;g¼1;…;k



⊂xi;i¼1;…;nfg

:Medoids of the drought characteristics

Smaller the Xie‐Beni index, more compact is the cluster. Therefore, each catchment is assigned to a speciﬁc

class with a certain probability, and the catchments with highest probability are considered as primary clus-

ters for subsequent analysis. The resulting clusters based on these trivariate drought characteristics (inten-

sity, duration, and number of events) are a consequence of natural partitions identiﬁed by the clustering

algorithm. The drought characteristics in each cluster would indicate a distinct drought regime that can pro-

vide valuable information on the controls of climate and catchment characteristics on hydrological droughts.

3.3. RF Model

In this study, we utilize RF algorithm (Breiman, 2001) to investigate the dominant catchment and climate

variables that plays an important role for evolution of clustered drought characteristics. It is important to

acknowledge that selection of an algorithm depends on the objectives and the types of data to be analyzed

(Caruana & Niculescu‐Mizil, 2006; Huang et al., 2015; Kotsiantis et al., 2007). The RF algorithms differ from

linear regression methods. In this study, we used nonlinear RF model and it has a major advantage that they

are (mostly) unaffected by multicollinearity (Ishwaran et al., 2010; Zhang & Ma, 2012; Díaz‐Uriarte & De

10.1029/2018WR024620

Water Resources Research

KONAPALA AND MISHRA 5of25

Andres, 2006). The multicollinearity problem is alleviated since a random subset of features is chosen for

each tree in a RF. (Hsilch et al., 2014; Ishwaran et al., 2010; Zhang & Ma, 2012; Díaz‐Uriarte & De

Andres, 2006). The ability of RF algorithm to deal with overﬁtting issues makes it suitable for

our application.

RF algorithm uses a set of bootstraps (Efron & Tibshirani, 1994) samples and grows an independent tree

model on each bootstrapped sample of the population. Each tree is grown by recursively partitioning the

population with an objective to minimize the mean square errors. At each split, a subset of candidate vari-

ables is tested for the split optimization and each node is divided into two successor nodes. Each successor

node is then split again until the process reaches the stopping criteria of either maximum node purity or

node member size, which deﬁnes the set of terminal (unsplit) nodes for the tree. RF algorithm then ranks

each training set observation into one unique terminal node per tree. The RF estimate for each observation

is then calculated by averaging the terminal node results across the collection of trees. A basic pseudo‐

algorithm explaining the RF procedure is presented in Table 1 and Figure 2. The resampling and averaging

procedure circumvents the problem of overﬁtting and multicollinearity making this approach suitable for

our study (Cutler et al., 2007; Díaz‐Uriarte & De Andres, 2006; Prasad et al., 2006; Zhang & Ma, 2012). RF

algorithm can be tuned to reduce the prediction error (Boulesteix et al., 2012; Breiman, 2001; Strobl et al.,

2009). The accuracy of RF algorithm output mainly depends on three parameters (1) the number of trees

(ntrees) to grow in the forests, (2) the number of randomly selected predictor variables (mtry) at each node,

and (3) the minimal number of observations at the terminal nodes (nodesize) of the trees. We set the number

of trees (ntrees) to 1,000 as suggested by Hengl et al. (2018) and Probst and Boulesteix (2017), and we

randomly resampled different combinations of parameter sets with “mtry”ranging from one to total

variables considered (60 variables) and “nodesize”ranging from one to total number of catchments in each

regime. The combination of “mtry”and “nodesize”are selected based on the least out‐of‐bag error is

considered as the optimal parameter.

3.4. Framework for Interpreting RF Algorithm

We interpreted the RF model by examining three important features exploring variable importance, variable

interaction and partial dependence. Variable importance and interaction are based on maximal trees and

minimal depth concept (Ishwaran et al., 2010), whereas partial dependence is estimated by integrating the

effects of all the variables besides the covariate of interest (Breiman, 2001). The concept of minimal depth

would allow us to identify the dominant variables, whereas the partial dependence quantiﬁes the approxi-

mate relationship between each dominant variable and the drought characteristic. The concept of interac-

tion depth would allow us to understand the interaction among dominant controls of climate, catchment

and morphological variables related to a particular drought characteristic.

3.4.1. Minimal Depth

The concept of minimal depth [Diller et al., 2012; Hsisch et al., 2011; Ishwaran et al., 2010] is useful for asses-

sing the variable importance and variable interactions within a RF modeling framework. The concept of

minimal depth of a RF can be formulated precisely in terms of a maximal subtree. The maximal subtree

for a variable vis the largest subtree whose root node is split based on the changes in variable v. The shortest

distance from the root of the tree to the root of the closest maximal subtree of vis the minimal depth of v.

The minimal depth for any variable vcan be expressed as

MD vðÞ¼

∑NRF

i¼1TvðÞ

NRF

(6)

where N

is the total of number of trees (i.e., ntrees = 1,000), ‖T(v)‖represents the distance of variable v

from the root of any tree T. To illustrate this concept of maximal tree and minimal depth of variable v,we

show three separate trees (Figure 3) representing three randomized trees to mimic the behavior of RFs. In

this way, depth of variable vfor all the maximal subtrees are identiﬁed and averaged across all the rando-

mized trees to calculate the (minimal depth) MD(v). A smaller MD(v) value indicates that the corresponding

variable vis more inﬂuential. Those variables with averaged minimal depth exceeding the average minimal

depth threshold are treated as noisy and therefore removed from the ﬁnal model.

10.1029/2018WR024620

Water Resources Research

KONAPALA AND MISHRA 6of25

Figure 2. An illustration of random forest algorithm used to model the inﬂuence of dominant variables on hydrologic droughts for CONUS.

Figure 3. A pictorial illustration of concept of maximal subtrees, minimal depth, and interactive. The maximal trees are indicated by red, and the depth is indicated

by an integer located in the center of tree with the root node as zero. In the ﬁrst tree (Figure 3a), the variable vsplits the root node; therefore, the entire tree

can be considered as a maximal subtree for the variable v, whereas, in the second tree (Figure 3b), the maximal subtree for variable vis not the entire tree as

exhibited in the previous scenario. This is because the variable vdoes not split on the root node unlike the previous case. Figure 3c presents another scenario with

two maximal subtrees for variable v. The maximal subtree on the left side has a depth of two, whereas on the right side it has a depth of one.

10.1029/2018WR024620

Water Resources Research

KONAPALA AND MISHRA 7of25

3.4.2. Interactive Depth

Dominant controls identiﬁed by the concept of minimal depth can potentially identify the effect of each

independent variable on the drought characteristics; however, it ignores the interaction effect with respect

to other variables. For instance, drought propagation might be inﬂuenced by two or more interacting vari-

ables in a speciﬁc regime. Therefore, an interactive minimal depth metric that measures the interactions

between any two variables vand wis needed (Diller et al., 2012; Hsisch et al., 2011; Ishwaran et al., 2010).

For this purpose, we ﬁrst deﬁne the variable interactive distance ‖MT(v,w)‖, which represents the distance

between variables vand wfrom the root of any maximal tree (MT). Since, the maximal tree depths signiﬁ-

cantly vary across each randomized tree (Figure 3), a standardization procedure needs to be applied. The

interactive depth can be formulated as

ID v;wðÞ¼

∑NRF

i¼11−

MT v;wðÞ

MTD vðÞ

NRF

(7)

where N

is the total of number of trees (i.e., ntrees = 1,000), ‖MT(v,w)‖represents the distance between

variable vand wfrom the root of any maximal tree MT, and MTD(v) is the depth of maximal subtree

MT(v). Based on the formulation, the ID(v,w) has a range between 0 to 1. Among them, the interactive depth

(ID) values closer to zero indicates higher interaction between any two considered variables. Figure 3c

illustrates these interactions between variables vand w, where the right maximal subtree of variable vand

wsplits further inside the subtree. If this concept is observed over all the randomized trees, then there is a

signiﬁcant interaction between variables vand wand they collectively inﬂuence a prediction outcome.

3.4.3. Partial Dependence

The concept of partial dependence can quantify the functional relationship between dominant variables and

drought characteristics. Partial dependence is assessed by integrating the effects of all the variables beside

the covariate of interest (Breiman, 2001; Friedman & Meulman, 2003). Partial dependence of a variable x

can be estimated by averaging over the input variables {X

,i=1,…,n} with ﬁxed x

fkxk

ðÞ¼

n∑n

i¼1b

i;Ck;xk

 (8)

whereb

frepresents the outputs based on the RF models. This partial dependence estimate can be visualized to

understand the functional relationship between the variables (x

) and their potential inﬂuence on hydrolo-

gical droughts. As the RF algorithm randomly resamples the variables for bagging the trees, we run each

model 1,000 times and then average the minimal and interactive depth variables to interpret and identify

the dominant variables.

4. Results and Discussions

4.1. Classiﬁcation of Drought Regimes

The fuzzy k medoids clustering approach was applied to 652 catchments to classify the drought regimes

based on drought intensity (DI), drought duration (DD), and number of events. First, we identiﬁed the opti-

mal number of regimes based on fuzzy silhouette (FS) index andXB index. Figure 4a shows the behavior of

FS and XB indices with respect to the number of regimes. It was observed that the optimal number of clusters

appears to be three based on the maximum and minimum value of FS and XB, respectively. Therefore, we

consider the optimal number of clusters as three for further analysis.

The drought characteristics (i.e., DI, DD, and ND) for three selected drought regimes are shown in

Figures 4b and 4d. Since the units of DI, DD, and number of drought (ND) are different, we applied the con-

cept of Z score to standardize and compare the drought characteristics for three selected regimes. Z score

measures the standard deviation of the sample data points from their population average. The boxplots with

Z score are plotted in Figure 4b, so that the drought characteristics can be compared among the identiﬁed

regimes. The number of catchments representing each regime are shown in Figure 4c. The absolute values

of drought characteristics for each of the drought regime is plotted as probability distribution as shown in

Figure 4d. Regime 1 is represented by 142 catchments with longer droughts (median DDzs~1), lower

drought intensities (median DIzs~−0.75) and occurrences with median NDzs~−1. The magnitude of DD

10.1029/2018WR024620

Water Resources Research

KONAPALA AND MISHRA 8of25

for regime 1 varies between 7 and 20 months, whereas the magnitude of DI and ND varies from 0.4 to 0.8 and

5 to 20, respectively. Regime 2 is represented by 242 catchments that exhibit relatively moderate drought

characteristics with median z scores close to 0 (Figure 4b). The magnitude of DD for the catchments

located in regime 2 varies between 7 and 12 months, DI within the range of 0.5 to 0.9, and ND within the

range of 15 to 25. Higher number of catchments (total: 268) is located in regime 3, which represents low

drought duration (median DDzs~−0.8) occurring frequently (median NDzs~0.75) with higher intensity

(median DIzs~1). The catchments located in regime 3 witness droughts with duration between 5 and

8 months, intensity varies between 0.7 and 1.2 and frequency between 20 and 30 (Figure 4d).

The spatial locations of catchments for three drought regimes are shown in Figure 5. The catchments

located in Paciﬁc North West, parts of north eastern, and central USA represent regime 1, with drought

characteristics of longer duration but are less intense and occur less frequently. Whereas, the catchments

representing regime 2 with moderate drought characteristics are in different parts of CONUS, and the

catchments representing regime 3 are mostly located in north central and eastern USA including

watersheds in paciﬁc North West region. Overall, it was observed that the spatial proximity between

the catchments does play a considerable role in the clustering of regimes, which is probably due to similar

climatological variability and catchment response characteristics (Brutsaert & Nieber, 1977; Knapp et al.,

2002; Serrano, 2006)

4.2. RF Model Performance

The effect of multicollinearity in data analysis can make it difﬁcult to get appropriate linear coefﬁcient

estimates with small standard errors (Achen, 1982). Our analysis is different due to the application of RF

Figure 4. Characteristics of the fuzzy clustered drought regimes. (a) The variation of fuzzy silhouette index (FSI) and

Xie and Beni (XB) index values according to the number of clusters with 3 being the optimal number of clusters. (b) The

box plots of Z scores of considered drought characteristics in each cluster. (c) The number of catchments belonging

to each cluster (d) provides the kernel density estimates of the probability distribution of the drought characteristics

speciﬁc to each regime.

10.1029/2018WR024620

Water Resources Research

KONAPALA AND MISHRA 9of25

algorithm, which is nonlinear in nature, and we do not rely any regression coefﬁcients in our analysis.

Therefore, even though there is a linear correlation between the predictors, it does not interfere with our

analysis. In addition, this multicollinearity problem is alleviated since a random subset of features is

chosen for each tree in a RF [Hsilch et al., 2014; Ishwaran et al., 2010; Zhang & Ma, 2012; Díaz‐Uriarte &

De Andres, 2006].

As highlighted before, the primary purpose of our study is to identify the key climate, catchment, and mor-

phological variables using a machine learning interpretation framework. Although, our machine learning

application is not focused on prediction, we performed a preliminary analysis to evaluate the performance

of RF model by splitting the data in to training (75%) and testing (25%) phase of the optimized RF algorithm.

The model performed well based on the root‐mean‐square error information, and these plots are presented

in the supplementary text.

We evaluated the performance of RF algorithm to model the variations in drought characteristics (DI, DD,

and ND) in each regime with respect to the selected catchment and climate variables by applying on the

entire data set. As highlighted in section 3, the optimal parameters of RFs (i.e., mtry and nodesize) which

was derived based on the least out‐of‐bag error are listed in Table 2. In addition to that, the metrics of R

percentage bias (PBIAS), and Nash‐Sutcliffe efﬁciency (NSE) for the corresponding optimal model conﬁg-

urations are listed in Table 2. The coefﬁcient of determination (R

) in case of each RF model is more than

0.9. This measure indicates that the adopted RF algorithm can explain more than 90% of the variance found

in the drought characteristics. The PBIAS values which are expressed in percentage remain closer to 0 indi-

cating comparatively lesser bias among all the RF models. Finally, the NSE values are in the range of 0.77 to

0.85. NSE values closer to 1 correspond to a perfect match between the modeled and observed data points.

Also, NSE values greater than 0 indicate an unbiased model. Hence, the NSE values also point toward an

unbiased and efﬁcient model. Therefore, all the models have high coefﬁcient of determination (R

> 0.9),

lower PBIAS values and NSE values closer to 1.

Figure 5. Spatial distribution of the three drought regimes located in CONUS.

10.1029/2018WR024620

Water Resources Research

KONAPALA AND MISHRA 10 of 25

4.3. Application of Interpretation Framework to Understand Drought Characteristics

4.3.1. Application to Drought Duration

Figure 6 shows the ranking of climate and catchment variables that has potential inﬂuence on the hydrolo-

gical DD for each drought regime. As discussed earlier, variables with least minimal depth likely to have

higher dominant control, whereas the increase in depth will have lower inﬂuence on drought duration

within each regime. The dashed line (Figure 6) indicates the average minimal depth of all the variables,

which can be used as a threshold to determine the signiﬁcant variables of interest (Ishwaran et al., 2011).

Based on this threshold, the signiﬁcant inﬂuencing variables are highlighted in green color, and the nonin-

ﬂuential variables are highlighted in orange color (Figure 6).

Overall, 20 variables have more than average minimal depth for regime 1, which represents catchments with

higher drought duration (median DDzs~1). In case of regime 2, which represents catchments with average

Figure 6. Rank plots are provided in ascending order with variables exhibiting the least minimal depth on the top across clusters (a) 1, (b) 2, and (c) 3 in case of

drought duration (DD). The dotted line represents the average minimal depth. The variables having below average minimal depths are color codes as green and

variables with above average minimal depths are color coded as red.

10.1029/2018WR024620

Water Resources Research

KONAPALA AND MISHRA 11 of 25

drought duration, 14 variables have more than average minimal depth. Finally, in case of regime 3, which

consists of catchments with lower drought duration (median DDzs~−0.8), a total of 11 variables have more

than average minimal depth. The potential inﬂuence of number of climate and catchment variables on

drought characteristics varies for three different drought regimes. For instance, maximum number of catch-

ment variables dominate in controlling the drought duration for catchments that witnesses low drought

durations, whereas soil and climate variables dominate for catchments witnessing high and medium

drought durations, respectively. It was observed that in the case of catchments with high drought durations

(regime 1), base ﬂow index (BFI_AVE) has signiﬁcant lesser minimal depth compared to other variables sug-

gesting its dominant role in that regime. Base ﬂow index is a key variable as it captures the interaction

between climate and catchment variables that generates streamﬂow a given watershed.

To further understand how these dominant variables interact with each other to potentially inﬂuence the

drought duration, the normalized interactive minimal depth was plotted between the top 5 variables

(Figure 7). As highlighted before, normalized interactive minimal depth varies from 0 to 1, where 0

indicates highly interactive and 1 being no interaction between the selected variables. In case of regime

1 and 2, the interactive minimal depth between the variables is closer to 1 indicating that there is less

interaction between the dominant variables. However, Base ﬂow index (BFI_AVE) seems to interact with

the other variables and especially with the mean Relief ratio (RR_MEAN) and aspect with respect the

geographical north (ASPECT_NORTH) in regime 1. In case of regime 3, the maximum number of days

in a month with nonzero precipitation (WDMAX_BASIN) interacts with other variables and particularly

with length of streams per square kilometer (STREAMS_KM_SQ_KM) within the catchments. Overall,

these results suggest no signiﬁcant interaction between the dominant variables, although they have direct

inﬂuence on the drought duration.

We further assessed the partial dependence of top 5 dominant variables on drought duration (Figure 8). In

case of regime 1 (Figure 8a), base ﬂow (BFI_AVE) controls the drought duration based on a power law beha-

vior. The relation between baseﬂow index and drought is often complicated. Higher base ﬂow index can

result in low duration drought events, and as the magnitude of baseﬂow index increases, it shares a power

law function with the drought duration. In addition to that, the power law behavior extends over the entire

range of drought duration, which suggests a greater control of base ﬂow on higher drought durations. Mean

elevation and percentage of soils with low inﬁltration rate (HGC) exhibit nonlinear relationships; however,

unlike the case of base ﬂow index, they explain the variability of drought duration partially ranging from 12

to 13 months. In case of regime 2 (Figure 8b), base ﬂow index predominantly controls the drought duration

based on a nonlinear relationship. However, it is interesting to see that the underlying functional relation-

ship does not obey power law, as in the case of regime 1. Other variables, such as, basin compactness

(BASIN_COMPACTNESS), percentage of soils with low inﬁltration rate (HGC), aspect with respect the

geographical north (ASPECT_NORTHNESS), and temperature variability (TMEAN_SD) also exhibit a

nonlinear and inversely proportional functional dependence on drought duration. In case of regime 3, the

maximum of number of days in a month with nonzero precipitation (WDMAX_BASIN) plays a key role

compared to the base ﬂow index. A left truncated parabolic relationship can be observed, which indicates

a nonlinear control of precipitation intensity on drought duration.

4.3.2. Application to Number of Drought Events

Figure 9 shows the ranking of catchment and climate variables that has potential inﬂuence on ND events

within each regime. Overall, 23 variables have more than average minimal depth for regime 1, which

represents catchments with lesser drought occurrences (median NDzs~−1). A total of 13 variables have

more than average minimal depth are selected for regime 2, which represents catchments with moderate

number of drought events, whereas 14 variables have more than average minimal depth for regime 3.

Although, different dominant variables are identiﬁed that controls drought duration for each regime;

however, similar variables within each regime dominate both drought duration and drought

event occurrences.

The interaction between the ﬁve most dominant variables within each regime based on the number of

drought events is illustrated in Figure 10. Similar to the case of drought duration, lowest interactive depth

was observed in the case of base ﬂow index (BFI_AVE) and it has some interaction with the mean relief ratio

(RR_MEAN) in case of regime 1. However, no such signiﬁcant interactions observed in case of regime 2 due

to the relatively high ID values.

10.1029/2018WR024620

Water Resources Research

KONAPALA AND MISHRA 12 of 25

Figure 11a illustrates the partial dependence between variables speciﬁc to regime 1. It can be observed that

the variables that has potential inﬂuence on drought duration also inﬂuences drought event occurrences.

The BFI_Ave is inversely proportional to drought occurrences following an exponential relationship. An

increase in base ﬂow likely to increase in ground water contribution to streamﬂow resulting in lesser num-

ber of droughts. Elevation exhibits an inverse relationship up to 2,000 m and then exhibits a directly propor-

tional relationship till 3,000 m, whereas HGC exhibits a semi parabolic relationship and it can be observed

that other variables do not explain much of the variability of drought event occurrences.

Overall, it was observed that RF modeling framework is ﬂexible to accommodate different functional rela-

tionships between the dominant variables and the number of drought events. In case of regime 2, variability

of temperature (TMEAN_SD) exhibits more dominant behavior in controlling the drought event occur-

rences, whereas BFI_AVE shares an inversely proportional relationship for the same regime. The other three

selected variables exhibit dominant and different functional relationships as shown in Figure 11b. Bulk den-

sity of the soil (BD_AVE) is a key variable that has potential inﬂuence on the drought event occurrences in

regime 3 (Figure 11c). However, it does not explain the variability of the entire range of drought event occur-

rences as in the case of other two regimes. WDMAX_BASIN was able to explain the variability of drought

Figure 7. Interactive depth of the top ﬁve dominant variables controlling drought duration (DI) for clusters (a) 1, (b) 2,

and (c) 3. (Note: In each ﬁgure, the xaxis represents the same variables which are provided as heading in each ﬁgure

facet. Reference variables are marked with blue crossing each panel. Higher values indicate lower interactivity with

reference variable.)

10.1029/2018WR024620

Water Resources Research

KONAPALA AND MISHRA 13 of 25

occurrences on the higher end which was previously ignored by the BD_AVE. This highlights the

complementary behavior of the climate and catchment characteristics for controlling the drought

event occurrences.

4.3.3. Application to Drought Intensity

Climate and catchment variables are ranked based on their potential inﬂuence on DI (Figure 12). A total

number of 24 variables have more than average minimal depth for regime 1, which represents catchments

with lower drought intensity (median DIzs~−1). In comparison to regime1, a lesser number of inﬂuencing

variables were observed for regimes 2 and 3. A total number of 10 and 11 variables have more than average

minimal depth for regimes 2 and 3, which represents catchments with moderate and higher drought inten-

sity, respectively. The type of variables which dominate drought intensity are mostly similar in the case of

drought duration and number of drought events. Overall, it was observed that the majority of the variables

are related to soil, climate and catchment characteristics that has potential inﬂuence on drought intensity in

regimes 1–3, respectively.

The interaction between the top ﬁve dominant variables within each regime in case of drought intensity is

illustrated in Figure 13. In case of regime 1, none of the dominant variables have shown any signiﬁcant inter-

actions as the ID values are closer to 1. However, in case of regime 2, temperature variability (TMEAN_SD)

exhibits potential interaction with other dominant variables as ID value is around 0.7. In case of regime 3,

Figure 8. Partial dependence plots of top ﬁve dominant variables controlling drought duration (DI) clusters (a) 1, (b) 2,

and (c) 3.

10.1029/2018WR024620

Water Resources Research

KONAPALA AND MISHRA 14 of 25

the average clay content (CLAYAVE) in the basin exhibits signiﬁcant interactive effects. Among them, the

percentage of soil with high inﬁltration (HGA) has exhibited signiﬁcant interaction with CLAYAVE. As

in the case of other drought properties, the interacting effects are signiﬁcant but to the lesser as exhibited

by the relative higher ID values in case of drought intensity.

Figure 14a illustrates the partial dependence speciﬁc to regime 1. The percentage of soils with high inﬁltra-

tion capacity (HGA) has a dominant role and it shares a directly proportional relationship with drought

intensity. The presence of soils with high inﬁltration likely to create a competition between ground water

recharge and streamﬂow. Hence, drier antecedent conditions may result in more intense droughts. On the

other hand, forest cover (FOREST) has an inversely proportional relationship with drought intensity. Base

ﬂow index also inﬂuences the drought intensity, but not as signiﬁcant as in other cases. Mean aspect degree

(Aspect_Degrees) shares a direct proportional relationship with drought intensity.

Figure 9. Same as Figure 7 but in case of number of drought events (NE).

10.1029/2018WR024620

Water Resources Research

KONAPALA AND MISHRA 15 of 25

In case of regime 2, the temperature variability (TMEAN_SD) exhibits the most dominant control on

drought intensity similar to the case of number of drought events. However, the functional relationship is

opposite in nature. The percentage of streams in Strahler's forth order (PCT_4th_ORDER) exhibits an

inverse exponential relationship with the drought intensity. Additional variables, such as, rainfall factor

(R_FACT), silt content (SILT_AVE), and precipitation variability (PRCP_CV) also inﬂuences the intensity

of drought. In regime 3, basin compactness (BAS_COMPACTNESS) and base ﬂow index (BFI_AVE) exhibit

dominant control on drought intensity. However, the remaining variables are not as dominant as in the case

of other clusters.

5. Discussion and Outlook

Hydrological drought in a catchment is controlled by the climate characteristics (recharge) and catchment

characteristics (storage). Based on our interpretation framework by using MD, ID, and partial dependence

metrics, the important climate and catchment characteristics that controls hydrological drought character-

istics (number of events, duration, and intensity) are provided in Table 3. It was observed that the catch-

ments for regime 1 are mostly located in the higher elevations or mountainous regions characterized by

the steep sloping terrain (Figure 3). The hydrological drought characteristics for regime 1 are mostly inﬂu-

enced by the catchment characteristics, which includes base ﬂow index, elevation, and soil characteristics

Figure 10. Same as Figure 8 but in case of number of drought events (NE).

10.1029/2018WR024620

Water Resources Research

KONAPALA AND MISHRA 16 of 25

(inﬁltration rates). Baseﬂow is inﬂuenced by natural factors such as climate, geology, relief, soils, and

vegetation. Factors that promote inﬁltration and recharge of subsurface storage will increase baseﬂows,

while factors associated with higher evapotranspiration will reduce baseﬂow. Therefore, in these

catchments, groundwater drainage moves slowly, which results in prolonged baseﬂow following rainfall

events and thus being more inﬂuential in generating hydrological droughts. Interestingly, the elevation is

a standalone catchment characteristic that plays an important role for drought characteristics in regime 1.

Most of these catchments receive snows during winter months; therefore, the hydrological drought can be

inﬂuenced by a combination of rain and snow and depending on the difference between elevation ranges

the timing and intensity of drought can vary among the watersheds.

In addition to elevation, the soil characteristic (e.g., inﬁltration and hydraulic conductivity) is a key variable

for hydrological droughts for regime 1. In these catchments, subsurface ﬂow generation is directly propor-

tional to the hydraulic conductivity of soils and thus controlling the discharge rate speciﬁc to soil types

(e.g., Armbruster, 1976; Musiake et al., 1984; Smith, 1981). In addition, soil properties are known to affect

inﬁltration, rooting depth/restrictions, available water capacity, soil porosity, and soil microorganism activ-

ity, which inﬂuence the streamﬂow discharge rate (Bennie et al., 2008; Moeslund et al., 2013; Strachan &

Daly, 2017). The moisture storage capacity in soil decreases due to reduced precipitation and high evapotran-

spiration that further reduces baseﬂow leading to evolution of hydrological droughts in different segments of

the hydrological system. Hence, streamﬂow generation in base ﬂow dominant streams is strongly inﬂuenced

Figure 11. Same as Figure 9 but in case of number of drought events (NE).

10.1029/2018WR024620

Water Resources Research

KONAPALA AND MISHRA 17 of 25

by the subsurface hydrogeologic, conﬁguration, the saturated permeabilities of the component formations,

and the unsaturated soil characteristics of the soil types (Freeze, 1972). Hence, in addition to the

topography, the soil features may control the hydrologic drought properties through these physical

processes. Further, the forest cover inﬂuences drought intensity compared to duration and number

of events.

Some of the catchment characteristics that controls hydrologic drought in regime also inﬂuence drought

characteristics in regimes 2 and 3. However, there is a clear difference between regime 1 and regimes 2

and 3 in terms of climate control on hydrological droughts. The Climate factors such as precipitation may

have lesser direct inﬂuence on hydrological droughts in regime 1, which can be attributed to limited time

available to store water in comparatively higher gradient watersheds as well as possible contribution of snow

for the watersheds located in snowy regions (e.g., northeast and central north watersheds). The climate

Figure 12. Same as Figure 7 but in case of drought intensity (DI).

10.1029/2018WR024620

Water Resources Research

KONAPALA AND MISHRA 18 of 25

variable which has a potential inﬂuence on hydrological drought in regimes 2 and 3 includes temperature

and precipitation. For example, temperature can have a direct inﬂuence on the development of

hydrological drought in snow dominated regions. The combination of elevation and temperature on

triggering hydrological droughts can vary due to snow dominated regions located in mountain regions.

The role of precipitation characteristics on propagation of hydrological drought is well recognized (Mishra

& Singh, 2010; Mukherjee et al., 2018; Wan et al., 2017). The amount of rainwater held in storage is

different for three regimes, for example, higher elevation areas can hold less rain water compared to low

lying forested areas. The rainfall pattern in semiarid regions (typically western USA) is very irregular

leading to very low storage and increase in hydrological drought.

However, for the humid catchments located in regimes 2 and 3, the soils are mostly saturated due to the ante-

cedent climate conditions that results in a more direct relationship between precipitation, potential evapo-

transpiration, and temperature with hydrologic drought characterization. In addition to precipitation and

temperature, the lower relative humidity can inﬂuence rainfall patterns leading to the evolution of hydrolo-

gical drought in regime 3. The role relative humidity on evolution of drought is complex in nature. During

dry hydrologic conditions, the moisture depletes from the upper soil layers leading to decrease in evapotran-

spiration and atmospheric relative humidity (Mishra & Singh, 2010). Further, the reduced relative humidity

reduces the probability of the rainfall, which further triggers hydrological drought (Mishra & Singh, 2010).

Figure 13. Same as Figure 8 but in case of drought intensity (DI).

10.1029/2018WR024620

Water Resources Research

KONAPALA AND MISHRA 19 of 25

In regimes 2 and 3, the stream networks deﬁned by Strahler number has a potential inﬂuence on the hydro-

logical drought. An increase in Strahler order can be related to a decline in the catchment's general slope

(Haidary et al., 2015) and potentially increase the storage capacity, groundwater recharge, and baseﬂow of

the watershed. Therefore, the ﬁrst order streams that represent the outermost tributaries are typically

located at higher slopes compared to fourth order streams. This suggests that the higher storage capacity

likely to be observed in fourth order stream will have a better control on hydrological drought compared

with ﬁrst order stream. The ﬁrst order stream which is usually located at the upper end of channel networks

(Strahler, 1952) comparatively has larger slope likely to drain out excess water immediately following a pre-

cipitation event (McMahon & Finlayson, 2003). As a result, if there is a deﬁcit in precipitation or increase in

evapotranspiration in the catchment, the ﬁrst order streams likely to facilitate a more direct propagation of

meteorological drought to hydrological drought with no buffer (Godsey & Kirchner, 2014; McMahon &

Finlayson, 2003; Pinna et al., 2004). As a result, the presence of ﬁrst and fourth stream orders might inﬂu-

ence the drought properties in contrasting ways.

In regimes 2 and 3, the mean aspect degree found to be an important variable in controlling the hydrologic

drought properties. Mean aspect degree is often associated with variability in microclimate, including near‐

surface temperatures, evaporative demand, soil moisture content, and vegetation (Strachan & Daly, 2017;

Srinivasan et al., 2015; Moeslund et al., 2013). As a result, the mean aspect degree, which is a topographic

metric, controls the microclimatic and vegetation features likely to control drought characteristics. It was

Figure 14. Same as Figure 9 but in case of drought intensity (DI).

10.1029/2018WR024620

Water Resources Research

KONAPALA AND MISHRA 20 of 25

observed that in addition to the common processes that control the hydrologic drought characteristics, there

are additional distinct processes observed speciﬁc to each regime. This highlights the differential nature of

climate and catchment control on hydrological droughts. Under humid conditions, the evolution of

hydrological droughts in small size catchments can be attributed more to climate characteristics, whereas

for the larger watersheds the storage capacity and baseﬂow associated with catchment characteristics can

play a dominant role, whereas, for the catchments under severe dry condition, the climate signal can have

less predictive power compared to the storage properties of the watershed.

The application of RF algorithm can provide a better understanding of how the climate and catchment

controls differ for a speciﬁc hydrological drought characteristic. For instance, previous studies highlighted

that the BFI, which represents the storage characteristics, plays a key role in controlling the drought dura-

tion (; Van Lanen et al., 2013; Van Loon & Laaha, 2015). Our framework does reconﬁrm the role of BFI in

case of regimes 1 and 2; however, we further identiﬁed two important distinctions. First, the relationship

between BFI with drought characteristics can be nonlinear, and as a result, it cannot be generalized about

increase in drought duration with BFI. Second, the base ﬂow acts as a dominant process mostly for the catch-

ments that witnesses medium and high duration drought events, whereas it has lesser inﬂuence for the

catchments with lower drought durations. The linear regression approach may not capture such phenom-

enon as the model parameters are more biased toward high magnitude variables (Hastie et al., 2009).

Our empirical analysis suggests lack of prominent interactions between dominant variables on hydrological

drought propagation. This highlights the fact that dominant drivers of drought characteristics are more addi-

tive and independent in nature. For example, the percentage of soils with high inﬁltration rate (HGA) and

percentage of fourth order streams (PCT_4th_ORDER) both have dominant control on drought intensity but

have minimal interaction effect. As a result, even though both of these variables control the propagation of

drought independently, the underlying processes for drought propagation have minimum interaction with

each other.

Our results also indicated that even though similar dominant controls exist across different regimes; their

functional relationship with drought characteristics might be different as highlighted in Jencso and

McGlynn (2011) and Knapp et al. (2015). For instance, we identiﬁed that the base ﬂow index controls

drought duration for both regimes 1 and 2. However, the functional relationship of base ﬂow with respect

to drought duration is different in regimes 1 and 2. In regime 1, the base ﬂow index exhibits an exponential

relationship with drought duration, where as in the case of regime 2, a different form of nonlinear relation-

ship exists, which does not ﬁt into the traditional exponential functions. Therefore, even though the catch-

ment and climate characteristics exhibit a nonlinear relationship with respect to drought characteristics, the

relationships across regions with different hydrologic characteristics should not be generalized.

Our interpretative modeling framework also highlighted the inﬂuence of dominant variables can vary over a

range of drought characteristics. In other words, individual climate (catchment) characteristic can have a

higher (lower) inﬂuence on the variability in the upper (lower) range of drought characteristic. For instance,

the bulk density may affect the soil features which control the drought intensity in regime 1. However, it

does not explain the variability of the entire range of drought event occurrences as in the case of other

two regimes. Whereas, WDMAX_BASIN can able to explain the variability of drought occurrences on the

higher end which was previously ignored by the BD_AVE. This highlights the complementary behavior of

the climate and catchment characteristics for controlling the drought event occurrences.

The framework presented in this study introduces valuable interpretability components of RF algorithm in

the context of understanding hydrologic processes. Even though we have applied this framework for under-

standing drought characteristics, there are other frameworks for understanding the black box model inter-

pretability. Among them, individual conditional expectations (Goldstein et al., 2015; Guidotti et al., 2018),

local interpretable model‐agnostic explanations (Ribeiro et al., 2016), and inﬂuence functions (Koh &

Liang, 2017), which are recently introduced in machine learning literatures. There is a potential scope to

compare these interpretability frameworks and the quality of machine learning algorithms. We believe

our approach can serve as a preliminary avenue to further delve deeper into the application of interpretative

machine learning frameworks for understanding not only droughts but also other hydrologic processes.

Therefore, further works along these directions might improve our understanding of hydrologic processes

using interpretative machine learning algorithms.

10.1029/2018WR024620

Water Resources Research

KONAPALA AND MISHRA 21 of 25

6. Conclusions

In this study, we applied machine learning methods by integrating fuzzy clustering and RF algorithm to

develop an interpretation framework (i.e., minimal and interactive depth and partial dependence) to

quantify the role of climate and catchment controls on hydrological drought for 652 catchments located in

CONUS. RF algorithm can adequately capture the functional relationship between climate and catchment

characteristics and hydrological droughts. The proposed framework based on MD, ID, and partial

dependence metrics can identify the important climate and catchment characteristics that can further

improve our understanding of the dominant role of climate and catchment characteristics in propagation

of hydrological droughts.

Using a large number of catchments under different climatic regimes enabled us to explore the dominant

control of these land scape control on CONUS hydrological droughts. We conclude that the RF‐based inter-

pretative approach is a simple, robust, and yet powerful way to gain insights into the drivers of hydrological

droughts. The applied framework can provide useful information to understand different combination of

climate and catchment characteristics that can either attenuated or intensify the hydrological droughts.

The following conclusions can be drawn from this study: (i) Three drought regimes are identiﬁed based

on their duration, frequency and intensity, which includes Regime 1: droughts with longer duration, less fre-

quent, and lesser intensity; Regime 2: droughts with moderate duration, moderate frequency, and moderate

intensity; and Regime 3: droughts with shorter duration, more frequent, and more intense; (ii) among the

identiﬁed regimes, even though some common hydrologic processes control the drought characteristics,

there are some distinct processes speciﬁc to each regime; (iii) similar climate, catchment, and morphological

characteristics may exhibit varied functional relationships (i.e., exponential, hyperbolic, and linear) with

drought characteristics located in different regimes; and (iv) the dominant variables may not explain the

variability of the entire range of drought characteristics. From the above insights, we propose that these

issues deserve more attention by integrating the knowledge obtained from the application of machine

learning algorithms in hydroclimatic process (e.g., hydrological drought) and hydrological models used

for such analysis. Although, hydrologic models can able to capture the streamﬂow with reasonable accuracy,

but it often over (under) estimated the extreme events such as extreme drought events. This implies that a

better understanding of the role of climate and catchment characteristics for the evolution and propagation

of hydrological drought events is essential. The results obtained from our proposed machine learning

framework can complement the ongoing research related to hydrological droughts by better exploitation

of the value of nonclimatic attributes (such as soil, land cover, and geology), and a more systematic

characterization of the uncertainties in catchment attributes needs to performed.

References

Achen, C. H. (1982). Interpreting and using regression (Vol. 29). Sage.

Addor, N., Nearing, G., Prieto, C., Newman, A. J., Le Vine, N., & Clark, M. P. (2018). A ranking of hydrological signatures based on their

predictability in space. Water Resources Research,54(11), 8792–8812.

Apurv, T., Sivapalan, M., & Cai, X. (2017). Understanding the role of climate characteristics in drought propagation. Water Resources

Research,53, 9304–9329. https://doi.org/10.1002/2017WR021445

Armbruster, J. T. (1976). An inﬁltration index useful in estimating low‐ﬂow characteristics of drainage basins. Journal of Research of the

U. S. Geological Survey,4(5), 533–538.

Bastani, O., Kim, C., & Bastani, H. (2017). Interpreting blackbox models via model extraction. arXiv preprint arXiv:1705.08504.

Bennie, J., Huntley, B., Wiltshire, A., Hill, M. O., & Baxter, R. (2008). Slope, aspect and climate: Spatially explicit and implicit models of

topographic microclimate in chalk grassland. Ecological modelling,216(1), 47–59.

Bibal, A., & Frénay, B. (2016). Interpretability of machine learning models and representations: An introduction. In Proceedings on

ESANN (pp. 77‐82).

Boulesteix, A. L., Janitza, S., Kruppa, J., & König, I. R. (2012). Overview of random forest methodology and practical guidance with

emphasis on computational biology and bioinformatics. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery,2(6),

493–507.

Breiman, L. (2001). Random rorests. Machine Learning,45,5–32. https://doi.org/10.1023/A:1010933404324

Brutsaert, W., & Nieber, J. L. (1977). Regionalized drought ﬂow hydrographs from a mature glaciated plateau. Water Resources Research,

13(3), 637–643.

Campello, R. J., & Hruschka, E. R. (2006). A fuzzy extension of the silhouette width criterion for cluster analysis. Fuzzy Sets and Systems,

157(21), 2858–2875.

Carrillo, G., Troch, P. A., Sivapalan, M., Wagener, T., Harman, C., & Sawicz, K. (2011). Catchment classiﬁcation: Hydrological

analysis of catchment behavior through process‐based modeling along a climate gradient. Hydrology and Earth System Sciences,15(11),

3411–3430.

Caruana, R., & Niculescu‐Mizil, A. (2006, June). An empirical comparison of supervised learning algorithms. In Proceedings of the 23rd

international conference on Machine learning (pp. 161‐168). ACM.

10.1029/2018WR024620

Water Resources Research

KONAPALA AND MISHRA 22 of 25

Acknowledgments

We very much appreciate Associate

Editor and three reviewer's valuable

comments that helped us improve our

manuscript. This study was supported

by the NSF award 1653841. Any

opinion, ﬁndings, and conclusions or

recommendations expressed in this

material are those of the authors and do

not necessarily reﬂect the views of the

NSF. The authors used GAGES II data

set in this study, and these data sets are

publicly available at https://water.usgs.

gov/GIS/metadata/usgswrd/XML/

gagesII_Sept2011.xml website.

Cayan, D. R., Das, T., Pierce, D. W., Barnett, T. P., Tyree, M., & Gershunov, A. (2010). Future dryness in the southwest US and the

hydrology of the early 21st century drought. Proceedings of the National Academy of Sciences,107(50), 21,271–21,276.

Cutler, D. R., Edwards, T. C. Jr., Beard, K. H., Cutler, A., Hess, K. T., Gibson, J., & Lawler, J. J. (2007). Random forests for classiﬁcation in

ecology. Ecology,88(11), 2783–2792.

Daly, C., Taylor, G. H., Gibson, W. P., Parzybok, T. W., Johnson, G. L., & Pasteris , P. A. (2000). High‐quality spatial climate data sets for the

United States and beyond. Transactions of the ASAE,43(6), 1957.

Díaz‐Uriarte, R., & De Andres, S. A. (2006). Gene selection and classiﬁcation of microarray data using random forest. BMC Bioinformatics,

7(1), 3.

Diller, G. P., Alonso‐Gonzalez, R., Kempny, A., Dimopoulos, K., Inuzuka, R., Giannakoulas, G., & Swan, L. (2012). B‐type natriuretic

peptide concentrations in contemporary Eisenmenger syndrome patients: Predictive value and response to disease targeting therapy.

Heart, heartjnl‐2011.

Doshi‐Velez, F., & Kim, B. (2017). Towards a rigorous science of interpretable machine learning. arXiv preprint arXiv:1702.08608.

Efron, B., & Tibshirani, R. J. (1994). An introduction to the bootstrap. Washington, DC: CRC press.

Elshorbagy, A., Corzo, G., Srinivasulu, S., & Solomatine, D. P. (2010a). Experimental investigation of the predictive capabilities of data

driven modeling techniques in hydrology—Part 1: Concepts and methodology. Hydrology and Earth System Sciences,14(10), 1931–1941.

Elshorbagy, A., Corzo, G., Srinivasulu, S., & Solomatine, D. P. (2010b). Experimental investigation of the predictive capabilities of data

driven modeling techniques in hydrology‐Part 2: Application. Hydrology and Earth System Sciences,14(10), 1943–1961.

Fahimi, F., Yaseen, Z. M., & El‐shaﬁe, A. (2017). Application of soft computing based hybrid models in hydrological variables modeling: A

comprehensive review. Theoretical and applied climatology,128(3‐4), 875–903.

Falcone, J. A. (2011). GAGES‐II: Geospatial attributes of gages for evaluating streamﬂow. US Geological Survey.

Fienen, M. N., Nolan, B. T., Kauffman, L. J., & Feinstein, D. T. (2018). Metamodeling for groundwater age forecasting in the Lake Michigan

Basin. Water Resources Research,54, 4750–4766. https://doi.org/10.1029/2017WR022387

Freeze, R. A. (1972). Role of subsurface ﬂow in generating surface runoff: 1. Base ﬂow contributions to channel ﬂow. Water Resources

Research,8(3), 609–623.

Friedman, J. H., & Meulman, J. J. (2003). Multiple additive regression trees with application in epidemiology. Statistics in medicine,22(9),

1365–1381. https://doi.org/10.1002/sim.1501

Gocic, M., & Trajkovic, S. (2014). Spatiotemporal characteristics of drought in Serbia. Journal of Hydrology,510, 110–123.

Godsey, S. E., & Kirchner, J. W. (2014). Dynamic, discontinuous stream networks: Hydrologically driven variations in active drainage

density, ﬂowing channels and stream order. Hydrological Processes,28(23), 5791–5803.

Goldstein, A., Kapelner, A., Bleich, J., & Pitkin, E. (2015). Peeking inside the black box: Visualizing statistical learning with plots of indi-

vidual conditional expectation. Journal of Computational and Graphical Statistics,24(1), 44–65. https://doi.org/10.1080/

10618600.2014.907095

Guidotti, R., Monreale, A., Ruggieri, S., Turini, F., Giannotti, F., & Pedreschi, D. (2018). A survey of methods for explaining black box

models. ACM Computing Surveys (CSUR),51(5), 93.

Gupta, H. V., & Nearing, G. S. (2014). Debates—The future of hydrological sciences: A (common) path forward? Using models and data to

learn: A systems theoretic perspective on the future of hydrological science. Water Resources Research,50, 5351–5359. https://doi.org/

10.1002/2013WR015096

Haidary, A., Amiri, B. J., Adamowski, J., Fohrer, N., & Nakane, K. (2015). Modelling the relationship between catchment attributes and

wetland water quality in Japan. Ecohydrology,8, 726–737. https://doi.org/10.1002/eco.1539

Haslinger, K., Kofﬂer, D., Schöner, W., & Laaha, G. (2014). Exploring the link between meteorological drought and streamﬂow: Effects of

climate‐catchment interaction. Water Resources Research,50, 2468–2487. https://doi.org/10.1002/2013WR015051

Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning: Data mining, inference, and prediction. Springer

Science & Business Media.

Hengl, T., Nussbaum, M., Wright, M. N., Heuvelink, G. B., & Gräler, B. (2018). Random forest as a generic framework for predictive

modeling of spatial and spatio‐temporal variables. PeerJ,6, e5518.

Huang, G., Huang, G. B., Song, S., & You, K. (2015). Trends in extreme learning machines: A review. Neural Networks,61,32–48.

https://doi.org/10.1016/j.neunet.2014.10.001

Ishwaran, H., Kogalur, U. B., Chen, X., & Minn, A. J. (2011). Random survival forests for high‐dimensional data. Statistical Analysis and

Data Mining: The ASA Data Science Journal,4(1), 115–132.

Ishwaran, H., Kogalur, U. B., Gorodeski, E. Z., Minn, A. J., & Lauer, M. S. (2010). High‐dimensional variable selection for survival data.

Journal of the American Statistical Association,105, 205–217.

Jencso, K. G., & McGlynn, B. L. (2011). Hierarchical controls on runoff generation: Topographically driven hydrologic connectivity,

geology, and vegetation. Water Resources Research,47, W11527. https://doi.org/10.1029/2011WR010666

Karpatne, A., Atluri, G., Faghmous, J. H., Steinbach, M., Banerjee, A., Ganguly, A., & Kumar, V. (2017). Theory‐guided data science: A new

paradigm for scientiﬁc discovery from data. IEEE Transactions on Knowledge and Data Engineering,29(10), 2318–2331.

Knapp, A. K., Carroll, C. J., Denton, E. M., La Pierre, K. J., Collins, S. L., & Smith, M. D. (2015). Differential sensitivity to regional‐scale

drought in six central US grasslands. Oecologia,177(4), 949–957. https://doi.org/10.1007/s00442‐015‐3233‐6

Knapp, A. K., Fay, P. A., Blair, J. M., Collins, S. L., Smith, M. D., Carlisle, J. D., et al. (2002). Rainfall variability, carbon cycling, and plant

species diversity in a mesic grassland. Science,298(5601), 2202–2205.

Koch, J., Stisen, S., Refsgaard, J. C., Ernstsen, V., Jakobsen, P. R., & Højberg, A. L. (2019). Modeling depth of the redox interface at high

resolution at national scale using random forest and residual Gaussian simulation. Water Resources Research,55, 1451–1469.

https://doi.org/10.1029/2018WR023939

Koh, P. W., & Liang, P. (2017, July). Understanding black‐box predictions via in ﬂuence functions. In International Conference on Machine

Learning (pp. 1885‐1894).

Konapala, G., & Mishra, A. (2017). Review of complex networks application in hydroclimatic extremes with an implementation to char-

acterize spatio‐temporal drought propagation in continental USA. Journal of Hydrology,555, 600–620.

Konapala, G., & Mishra, A. K. (2016). Three‐parameter‐based streamﬂow elasticity model: Application to MOPEX basins in the USA at

annual and seasonal scales. Hydrology and Earth System Sciences,20(6), 2545–2556.

Kotsiantis, S. B., Zaharakis, I., & Pintelas, P. (2007). Supervised machine learning: A review of classiﬁcation techniques. Emerging Arti ﬁcial

Intelligence Applications in Computer Engineering,160,3–24.

Krishnapuram, R., Joshi, A., Nasraoui, O., & Yi, L. (2001). Low‐complexity fuzzy relational clustering algorithms for web mining. IEEE

transactions on Fuzzy Systems,9(4), 595–607.

10.1029/2018WR024620

Water Resources Research

KONAPALA AND MISHRA 23 of 25

Latt, Z. Z., Wittenberg, H., & Urban, B. (2015). Clustering hydrological homogeneous regions and neural network based index ﬂood esti-

mation for ungauged catchments: an example of the Chindwin River in Myanmar. Water Resources Management,29(3), 913–928.

Ley, R., Casper, M. C., Hellebrand, H., & Merz, R. (2011). Catchment classiﬁcation by runoff behaviour with self‐organizing maps (SOM).

Hydrology and Earth System Sciences,15(9), 2947–2962.

McMahon, T. A., & Finlayson, B. L. (2003). Droughts and anti‐droughts: The low ﬂow hydrology of Australian rivers. Freshwater Biology,

48(7), 1147–1160.

Mishra, A. K., & Singh, V. P. (2010). A review of drought concepts. Journal of hydrology,391(1‐2), 202–216.

Mishra, A. K., & Singh, V. P. (2011). Drought modeling—A review. Journal of Hydrology,403(1‐2), 157–175.

Moeslund, J. E., Arge, L., Bøcher, P. K., Dalgaard, T., & Svenning, J. C. (2013). Topography as a driver of local terrestrial vascular plant

diversity patterns. Nordic Journal of Botany,31(2), 129–144.

Mukherjee, S., Mishra, A., & Trenberth, K. E. (2018). Climate change and drought: A perspective on drought indices. Current Clima te

Change Reports,4(2), 145–163. https://doi.org/10.1007/s40641‐018‐0098‐x

Musiake, K., Takahasi, Y., Ando, Y., 1984. Statistical analysis on effects of basin geology on river ﬂow regime in mountainous areas of

Japan. Proc. Fourth Cong. Asian & Paciﬁc Reg. Div. Int. Assoc. Hydraul. Res., Bangkok, APD‐IAHR/Asian Institute Technology, vol. 2,

pp. 1141–1150.

Narasimhan, B., & Srinivasan, R. (2005). Development and evaluation of Soil Moisture Deﬁcit Index (SMDI) and Evapotranspiration

Deﬁcit Index (ETDI) for agricultural drought monitoring. Agricultural and Forest Meteorology,133(1‐4), 69–88.

Nourani, V., Baghanam, A. H., Adamowski, J., & Kisi, O. (2014). Applications of hybrid wavelet—Artiﬁcial Intelligence models in

hydrology: A review. Journal of Hydrology,514,358–377.

Olden, J. D., Kennard, M. J., & Pusey, B. J. (2012). A framework for hydrologic classiﬁcation with a review of methodologies and appli-

cations in ecohydrology. Ecohydrology,5(4), 503–518.

Pinna, M., Fonnesu, A., Sangiorgio, F., & Basset, A. (2004). Inﬂuence of summer drought on spatial patterns of resource availability and

detritus processing in Mediterranean stream sub‐basins (Sardinia, Italy). International Review of Hydrobiology: A Journal Covering all

Aspects of Limnology and Marine Biology,89(5‐6), 484–499.

Prasad, A. M., Iverson, L. R., & Liaw, A. (2006). Newer classiﬁcation and regression tree techniques: bagging and random forests for

ecological prediction. Ecosystems,9(2), 181–199.

Probst, P., & Boulesteix, A. L. (2017). To tune or not to tune the number of trees in random forest? arXiv preprint arXiv:1705.05654.

Raghavendra, S., & Deka, P. C. (2014). Support vector machine applications in the ﬁeld of hydrology: A review. Applied soft computing,19,

372–386.

Rajsekhar, D., Singh, V. P., & Mishra, A. K. (2014). Hydrologic drought atlas for Texas. Journal of Hydrologic Engineering,20(7), 05014023.

Ribeiro, M. T., Singh, S., & Guestrin, C. (2016, August). Why should I trust you?: Explain ing the predictions of any classiﬁer. In Proceedings

of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining (pp. 1135‐1144). ACM.

Rice, J. S., Emanuel, R. E.,Vose, J. M., & Nelson, S. A. (2015). Continental US streamﬂow trends from 1940 to 2009 and their relationships

with watershed spatial characteristics. Water Resources Research,51, 6262–6275.

Saft, M., Peel, M. C., Western, A. W., & Zhang, L. (2016). Predicting shifts in rainfall‐runoff partitioning during multiyear drought: Roles of

dry period and catchment characteristics. Water Resources Research,52, 9290–9305. https://doi.org/10.1002/2016WR019525

Saft, M., Western, A. W., Zhang, L., Peel, M. C., & Potter, N. J. (2015). The inﬂuence of multiyear drought on the annual rainfall‐runoff

relationship: An Australian perspective. Water Resources Research,51, 2444–2463. https://doi.org/10.1002/2014WR015348

Sawicz, K., Wagener, T., Sivapalan, M., Troch, P. A., & Carrillo, G. (2011). Catchment classiﬁcation: Empirical analysis of hydrologic

similarity based on catchment function in the eastern USA. Hydrology and Earth System Sciences,15(9), 2895–2911.

Schwalm, C. R., Anderegg, W. R., Michalak, A. M., Fisher, J. B., Biondi, F., Koch, G., & Huntzinger, D. N. (2017). Global patterns of drought

recovery. Nature,548(7666), 202–205. https://doi.org/10.1038/nature23021

Scornet, E. (2017). Tuning parameters in random forests. ESAIM: Proceedings and Surveys 60: 144‐162.

Shefﬁeld, J., Wood, E. F., & Roderick, M. L. (2012). Little change in global drought over the past 60 years. Nature,491(7424), 435–438.

https://doi.org/10.1038/nature11575

Shen, C. (2018). A transdisciplinary review of deep learning research and its relevance for water resources scientists. Water Resources

Research,54(11), 8558–8593.

Shortridge, J. E., Guikema, S. D., & Zaitchik, B. F. (2016). Machine learning methods for empirical streamﬂow simulation: A comparison of

model accuracy, interpretability, and uncertainty in seasonal watersheds. Hydrology and Earth System Sciences,20(7), 2611–2628.

Shukla, S., & Wood, A. W. (2008). Use of a standardized runoff index for characterizing hydrologic drought. Geophysical Research Letters,

35, L02405. https://doi.org/10.1029/2007GL032487

Smith, R. W. (1981). Rock type and minimum 7‐day/10‐year ﬂow in Virginia streams. Virginia Water Resource Research Center, Virginia

Polytechnology Institute and State University, Blacksburg, Bulletin, vol. 116, 43 pp.

Stahl, K., Moore, R. D., Shea, J. M., Hutchinson, D., & Cannon, A. J. (2008). Coupled modelling of glacier and streamﬂow response to future

climate scenarios. Water Resources Research,44, W02422. https://doi.org/10.1029/2007WR005956

Stoelzle, M., Stahl, K., Morhard, A., & Weiler, M. (2014). Streamﬂow sensitivity to drought scenarios in catchments with different geology.

Geophysical Research Letters,41, 6174–6183. https://doi.org/10.1002/2014GL061344

Strachan, S., & Daly, C. (2017). Testing the daily PRISM air temperature model on semiarid mountain slopes. Journal of Geophysical

Research: Atmospheres,122, 5697–5715. https://doi.org/10.1002/2016JD025920

Strahler, A. N. (1952). Hypsometric (area‐altitude) analysis of erosional topography. Geological Society of America Bulletin,63(11),

1117–1142.

Strobl, C., Malley, J., & Tutz, G. (2009). An introduction to recursive partitioning: Rationale, application, and characteristics of classiﬁca-

tion and regression trees, bagging, and random forests. Psychological methods,14(4), 323–348. https://doi.org/10.1037/a0016973

Tallaksen, L. M., Hisdal, H., & Van Lanen, H. A. (2009). Space‐time modelling of catchment scale drought characteristics. Journal of

Hydrology,375(3‐4), 363–372.

Van Lanen, H. A. J., Wanders, N., Tallaksen, L. M., & Van Loon, A. F. (2013). Hydrological drought across the world: Impact of climate and

physical catchment structure. Hydrology and Earth System Sciences,17, 1715–1732.

Van Loon, A. F., & Laaha, G. (2015). Hydrological drought severity explained by climate and catchment characteristics. Journal of

Hydrology,526,3–14.

Van Loon, A. F., Tijdeman, E., Wanders, N., Van Lanen, H. J., Teuling, A. J., & Uijlenhoet, R. (2014). How climate seasonality

modiﬁes drought duration and deﬁcit. Journal of Geophysical Research: Atmospheres,119, 4640–4656. https://doi.org/10.1002/

2013JD020383

10.1029/2018WR024620

Water Resources Research

KONAPALA AND MISHRA 24 of 25

Van Loon, A. F., & Van Lanen, H. A. J. (2012). A process‐based typology of hydrological drought. Hydrology and Earth System Sciences,

16(7), 1915–1946.

Veettil, A. V., Konapala, G., Mishra, A. K., & Li, H. Y. (2018). Sensitivity of drought resilience‐vulnerability‐exposure to hydrologic ratios in

contiguous United States. Journal of Hydrology,564, 294–306.

Vicente‐Serrano, S. M. (2006). Spatial and temporal analysis of droughts in the Iberian Peninsula (1910–2000). Hydrological Sciences

Journal,51(1), 83–97.

Vicente‐Serrano, S. M., López‐Moreno, J. I., Beguería, S., Lorenzo‐Lacruz, J., Azorin‐Molina, C., & Morán‐Tejeda, E. (2011). Accurate

computation of a streamﬂow drought index. Journal of Hydrologic Engineering,17(2), 318–332.

Wan, W., Zhao, J., Li, H. Y., Mishra, A., Ruby Leung, L., Hejazi, M., et al. (2017). Hydrological drought in the Anthropocene: Impacts of

local water extraction and reservoir regulation in the US. Journal of Geophysical Research: Atmospheres,122, 11,313–11,328. https://doi.

org/10.1002/2017JD026899

Wang, D., Hejazi, M., Cai, X., & Valocchi, A. J. (2011). Climate change impact on meteorological, agricultural, and hydrological drought in

central Illinois. Water Resources Research,47, W09527. https://doi.org/10.1029/2010WR009845

Xie, X. L., & Beni, G. (1991). A validity measure for fuzzy clustering. IEEE Transactions on Pattern Analysis & Machine Intelligence,8,

841–847.

Yadav, M., Wagener, T., & Gupta, H. (2007). Regionalization of constraints on expected watershed response behavior for improved pre-

dictions in ungauged basins. Advances in Water Resources,30(8), 1756–1774.

Yevjevich, V. (1967). An objective approach to deﬁnitions and investigations of continental hydrologic droughts. Hydrol. Papers 23,

Colorado State University Publication, Colorado State University, Fort Collins, Colorado, USA.

Yoo, J., Kwon, H. H., Kim, T. W., & Ahn, J. H. (2012). Drought frequency analysis using cluster analysis and bivariate probability

distribution. Journal of Hydrology,420, 102–111.

Zhang, C., & Ma, Y. (2012). Ensemble machine learning: Methods and applications. Springer Science & Business Media.

Zhang, Q., Xiao, M., Singh, V. P., & Li, J. (2012). Regionalization and spatial changing properties of droughts across the Pearl River basin,

China. Journal of Hydrology,472, 355–366.

10.1029/2018WR024620

Water Resources Research

KONAPALA AND MISHRA 25 of 25

A preview of this full-text is provided by Wiley.

Learn more

Content available from Water Resources Research

This content is subject to copyright. Terms and conditions apply.

Global Flash Droughts Characteristics: Onset, Duration, and Extent at Watershed Scales

Article

Full-text available

May 2024
GEOPHYS RES LETT

Plain Language Summary Flash droughts (FDs), which are sudden and severe dry periods, are causing problems for our water and food systems and making it harder to prepare for disasters. To address these challenges effectively, it is crucial to gain a thorough understanding of the underlying mechanisms and factors driving FDs at the watershed level. In this study, we looked at climatic patterns alongside the lengths of dry and wet periods spanning from 1980 to 2019. Our primary focus was on three key aspects: the extent of FDs, when they begin, and how long they persist. Our research findings demonstrate considerable variations in FDs occurrences across different regions. Notably, in the Southern Hemisphere, FDs are expanding rapidly, developing more swiftly, and enduring for extended periods, closely mirroring shifts in precipitation and temperature patterns. Interestingly, the onset and duration of FDs seem to depend more on the intensity of climatic factors than on how long it's been dry or wet. The expansion of FDs in a region is linked to both the climatic and dry/wet periods, emphasizing the geophysical connectivity within a watershed.

Establishing and modeling the causality relationship of hydro-climatic and land cover change variables with water quality over Lake Tana, Ethiopia

Article

Full-text available

Mar 2024

Lakes, the most widespread inland water bodies in the globe, are highly susceptible to change in trophic state due to external factors. Changing hydro-climatic conditions and land cover changes (LCC) can cause lake water quality deterioration. This study establishes the quantitative relationship between variability in the water quality index and changes in hydro-climatic and LCC variables. Water quality is represented by the Forel-ule index (FUI) whereas the hydro-climatic variables considered in this study are lake bottom layer temperature (lblt), lake total layer temperature (ltlt), precipitation, runoff, evaporation, lake skin temperature (lskt), surface wind speed and air temperature. The LCC is quantified by lower and higher level leaf area index (Lv-lai and Hv-lai). FUI has a positive relationship with surface wind speed, precipitation, runoff, ltlt, lblt, and LCC and a negative relationship with evaporation, lskt, and air temperature with 95% confidence level over most parts of the Lake. The temporal correlation is also apparent from the long-term trend pattern. A significant decreasing trend is observed in FUI and lake bottom layer temperature (lblt). In contrast, an insignificant increasing trend is observed in air tem�perature and lake skin temperature (lskt). The changes in LCC, runoff, precipitation, and surface wind speed is insignificant between 2000 and 2020. Moreover, the phase composites of FUI and hydro-climatic and LCC variables derived from multichannel singular spectrum analysis (MSSA) show strong seasonal modulation of water quality by hydro-climatic and LCC variables. The annual cycle represented by the first two eigenmodes (except wind speed which is represented by the second and third eigenmodes) accounts for between 27.41% (wind speed) to 52.32% (precipitation) of the total joint spatiotemporal variability of FUI and the driving var�iables. The convergent cross-mapping (CCM) analysis shows that cross-map skill (ρ2 ) is increased with increasing library length (L) and time delay (τ), which suggests significant causal effects of hydro-climatic and LCC variables on FUI and the lagged causation is consistent with maximum values of ρ2 . The significant feedback of FUI to changes in hydro-climatic and LCC variables shows the possibility of hindcast/forecast of the historical/future status of water quality from hydro-climatic and LCC variables. As a result, a multivariate nonlinear regression model (MNWQFM) is developed to forecast the lake water quality index from the hydro-climatic and LCC var�iables. The model has high performance with R2 of 83.6% and root means square error (RMSE) of 0.15 in FUI.

Drought Analysis and Forecasting in Odisha using Machine Learning Techniques

Article

Jun 2024

Neena Uthaman

Drought is a natural phenomenon that damages agricultural land severely. The severity of drought must be reduced to decrease its impact on agricultural productivity. The study of drought was carried out for the state Odisha which experienced drought 8 times during the last 20 years due to failure of monsoon. Analysis for the data was explored by explorative analysis.The drought forecasting was carried out using machine learning techniques like the Auto-regressive model (AR), Long Short-Term Memory (LSTM), and Auto-regressive Integrated Moving Average (ARIMA) using daily rainfall data collected for 28 years (1993-2020). Further using this data each district was categorised into four different categories namely Flood (FL), No Drought (ND), Moderate Drought (MD), and Severe Drought (SD). To classify the districts after forecasting, classification models were used like Support Vector Classifier (SVC) and Naïve Bayes. The results of the forecasting model as well as the classification model were compared. It becomes important to forecast drought for proper planning and management of the water resource system to decrease the damage due to such calamities. This study is valuable for the government, farmers, and other stakeholders to understand the pattern and reason behind the severity of drought to take relevant precautionary measures and improve decisions and facilities to tackle such natural calamities.

Multidimensional Water Level and Water Quality Response to Severe Drought in Xingyun Lake

Article

Full-text available

Jun 2024

Drought stress has a significant impact on the quality and quantity of lake water. Understanding this impact is crucial for preventing water security risks and pollution recovery. However, there is a lack of systemic understanding of how drought affects water quality and quantity, and how they change in multiple dimensions. This manuscript established a synthesized methodology with the principles to judge the applicability and three steps of application to detect the change in water quality and water level under severe drought in Xingyun Lake, China. Results show that (1) The water level and water quality of Xingyun Lake have a synchronous and evident response to drought during 2009–2014. The rainfall during 2008–2015 declined by 22.9 % to normal, and the inundated area and lake water depth in 2012 decreased by 10.50 % from 2002 to 1.38 m to the average depth, respectively. The pollution index climbed above 1.21 after 2008, fluctuating around 1.42. (2) Under drought, the water quality indicators significantly changed in the terms of the overall feature, trend, eigenvalue, and morphological characteristics. The water quality indicators of Set2008-2015 are significantly different from set2000-2007 and not in the groups of set1994-2000. The morphological characteristics of water quality indicators in set2008-2015 differs significantly from that in set2000-2007 shown by the minimum, maximum, median, quartiles, and extreme values. (3) Although NH3–N showed no significant change, the water quality deteriorated in the physical, chemical, and biological aspects. The TP, IMN, and BOD5 changed more evidently than DO and NH3–N. (4) Water quality grade and indicator concentration deteriorated significantly and sharply under severe drought and are threatened deeply by TP and TN. The synthesized methodology is scientifically constructed and canbe employed in the characteristics cognition of water quality and water level to severe drought in and out of this research. And the intervention time and various regulating measures for pollution degradation and water quality recovery canbe constructed based on the multi-dimensional analysis of water quality change under drought evolution.

Generative Adversarial Network for Real‐Time Flash Drought Monitoring: A Deep Learning Study

Article

Full-text available

May 2024
WATER RESOUR RES

Droughts are among the most devastating natural hazards, occurring in all regions with different climate conditions. The impacts of droughts result in significant damages annually around the world. While drought is generally described as a slow‐developing hazardous event, a rapidly developing type of drought, the so‐called flash drought has been revealed by recent studies. The rapid onset and strong intensity of flash droughts require accurate real‐time monitoring. Addressing this issue, a Generative Adversarial Network (GAN) is developed in this study to monitor flash droughts over the Contiguous United States (CONUS). GAN contains two models: (a) discriminator and (b) generator. The developed architecture in this study employs a Markovian discriminator, which emphasizes the spatial dependencies, with a modified U‐Net generator, tuned for optimal performance. To determine the best loss function for the generator, four different networks are developed with different loss functions, including Mean Absolute Error (MAE), adversarial loss, a combination of adversarial loss with Mean Square Error (MSE), and a combination of adversarial loss with MAE. Utilizing daily datasets collected from NLDAS‐2 and Standardized Soil Moisture Index (SSI) maps, the network is trained for real‐time daily SSI monitoring. Comparative assessments reveal the proposed GAN's superior ability to replicate SSI values over U‐Net and Naïve models. Evaluation metrics further underscore that the developed GAN successfully identifies both fine‐ and coarse‐scale spatial drought patterns and abrupt changes in the SSI temporal patterns that is important for flash drought identification.

Investigation of New Integrated Drought Monitoring Model Taking into Account the Effects of Climate Anomalies

Article

Jun 2024

Drought mechanisms vary markedly within different ecogeographical regions. Existing drought indices do not reflect the impact of climate anomalies on drought. In this study, the climate anomaly index was incorporated into the drought model based on the study of the effects of the El Niño-Southern Oscillation (ENSO) and Madden-Julian Oscillation (MJO) on drought in different eco-geographic zones of China. The model uses climate anomaly indices, meteorological drought indices, vegetation growth condition data, surface temperature, and biophysical attributes as characteristic variables and the Palmer Drought Severity Index (PDSI) as the dependent variable for model construction based on the Random Forest (RF) method. The results showed that the model has high accuracy for drought monitoring. The correlation coefficients between model results and observed drought condition values for all four seasons were above 0.95. The model was applied to drought monitoring in North China and the Huang-Huai-Hai region from 2006 to 2018. The statistical interpolation results of the meteorological drought indices and precipitation data were used to verify the application effect of the model. It was found that the model can accurately monitor drought caused by precipitation scarcity and reflect local variations in drought. This study provides a new model (Climate Anomaly Considering Integrated Surface Drought Index, CAC-ISDI) for drought monitoring that aims at quantitative and detailed monitoring of drought conditions and regional differences. It provides a robust method for accurate drought monitoring and evaluation in China and the rest of the world.

Three-dimensional perspective on the characterization of the spatiotemporal propagation from meteorological to agricultural drought

Article

Jun 2024
AGR FOREST METEOROL

Future projections of meteorological, agricultural and hydrological droughts in China using the emergent constraint

Article

Jun 2024

Examining drinking water quality: analysis of physico-chemical properties and bacterial contamination with health implications for Shangla district, Khyber Pakhtunkhwa, Pakistan

Article

Full-text available

May 2024
ENVIRON GEOCHEM HLTH

A comprehensive understanding of water quality is essential for assessing the complex relationship between surface water and sources of pollution. Primarily, surface water pollution is linked to human and animal waste discharges. This study aimed to investigate the physico-chemical characteristics of drinking water under both dry and wet conditions, assess the extent of bacterial contamination in samples collected from various locations in District Shangla, and evaluate potential health risks associated with consuming contaminated water within local communities. For this purpose, 120 groundwater and surface water samples were randomly collected from various sources such as storage tanks, user sites, streams, ponds and rivers in the study area. The results revealed that in Bisham, lakes had the highest fecal coliform levels among seven tested sources, followed by protected wells, reservoirs, downstream sources, springs, rivers, and ditches; while in Alpuri, nearly 80% of samples from five sources contained fecal coliform bacteria. Similarly, it was observed that the turbidity level, total dissolved solids, electrical conductivity, biological oxygen demand, and dissolved oxygen in the surface drinking water sources of Bisham were significantly higher than those in the surface drinking water sources of Alpuri. Furthermore, the results showed that in the Alpuri region, 14% of the population suffers from dysentery, 27% from diarrhea, 22% from cholera, 13% from hepatitis A, and 16% and 8% from typhoid and kidney problems, respectively, while in the Bisham area, 24% of residents are affected by diarrhea, 17% by cholera and typhoid, 15% by hepatitis A, 14% by dysentery, and 13% by kidney problems. These findings underscore the urgent need for improved water quality management practices and public health interventions to mitigate the risks associated with contaminated drinking water. It is recommended to implement regular water quality monitoring programs, enhance sanitation infrastructure, and raise awareness among local communities about the importance of safe drinking water practices to safeguard public health.

On Robustness of the Explanatory Power of Machine Learning Models

Preprint

Full-text available

Mar 2024

Machine learning (ML) is increasingly considered the solution to environmental problems where only limited or no physico-chemical process understanding is available. But when there is a need to provide support for high-stake decisions, where the ability to explain possible solutions is key to their acceptability and legitimacy, ML can come short. Here, we develop a method, rooted in formal sensitivity analysis (SA), that can detect the primary controls on the outputs of ML models. Unlike many common methods for explainable artificial intelligence (XAI), this method can account for complex multi-variate distributional properties of the input-output data, commonly observed with environmental systems. We apply this approach to a suite of ML models that are developed to predict various water quality variables in a pilot-scale experimental pit lake. A critical finding is that subtle alterations in the design of an ML model (such as variations in random seed for initialization, functional class, hyperparameters, or data splitting) can lead to entirely different representational interpretations of the dependence of the outputs on explanatory inputs. Further, models based on different ML families (decision trees, connectionists, or kernels) seem to focus on different aspects of the information provided by data, although displaying similar levels of predictive power. Overall, this underscores the importance of employing ensembles of ML models when explanatory power is sought. Not doing so may compromise the ability of the analysis to deliver robust and reliable predictions, especially when generalizing to conditions beyond the training data.

Transient Anomalous Diffusion and Advective Slowdown of Bedload Tracers by Particle Burial and Exhumation

Article

Full-text available

Oct 2019
WATER RESOUR RES

The process of burial and exhumation of bedload particles within a certain depth of the riverbed leads to vertical exchange of particles, which significantly affects the characteristics of streamwise bedload transport. In this paper, we revisit the classic active layer formulation and extend it by incorporating the burial and exhumation through conceptualizing the fluctuations of bed surface as the relative vertical movement of buried tracer particles in the substrate layer (i.e. we change the static reference system to the fluctuating riverbed surface). We theoretically demonstrate, for the first time, the emergence of the transient anomalous (both super- and sub-) diffusion and power-law advective slowdown at the intermediate timescales, which are induced by the non-equilibrium transport as characterized by the inhomogeneous vertical mixing of tracers due to particle burial and exhumation. Neglecting the ballistic regime at extremely short times, at small- and large- timescales the transport regimes show normal diffusion. This result further implies that for the most typical fluvial riverbed with finite vertical exchange depth (i.e. non-aggrading or -degrading bed), the sub-diffusion of bedload tracers for large timescale transport may still be transient, which will eventually converge to the normal diffusion as time increases. Comparing the obtained analytical solutions with available numerical results as well as field observations, we show that the proposed formulation can capture well anomalous diffusion and the power-law slowdown of the advective velocity of bedload tracers at intermediate timescales, and more importantly the transition from anomalous to normal diffusion at large timescales.

Modeling Depth of the Redox Interface at High Resolution at National Scale Using Random Forest and Residual Gaussian Simulation

Article

Full-text available

Feb 2019
WATER RESOUR RES

The management of water resources needs robust methods to efficiently reduce nitrate loads. Knowledge on where natural denitrification takes place in the subsurface is thereby essential. Nitrate is naturally reduced in anoxic environments and high-resolution information of the redox interface, that is, the depth of the uppermost reduced zone is crucial to understand the variability of the denitrification potential. In this study we explore the opportunity to use random forest (RF) regression to model redox depth across Denmark at 100-m resolution based on ~13,000 boreholes as training data. We highlight the importance of expert knowledge to guide the RF model in areas where our conceptual understanding is not represented correctly in the training data set by addition of artificial observations. We apply random forest regression kriging in which sequential Gaussian simulation models the RF residuals. The RF model reaches a R ² score of 0.48 for an independent validation test. Including sequential Gaussian simulation honors observations through local conditioning, and the spread of 800 realizations can be utilized to map uncertainty. Emphasis is put on adequate handling of nonstationarities in variance and spatial correlation of the RF residuals. The RF residuals show no spatial correlation for large parts of the modeling domain, and a local variance scaling method is applied to account for the nonstationary variance. Moreover, we present and exemplify a framework where newly acquired field data can easily be integrated into random forest regression kriging to quickly update local models.

Experimental Censorship of Bed Load Particle Motions and Bias Correction of the Associated Frequency Distributions

Article

Full-text available

Jan 2019

Knowledge of the statistical distributions of particle hop properties (distances, travel, and rest times) enables a deeper understanding of bed load sediment transport. However, the measurement of particle hops is prone to censorship: Since many hops cross the boundaries of a spatial‐temporal observation window, one knows that they exist but does not know how long they are. An option is to build particle hop samples considering only the hops that are completely observed and excluding (censoring) those observed only partially. Such a choice, however, biases the frequency distributions of the hop properties. Moreover, censorship acts in both space and time, and a hop censored in time will also not contribute to a sample of hop lengths, and vice versa. Time censorship similarly applies to particle rest times. This paper presents a theoretical formulation of censorship that leads to nonparametric bias corrections recovering estimates of values of the underlying distributions of hop distance, travel time, and rest time up to sampling window dimensions. We illustrate the occurrence and consequences of experimental censorship, and the benefit of applying the bias corrections, for both synthetic and laboratory samples of particle hops. The corrections reasonably recover the relative proportions of frequency distributions represented by the data up to the sampling dimensions and improve the estimates of the first two moments of particle hop properties. Recommendations are given regarding how the size of an observation window may be chosen to reduce the bias to below some prescribed value, if the forms of the underlying distributions are known.

Legacy, Rather Than Adequacy, Drives the Selection of Hydrological Models

Article

Full-text available

Jan 2019
WATER RESOUR RES

The findings of hydrological modeling studies depend on which model was used. Although hydrological model selection is a crucial step, experience suggests that hydrologists tend to stick to the model they have experience with, and rarely switch to competing models, although these models might be more adequate given the study objectives. To gain quantitative insights into model selection, we explored the use of seven rainfall-runoff models based on the abstract of 1,529 peer-reviewed papers published between 1991 and 2018. The models selected were the Hydrologiska Byråns Vattenbalansavdelning model (HBV), the Variable Infiltration Capacity model (VIC), the mesoscale Hydrological model (mHM), the TOPography-based hydrologic model (TOPMODEL), the Precipitation Runoff Modelling System (PRMS), the Génie Rural model à 4 paramètres Journaliers (GR4J), and the Sacramento soil moisture accounting model. We provide quantitative evidence of regional preferences in model use across the world and demonstrate that specific models are consistently preferred by certain institutes. Model attachment is particularly strong. In ~74% of the studies, the model selected can be predicted solely based on the affiliation of the first author. The influence of adequacy on the model selection process is less clear. Our data reveal that each model is used across a wide range of purposes, landscapes, and temporal and spatial scales (i.e., as a model of everything and everywhere). Model intercomparisons can provide guidance for model selection and improve model adequacy, but they are still rare (because each model must usually be setup individually) and the insights they provide are currently limited (because they are rarely controlled experiments). We suggest that moving from fixed-structure models to modular modeling frameworks (master templates for model generation) can overcome these issues, enable a more collaborative and responsive model development environment, and result in improved model adequacy.

A Ranking of Hydrological Signatures Based on Their Predictability in Space

Article

Full-text available

Nov 2018

Hydrological signatures are now used for a wide range of purposes, including catchment classification, process exploration and hydrological model calibration. The recent boost in the popularity and number of signatures has however not been accompanied by the development of clear guidance on signature selection, meaning that signature selection is often arbitrary. Here we use three complementary approaches to compare and rank 15 commonly-used signatures, which we evaluate in 671 US catchments from the CAMELS data set (Catchment Attributes and MEteorology for Large-sample Studies). Firstly, we employ machine learning (random forests) to explore how attributes characterizing the climatic conditions, topography, land cover, soil and geology influence (or not) the signatures. Secondly, we use a conceptual hydrological model (Sacramento) to critically assess which signatures are well captured by the simulations. Thirdly, we take advantage of the large sample of CAMELS catchments to characterize the spatial smoothness (using Moran's I) of the signature field. These three approaches lead to remarkably similar rankings of the signatures. We show that signatures with the noisiest spatial pattern tend to be poorly captured by hydrological simulations, that their relationship to catchments attributes are elusive (in particular they are not correlated to climatic indices like aridity) and that they are particularly sensitive to discharge uncertainties. We question the utility and reliability of those signatures in experimental and modeling hydrological studies, and we underscore the general importance of accounting for uncertainties in hydrological signatures.

Constraining Conceptual Hydrological Models With Multiple Information Sources

Article

Full-text available

Sep 2018

The calibration of hydrological models without streamflow observations is problematic, and the simultaneous, combined use of remotely sensed products for this purpose has not been exhaustively tested thus far. Our hypothesis is that the combined use of products can 1) reduce the parameter search space and 2) improve the representation of internal model dynamics and hydrological signatures. Five different conceptual hydrological models were applied to 27 catchments across Europe. A parameter selection process, similar to a likelihood weighting procedure, was applied for 1023 possible combinations of ten different data sources, ranging from using 1 to all 10 of these products. Distances between the two empirical distributions of model performance metrics with and without using a specific product, were determined to assess the added value of a specific product. In a similar way, the performance of the models to reproduce 27 hydrological signatures was evaluated relative to the unconstrained model. Significant reductions in the parameter space were obtained when combinations included AMSR‐E and ASCAT soil moisture, GRACE total water storage anomalies, as well as, in snow dominated catchments, the MODIS snow cover products. The evaporation products of LSA‐SAF and MOD16 were less effective for deriving meaningful, well constrained posterior parameter distributions. The hydrological signature analysis indicated that most models profited from constraining with an increasing number of data sources. Concluding, constraining models with multiple data sources simultaneously was shown to be valuable for at least four of the five hydrological models to determine model parameters in absence of streamflow.

Random forest as a generic framework for predictive modeling of spatial and spatio-temporal variables

Article

Full-text available

Aug 2018

Random forest and similar Machine Learning techniques are already used to generate spatial predictions, but spatial location of points (geography) is often ignored in the modeling process. Spatial auto-correlation, especially if still existent in the cross-validation residuals, indicates that the predictions are maybe biased, and this is suboptimal. This paper presents a random forest for spatial predictions framework (RFsp) where buffer distances from observation points are used as explanatory variables, thus incorporating geographical proximity effects into the prediction process. The RFsp framework is illustrated with examples that use textbook datasets and apply spatial and spatio-temporal prediction to numeric, binary, categorical, multivariate and spatiotemporal variables. Performance of the RFsp framework is compared with the state-of-the-art kriging techniques using fivefold cross-validation with refitting. The results show that RFsp can obtain equally accurate and unbiased predictions as different versions of kriging. Advantages of using RFsp over kriging are that it needs no rigid statistical assumptions about the distribution and stationarity of the target variable, it is more flexible towards incorporating, combining and extending covariates of different types, and it possibly yields more informative maps characterizing the prediction error. RFsp appears to be especially attractive for building multivariate spatial prediction models that can be used as “knowledge engines” in various geoscience fields. Some disadvantages of RFsp are the exponentially growing computational intensity with increase of calibration data and covariates and the high sensitivity of predictions to input data quality. The key to the success of the RFsp framework might be the training data quality—especially quality of spatial sampling (to minimize extrapolation problems and any type of bias in data), and quality of model validation (to ensure that accuracy is not effected by overfitting). For many data sets, especially those with lower number of points and covariates and close-to-linear relationships, model-based geostatistics can still lead to more accurate predictions than RFsp.

Interpretability of Machine Learning Models and Representations: an Introduction

Conference Paper

Full-text available

Apr 2016

Interpretability is often a major concern in machine learning. Although many authors agree with this statement, interpretability is often tackled with intuitive arguments, distinct (yet related) terms and heuristic quantifications. This short survey aims to clarify the concepts related to interpretability and emphasises the distinction between interpreting models and representations, as well as heuristic-based and user-based approaches.

Interpreting and Using Regression

Book

Jan 1982

Christopher Achen

An Introduction to the Bootstrap

Book

May 1994

Quantifying Climate and Catchment Control on Hydrological Drought in the Continental United States

Abstract and Figures

Recommended publications

Paleoenvironmental Perspectives on Drought in Western Canada - Introduction

Hydrogeologic influence on changes in snowmelt runoff with climate warming: Numerical experiments on...

California from drought to deluge

Fine scale ecohydrological processes in northern peatlands and their relevance for the carbon cycle