Chapter

Application of a Machine Learning Technique for Developing Short-Term Flood and Drought Forecasting Models in Tropical Mountainous Catchments

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Floods and droughts are among the most common natural hazards worldwide. They produce major impacts on society, economy, and ecosystems. Even worst, the frequency and severity of hydrological extremes are expected to increase with climate change and land-use alteration. As a countermeasure, during last decades, implementation of flood and drought forecasting models have globally become an emerging field of research for water management and risk assessment. In mountainous areas, hydrological extremes forecasting is unfortunately more challenging considering that information other than precipitation and runoff is not commonly available due to budget constraints, remoteness of the study areas and extreme spatio-temporal variability of additional driving forces. This is especially true for the tropical Andes in South America, which is the longest and widest cool region in the tropics. Recent advances in computational science coupled with long-term data availability have boosted Machine Learning (ML) applications. Among the variety of ML techniques, there is a potential to use the Random Forest (RF) algorithm due to its simplicity, robustness and capacity to deal with complex data structures. We used a step-wise methodology to developed short-term flood and drought forecasting models for several lead times (4, 8, 12 and 24 h) for two catchment representative of the Ecuadorian Andes. We found that derived models can reach maximum validation performances (Nash–Sutcliffe efficiency, NSE) from 0.860 (4-h) to 0.545 (24-h) for optimal inputs composed only by features accounting for 80% of the model’s outcome variance. Moreover, we found that a set of RF hyper-parameters can be transferred to a comparable catchment with a maximum model performance reduction of 0.10 (NSE). Overall, the forecasting of hydrological extremes (especially floods) is challenging mainly due to lack of relevant data (driving forces) and sufficient extreme events from which RF models learn. The applicability of this study is to assist authorities in flood and drought management to evaluate hazard risks and to found the basis for developing integrated action plans from a local and regional perspective.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Approaches vary depending on the number of variables considered, the type of learning model used, and the response variable to be expected to determine risks. In terms of prediction delays, the most predicted time score is 4 hours (Muñoz et al. 2021) and predictions of 1 to 3 months are the most popular (Tadesse et al. 2014, Adede et al. 2019. ...
... Now, short-term multivariate predictive modelling hours to days in advance can be established (Anshuka et al. 2019;Dikshit et al. 2022). The shortest prediction score is 4 hours (Muñoz et al. 2021) and predictions of 1-3 months are the most popular (Tadesse et al. 2014, Adede et al. 2019. However, despite the significant results recorded in recent years in predictive drought modelling, no approach has been proposed for predictive modelling of the post-drought recovery phase. ...
Article
Full-text available
This article reviews the main recent applications of multi-sensor remote sensing and Artificial Intelligence techniques in the multivariate modelling of agricultural drought. The study focused mainly on three fundamental aspects, namely descriptive modelling, predictive modelling, and spatial modelling of expected risks and vulnerability to drought. Thus, out of 417 articles across all studies on drought, 226 articles published from 2010 to 2022 were analyzed to provide a global overview of the current state of knowledge on multivariate drought modelling using the inclusion criteria. The main objective is to review the recent available scientific evidence regarding multivariate drought modelling based on the joint use of geospatial technologies and artificial intelligence. The analysis focused on the different methods used, the choice of algorithms and the most relevant variables depending on whether they are descriptive or predictive models. Criteria such as the skill score, the given game complexity used, and the nature of validation data were considered to draw the main conclusions. The results highlight the very heterogeneous nature of studies on multivariate modelling of agricultural drought and the very original nature of studies on multivariate modelling of agricultural drought in the recent literature. For future studies, in addition to scientific advances in prospects, case studies and comparative studies appear necessary for an in-depth analysis of the reproducibility and operational applicability of the different approaches proposed for spatial and temporal modelling of agricultural drought.
... Approaches vary depending on the number of variables considered, the type of learning model used, and the response variable to be expected to determine risks. In terms of prediction delays, the most predicted time score is 4 hours (Muñoz et al. 2021) and predictions of 1 to 3 months are the most popular (Tadesse et al. 2014, Adede et al. 2019. ...
... Now, short-term multivariate predictive modelling hours to days in advance can be established (Anshuka et al. 2019;Dikshit et al. 2022). The shortest prediction score is 4 hours (Muñoz et al. 2021) and predictions of 1-3 months are the most popular (Tadesse et al. 2014, Adede et al. 2019. However, despite the significant results recorded in recent years in predictive drought modelling, no approach has been proposed for predictive modelling of the post-drought recovery phase. ...
Article
Full-text available
This article reviews the main recent applications of multi-sensor remote sensing and Artificial Intelligence techniques in multivariate modelling of agricultural drought. The study focused mainly on three fundamental aspects, namely descriptive modelling, predictive modelling, and spatial modelling of expected risks and vulnerability to drought. Thus, out of 417 articles across all studies on drought, 226 articles published from 2010 to 2022 were analyzed to provide a global overview of the current state of knowledge on multivariate drought modelling using the inclusion criteria. The main objective is to review the recent available scientific evidence regarding multivariate drought modelling based on the joint use of geospatial technologies and artificial intelligence. The analysis focused on the different methods used, the choice of algorithms and the most relevant variables depending on whether they are descriptive or predictive models. Criteria such as the skill score, the given game complexity used, and the nature of validation data were considered to draw the main conclusions. The results highlight the very heterogeneous nature of studies on multivariate modelling of agricultural drought, and the very original nature of studies on multivariate modelling of agricultural drought in the recent literature. For future studies, in addition to scientific advances in prospects, case studies and comparative studies appear necessary for an in-depth analysis of the reproducibility and operational applicability of the different approaches proposed for spatial and temporal modelling of agricultural drought. HIGHLIGHTS The components and fundamentals of multivariate modelling of agricultural drought were discussed. The importance of hybrid artificial intelligence models is widely discussed in improving the performance of traditional machine learning models. Quantum machine learning algorithms are weakly explored in multivariate drought modelling. Therefore, future studies should explore this approach. The major challenge of multivariate modelling of drought frequency is mainly related to the difference in the return periods of the different variables (time-shifted and spatially effects).
... Min_samples_leaf sets the minimum number of leaves on nodes, so higher number of min_samples_leaf can reduce the depth of the tree and help control overfit. [34] On the other hand, max_features parameter controls how many features should be involved in deciding the best split, so setting a smaller value for max_features can restrict overfitting. [35] For XGBoost, gamma, learning_rate, and min_child_weight were added to n_estimators and n_depth discussed above. ...
Preprint
Full-text available
Obesity is a growing health problem in the United States. Studies suggest that prevalence of obesity differs significantly across regions and socioeconomic factors. In this study, 3,220 counties in the U.S. were categorized to three groups based on prevalence of adult obesity. Further, Random Forest and Extreme Gradient Boost (XGBoost) were implemented in the dataset to predict obesity rates of the respective counties. XGBoost achieved the higher scores, with accuracy of 55.84% and F1 score of 56.23%. Proportion of physically inactive people were the most important feature in predicting adult obesity rates.
Article
Climate change, coupled with adaptive human actions, will affect water resource access and availability for environmental requirements. Hydrological modelling is an effective strategy for forecasting climate and human impacts on water resources. Modelling tools are advised for sparse-data mountainous basins intended to supply densely populated urban areas in the future. In this context, the SWAT model is used to evaluate the impact of climate change on renewable groundwater resources in the Pita River basin (PRB), a representative area of the andosol-dominated páramo ecosystem in the Andean highlands projected to meet future water demand in the metropolitan district of Quito (MDQ), Ecuador. Based on data availability, a SWAT model is configured for the PRB over the period 2006–2015 and five regional climate models (RCMs) based on two Representative Concentration Pathway (RCP) emission scenarios (4.5 and 8.5) for the mid- (2040–2069) and long-term (2070–2099) future horizons are implemented. Hydrological modelling demonstrates increases in average temperature and precipitation of + 2 °C and + 3 % in the mid-term and + 4.5 °C and + 20 % in the long term, respectively. All the RCMs indicate less aquifer recharge in the mid-term. However, this pattern is softened, and even reversed, in several scenarios in the long term. Seasonal differences in streamflow and aquifer recharge in the baseline scenario are predicted to increase to + 23 %. The natural hydrological regime determined by thick porous allophane-rich andosols over virtually full moderate-permeability volcanic and volcano-sedimentary aquifers induces high streamflow and low aquifer recharge rates. Future hydrological regime could place the highly sensitive soil–vegetation dynamics of the páramo ecosystem at risk of degradation, with negative consequences for habitat preservation, in general, and stream water provision, in particular. Hence, groundwater is the safest option for water provision in the future.
Preprint
Full-text available
Extreme runoff modeling is hindered by the lack of sufficient and relevant ground information and the low reliability of physically-based models. The authors propose to combine precipitation Remote Sensing (RS) products, Machine Learning (ML) modeling, and hydrometeorological knowledge to improve extreme runoff modeling. The approach applied to improve the representation of precipitation is the object-based Connected Component Analysis (CCA), a method that enables classifying and associating precipitation with extreme runoff events. Random Forest (RF) is employed as a ML model. We used 2.5 years of nearly-real-time hourly RS precipitation from the PERSIANN-CCS and IMERG-early run databases (spatial resolutions of 0.04 o and 0.1 o , respectively), and runoff at the outlet of a 3391 km 2-basin located in the tropical Andes of Ecuador. The developed models show the ability to simulate extreme runoff for the cases of long-duration precipitation events regardless of the spatial extent, obtaining Nash-Sutcliffe efficiencies (NSE) above 0.72. On the contrary, we found an unacceptable model performance for a combination of short-duration and spatially-extensive precipitation events. The strengths/weaknesses of the developed ML models are attributed to the ability/difficulty to represents complex precipitation-runoff responses.
Article
Full-text available
Floods are among the most destructive natural disasters, which are highly complex to model. The research on the advancement of flood prediction models contributed to risk reduction, policy suggestion, minimization of the loss of human life, and reduction the property damage associated with floods. To mimic the complex mathematical expressions of physical processes of floods, during the past two decades, machine learning (ML) methods contributed highly in the advancement of prediction systems providing better performance and cost-effective solutions. Due to the vast benefits and potential of ML, its popularity dramatically increased among hydrologists. Researchers through introducing novel ML methods and hybridizing of the existing ones aim at discovering more accurate and efficient prediction models. The main contribution of this paper is to demonstrate the state of the art of ML models in flood prediction and to give insight into the most suitable models. In this paper, the literature where ML models were benchmarked through a qualitative analysis of robustness, accuracy, effectiveness, and speed are particularly investigated to provide an extensive overview on the various ML algorithms used in the field. The performance comparison of ML models presents an in-depth understanding of the different techniques within the framework of a comprehensive evaluation and discussion. As a result, this paper introduces the most promising prediction methods for both long-term and short-term floods. Furthermore, the major trends in improving the quality of the flood prediction models are investigated. Among them, hybridization, data decomposition, algorithm ensemble, and model optimization are reported as the most effective strategies for the improvement of ML methods. This survey can be used as a guideline for hydrologists as well as climate scientists in choosing the proper ML method according to the prediction task.
Article
Full-text available
Flash-flood forecasting has emerged worldwide due to the catastrophic socio-economic impacts this hazard might cause and the expected increase of its frequency in the future. In mountain catchments, precipitation-runoff forecasts are limited by the intrinsic complexity of the processes involved, particularly its high rainfall variability. While process-based models are hard to implement, there is a potential to use the random forest algorithm due to its simplicity, robustness and capacity to deal with complex data structures. Here a step-wise methodology is proposed to derive parsimonious models accounting for both hydrological functioning of the catchment (e.g., input data, representation of antecedent moisture conditions) and random forest procedures (e.g., sensitivity analyses, dimension reduction, optimal input composition). The methodology was applied to develop short-term prediction models of varying time duration (4, 8, 12, 18 and 24 h) for a catchment representative of the Ecuadorian Andes. Results show that the derived parsimonious models can reach validation efficiencies (Nash-Sutcliffe coefficient) from 0.761 (4-h) to 0.384 (24-h) for optimal inputs composed only by features accounting for 80% of the model’s outcome variance. Improvement in the prediction of extreme peak flows was demonstrated (extreme value analysis) by including precipitation information in contrast to the use of pure autoregressive models.
Article
Full-text available
The random forest (RF) algorithm has several hyperparameters that have to be set by the user, for example, the number of observations drawn randomly for each tree and whether they are drawn with or without replacement, the number of variables drawn randomly for each split, the splitting rule, the minimum number of samples that a node must contain, and the number of trees. In this paper, we first provide a literature review on the parameters' influence on the prediction performance and on variable importance measures. It is well known that in most cases RF works reasonably well with the default values of the hyperparameters specified in software packages. Nevertheless, tuning the hyperparameters can improve the performance of RF. In the second part of this paper, after a presenting brief overview of tuning strategies, we demonstrate the application of one of the most established tuning strategies, model‐based optimization (MBO). To make it easier to use, we provide the tuneRanger R package that tunes RF with MBO automatically. In a benchmark study on several datasets, we compare the prediction performance and runtime of tuneRanger with other tuning implementations in R and RF with default hyperparameters. This article is categorized under: • Algorithmic Development > Biological Data Mining • Algorithmic Development > Statistics • Algorithmic Development > Hierarchies and Trees • Technologies > Machine Learning
Article
Full-text available
The Mediterranean area is prone to intense rainfall events triggering flash floods, characterized by very short response times that sometimes lead to dramatic consequences in terms of casualties and damages. These events can affect large territories, but their impact may be very local in catchments that are generally ungauged. These events remain difficult to predict and the processes leading to their generation still need to be clarified. The HyMeX initiative (Hydrological Cycle in the Mediterranean Experiment, 2010-2020) aims at increasing our understanding of the water cycle in the Mediterranean basin, in particular in terms of extreme events. In order to better understand processes leading to flash floods, a four-year experiment (2012-2015) was conducted in the Cévennes region (South-East) France as part of the FloodScale project. Both continuous and opportunistic measurements during floods were conducted in two large catchments (Ardèche and Gard rivers) with nested instrumentation from the hillslopes to catchments of about 1, 10, 100 to 1000 km2 covering contrasted geology and land use. Continuous measurements include distributed rainfall, stream water level, discharge, water temperature and conductivity and soil moisture measurements. Opportunistic measurements include surface soil moisture and geochemistry sampling during events and gauging of floods using non-contact methods: portable radars to measure surface water velocity or image sequence analysis using LS-PIV (Large Scale Particle Image Velocimetry). During the period 2012-2014, and in particular during autumn 2014, several intense events affected the catchments and provided very rich data sets. Data collection was complemented with modelling activity aiming at simulating observed processes. The modelling strategy was setup through a wide range of scales, in order to test hypotheses about physical processes at the smallest scales, and aggregated functioning hypothesis at the largest scales. During the project, a focus was also put on the improvement of rainfall fields characterization both in terms of spatial and temporal variability and in terms of uncertainty quantification. Rainfall reanalyses combining radar and rain gauges were developed. Rainfall simulation using a stochastic generator was also performed. Another effort was dedicated to the improvement of discharge estimation during floods and the quantification of streamflow uncertainties using Bayesian techniques. The paper summarizes the main results gained from the observations and the subsequent modelling activity in terms of flash flood process understanding at the various scales. It concludes on how the new acquired knowledge can be used for prevention and management of flash floods.
Article
Full-text available
Combining randomization methods with ensemble prediction is emerging as an effective option to balance accuracy and computational efficiency in data-driven modelling. In this paper, we investigate the prediction capability of extremely randomized trees (Extra-Trees), in terms of accuracy, explanation ability and computational efficiency, in a streamflow modelling exercise. Extra-Trees are a totally randomized tree-based ensemble method that (i) alleviates the poor generalisation property and tendency to overfitting of traditional standalone decision trees (e.g. CART); (ii) is computationally efficient; and, (iii) allows to infer the relative importance of the input variables, which might help in the ex-post physical interpretation of the model. The Extra-Trees potential is analysed on two real-world case studies – Marina catchment (Singapore) and Canning River (Western Australia) – representing two different morphoclimatic contexts. The evaluation is performed against other tree-based methods (CART and M5) and parametric data-driven approaches (ANNs and multiple linear regression). Results show that Extra-Trees perform comparatively well to the best of the benchmarks (i.e. M5) in both the watersheds, while outperforming the other approaches in terms of computational requirement when adopted on large datasets. In addition, the ranking of the input variable provided can be given a physically meaningful interpretation.
Article
Full-text available
The random forest algorithm, proposed by L. Breiman in 2001, has been extremely successful as a general-purpose classification and regression method. The approach, which combines several randomized decision trees and aggregates their predictions by averaging, has shown excellent performance in settings where the number of variables is much larger than the number of observations. Moreover, it is versatile enough to be applied to large-scale problems, is easily adapted to various ad-hoc learning tasks, and returns measures of variable importance. The present article reviews the most recent theoretical and methodological developments for random forests. Emphasis is placed on the mathematical forces driving the algorithm, with special attention given to the selection of parameters, the resampling mechanism, and variable importance measures. This review is intended to provide non-experts easy access to the main ideas.
Article
Full-text available
The present study aims to investigate the potential of the random forests ensemble classification and regression technique to improve rainfall rate assignment during day, night and twilight (resulting in 24-hour precipitation estimates) based on cloud physical properties retrieved from Meteosat Second Generation (MSG) Spinning Enhanced Visible and InfraRed Imager (SEVIRI) data. Random forests (RF) models contain a combination of characteristics that make them well suited for its application in precipitation remote sensing. One of the key advantages is the ability to capture non-linear association of patterns between predictors and response which becomes important when dealing with complex non-linear events like precipitation. Due to the deficiencies of existing optical rainfall retrievals, the focus of this study is on assigning rainfall rates to precipitating cloud areas in connection with extra-tropical cyclones in mid-latitudes including both convective and advective-stratiform precipitating cloud areas. Hence, the rainfall rates are assigned to rain areas previously identified and classified according to the precipitation formation processes. As predictor variables water vapor-IR differences and IR cloud top temperature are used to incorporate information on cloud top height. ΔT8.7–10.8 and ΔT10.8–12.1 are considered to supply information about the cloud phase. Furthermore, spectral SEVIRI channels (VIS0.6, VIS0.8, NIR1.6) and cloud properties (cloud effective radius, cloud optical thickness) are used to include information about the cloud water path during daytime, while suitable combinations of temperature differences (ΔT3.9–10.8, ΔT3.9–7.3) are considered during night-time. The development of the rainfall rate retrieval technique is realised in three steps. First, an extensive tuning study is carried out to customise each of the RF models. The daytime, night-time and twilight precipitation events have to be treated separately due to differing information content about the cloud properties between the different times of day. Secondly, the RF models are trained using the optimum values for the number of trees and number of randomly chosen predictor variables found in the tuning study. Finally, the final RF models are used to predict rainfall rates using an independent validation data set and the results are validated against co-located rainfall rates observed by a ground radar network. To train and validate the model, the radar-based RADOLAN RW product from the German Weather Service (DWD) is used which provides area-wide gauge-adjusted hourly precipitation information. Regarding the overall performance, as indicated by the coefficient of determination (Rsq), hourly rainfall rates show already a good correlation with Rsq = 0.5 (day and night) and Rsq = 0.48 (twilight) between the satellite and radar based observations. Higher temporal aggregation leads to better agreement. Rsq rises to 0.78 (day), 0.77 (night) and 0.75 (twilight) for 8-h interval. By comparing day, night and twilight performance it becomes evident that daytime precipitation is generally predicted best by the model. Twilight and night-time predictions are generally less accurate but only by a small margin. This may due to the smaller number of predictor variables during twilight and night-time conditions as well as less favourable radiative transfer conditions to obtain the cloud parameters during these periods. However, the results show that with the newly developed method it is possible to assign rainfall rates with good accuracy even on an hourly basis. Furthermore, the rainfall rates can be assigned during day, night and twilight conditions which enables the estimation of rainfall rates 24 h day. Keywords: Rainfall rate; Rainfall retrieval; Random forests; Machine learning; MSG SEVIRI; Geostationary satellites; Optical sensors
Article
Full-text available
This review considers the application of artificial neural networks (ANNs) to rainfall-runoff modelling and flood forecasting. This is an emerging field of research, characterized by a wide variety of techniques, a diversity of geographical contexts, a general absence of intermodel comparisons, and inconsistent reporting of model skill. This article begins by outlining the basic principles of ANN modelling, common network architectures and training algorithms. The discussion then addresses related themes of the division and preprocessing of data for model calibration/validation; data standardization techniques; and methods of evaluating ANN model performance. A literature survey underlines the need for clear guidance in current modelling practice, as well as the comparison of ANN methods with more conventional statistical models. Accordingly, a template is proposed in order to assist the construction of future ANN rainfall-runoff models. Finally, it is suggested that research might focus on the extraction of hydrological 'rules' from ANN weights, and on the development of standard performance measures that penalize unnecessary model complexity.
Chapter
Full-text available
The increasing availability of large amounts of historical data and the need of performing accurate forecasting of future behavior in several scientific and applied domains demands the definition of robust and efficient techniques able to infer from observations the stochastic dependency between past and future. The forecasting domain has been influenced, from the 1960s on, by linear statistical methods such as ARIMA models. More recently, machine learning models have drawn attention and have established themselves as serious contenders to classical statistical models in the forecasting community. This chapter presents an overview of machine learning techniques in time series forecasting by focusing on three aspects: the formalization of one-step forecasting problems as supervised learning tasks, the discussion of local learning techniques as an effective tool for dealing with temporal data and the role of the forecasting strategy when we move from one-step to multiple-step forecasting.
Article
Full-text available
This study provides a step-wise analysis of a conceptual grid-based distributed rainfall-runoff model, the United States National Weather Service (US NWS) Hydrology Laboratory Research Distributed Hydrologic Model (HL-RDHM). It evaluates model parameter sensitivities for annual, monthly, and event time periods with the intent of elucidating the key parameters impacting the distributed model's forecasts. This study demonstrates a methodology that balances the computational constraints posed by global sensitivity analysis with the need to fully characterize the HL-RDHM's sensitivities. The HL-RDHM's sensitivities were assessed for annual and monthly periods using distributed forcing and identical model parameters for all grid cells at 24-hour and 1-hour model time steps respectively for two case study watersheds within the Juniata River Basin in central Pennsylvania. This study also provides detailed spatial analysis of the HL-RDHM's sensitivities for two flood events based on 1-hour model time steps selected to demonstrate how strongly the spatial heterogeneity of forcing influences the model's spatial sensitivities. Our verification analysis of the sensitivity analysis method demonstrates that the method provides robust sensitivity rankings and that these rankings could be used to significantly reduce the number of parameters that should be considered when calibrating the HL-RDHM. Overall, the sensitivity analysis results reveal that storage variation, spatial trends in forcing, and cell proximity to the gauged watershed outlet are the three primary factors that control the HL-RDHM's behavior.
Article
Full-text available
Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems. This package focuses on bringing machine learning to non-specialists using a general-purpose high-level language. Emphasis is put on ease of use, performance, documentation, and API consistency. It has minimal dependencies and is distributed under the simplified BSD license, encouraging its use in both academic and commercial settings. Source code, binaries, and documentation can be downloaded from http://scikit-learn.sourceforge.net.
Article
Full-text available
Watershed models are powerful tools for simulating the effect of watershed processes and management on soil and water resources. However, no comprehensive guidance is available to facilitate model evaluation in terms of the accuracy of simulated data compared to measured flow and constituent values. Thus, the objectives of this research were to: (1) determine recommended model evaluation techniques (statistical and graphical), (2) review reported ranges of values and corresponding performance ratings for the recommended statistics, and (3) establish guidelines for model evaluation based on the review results and project-specific considerations; all of these objectives focus on simulation of streamflow and transport of sediment and nutrients. These objectives were achieved with a thorough review of relevant literature on model application and recommended model evaluation methods. Based on this analysis, we recommend that three quantitative statistics, Nash-Sutcliffe efficiency (NSE), percent bias (PBIAS), and ratio of the root mean square error to the standard deviation of measured data (RSR), in addition to the graphical techniques, be used in model evaluation. The following model evaluation performance ratings were established for each recommended statistic. In general, model simulation can be judged as satisfactory if NSE > 0.50 and RSR < 0.70, and if PBIAS + 25% for streamflow, PBIAS + 55% for sediment, and PBIAS + 70% for N and P. For PBIAS, constituent-specific performance ratings were determined based on uncertainty of measured data. Additional considerations related to model evaluation guidelines are also discussed. These considerations include: single-event simulation, quality and quantity of measured data, model calibration procedure, evaluation time step, and project scope and magnitude. A case study illustrating the application of the model evaluation guidelines is also provided.
Article
Full-text available
Flood risk assessment is an essential part of flood risk management. As part of the new EU flood directive it is becoming increasingly more popular in European flood policy. Particularly cities with a high concentration of people and goods are vulnerable to floods. This paper introduces the adaptation of a novel method of multicriteria flood risk assessment, that was recently developed for the more rural Mulde river basin, to a city. The study site is Leipzig, Germany. The "urban" approach includes a specific urban-type set of economic, social and ecological flood risk criteria, which focus on urban issues: population and vulnerable groups, differentiated residential land use classes, areas with social and health care but also ecological indicators such as recreational urban green spaces. These criteria are integrated using a "multicriteria decision rule" based on an additive weighting procedure which is implemented into the software tool FloodCalc urban. Based on different weighting sets we provide evidence of where the most flood-prone areas are located in a city. Furthermore, we can show that with an increasing inundation extent it is both the social and the economic risks that strongly increase.
Book
The methodology used to construct tree structured rules is the focus of this monograph. Unlike many other statistical procedures, which moved from pencil and paper to calculators, this text's use of trees was unthinkable before computers. Both the practical and theoretical sides have been developed in the authors' study of tree methods. Classification and Regression Trees reflects these two sides, covering the use of trees as a data analysis method, and in a more mathematical framework, proving some of their fundamental properties.
Article
The hydrometeorological monitoring is essential for proper management and conservation of water resources and to understand the functioning of drainage systems or to establish an early warning system. In the city of Cuenca there are some hydrometeorological monitoring networks, the Municipal Public Company of Telecommunications, Water, Sewerage and Sanitation of Cuenca (ETAPA EP) is the most important, as it has stations in the basins of four major rivers across the city. In recent years a modernization and improvement of the hydrometeorological network ETAPA EP has allowed most stations transmit information in nearly real time,allowinghydrometeorological information being displayed and analyzed instantly to generate alerts and take precautions against possible floods.
Article
Raising interest in the interaction between humans and climate drivers to understand the past and current development of floods in urbanised landscapes is of great importance. This study presents a regional screening of land-use, rainfall regime and flood dynamics in north-eastern Italy, covering the timeframe 1900–2010. This analysis suggests that, statistically, both climate and land-use have been contributing to a significant increase of the contribution of short duration floods to the increase in the number of flooded locations. The analysis also suggests that interaction arises, determining land-use dynamics to couple with climatic changes influencing the flood aggressiveness simultaneously. Given that it is not possible to control the climatic trend, an effective disaster management clearly needs an integrated approach to land planning and supervision. This research shows that land management and planning should include the investigation of the location of the past and future social and economic drivers for development, as well as past and current climatic trends.
Article
Common problems faced by rainfall–runoff modellers are data limitation, model overparameterization and related problems of parameter identifiability. Depending on the application, possible solutions to overcome these problems include the use of parsimonious conceptual models, avoid the use of a fixed pre-defined model conceptualization, but apply a ‘‘top-down’’ or ‘‘downward’’ method to allow the model structure to be adjusted or inferred from available data and field evidence. This paper presents a top-down procedure that starts from a generalized model structure framework that is adjusted in a case-specific parsimonious way. The model-structure building is done in a transparent, step-wise way, where separate parts of the model structure are identified and calibrated based on multiple and non-commensurable information derived from river flow series by means of a number of sequential time series processing tasks. These include separation of the high frequency (e.g., hourly, daily) river flow series into subflows, split of the series in approx. independent quick and slow flow hydrograph periods, and the extraction of independent peak and low flows. The model building and calibration account for the statistical assumptions and requirements on independency and homoscedasticity of the model residuals. Next to identification of the subflow recessions and related routing submodels, equations describing quick and slow runoff sub-responses and soil water storage are derived from the time series data. The method includes testing of the model performance for peak and low flow extremes.
Article
Random forests are a combination of tree predictors such that each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest. The generalization error for forests converges a.s. to a limit as the number of trees in the forest becomes large. The generalization error of a forest of tree classifiers depends on the strength of the individual trees in the forest and the correlation between them. Using a random selection of features to split each node yields error rates that compare favorably to Adaboost (Y. Freund & R. Schapire, Machine Learning: Proceedings of the Thirteenth International conference, ***, 148–156), but are more robust with respect to noise. Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the splitting. Internal estimates are also used to measure variable importance. These ideas are also applicable to regression.
Article
Perched springs in nature emerge from aquifers laying on aquitards within the unsaturated zone, some of which emerge one above the other. A finite element model was introduced, using the FEFLOW code, for simulating the groundwater flow regime in each of these aquifers, for quantifying the fraction of rain that recharges the aquifers, and for estimating the hydrogeological parameters of the aquifers and aquitards. Many of the perched springs in Israel are found in the Judea Group aquifer, a stratified carbonate rock unit, characterised by a well-developed karst system. The Batir and Jamia springs exemplifies such a system, where Batir is the upper spring discharging at the contact between Aminadav and Moza Formations, and Jamia is the lower one, discharging at the contact between Kesalon and Sorek Formations. The 25-year-long measured spring’s hydrographs were used to calibrate the spring’s coefficients, the hydraulic conductivities of the different layers, the karst features and the yearly amount of rain recharging the spring.
Article
A new approach for designing the network structure in an artificial neural network (ANN)-based rainfall-runoff model is presented. The method utilizes the statistical properties such as cross-, auto- and partial-auto-correlation of the data series in identifying a unique input vector that best represents the process for the basin, and a standard algorithm for training. The methodology has been validated using the data for a river basin in India. The results of the study are highly promising and indicate that it could significantly reduce the effort and computational time required in developing an ANN model. Copyright © 2002 John Wiley & Sons, Ltd.
Article
The group method of data handling (GMDH) algorithm presented by A. C. Ivakhnenko and colleagues is an heuristic self-organization method. It establishes the input–output relationship of a complex system using a multilayered perception-type structure that is similar to a feed-forward multilayer neural network. This study provides a step towards understanding and evaluating a role for GMDH in the investigation of the complex rainfall–runoff processes in a heterogeneous watershed in Taiwan. Two versions of the revised GMDH model are implemented: a stepwise regression procedure and a recursive formula. Eleven typhoon events in the Shen-cei Creek watershed, Taiwan, are used to build the model and verify its usefulness. The prediction results of the revised GMDH models and the instantaneous unit hydrograph (IUH) model are compared. Based on the criteria of forecasting precision and the rate and time of peak error, a much better performance is obtained with the revised GMDH models. Copyright © 1999 John Wiley & Sons, Ltd.
Article
Aim Humid tropical alpine environments are crucial ecosystems that sustain biodiversity, biological processes, carbon storage and surface water provision. They are identified as one of the terrestrial ecosystems most vulnerable to global environmental change. Despite their vulnerability, and the importance for regional biodiversity conservation and socio-economic development, they are among the least studied and described ecosystems in the world. This paper reviews the state of knowledge about tropical alpine environments, and provides an integrated assessment of the potential threats of global climate change on the major ecosystem processes. Location Humid tropical alpine regions occur between the upper forest line and the perennial snow border in the upper regions of the Andes, the Afroalpine belt and Indonesia and Papua New Guinea. Results and main conclusions Climate change will displace ecosystem boundaries and strongly reduce the total area of tropical alpine regions. Displacement and increased isolation of the remaining patches will induce species extinction and biodiversity loss. Drier and warmer soil conditions will cause a faster organic carbon turnover, decreasing the below-ground organic carbon storage. Since most of the organic carbon is currently stored in the soils, it is unlikely that an increase in above-ground biomass will be able to offset soil carbon loss at an ecosystem level. Therefore a net release of carbon to the atmosphere is expected. Changes in precipitation patterns, increased evapotranspiration and alterations of the soil properties will have a major impact on water supply. Many regions are in danger of a significantly reduced or less reliable stream flow. The magnitude and even the trend of most of these effects depend strongly on local climatic, hydrological and ecological conditions. The extreme spatial gradients in these conditions put the sustainability of ecosystem management at risk.
Article
This paper discusses the need for a well-considered approach to reconciling environmental theory with observations that has clear and compelling diagnostic power. This need is well recognized by the scientific community in the context of the ‘Predictions in Ungaged Basins’ initiative and the National Science Foundation sponsored ‘Environmental Observatories’ initiative, among others. It is suggested that many current strategies for confronting environmental process models with observational data are inadequate in the face of the highly complex and high order models becoming central to modern environmental science, and steps are proposed towards the development of a robust and powerful ‘Theory of Evaluation’. This paper presents the concept of a diagnostic evaluation approach rooted in information theory and employing the notion of signature indices that measure theoretically relevant system process behaviours. The signature-based approach addresses the issue of degree of system complexity resolvable by a model. Further, it can be placed in the context of Bayesian inference to facilitate uncertainty analysis, and can be readily applied to the problem of process evaluation leading to improved predictions in ungaged basins. Copyright © 2008 John Wiley & Sons, Ltd.
Article
The aim of this paper is to investigate the detailed hydrometeorological circumstances that lead to accidental casualties, and to better understand the prominent physical factors of risk. Based on an event that affected the Gard region (Southern France) in September 2002, it is a first attempt to combine analysis of the physical and human response to Mediterranean storms.After details concerning the methodology (for meteorological, hydrological and casualty analysis), the local context and the event, the authors examine two points: the dynamics of the event (flash-flood and riverine-flood response to the storm) together with human exposure on the one hand, and scale as a critical problem affecting flood risk on the other.This investigation stresses the specificity of small catchments, which are more dangerous both in hydrological and human terms. Moreover, this contribution linking social sciences and geophysics constitutes an important step in what [Morss, R.E., Wilhelmi, O.V., Downton, M.W., Gruntfest, E., 2005. Flood risk, uncertainty, and scientific information for decision making. Bull. Am. Meteor. Soc. 86 (11), 1593–1601] call the “End to end to end” process
Article
This paper analyses the problems involved in the conservation and management of the hydrological system of the South American páramo. The páramo consists of a collection of neotropical alpine grassland ecosystems covering the upper region of the northern Andes. They play a key role in the hydrology of the continent. Many of the largest tributaries of the Amazon basin have their headwaters in the páramo. It is also the major water source for the Andean highlands and a vast area of arid and semi-arid lowlands, where páramo water is used for domestic, agricultural and industrial consumption, and the generation of hydropower. Recently, the páramo is increasingly used for intensive cattle grazing, cultivation, and pine planting, among others. These activities, as well as global phenomena such as climate change, severely alter the hydrological regime. A review on the state of knowledge of its hydrology is given in a first part. In a second part, the impact of human activities and climate change on the hydrology of the páramo is discussed.
Article
In the last decade the increasing accessibility of computing means has made the application of spatially-distributed hydrological model an attractive perspective for both researchers and practitioner hydrologists. The availability of physically-based approaches does not generally overcome the need to calibrate at least a part of the model parameters and the complexity of distributed models, making the computations highly intensive, has often prevented an extensive analysis of calibration issues. The purpose of this study is an evaluation of a series of automatic calibration experiments (using the Shuffled Complex Evolution method) performed with a highly conceptualised, continuously simulating, distributed hydrologic model. The calibration and validation data consist of real precipitation and discharge observations referring to a mid-sized (1050 km2), highly vegetated watershed, located in the Apennine Mountains in Italy. Major flood events that occurred in the 1990–2000 decade are simulated with the parameters obtained by calibrating the rainfall-runoff model referring to different scenarios of historical data availability. A first set of experiments investigates the length of the calibration period required for an efficient parameterisation. The second analysis focuses on the influence on model calibration of the spatial resolution of the rainfall input and is carried out by varying the size and distribution of the raingauge network. A third aspect regards the analysis of the reliability of model parameters in simulating the discharge in ungauged river sections. The aim of the study is to provide the user with indications for appropriately selecting the historical data base to be used for model calibration. The results indicate how reducing the length of the calibration period under the extension of three months seems to deteriorate significantly the rainfall-runoff model performances. The model simulations are satisfactory also under the hypothesis of spatially uniform rainfall, provided that the mean areal rainfall intensity is estimated on the basis of a sufficiently extended number of raingauges, whereas there is a strong worsening with an excessive reduction of the raingauge network density. Finally, the distributed model has proven to be able to provide reliable simulations referring to ungauged internal river sections.
Article
In this paper, the classic ‘divide and conquer (DAC)’ paradigm is applied as a top-down black-box technique for the forecasting of daily streamflows from the streamflow records alone, i.e. without employing exogenous variables of the runoff generating process such as rainfall. To this end, three forms of hybrid artificial neural networks (ANNs) are used as univariate time series models, namely, the threshold-based ANN (TANN), the cluster-based ANN (CANN), and the periodic ANN (PANN). For the purpose of comparison of forecasting efficiency, the normal multi-layer perceptron form of ANN (MLP–ANN) is selected as the baseline ANN model. Having first applied the MLP–ANN models without any data-grouping procedure, the influence of various data preprocessing procedures on the MLP–ANN model forecasting performance is then investigated. The preprocessing procedures considered are: standardization, log-transformation, rescaling, deseasonalization, and combinations of these. In the context of the single streamflow series considered, deseasonalization without rescaling was found to be the most effective preprocessing procedure. Some discussions are presented (i) on data preprocessing and (ii) on selection of the best ANN model. Overall, among the three variations of hybrid ANNs tested, the PANN model performed best. Compared with the MLP–ANN fitted to the deseasonalized data, the PANN based on the soft seasonal partitioning performed better for short lead times (≤3 days), but the advantage vanishes for longer lead times.
Article
A multi-criteria model evaluation protocol is presented to check the performance of rainfall-runoff models during model calibration and validation phases based on a high frequency (e.g. hourly, daily) river flow series. The multiple criteria or objectives are based on multiple and non-commensurable measures of information derived from river flow series by means of a number of sequential time series processing tasks. These include separation of the river flow series in subflows, split of the series in nearly independent quick and slow flow hydrograph periods, and the extraction of nearly independent peak and low flows. The protocol accounts for the statistical assumptions and requirements on independency and homoscedasticity of the model residuals, significantly advanced through the use of nearly independent flow values extracted from the flow series. Next to the separate evaluation of the subflow recessions, the quick and slow runoff peak and low values and event volumes, also the performance of the model in predicting extreme high and low flow statistics is validated. To support the time series processing tasks as well as the application of the multi-criteria model evaluation protocol, a Microsoft Excel-based tool (WETSPRO: Water Engineering Time Series PROcessing tool) has been developed. It is based on the assessment of graphical displays, which complement traditional goodness-of-fit statistics.
Article
The main objective of this paper is to combine and integrate environmental, economic and social impact assessment procedures in order to support decision-making in the context of flood control policy in the Netherlands. The hydraulic, hydrological, ecological, economic and social effects of alternative flood control policies, such as land use change and floodplain restoration, are evaluated using a combination of advanced quantitative modelling techniques and qualitative expert judgement. The results from the ecological, economic and social impact assessment are evaluated in an integrated way through cost–benefit analysis (CBA) and multi-criteria analysis (MCA). As expected, these methods produce different outcomes. Although traditional flood control policy-building higher and stronger dikes-is a cost-effective option, investment in alternative flood control policy-land use changes and floodplain restoration-can be justified on the basis of both CBA and MCA when including the additional ecological and socio-economic benefits in the long run. The outcome of the MCA appears to be especially sensitive to the inclusion of the qualitative scores for the expected social impacts of land use change and floodplain restoration. An important research question remains how to assess, integrate and trade-off (1) significantly different types of impacts in a methodologically sound way in both cost–benefit and multi-criteria analysis, and (2) significantly different types and quality of available knowledge and information about these impacts.
Conference Paper
Multi-step ahead forecasting is an important issue for organizations, often used to assist in tactical decisions. Such forecasting can be achieved by adopting time series forecasting methods, such as the classical Holt-Winters (HW) that is quite popular for seasonal series. An alternative forecasting approach comes from the use of more flexible learning algorithms, such as Neural Networks (NN) and Support Vector Machines (SVM). This paper presents a simultaneous variable (i.e. time lag) and model selection algorithm for multi-step ahead forecasting using NN and SVM. Variable selection is based on a backward algorithm that is guided by a sensitivity analysis procedure, while model selection is achieved using a grid-search. Several experiments were devised by considering eight seasonal series and the forecasts were analyzed using two error criteria (i.e. SMAPE and MSE). Overall, competitive results were achieved when comparing the SVM and NN algorithms with HW.
Article
DOI: 10.1016/j.jhydrol.2011.01.017 Accurately modeling rainfall–runoff (R–R) transform remains a challenging task despite that a wide range of modeling techniques, either knowledge-driven or data-driven, have been developed in the past several decades. Amongst data-driven models, artificial neural network (ANN)-based R–R models have received great attentions in hydrology community owing to their capability to reproduce the highly nonlinear nature of the relationship between hydrological variables. However, a lagged prediction effect often appears in the ANN modeling process. This paper attempts to eliminate the lag effect from two aspects: modular artificial neural network (MANN) and data preprocessing by singular spectrum analysis (SSA). Two watersheds from China are explored with daily collected data. Results show that MANN does not exhibit significant advantages over ANN. However, it is demonstrated that SSA can considerably improve the performance of prediction model and eliminate the lag effect. Moreover, ANN or MANN with antecedent runoff only as model input is also developed and compared with the ANN (or MANN) R–R model. At all three prediction horizons, the latter outperforms the former regardless of being coupled with/without SSA. It is recommended from the present study that the ANN R–R model coupled with SSA is more promisings. Author name used in this publication: K. W. Chau
Article
Extremes of weather and climate can have devastating effects on human society and the environment. Understanding past changes in the characteristics of such events, including recent increases in the intensity of heavy precipitation events over a large part of the Northern Hemisphere land area, is critical for reliable projections of future changes. Given that atmospheric water-holding capacity is expected to increase roughly exponentially with temperature--and that atmospheric water content is increasing in accord with this theoretical expectation--it has been suggested that human-influenced global warming may be partly responsible for increases in heavy precipitation. Because of the limited availability of daily observations, however, most previous studies have examined only the potential detectability of changes in extreme precipitation through model-model comparisons. Here we show that human-induced increases in greenhouse gases have contributed to the observed intensification of heavy precipitation events found over approximately two-thirds of data-covered parts of Northern Hemisphere land areas. These results are based on a comparison of observed and multi-model simulated changes in extreme precipitation over the latter half of the twentieth century analysed with an optimal fingerprinting technique. Changes in extreme precipitation projected by models, and thus the impacts of future changes in extreme precipitation, may be underestimated because models seem to underestimate the observed increase in heavy precipitation with warming.
Article
Ant colony optimization (ACO) can be applied to the data mining field to extract rule-based classifiers. The aim of this paper is twofold. On the one hand, we provide an overview of previous ant-based approaches to the classification task and compare them with state-of-the-art classification techniques, such as C4.5, RIPPER, and support vector machines in a benchmark study. On the other hand, a new ant-based classification technique is proposed, named AntMiner+. The key differences between the proposed AntMiner+ and previous AntMiner versions are the usage of the better performing MAX-MIN ant system, a clearly defined and augmented environment for the ants to walk through, with the inclusion of the class variable to handle multiclass problems, and the ability to include interval rules in the rule list. Furthermore, the commonly encountered problem in ACO of setting system parameters is dealt with in an automated, dynamic manner. Our benchmarking experiments show an AntMiner+ accuracy that is superior to that obtained by the other AntMiner versions, and competitive or better than the results achieved by the compared classification techniques.
Study on the overfitting of the artificial neural network forecasting model
  • L Jin
  • X Kuang
  • H Huang
  • Z Qin
  • Y Wang
Economic and social effects of “El Nino” in Ecuador, 1997–8
  • R Vos
  • M Velasco
  • E Labastida