Article

Comparison of machine learning and dynamic models for predicting actual vapour pressure when psychrometric data are unavailable

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Information of actual vapour pressure (ea) is frequently required in many disciplines. However, psychrometric data required to calculate ea are often not readily available. Hence, it is of great importance to develop models to estimate ea when psychrometric data are unavailable. Here, five machine learning models were developed for estimating ea, viz. extreme gradient boosting (XGBoost), extreme learning machine (ELM), kernel-based nonlinear extension of Arps decline (KNEA), multiple adaptive regression splines (MARS), and support vector machine (SVM) models. Their performance was also compared to a dynamic model proposed recently, which estimates ea by adjusting dew point temperature from minimum temperature (Tmin) with dynamic correction factor. Three input combinations using only temperature data (i.e. Tmin and mean temperature (Tmean)) were considered in the machine learning models. The meteorological data collected from 1,188 stations across six climate zones were used to develop and assess the models. The overall results revealed that the dynamic and machine learning models offered satisfactory ea estimates spanning from hyper arid to humid climates. However, the accuracy of the dynamic model was lower than all machine learning algorithms using either only Tmin or combinations of Tmean and Tmin in all climate zones. The machine learning models using Tmean and Tmin were superior to those using only Tmean or Tmin. There were comparable performances among the ELM, KNEA, MARS, and SVM models with various input variables; however, the XGBoost model incorporating Tmean and Tmin produced the best accuracy. The computational demand was least for the ELM model, followed by the XGBoost model. Considering the accuracy and computational demand, the XGBoost model is recommended for predicting daily and monthly ea from hyper arid to humid climates when historical data are prior known. When there are no historical data, we recommend using the global XGBoost model incorporating Tmean, Tmin, and aridity index for estimating daily and monthly ea from arid to humid regions, and using the dynamic model in hyper-arid regions.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Fortunately, a large number of studies have been performed for near-surface T d estimation based on in-situ meteorological observations. Apart from those conducted on the basis of machine learning models such as Baghban et al. (2016), Park et al. (2021), Dong et al. (2022) and Qiu et al. (2022), a major assumption of these studies is that near-surface T d can be approximately estimated from daily minimum air temperature (T amin ). For example, in the early studies of Allen et al. (1998) and Mu et al. (2009), near-surface T d was assumed to be equivalent to T amin for the development of evapotranspiration models. ...
... As a result, a variety of correction methods have been proposed subsequently to estimate T d accurately from T amin . A review of these methods is available in Paredes et al. (2020) and Qiu et al. (2022). These T amin -based correction methods at site scale have enlightening significance for the retrieval of near-surface T d from MODIS products, because numerous studies confirm that T amin estimates with high accuracy can be achieved by adopting MODIS nighttime land surface temperature (T s ) as proxy (Chen et al. 2021;Lin et al. 2012;Oyler et al. 2016;Shiff, Helman, and Lensky 2021;Vancutsem et al. 2010;Zhu, Lű, and Jia 2013). ...
Article
Full-text available
MODIS atmospheric profile products (MOD07_L2 and MYD07_L2) have been widely used for near-surface dew point temperature ( $T_d$Td) estimation. However, their accuracy over large scale has seldom been evaluated. In this study, we validated these two products comprehensively against 2153 stations over mainland China. MOD07_L2 was suggested by our study because it achieved higher accuracy in either of two frequently-used methods. To be specific, the root-mean-square error (RMSE) achieved by MOD07_L2 and MYD07_L2 was 5.82 and 7.42 °C, respectively. On this basis, a recent ground-based correction method was modified to further improve their accuracy. Our focus is to investigate whether this ground-based approach is applicable to large-scale remote sensing applications. The results show that this new method showed great potential for $T_d$Td estimation independently from ground observations. Through the introduction of MODIS land surface products, the RMSE it achieved for MOD07_L2 and MYD07_L2 was 5.23 and 5.59 °C, respectively. Further analysis shows that it was particularly useful in capturing the annual average $T_d$Td patterns. The R², RMSE, and bias of annual average daily mean $T_d$Td estimates were 0.95, 1.84 °C, and 0.53 °C, and those achieved for annual average instantaneous $T_d$Td estimates were 0.94, 2.09 °C, and 0.75 °C, respectively.
... Qiu namely XGBoost, extreme learning machine (ELM), kernel-based nonlinear extension of Arps decline (KNEA), MARS and SVM with a dynamic empirical model, to predict the vapor pressure using maximum, minimum and mean temperature data. The XGBoost model was the most successful [50]. Literature reviews conducted by the authors indicate that there is no study on the prediction of vapor pressure using hydro-meteorological data. ...
... Vapor pressure is generally estimated with psychometric data. In this study, vapor pressure estimation was performed with hydro-meteorological data ignoring the psychometric data [50]. ...
Article
In this study, it was investigated that how machine learning (ML) methods show performance in different problems having different characteristics. Six ML approaches including Artificial neural networks (ANN), gaussian process regression (GPR), support vector machine regression (SVMR), long short-term memory (LSTM), multi-gene genetic programming (MGGP) and M5 model tree (M5Tree) were utilized to analyze three independent civil engineering problems belonging to construction management, geotechnical engineering, and hydrological engineering sub-disciplines. Mean absolute percentage error (MAPE), root mean square error (RMSE), coefficient of determination (R²), relative root means square error (RRMSE), Nash–Sutcliffe efficiency (NSE), Kling-Gupta efficiency (KGE), and overall index of model performance (OI) criteria were used to evaluate the performances of the models. Besides performance criteria, the relative performances of the six ML models were assessed using Taylor diagram, Violin diagram and One-Tailed Wilcoxon Signed-Rank Test. For each of the problem considered in this study, the effectiveness of the input parameters on the output parameter has been defined using the Relief Method and Correlation Coefficient. The results show that ANN and MGGP models yielded the most successful estimations for three different problems considered. The best prediction was achieved by MGGP model for hydrological engineering problem. For the construction management, geotechnical engineering problems, the best results were obtained using the ANN model. All models were reliable to solve the geotechnical engineering and hydrological engineering problems while LSTM and SVMR models are not reliable to solve the construction management problem. The most and least effective input parameters on output parameter were contract cost (CC) and work definition number (WDN) for the managerial data set. On the other hand, the most and least effective input parameters on the output parameters for the experimental and natural data sets have been obtained as width of the pile (B), rotation degree (R) and minimum temperature (Tmin), streamflow (Q) data, respectively. The number of data and data selection have a significant effect on the homogeneity of the data set and its representativeness of the problem. The error values obtained in test stage are affected from this condition. The equations to calculate the outputs of each of the problem considered were obtained using MGGP and M5Tree models.
... Machine learning techniques such as support vector regression, neural networks, and extreme learning machines have been widely applied in streamflow prediction in recent decades (Gharib & Davies 2021;Ibrahim et al. 2022). These machine learning models often outperform physical models in modeling the non-linear streamflow process without requiring knowledge of the local hydrological system (Wu et al. 2021;Qiu et al. 2022). However, machine learning models are often too simple to perform deep feature extraction (Han et al. 2021b;Mei et al. 2022). ...
Article
Full-text available
Accurate streamflow prediction is crucial for effective water resource management. However, reliable prediction remains a considerable challenge because of the highly complex, non-stationary, and non-linear processes that contribute to streamflow at various spatial and temporal scales. In this study, we utilized a convolutional neural network (CNN)–Transformer–Long short-term memory (LSTM) (CTL) model for streamflow prediction, which replaced the embedding layer with a CNN layer to extract partial hidden features, and added a LSTM layer to extract correlations on a temporal scale. The CTL model incorporated Transformer's ability to extract global information, CNN's ability to extract hidden features, and LSTM's ability to capture temporal correlations. To validate its effectiveness, we applied it for streamflow prediction in the Shule River basin in northwest China across 1-, 3-, and 6-month horizons and compared its performance with Transformer, CNN, LSTM, CNN–Transformer, and Transformer–LSTM. The results demonstrated that CTL outperformed all other models in terms of predictive accuracy with Nash–Sutcliffe coefficient (NSE) values of 0.964, 0.912, and 0.856 for 1-, 3-, 6-month ahead prediction. The best results among the five comparative models were 0.908, 0.824, and 0.778, respectively. This indicated that CTL is an outstanding alternative technique for streamflow prediction where surface data are limited.
... In this study, six machine learning models were developed to simulate the long-term daily minimum temperature inside typical plastic greenhouses by using meteorological station observations. The models included support vector machine (SVM) (Vapnik, 2013), extreme gradient boosting (XGBoost) (Chen and Guestrin, 2016;Qiu et al, 2022a), random forest (RF) (Feng et al., 2017), extreme learning machine (ELM) (Huang et al., 2006;Qiu et al., 2022b), back-propagation (BP) neural network (Guo et al., 2011), and multiple linear regression (MLR). These models are commonly adopted in the field of meteorology. ...
Article
Full-text available
The environmental stress, pests and diseases are frequently occurred during production of facility agriculture in China. Among them, adverse meteorological conditions, such as low temperature and limited light, often co-occurred in greenhouses and brought great losses in southern China. Nevertheless, there is little knowledge about agrometeorological disasters in facility, especially for co-occurred climate extremes. Here, we applied machine learning methods to simulate long-term daily minimum temperature in plastic greenhouses, so as to assess the spatio-temporal characteristics of compound low-temperature and limited-light events (LTLL) in southern China. We took strawberry as the representative horticulture plant to quantitatively investigate the potential effects of the LTLL stress based on experimental data. It was found that when the LTLL stress occurred, strawberry was more sensitive to low-temperature than limited-light and duration. The losses of the fruit soluble solids content caused by LTLL stress were relatively lower than that of yield. The LTLL events mainly occurred from November to March of the following year in southern China. The occurrence frequency had a decreasing trend during 1990–2019 at 3.4 d/10 a, which mainly resulted from its reduction in spring. Assuming that all the LTLL events occurred at strawberry flowering stage, ~ 11.71% of them could result strawberry fruit yield losses over 70%, and the most serious LL events mainly occurred in December and January. The northern part of southern China had a higher LTLL risk. The results have the potential to provide guidance for plastic greenhouse layout and strawberry production.
Article
Full-text available
Having sufficient and qualified datasets is of paramount importance in terms of understanding the internal dynamics of the nature-related phenomenon. Given the necessity to maintain the completeness of the datasets, this study introduced a novel technique containing the implementation of machine learning algorithms and a meta-heuristic optimization algorithm for imputing the gaps encountered in measurements of solar radiation which is one of the crucial meteorological variables in terms of not only climate dynamics but also energy technologies. To accomplish this aim, four different gap sizes, i.e., 5 %, 10 %, 20 %, and 30 %, have synthetically been constituted and the applicability of the extreme gradient boosting (XGBoost) configured by the differential evolution (DE) was examined for each gap size. The corresponding model was benchmarked with conventional interpolation techniques (i.e., linear and spline optimizations) and other widely applied ML algorithms (i.e., random forest and multivariate adaptive regression splines). A multi-perspective input selection strategy was considered to model the missing values based on correlation coefficients under three scenarios encompassing a total of 14 different models. The results revealed that the XGBoost-DE model generated with the solar radiation measurements of neighboring stations was found as the best-performed model in all gap sizes, i.e., 5 % (NSE: 0.950; KGE: 0.967), 10 % (NSE:0.934; KGE: 0.962), and 30 % (NSE: 0.939; KGE: 0.957), but 20 % which the highest accuracy was obtained with the RF (NSE: 0.944; KGE: 0.966). On the other hand, the interpolation techniques had the lowest accuracies among their counterparts in imputation attempts with respect to all gap size alternatives.
Article
Full-text available
Seven artificial neural network (ANN) models were developed to predict daytime actual evapotranspiration (ET) for Nissouri Creek in Oxford County, Canada, from April to July 2018, using the Bowen ratio energy balance method as target output for the first time. In total, 12 variations of each model were deployed using different combinations of model parameters, including the sigmoid and rectified linear unit (ReLU) activation functions, stochastic gradient descent (SGD), and root-mean-square-propagation (RMSprop) learning algorithms, three different network architectures, and 100 and 500 training epochs. This is the first time that ReLU has been used in ANNs that predict ET and it outperformed sigmoid in six of the seven models. This is particularly significant because until now the sigmoid activation function (or variations therein) had been exclusively employed in the ET literature. RMSprop was also used for the first time and typically demonstrated equivalent performance to that of SGD. The optimal model employs the ReLU activation function, consists of a 4-4-1 network architecture, includes the input parameters of net radiation, air temperature, soil heat flux, and wind speed, and is trained by the SGD learning algorithm for 500 training epochs. This model boasts a coefficient of determination (R2) of 0.997, root-mean-square error (RMSE) of 0.39 mm/day, and mean absolute error (MAE) of 0.18 mm/day. Furthermore, all seven models developed adequately model the ET process, with R2 ranging from 0.988 to 0.997, RMSE from 0.39 to 0.78 mm/day, and MAE from 0.18 to 0.58 mm/day.
Article
Full-text available
Recent decades have been characterized by increasing temperatures worldwide, resulting in an exponential climb in vapor pressure deficit (VPD). VPD has been identified as an increasingly important driver of plant functioning in terrestrial biomes and has been established as a major contributor in recent drought‐induced plant mortality independent of other drivers associated with climate change. Despite this, few studies have isolated the physiological response of plant functioning to high VPD, thus limiting our understanding and ability to predict future impacts on terrestrial ecosystems. An abundance of evidence suggests that stomatal conductance declines under high VPD and transpiration increases in most species up until a given VPD threshold, leading to a cascade of subsequent impacts including reduced photosynthesis and growth, and higher risks of carbon starvation and hydraulic failure. Incorporation of photosynthetic and hydraulic traits in ‘next‐generation’ land‐surface models has the greatest potential for improved prediction of VPD responses at the plant‐ and global‐scale, and will yield more mechanistic simulations of plant responses to a changing climate. By providing a fully integrated framework and evaluation of the impacts of high VPD on plant function, improvements in forecasting and long‐term projections of climate impacts can be made.
Article
Full-text available
Temperature changes have widespread impacts on the environment, economy, and municipal planning. Generating accurate climate prediction at finer spatial resolution through downscaling could help better assess the future effects of climate change on a local scale. Ensembles of multiple climate models have been proven to improve the accuracy of temperature prediction. Meanwhile, machine learning techniques have shown high performance in solving various predictive modeling problems, which make them a promising tool for temperature downscaling. This study investigated the performance of machine learning (long short-term memory (LSTM) networks and support vector machine (SVM)) and statistical (arithmetic ensemble mean (EM) and multiple linear regression (MLR)) methods in developing multi-model ensembles for downscaling long-term daily temperature. A case study of twelve meteorological stations across Ontario, Canada, was conducted to evaluate the performance of the proposed ensembles. The results showed that both machine learning and statistical techniques performed well at downscaling daily temperature with multi-model ensembles and had similar performance with relatively high accuracy. The R² of 12 stations ranged between 0.756 and 0.820 and RMSE ranged between 4.318 and 7.063 °C. Both machine learning and statistical ensembles for downscaling had difficulty in predicting extreme values for temperature below − 10 °C and above 20 °C. The results provided technical support for using statistical and machine learning methods to generate high-resolution daily temperature prediction.
Article
Full-text available
Crop evapotranspiration (ETc) is a complex and non-linear process difficult to measure and estimate accurately. This complexity can be solved applying the machine learning techniques with different meteorological input variables. This study investigated the performance of k-Nearest Neighbour (kNN), Artificial Neural Networks (ANN) and Adaptive Boosting (AdaBoost) models to predict daily potato ETc using four scenarios of available meteorological data as: air temperature (scenario 1), air temperature and solar radiation (scenario 2), air temperature, solar radiation and wind speed (scenario 3), and air temperature, solar radiation, wind speed and relative humidity (scenarios 4). The analysis was based on the results of experimental trials carried out in Southern Italy in 2009 and 2010 and focussed on the potato crop cultivation under optimal water supply. The results of ETc estimation with different machine learning techniques were compared with ETc obtained from the soil water balance model, based on the FAO Penman Monteith approach, and gravimetric measurements of soil water content in the crop root zone. The best performances were observed with the kNN model with R2 of 0.813, 0.968 and 0.965, slope of regression 0.947, 0.980 and 0.991, modelling efficiency (EF) of 0.848, 0.970 and 0.972, root mean square error (RMSE) of 0.790, 0.351 and 0.355 mm day−1, mean absolute error (MAE) of 0.563, 0.263 and 0.274 mm day−1 and mean squared error (MSE) of 0.623, 0.123 and 0.126 mm day−1 for scenarios 1, 2 and 3, respectively. When all meteorological variables were available (scenario 4), the ANN model produced slightly better statistical indicators. Therefore, the kNN model could be recommended for the estimation of ETc when limited meteorological data are available. Otherwise, the ANN model should be applied.
Chapter
Full-text available
This paper presents a novel machine learning approach backed by ensembling machine learning algorithms to build landslide susceptibility maps. The results reveal that this approach outperforms prior machine learning-based approaches in terms of precision, recall, and F-score for landslide susceptibility modeling. In this research, three ensemble machine learning algorithms were tested for their applicability in landslide prediction domain, namely, random forest, rotation forest, and XGBoost. A comparison between these ensemble models and the machine learning algorithms used in previous researches was also performed. In order to evaluate the model’s ability to generalize results, two different study areas were used in this study, which are Ratnapura district in Sri Lanka and Glenmalure in Ireland. Several landslide conditioning features including land use, landform, vegetation index, elevation, overburden, aspect, curvature, catchment area, drainage density, distance to water streams, soil, bedrock condition, lithology and rainfall prepared by surveying, remote sensing, and deriving from Digital Elevation Model (DEM) were utilized in building the spatial database. Importantly, this study introduces new landslide conditioning factors like overburden and water catchment areas which have good importance values. Further, research applies dynamic factors like rainfall and vegetation index for susceptibility map building, by making use of remote sensing data which is updated periodically. The study emphasizes the capability of ensemble approaches in generalizing results well for both study areas which inherit completely different environmental properties, and its ability to provide a scalable map building mechanism. Also, useful insights and guidelines are also provided for fellow researchers who are interested in building susceptibility maps using machine learning approaches.
Article
Full-text available
Atmospheric vapor pressure deficit (VPD) is a critical variable in determining plant photosynthesis. Synthesis of four global climate datasets reveals a sharp increase of VPD after the late 1990s. In response, the vegetation greening trend indicated by a satellite-derived vegetation index (GIMMS3g), which was evident before the late 1990s, was subsequently stalled or reversed. Terrestrial gross primary production derived from two satellite-based models (revised EC-LUE and MODIS) exhibits persistent and widespread decreases after the late 1990s due to increased VPD, which offset the positive CO 2 fertilization effect. Six Earth system models have consistently projected continuous increases of VPD throughout the current century. Our results highlight that the impacts of VPD on vegetation growth should be adequately considered to assess ecosystem responses to future climate conditions.
Article
Full-text available
Accurately predicting reference evapotranspiration (ET0) with limited climatic data is crucial for irrigation scheduling design and agricultural water management. This study evaluated eight machine learning models in four categories, i.e. neuron-based (MLP, GRNN and ANFIS), kernel-based (SVM, KNEA), tree-based (M5Tree, XGBoost) and curve-based (MARS) models, for predicting daily ET0 with maximum/maximum temperature and precipitation data during 2001–2015 from 14 stations in various climatic regions of China, i.e., arid desert of northwest China (NWC), semi-arid steppe of Inner Mongolia (IM), Qinghai-Tibetan Plateau (QTP), (semi-)humid cold-temperate northeast China (NEC), semi-humid warm-temperate north China (NC), humid subtropical central China (CC) and humid tropical south China (SC). The results showed machine learning models using only temperature data obtained satisfactory daily ET0 estimates (on average R² = 0.829, RMSE = 0.718 mm day⁻¹, NRMSE = 0.250 and MAE = 0.508 mm day⁻¹). The prediction accuracy was improved by 7.6% across China when information of precipitation was further considered, particularly in (sub)tropical humid regions (by 9.7% in CC and 12.4% in SC). The kernel-based SVM, KNEA and curve-based MARS models generally outperformed the others in terms of prediction accuracy, with the best performance by KNEA in NWC and IM, by SVM in QTP, CC and SC, and very similar performance by them in NEC and NC. SVM (1.9%), MLP (2.0%), MARS (2.6%) and KNEA (6.4%) showed relatively small average increases in RMSE during testing compared with training RMSE. SVM is highly recommended for predicting daily ET0 across China in light of best accuracy and stability, while KNEA and MARS are also promising powerful models.
Article
Full-text available
The establishment of an accurate computational model for predicting reference evapotranspiration (ET0) process is highly essential for several agricultural and hydrological applications, especially for the rural water resource systems, water use allocations, utilization and demand assessments, and the management of irrigation systems. In this research, six artificial intelligence (AI) models were investigated for modeling ET0 using a small number of climatic data generated from the minimum and maximum temperatures of the air and extraterrestrial radiation. The investigated models were multilayer perceptron (MLP), generalized regression neural networks (GRNN), radial basis neural networks (RBNN), integrated adaptive neuro-fuzzy inference systems with grid partitioning and subtractive clustering (ANFIS-GP and ANFIS-SC), and gene expression programming (GEP). The implemented monthly time scale data set was collected at the Antalya and Isparta stations which are located in the Mediterranean Region of Turkey. The Hargreaves–Samani (HS) equation and its calibrated version (CHS) were used to perform a verification analysis of the established AI models. The accuracy of validation was focused on multiple quantitative metrics, including root mean squared error (RMSE), mean absolute error (MAE), correlation coefficient (R2), coefficient of residual mass (CRM), and Nash–Sutcliffe efficiency coefficient (NS). The results of the conducted models were highly practical and reliable for the investigated case studies. At the Antalya station, the performance of the GEP and GRNN models was better than the other investigated models, while the performance of the RBNN and ANFIS-SC models was best compared to the other models at the Isparta station. Except for the MLP model, all the other investigated models presented a better performance accuracy compared to the HS and CHS empirical models when applied in a cross-station scenario. A cross-station scenario examination implies the prediction of the ET0 of any station using the input data of the nearby station. The performance of the CHS models in the modeling the ET0 was better in all the cases when compared to that of the original HS.
Article
Full-text available
Reference crop evapotranspiration (ETo) estimations using the FAO Penman-Monteith equation (PM-ETo) require a set of weather data including maximum and minimum air temperatures (Tmax, Tmin), actual vapor pressure (ea), solar radiation (Rs), and wind speed (u2). However, those data are often not available, or data sets are incomplete due to missing values. A set of procedures were proposed in FAO56 (Allen et al. 1998) to overcome these limitations, and which accuracy for estimating daily ETo in the humid climate of Azores islands is assessed in this study. Results show that after locally and seasonally calibrating the temperature adjustment factor ad used for dew point temperature (Tdew) computation from mean temperature, ETo estimations shown small bias and small RMSE ranging from 0.15 to 0.53 mm day−1. When Rs data are missing, their estimation from the temperature difference (Tmax−Tmin), using a locally and seasonal calibrated radiation adjustment coefficient (kRs), yielded highly accurate ETo estimates, with RMSE averaging 0.41 mm day−1 and ranging from 0.33 to 0.58 mm day−1. If wind speed observations are missing, the use of the default u2 = 2 m s−1, or 3 m s−1 in case of weather measurements over clipped grass in airports, revealed appropriated even for the windy locations (u2 > 4 m s−1), with RMSE < 0.36 mm day−1. The appropriateness of procedure to estimating the missing values of ea, Rs, and u2 was confirmed.
Article
Full-text available
Prediction of petroleum production plays a key role in the petroleum engineering, but an accurate prediction is difficult to achieve due to the complex underground conditions. In this paper, we employ the kernel method to extend the Arps decline model into a nonlinear multivariate prediction model, which is called the nonlinear extension of Arps decline model (NEA). The basic structure of the NEA is developed from the Arps exponential decline equation, and the kernel method is employed to build a nonlinear combination of the input series. Thus, the NEA is efficient to deal with the nonlinear relationship between the input series and the petroleum production with a one-step linear recursion, which combines the merits of commonly used decline curve methods and intelligent methods. The case studies are carried out with the production data from two real-world oil field in China and India to assess the efficiency of the NEA model, and the results show that the NEA is eligible to describe the nonlinear relationship between the influence factors and the oil production, and it is applicable to make accurate forecasts for the oil production in the real applications.
Article
Full-text available
Monthly stream-flow forecasting can yield important information for hydrological applications including sustainable design of rural and urban water management systems, optimization of water resource allocations, water use, pricing and water quality assessment, and agriculture and irrigation operations. The motivation for exploring and developing expert predictive models is an ongoing endeavor for hydrological applications. In this study, the potential of a relatively new data-driven method, namely the extreme learning machine (ELM) method, was explored for forecasting monthly stream-flow discharge rates in the Tigris River, Iraq. The ELM algorithm is a single-layer feedforward neural network (SLFNs) which randomly selects the input weights, hidden layer biases and analytically determines the output weights of the SLFNs. Based on partial autocorrelation functions on historical stream-flow data, a set of five input combinations with lagged stream-flow values are employed to establish the best forecasting model. A comparative investigation is conducted to evaluate the performance of the ELM compared to other data-driven models: support vector regression (SVR) and generalized regression neural network (GRNN). The forecasting metrics defined as the correlation coefficient (r), Nash-Sutcliffe efficiency (ENS), Willmott’s Index (WI), root-mean-square error (RMSE) and mean absolute error (MAE) computed between the observed and forecasted stream-flow data are employed to assess the ELM model’s effectiveness. The results revealed that the ELM model outperformed the SVR and the GRNN models across a number of statistical measures. In quantitative terms, superiority of ELM over SVR and GRNN models was exhibited by Ens = 0.578, 0.378 and 0.144, r = 0.799, 0.761 and 0.468 and WI = 0.853, 0.802 and 0.689, respectively and the ELM model attained lower RMSE value by about 21.3% (relative to SVR) and by about 44.7% (relative to GRNN). Based on the findings of this study, several recommendations were suggested for further exploration of the ELM model in hydrological forecasting problems.
Article
Information on global solar radiation (Rs) is indispensable in many fields. However, reliable measurements of Rs are challenging worldwide because of high costs and technical complexities. Here, temperature- and sunshine-based generalized Extreme Gradient Boosting (XGBoost) models were proposed to estimate daily Rs for locations where historical Rs data are unknown. Four combinations of input variables were assessed. The first two included: (1) maximum, minimum, mean, and diurnal temperature, and extra-terrestrial radiation (Ra); and (2) sunshine duration, maximum possible sunshine duration, and Ra. In the first two inputs, the latter two further included geographical variables, i.e., latitude, longitude, and altitude. The developed models were also compared with temperature- and sunshine-based generalized empirical models. Daily data of Rs, maximum and minimum temperature, and actual sunshine duration during the period of 2007–2016 from 96 radiation stations of China were collected to develop and evaluate the models. The results showed that accuracy of the generalized XGBoost models was improved when geographical variables were further included in various climate zones. The generalized XGBoost model using temperature and geographical data as inputs slightly reduced accuracy compared to the temperature-based local-trained XGBoost model but is still superior to the temperature-based generalized empirical model. Somewhat surprisingly, there was comparable performance between the generalized XGBoost model using sunshine and geographical data as inputs and the local-trained sunshine-based XGBoost model. Therefore, the generalized XGBoost model was highly recommended to estimate daily Rs incorporating sunshine/temperature data and routinely available geographical information for locations where historical data are prior unknown.
Article
Understanding the process of crop evapotranspiration (ET c) and developing models for estimating ET c are crucial to efficiently schedule irrigation and enhance efficient water use. Here, we investigated variations of ET c and local crop coefficient (K c = ET c / ET o , where ET o is the reference evapotranspiration) in a rotated flooded rice-winter wheat system using ET c data based on the Bowen-ratio energy balance method from 2016 to 2020. We propose a modified K c model for estimating daily ET c , which includes a density coefficient (a function of fraction of canopy cover) and incorporates the effect of plant temperature constraint, leaf senescence, and water stress on ET c. Results indicated that the total ET c over whole growth stage for flooded rice and winter wheat field was 500.2 ± 62.5 and 298.6 ± 28.3 mm (means ± standard deviation), respectively. The values of local K c at the initial, middle, and late stages were 0.83 ± 0.14, 1.11 ± 0.06, and 0.99 ± 0.15, respectively, for flooded rice and 0.71 ± 0.08, 0.86 ± 0.06, and 0.76 ± 0.08, respectively, for winter wheat. There was no water stress over the entire season of the flooded rice-winter wheat rotation system except for some days with water draining in paddy rice field. Heat stress in summer adversely affected the ET c of rice. The modified K c model can well reproduce the values of daily ET c for both flooded rice and winter wheat, and improved the accuracy by 6~9% compared to the FAO 56 K c model using tabulated values after adjustment. The regression coefficient, coefficient of determination , root mean squared error and modeling efficiency between measured ET c and estimated by the modified K c model were 0.99, 0.89, 0.55 mm d − 1 and 0.89, respectively, for flooded rice, and 1.03, 0.85, 0.55 mm d − 1 and 0.82, respectively, for winter wheat. Therefore, the modified K c model could reasonably predict ET c for flooded rice and winter wheat and can serve as a useful tool to improve water use.
Article
Estimating actual vapor pressure (ea) without relative humidity (RH) data continues to draw research attention. One of the accurate ways to estimate ea is through estimation of dew point temperature (Tdew) from minimum (Tmin) or mean temperature (Tmean). Two existing methods have been largely used to estimate ea. The first method (method I) assumes that Tdew is close to Tmin. The other one (method II) adjusts Tdew from Tmin with piecewise correction factors (aT) from sub-humid to hyper arid regions, and from Tmean with a fixed correction factor, aD, in humid regions. Here, two methods are proposed to estimate ea. The first method (method III) adjusts Tdew from Tmin with dynamic aT based on the correlation function between aT and aridity index (AI) regardless of climate zones. The second method (method IV) adjusts Tdew from Tmin with dynamic aT when AI < 1.00 and from Tmean with aD when AI ≥ 1.00. The performance of four methods was evaluated based on data from 886 meteorological stations distributed from hyper-arid to humid regions. Results showed that there was a significant logarithmic correlation function between aT and AI, but no significant correlation between aD and AI. Daily values of ea estimated by method I were greatly overestimated in semi-arid to hyper-arid regions, but were reasonably estimated in humid regions. The accuracy of method II was improved in hyper-arid to dry sub-humid regions but decreased in humid regions, compared to method I. The proposed methods (III and IV) further improved the accuracy and produced reasonable estimation of daily ea in hyper-arid to humid regions, and method III produced a slightly better performance than method IV. Similar results were also observed for estimation of monthly ea. Therefore, the proposed method III is highly recommended to estimate daily and monthly ea when RH data are unavailable.
Article
Rice is a staple food crop that provides more calories to the global population than any other crop. Rice production is also a major consumer of freshwater resources. Hence, changes in rice evapotranspiration (ET c) due to projected warming patterns is becoming necessary in any management of water resources and food security assessments. Here, air temperature (T a) measurements from 1003 meteorological stations covering the period from 1967 to 2016 in China, Japan and the Philippines are first used to assess warming trends. Energy fluxes were then assembled so as to evaluate the responses of rice ET c to various warming trends. A modified Priestley-Taylor formulation was used to interpret ET c under differing warming scenarios. Results showed that the average values of daily mean T a from 1997-2016 increased by 4.6% relative to the period from 1967-1996, where 85% of all stations marked an increase of 0.5-1.5 • C. Greater increment in average daily minima in T a (5.1%) was noted in the past 20 years compared to the average daily maximum in T a (3.7%), showing asymmetric warming. The changed growth duration linearly decreased as ambient seasonal mean T a increased, and higher temperature sensitivity of altered growth duration occurred at greater warming level. Overall, the proposed modified Priestley-Taylor model can be used for estimating ET c of rice for both half-hourly and daily scales provided the growth duration is a priori known. Changes in seasonal ET c of rice under varying types of warming patterns are largely explained by both ambient seasonal mean T a and changes in growth duration.
Article
Groundwater (GW) resources provide a large share of the world’s water demand for various sections such as agriculture, industry, and drinking water. Particularly in the arid and semi-arid regions, with surface water scarcity and high evaporation, GW is a valuable commodity. Yet, GW data are often incomplete or nonexistent. Therefore, it is a challenge to achieve a GW potential assessment. In this study, we developed methods to produce reliable GW potential maps (GWPM) with only digital elevation model (DEM)-derived data as inputs. To achieve this objective, a case study area in Iran was selected and 13 factors were extracted from the DEM. A spring location dataset was obtained from the water sector organizations and, along with the non-spring locations, fed into machine learning algorithms for training and validation. For delineating reliable GW potential, algorithms including random forest (RF) and its developed version, parallel RF (PRF), as well as extreme gradient boosting (XGB) with different boosters were used. The area under the receiver operating characteristics curve indicated that the PRF and XGB with linear booster give similar high accuracy (about 86%) for GWPM. The most important factors for accurate GWPM in the modeling procedure were convergence, topographic wetness index, river density, and altitude. Overall, we conclude that high-accuracy GWPMs can be produced with only DEM-derived factors with acceptable accuracy. The developed methodology can be employed to produce initial information for GW exploitation in areas facing a lack of data.
Article
The stochastic and intermittent nature of wind speed brings rigorous challenges to the safe and stable operation of power system. Wind speed forecasting is crucial for availably dispatching the wind power resource. In this paper the proposed model based on secondary decomposition (SD) and bidirectional gated recurrent unit (BiGRU) can accommodate long-range dependency and extract the semantic information of raw data. In the model, the GRU method is improved in directional nature. A second layer is added in GRU network to connect the two reverse and separate hidden layers to the same output layer. The PSR-BiGRU model of each subsequence is established and chicken swarm optimization (CSO) algorithm is employed to jointly optimize the parameters. The proposed method focuses on deterministic and probabilistic forecasting and does not involve any distribution assumption of the prediction errors needed in most existing forecasting methods. The effectiveness and advancement of the proposed model is tested by using data from two different wind farms. Comparing with other hybrid models, the proposed hybrid model is suitable for wind speed forecasting and could obtain better forecasting performance.
Article
It is clear that the learning speed of feedforward neural networks is in general far slower than required and it has been a major bottleneck in their applications for past decades. Two key reasons behind may be: (1) the slow gradient-based learning algorithms are extensively used to train neural networks, and (2) all the parameters of the networks are tuned iteratively by using such learning algorithms. Unlike these conventional implementations, this paper proposes a new learning algorithm called extreme learning machine (ELM) for single-hidden layer feedforward neural networks (SLFNs) which randomly chooses hidden nodes and analytically determines the output weights of SLFNs. In theory, this algorithm tends to provide good generalization performance at extremely fast learning speed. The experimental results based on a few artificial and real benchmark function approximation and classification problems including very large complex applications show that the new algorithm can produce good generalization performance in most cases and can learn thousands of times faster than conventional popular learning algorithms for feedforward neural networks.1
Article
The intermittent nature of wind can represent an obstacle to get reliable wind speed forecasting, thus many methods were developed to improve the accuracy, due to unstable behavior patterns and the presence of noise signal. In order to overcome this issue, a preprocessing step is desirable to provide more reliable data. Decomposition strategy is reported as the crucial component of this improving task of the wind speed forecasting. It can be applied as the first step or as a recurrent process, and normally the raw wind speed data is decomposed in several signal patterns. Based on this understanding, this paper proposed a combination of two signal decomposition strategies, known as variational mode decomposition (VMD) and singular spectral analysis (SSA), with modulation signal theory. The proposed decomposition approach is further coupled with a long short-term memory neural network (LSTM), the adaptive neuro-fuzzy system (ANFIS), echo state network (ESN), support vector regression (SVR) and Gaussian regression process (GRP) models resulting in new ensemble learning approaches. All results obtained through these ensembles are compared between them and demonstrated an error stabilization behavior, ability decomposing the wind speed into uncorrelated components, reducing the errors from one up to twelve steps-ahead forecasting. In general terms, the results indicate that ensembles learning framework are robust and reliable to applications in wind speed forecasting task.
Article
In recent years, clean energies, such as wind power have been developed rapidly. Especially, wind power generation becomes a significant source of energy in some power grids. On the other hand, based on the uncertain and non-convex behavior of wind speed, wind power generation forecasting and scheduling may be very difficult. In this paper, to improve the accuracy of forecasting the short-term wind speed, a hybrid wind speed forecasting model has been proposed based on four modules: crow search algorithm (CSA), wavelet transform (WT), Feature selection (FS) based on entropy and mutual information (MI), and deep learning time series prediction based on Long Short Term Memory neural networks (LSTM). The proposed wind speed forecasting strategy is applied to real-life data from Sotavento that is located in the south-west of Europe, in Galicia, Spain, and Kerman that is located in the Middle East, in the southeast of Iran. The presented numerical results demonstrate the efficiency of the proposed method, compared to some other existing wind speed forecasting methods.
Article
Information of crop evapotranspiration (ETc) and crop coefficient (Kc) is essential for improving water use and optimizing irrigation scheduling. Here, a three-year (2015, 2016 and 2019) experiment was conducted for tomato grown in a solar greenhouse under full and deficit drip irrigation to investigate the variation of ETc measured by sap flow system plus micro-lysimeter in 2015 and by weighing lysimeter in 2016 and 2019. The controlling meteorological factors on ETc in two irrigation treatments were analyzed by using path analysis method. A single crop coefficient model considering leaf senescence, soil water stress and fraction of canopy cover was proposed under two irrigation levels. The results showed that total seasonal ETc over the whole growth stage under full irrigation was 310−350 mm, which was 16–23 % higher than that under deficit irrigation. The maximum hourly ETc rate in each month during the three-years experiment varied from 0.15 to 0.89, and from 0.15 to 1.88 mm h⁻¹, respectively, under deficit and full irrigation. Path analysis showed that the net radiation was the dominant meteorological factor aff ;ecting ETc through the direct eff ;ect, followed by the vapor pressure deficit, mainly through an indirect action on ETc. The Kc values at different growth stages estimated by the proposed single crop coefficient model under full and deficit irrigation agreed well with the measured ones, and the water stress coefficient, Ks, under deficit irrigation varied from 0.5 to 1.0. The proposed single crop coefficient model also estimate daily ETc of drip irrigated tomato reasonably in the solar greenhouse with regression coefficient of 0.93−0.99, determination of coefficient of 0.78−0.95 and root mean square root of 0.35−0.52 mm d⁻¹.
Article
Natural gas (NG) is a vital energy in the energy structure transition, and its consumption prediction is a significant issue in energy structure management and energy security. As the second largest energy consumer and producer in the world, the status of NG in the United States (US) energy system has been increasing since the “An America First Energy Plan” was proposed in 2017. Accurate prediction of natural gas consumption (NGC) can provide an effective reference for decision-makers, policymakers, and energy companies. This paper proposes an improved kernel-based nonlinear extension of the Arps decline model (KNEA) to forecast NGC in the US. The grey wolf optimization (GWO) algorithm is used to optimize the regularization parameter and kernel width in the KNEA model, and applies the hybrid model to the NGC datasets of different sectors (including lease and plant fuel usage, pipeline and distribution usage, residential users, commercial users, industrial users, vehicle fuels users, and power generation users) in the US. Compared with the prediction results of five benchmark models, it is shown that the GWO-KNEA model has the best performance in each dataset, and the range of mean absolute percentage error is less than 5%. By comparing the computational time and memory occupancy of the model, it can be concluded that the time and space complexity of the GWO-KNEA model is greater than that of the original KNEA model, but lower than that of other benchmark models. Moreover, this paper uses the newly proposed model to predict the NGC and consumption mix of the US from 2019 to 2025. The main conclusions are drawn: (1) NGC in the US will show a slow growth trend (the average annual growth rate is only 1.2%); (2) The proportion of NGC in power generation will increase significantly, reaching about 39% in 2025; (3) The proportion of residential, commercial and industrial NGC will decline slightly.
Article
Capabilities of the bat algorithm optimized extreme learning machine (Bat-ELM) model for dew point temperature (Tdew) estimation were evaluated in this study, in comparison with the kernel-based nonlinear extension of Arps decline model (KNEA), the genetic algorithm optimized ELM (GA-ELM), the particle swarm optimization ELM (PSO-ELM), and six other non-hybrid machine learning models. Daily meteorological data [including mean temperature (Tmean), maximum temperature (Tmax), minimum temperature (Tmin), mean relative humidity (RHmean), maximum relative humidity (RHmax), minimum relative humidity (RHmin) and atmospheric pressure (Pa)] during 2014–2017 at the Yangling station of China were collected for model evaluation, by using six different input combinations and a 10-fold cross-validation. Results showed that all models exhibited a poor accuracy with Tmean as the only input, but had a relatively good accuracy under the combination of three meteorological parameters (i.e., Tmax, Tmin and Pa) that can be easily acquired. Under the combination of Tmax, Tmin, RHmax, RHmin and Pa, model performances were similar or even slightly worse when compared with the combination of Tmax, Tmin, RHmax and RHmin. Overall, for estimating daily Tdew, our results suggest that Bat-ELM would be the optimal model while Tmax, Tmin, RHmax and RHmin would be the best input combinations.
Article
While the adverse effects of elevated salinity levels on leaf gas exchange in many crops is not in dispute, representing such effects on leaf photosynthetic rates (A) continues to draw research attention. Here, an optimization model for stomatal conductance (gc) that maximizes A while accounting for mesophyll conductance (gm) was used to interpret new leaf gas exchange measurements collected for five irrigation water salinity levels. A function between chloroplastic CO2 concentration (cc) and intercellular CO2 concentration (ci) modified by salinity stress to estimate gm was proposed. Results showed that with increased salinity, the estimated gm and maximum photosynthetic capacity were both reduced, whereas the marginal water use efficiency λ increased linearly. Adjustments of gm, λ and photosynthetic capacity were shown to be consistent with a large corpus of drought‐stress experiments. The inferred model parameters were then used to evaluate the combined effects of elevated salinity and atmospheric CO2 concentration (ca) on leaf gas exchange. For a given salinity level, increasing ca increased A linearly, but these increases were accompanied by mild reductions in gc and transpiration. The ca level needed to ameliorate A reductions due to increased salinity is also discussed using the aforementioned model calculations.
Article
The rice-wheat rotation system is one of the largest agricultural production systems worldwide. Accurate estimation of evapotranspiration (ET) in the rice-wheat rotation system is critical to enhance efficient irrigation management and water use. The variation of ET for a rice-wheat rotation system during 2015-2018 and its controlling meteorological factors was investigated using the Bowen ratio energy balance and path analysis methods. A modified Priestley-Taylor (PT) model considering soil water stress for soil evaporation (E) (f sw), and plant temperature constraint (deviation of air temperature from optimum for the crops used, f t), leaf senescence for transpiration was developed. The results showed that the diurnal variation of ET rate in different months exhibited a single peak curve with the maximum ET rates of 0.90 and 0.42 mm h −1 for rice and winter wheat, respectively. The total ET of the rice-wheat rotation system over the whole growing season was 765-841 mm, of which 63-67% was consumed by the rice field. The average daily ET rate over the whole growing season was 3.27-4.13 and 1.50-1.65 mm d −1 for rice and winter wheat, respectively. The results of ET partitioning showed that E accounted for 23-32% of the seasonal ET for rice and 48-51% for winter wheat. The ET partitioning of rice and winter wheat was closely linked to leaf area index (LAI). The ratio of E/ET reduced exponentially for rice with the increase of LAI, while it reduced linearly for winter wheat. The path analysis showed that the net radiation (R n) was the dominant meteorological factor affecting short-term ET of the rice-wheat rotation system through the direct effect. The water vapour pressure deficit (VPD), another important factor influencing ET, showed mainly an indirect effect on ET through path of R n and had a greater impact on ET for rice than that for wheat. The modified PT model could estimate ET for rice and winter wheat reasonably, with linear regression coefficient of 0.93-1.09 and coefficient of determination of 0.92−0.96. The model was sensitive to the fsw or ft .
Article
Accurate estimation of daily reference evapotranspiration (ET0) are vital for water resource management and irrigation decision-making. Based on the public weather forecasts, numerous models have been successfully used for daily ET0 estimating, while too many models available for selection, which causes confusion regarding model selection for specific climate regions. In this paper, the estimating performances of six ET0 equations using public weather forecast for a lead time of 1–7 days were compared for four main climatic region across China, and then, the most accurate equation was recommended for each climate region. Meanwhile, the applicability of every equation was assessed in relation to four climates, including subtropical monsoon climate (Cwa), temperate continental climate (Dfc), temperate monsoon climate (Dwa) and mountain plateau climate (HG). The Penman-Monteith Forecast (PMF) equation, which consisting of an adaptation of FAO56-PM equation using temperature and weather type forecast as inputs, provided the best ET0 estimation performance in Cwa and HG climates; and the Temperature Penman-Monteith (PMT) equation using only the temperature data, obtained the most accurate average results for the Dwa and Dfc climates. The best and the second best estimation performance for each climate usually provided by PMF and PMT equations, since they both have the same advantage of following the conceptual approach of the FAO56-PM equation; further the third and the fourth choice would be the Hargreaves-Samani (HS) and the Blaney-Criddle (BC), respectively, while the Thornwaite (TH) and the McCloud (MC) yielded high errors and may not be applicable for ET0 estimation for most climatic regions. As a whole, the PMF and PMT equations were better than the other equations and thus these two equations were recommended for daily ET0 estimation for the near-future at all climate regions across China.
Article
Accurate global solar radiation data are fundamental information for the allocation and design of solar energy systems. The current study compared different machine learning and empirical models for global solar radiation prediction only using air temperature as inputs. Four machine learning models, e.g., hybrid mind evolutionary algorithm and artificial neural network model, original artificial neural network, random forests and wavelet neural network, as well as four empirical temperature-based models (Hargreaves-Samani model, Bristow-Campbell model, Jahani model, and Fan model) were applied for prediction of daily global solar radiation in temperate continental regions of China. The results indicated the hybrid mind evolutionary algorithm and artificial neural network model provided better estimations, compared with the existing machine learning and empirical models. Thus, the temperature-based hybrid model is highly recommended to predict global solar radiation in temperate continental regions of China when only air temperature data are available. Combining the hybrid model with future air temperature forecasts, we can get the accurate information of future solar radiation, which is of great importance to management and operation of solar energy systems.
Article
Reliable and accurate prediction of reference evapotranspiration (ETo) is a precondition for the efficient management and planning of agricultural water resources as well as the optimal design of irrigation scheduling. This study evaluated the performances of four bio-inspired algorithm optimized extreme learning machine (ELM) models, i.e. ELM with genetic algorithm (ELM-GA), ELM with ant colony optimization (ELM-ACO), ELM with cuckoo search algorithm (CSA) and ELM with flower pollination algorithm (ELM-FPA), for predicting daily ETo across China by using a five-fold cross-validation approach. These models were further compared with the classical ELM model parameterized by the grid search method to demonstrate their capability and efficiency. Daily maximum and minimum ambient temperatures, wind speed, relative humidity and global solar radiation data during 2001-2015 collected from eight meteorological stations in contrasting climates of China were utilized to train, validate and test the models. The results showed that ETo values predicted by all ELM models agreed well with the corresponding FAO-56 Penman–Monteith values, with R2, RMSE, NRMSE and MAE ranging 0.9766-0.9967, 0.0896-0.2883 mm day−1, 3.2910%-11.7653% and 0.0708-0.1998 mm day−1, respectively. The ELM-FPA model (R2 = 0.9930, RMSE = 0.1589 mm day−1, NRMSE = 5.5406% and MAE = 0.1188 mm day−1) slightly outperformed the ELM-CSA model (R2 = 0.9922, RMSE = 0.1619 mm day−1, NRMSE = 5.6864% and MAE = 0.1200 mm day−1) during testing, both of which were superior to the ELM-ACO (R2 = 0.9912, RMSE = 0.1730 mm day−1, NRMSE = 6.0816% and MAE = 0.1254 mm day−1) and ELM-GA (R2 = 0.9889, RMSE = 0.1895 mm day−1, NRMSE = 6.7197% and MAE = 0.1310 mm day−1) models, followed by the standalone ELM model (R2 = 0.9856, RMSE = 0.2104 mm day−1, NRMSE = 7.1693% and MAE = 0.1408 mm day−1). The four hybrid ELM models exhibited higher improvements in daily ETo prediction in the temperate monsoon and (sub)tropical monsoon climates (with average decrease in RMSE of 14.0%, 25.1%, 31.4% and 33.1%, respectively), compared with those in the temperate continental and mountain plateau climates (with average decrease in RMSE of 5.2%, 9.8%, 12.9% and 14.1%, respectively). The results advocated the capability of bio-inspired optimization algorithms, especially the FPA and CSA algorithms, for improving the performance of the conventional ELM model in daily ETo prediction in contrasting climates of China.
Article
Accurate estimation of reference evapotranspiration (ET 0 )is critical for water resource management and irrigation scheduling. This study evaluated the potential of a new machine learning algorithm using gradient boosting on decision trees with categorical features support (i.e., CatBoost)for accurately estimating daily ET 0 with limited meteorological data in humid regions of China. Two other commonly used machine learning algorithms, Random Forests (RF)and Support Vector Machine (SVM), were also assessed for comparison. Eight input combinations of daily meteorological data [including both complete and incomplete combinations of solar radiation (R s ), maximum and minimum temperatures (T max and T min ), relative humidity (H r )and wind speed (U)]from five weather stations during 2001–2015 in South China were applied for model training and testing. The results showed that all the three algorithms could achieve satisfactory accuracy for ET 0 estimation in subtropical China using R s , T max and T min , or U, H r , T max and T min as inputs, under the circumstances of lacking complete meteorological parameters. The increases in testing RMSE and MAPE over training RMSE and MAPE showed positive correlations with the number of input parameters to the machine learning models. For the local models, among the three algorithms, SVM offered the best prediction accuracy and stability with incomplete combinations of meteorological parameters as inputs, while CatBoost performed best with the complete combination of parameters. Patterns of the generalized models were almost the same as the local models, but the former ones showed less than 10% decreases in RMSE or MAPE in comparison with the latter ones. In addition, the computing time and memory usage for data processing of CatBoost were much less than those of RF and SVM. Overall, as a tree-based algorithm, CatBoost made significant improvements in accuracy, stability and computational cost when compared to RF. Therefore, the CatBoost algorithm has a very high potential for ET 0 estimation in humid regions of China, and even possibly in other parts of the world with similar humid climates.
Article
The computation of the reference crop evapotranspiration (ETo) using the FAO Penman-Monteith equation (PM-ETo) requires data on maximum and minimum air temperatures (Tmax, Tmin), vapour pressure deficit (VPD), solar radiation (Rs) and wind speed at 2 m height (u2). However, those data are often not available, or data sets may be incomplete or have questionable quality. Various procedures were proposed in FAO56 to overcome these limitations and an abundant literature has been and is being produced relative to alternative computational methods. Studies applied to a variety of climates, from hyper-arid to humid, have demonstrated that improved methods to compute PM-ETo from temperature only (PMT approach) have appropriate accuracy. These methods refer to estimating: (i) the dew point temperature (Tdew) from Tmin or, in case of humid climates, from the mean temperature, Tmean; (ii) Rs from the temperature difference (TD = Tmax-Tmin); and (iii) u2 using default global or regional values. Greater difficulties refer to the need for locally calibrating the radiation adjustment coefficient (kRs) used with the Rs equation. Therefore, considering that calibrated kRs values were made available by past studies for a large number of locations and diverse climates, the current study developed and tested simple computational approaches relating locally calibrated kRs with various observed weather variables – TD, relative humidity (RH) and average u2. The equations were developed using CLIMWAT monthly full-data relative to all the Mediterranean countries. The equations refer to all available data, or to data grouped as hyper-arid and arid, semi-arid, dry and moist sub-humid, and humid climates. To test those kRs equations, ETo computed from temperature and using the predicted kRs values were compared with ETo computed with full data sets of the same Mediterranean locations and of Iran, Inner Mongolia, Portugal and Bolivia. RMSE average values result then small, ranging from 0.34 to 0.54 mm day⁻¹, therefore not very far from values obtained when a trial and error procedure was used for all the same locations, from 0.27 to 0.46 mm day⁻¹. These indicators allow to propose the use of kRs obtained from the predictive equations instead of locally calibrated kRs values, which greatly eases computations and may largely favour the use of the PMT approach.
Article
The knowledge of global solar radiation is of vital importance for the design and use of solar energy systems. This study evaluated the potential of two new powerful machine learning models, i.e., kernel-based nonlinear extension of Arps decline model and gradient boosting with categorical features support, for accurately estimating daily global solar radiation in humid regions. These two models were also compared with the multilayer perceptron, M5 model tree, random forest and multivariate adaptive regression spline models, using five input combinations of daily meteorological data during 2001–2015 from four weather stations in the (sub)tropical humid regions of South China. The results showed that, when lack of complete meteorological data, machine learning models using the ratio of actual and theoretical sunshine duration, maximum and minimum temperatures obtained satisfactory daily global solar radiation estimates. Generally, the kernel-based nonlinear extension of Arps decline model offered the best prediction accuracy among the studied models, followed by the gradient boosting with categorical features support. The multilayer perceptron model exhibited the smallest average percentage increase in the root mean square error during testing over the training values, followed by the kernel-based nonlinear extension of Arps decline model. Both the kernel-based nonlinear extension of Arps decline model and gradient boosting with categorical features support were successfully applied to develop general models for daily global solar radiation prediction (differences in root mean square error <5% compared with local models). The gradient boosting with categorical features support and multilayer perceptron exhibited much smaller computational time for both local and general models. Comprehensively considering the accuracy, stability and computational time, both the kernel-based nonlinear extension of Arps decline model and gradient boosting with categorical features support are recommended for predicting daily global solar radiation in the humid regions of China.
Article
Over the last decade, the combination of both big data and machine learning research area’s receiving considerable attention and expedite the prospect of the agricultural industry. This research aims to gain insights into a state-of-the-art big data application in smart farming. An essential issue for agriculture planning is to estimate evapotranspiration accurately because it plays a pivotal role in irrigation water scheduling for using water efficiently. This article presents H2O model framework to determine the daily ETo for Hoshiarpur and Patiala districts of Punjab. The effects of four supervised learning algorithms: Deep Learning-Multilayer Perceptrons (DL), Generalized Linear Model (GLM), Random Forest (RF), and Gradient-Boosting Machine (GBM) and also evaluate the overall ability to predict future ETo. Analysis of these four models, perform in H2O framework. This framework presents a new criterion to train, validate, test and improve the classification efficiency using machine learning algorithms. The performance of the DL model is compared with other state-of-art of models such as RF, GLM and GBM. In this respect, our analysis depicts that models presents high performance for modeling daily ETo (e.g. NSE = 0.95–0.98, r2 = 0.95–0.99, ACC = 85–95, MSE = 0.0369–0.1215, RMSE = 0.1921–0.2691).
Article
Accurate estimation of pan evaporation (Ep) is required for many applications, e.g., water resources management, irrigation system design and hydrological modeling. However, the estimation of Ep for a target station can be difficult as a result of partial or complete lack of local meteorological data under many conditions. In this study, daily Ep was estimated from local (target-station) and cross-station data in the Poyang Lake Watershed of China using four empirical models and three tree-based machine learning models, including M5 model tree (M5Tree), random forests (RFs) and gradient boosting decision tree (GBDT). Daily meteorological data during 2001–2010 from 16 weather stations were used to train the models, while the data from 2011 to 2015 were used for testing. Two cross-station applications were considered between each of the 16 stations and the other 15 stations. The results showed that the radiation-based Priestley-Taylor model (on average RMSE = 1.13 mm d⁻¹, NSE = 0.53, R² = 0.57, MBE = 0.21 mm d⁻¹) gave the most accurate daily Ep estimates among the four empirical models during testing, while the mass transfer-based Trabert model (on average RMSE = 1.38 mm d⁻¹, NSE = 0.25, R² = 0.46, MBE = 0.65 mm d⁻¹) performed worst. The GBDT model outperformed the RFs model, M5Tree model and the empirical models under the same input combinations in terms of prediction accuracy (on average RMSE = 0.86 mm d⁻¹, NSE = 0.68, R² = 0.73, MBE = 0.07 mm d⁻¹) and model stability (average percentage increase in testing RMSE = 16.3%). The RMSE values generally increased with the increase in the distance of two cross stations. A distance of less than 100 km between two cross stations is highly recommended for cross-station applications with satisfactory prediction accuracy (median percentage increase in RMSE <5% for cross-station application #1 and <20% for application #2) in the Poyang Lake Watershed of China and maybe elsewhere with similar climates.
Article
Although many studies have demonstrated the good performances of artificial intelligence (AI) approaches for reference evapotranspiration modeling, the applicability of AI approaches for actual crop evapotranspiration (ET) modeling still remains uncertain, especially in plastic mulched croplands. The objective of the present study was to evaluate the applicability of two different artificial intelligence approaches, including support vector machine (SVM) and artificial neural network optimized by genetic algorithm (GANN), in modeling actual ET in a rainfed maize field under non-mulching (CK) and partial plastic film mulching (MFR). A field experiment was conducted for continuous measurements of ET, meteorological variables, leaf area index (LAI) and plant heights (hc) under both CK and MFR during maize seasons of 2011–2013. The meteorological data containing minimum, maximum, mean air temperature, minimum, maximum, mean relative humidity, solar radiation, wind speed and crop data including LAI and hc during maize growing seasons of 2011–2012 were used to trained the SVM and GANN models by using two different input combination, and data of 2013 were used to validate the performances of the models. The results indicated that SVM1 and GANN1 models with meteorological and crop data as input could accurately estimate maize ET, which confirmed the good performances of SVM and GANN models for maize ET estimation. The performances of SVM2 and GANN2 models only with meteorological data as input were relatively poorer than those of SVM1 and GANN1 models, but the estimated results were acceptable when only meteorological data were available. Due to the optimizing of the genetic algorithm, the GANN models performed a slightly better than the SVM models under both CK and MFR, and can be highly recommended to model ET.
Article
The knowledge of global solar radiation (H) is a prerequisite for the use of renewable solar energy, but H measurements are always not available due to high costs and technical complexities. The present study proposes two machine learning algorithms, i.e. Support Vector Machine (SVM) and a novel simple tree-based ensemble method named Extreme Gradient Boosting (XGBoost), for accurate prediction of daily H using limited meteorological data. Daily H, maximum and minimum air temperatures (Tmax and Tmin), transformed precipitation (Pt, 1 for rainfall > 0 and 0 for rainfall = 0) and extra-terrestrial solar radiation (H0) during 1966–2000 and 2001–2015 from three radiation stations in humid subtropical China were used to train and test the models, respectively. Two combinations of input parameters, i.e. (i) only Tmax, Tmin and Ra, and (ii) complete data were considered for simulations. The proposed machine learning models were also compared with four well-known empirical models to evaluate their performances. The results suggest that the SVM and XGBoost models outperformed the selected empirical models. The performance of the machine learning models was improved by 5.9–12.2% for training phase and by 8.0–11.5% for testing phase in terms of RMSE when information of precipitation was further included. Compared with the SVM model, the XGBoost model generally showed better performance for training phase, and slightly weaker but comparable performance for testing phase in terms of accuracy. However, the XGBoost model was more stable with average increase of 6.3% in RMSE, compared to 10.5% for the SVM algorithm. Also, the XGBoost model (3.02 s and 0.05 s for training and testing phase, respectively) showed much higher computation speed than the SVM model (27.48 s and 4.13 s for training and testing phase, respectively). By jointly considering the prediction accuracy, model stability and computational efficiency, the XGBoost model is highly recommended to estimate daily H using commonly available temperature and precipitation data with excellent performance in humid subtropical climates.
Article
This article provides the first comprehensive study to explore the potential of tree-based ensemble methods in modeling solar radiation. Gradient boosting, bagging and random forest (RF) models have been developed for estimating global, diffuse and normal radiation components in daily and hourly time-scales. The developed ensemble models have been compared to their corresponding multi-layer perceptron (MLP), support vector regression (SVR) and decision tree (DT) models. The results show that the suggested techniques are very reliable and accurate, despite being relatively simple. The average validation coefficients of determination (R2) for boosting, bagging and RF algorithms are (0.957, 0.971, 0.967) for the global irradiation model, (0.768, 0.786, 0.791) for the diffuse irradiation model, (0.769, 0.785, 0.792) for the normal irradiation model, (0.852, 0.890, 0.883) for the hourly global irradiance model, (0.778, 0.869, 0.853) for the diffuse irradiance model, and (0.797, 0.897, 0.880) for the normal irradiance model. In general, the bagging and RF algorithms showed better estimates than gradient boosting. However, the gradient boosting algorithm was the most stable with maximum increase of 10.32% in the test root mean square error, compared to 41.3% for the MLP algorithm. The SVR algorithm offers the best combination of stability and prediction accuracy. Nevertheless, its computational costs are up to 39 times the computational costs of ensemble methods. The new ensemble methods have been recommended for generating synthetic radiation data to be used for simulating and evaluating the performance of different solar energy system s.
Article
Evapotranspiration is one of the most important components of hydrologic cycle for optimal management of water resources, especially in arid and semi-arid regions such as Iran. The main objective of the present research is to investigate the performance of empirical equations and soft computing approaches including gene expression programming (GEP), two types of support vector machine (SVM) namely SVM-polynomial (SVM-Poly) and SVM-radial basis function (SVM-RBF), as well as multivariate adaptive regression splines (MARS) in estimating monthly mean reference evapotranspiration (ETo) in Iran. In the present study, 16 empirical equations from temperature-based, mass transfer-based, radiation-based and meteorological parameters-based categories were utilized. Monthly mean data of 44 stations in the study region was used to estimate the monthly mean ETo. 50% of the data (22 stations) for the calibration/training step and the remaining 50% of the data (22 stations) were applied for the validation/testing stage of the empirical equations/soft computing methods. At first, 16 empirical equations were locally calibrated on the basis of FAO-56 Penman-Monteith method (as standard method). The results revealed that the calibration process improved the performance of equations in comparison with the original form of them. Then, the capability of the GEP, SVM-Poly, SVM-RBF and MARS models was evaluated for estimation of the monthly mean ETo. The selection of models’ inputs was conducted based on the used parameters in the empirical equations. It was found that the MARS and SVM-RBF methods generally performed better than GEP and SVM-Poly. At the end part of study, the accuracy of empirical equations and soft computing methods was compared. Overall, the performance of the MARS and SVM-RBF was better than used empirical equations.
Article
Terrestrial evapotranspiration (ET) for each plant functional type (PFT) is a key variable for linking the energy, water and carbon cycles of the atmosphere, hydrosphere and biosphere. Process-based algorithms have been widely used to estimate global terrestrial ET, yet each ET individual algorithm has exhibited large uncertainties. In this study, the support vector machine (SVM) method was introduced to improve global terrestrial ET estimation by integrating three process-based ET algorithms: MOD16, PT-JPL and SEMI-PM. At 200 FLUXNET flux tower sites, we evaluated the performance of the SVM method and others, including the Bayesian model averaging (BMA) method and the general regression neural networks (GRNNs) method together with three process-based ET algorithms. We found that the SVM method was superior to all other methods we evaluated. The validation results showed that compared with the individual algorithms, the SVM method driven by tower-specific (Modern Era Retrospective Analysis for Research and Applications, MERRA) meteorological data reduced the root mean square error (RMSE) by approximately 0.20 (0.15) mm/day for most forest sites and 0.30 (0.20) mm/day for most crop and grass sites and improved the squared correlation coefficient (R²) by approximately 0.10 (0.08) (95% confidence) for most flux tower sites. The water balance of basins and the global terrestrial ET calculation analysis also demonstrated that the regional and global estimates of the SVM-merged ET were reliable. The SVM method provides a powerful tool for improving global ET estimation to characterize the long-term spatiotemporal variations of the global terrestrial water budget.
Article
Predictions regarding the solar greenhouse temperature and humidity are important because they play a critical role in greenhouse cultivation. On account of this, it is important to set up a predictive model of temperature and humidity that would precisely predict the temperature and humidity, reducing potential financial losses. This paper presents a novel temperature and humidity prediction model based on convex bidirectional extreme learning machine (CB-ELM). Simulation results show that the convergence rate of the bidirectional extreme learning machine (B-ELM) can further be improved while retaining the same simplicity, by simply recalculating the output weights of the existing nodes based on a convex optimization method when a new hidden node is randomly added. The performance of the CB-ELM model is compared with other modeling approaches by applying it to predict solar greenhouse temperature and humidity. The experiment results show that the CB-ELM model predictions are more accurate than those of the B-ELM, Back Propagation Neural Network (BPNN), Support Vector Machine (SVM), and Radial Basis Function (RBF). Therefore, it can be considered as a suitable and effective method for predicting the solar greenhouse temperature and humidity.
Article
Reliable knowledge of solar radiation is an essential requirement for designing and planning solar energy systems. Thus, this paper presents a novel hybrid model for predicting hourly global solar radiation using random forests technique and firefly algorithm. Hourly meteorological data are used to develop the proposed model. The firefly algorithm is utilized to optimize the random forests technique by finding the best number of trees and leaves per tree in the forest. According to the results, the best number of trees and leaves per tree is 493 trees and one leaf per tree in the forest. Three statistical error values, namely, root mean square error, mean bias error, and mean absolute percentage error are used to evaluate the proposed model for the internal and external validation. Moreover, the results of the proposed model are compared with conventional random forests model, conventional artificial neural network and optimized artificial neural network model by firefly algorithm to show the superiority of the proposed hybrid model. Results show that the root mean square error, mean absolute percentage error, and mean bias error values of the proposed model are 18.98%, 6.38% and 2.86%, respectively. Moreover, the proposed random forests model shows better performance as compared to the aforementioned models in terms of prediction accuracy and prediction speed.
Article
The artificial neural networks (ANN) and the empirical methods of Priestley-Taylor, Makkink, Hargreaves and mass transfer were used to estimate the reference evapotranspiration with daily meteorological data. These datasets consisted of daily meteorological measurements from a station in northern Greece, covering a period of five years (2009–2013). The daily values of the reference evapotranspiration were calculated using the Penman-Monteith equation. Those datasets were used for training and testing the ANN. The algorithm that was used is of the multi-layer feed forward artificial neural networks and of the back-propagation for optimization. The architecture that was finally chosen has the 4-6-1 structure, with 4 neurons in the input layer, 6 neurons in the hidden layer and 1 neuron in the output layer which corresponds to the reference evapotranspiration, using a sigmoid transfer function. The ANNs models estimate ETo with an accuracy of a root mean square error (RMSE) ranged from 0.574 to 1.33 mm d⁻¹, and correlation coefficient (r) from 0.955 to 0.986. Using limited input variables (3 or 2) for training the ANNs result in ETo values with slightly lower accuracy. The RMSE ranged from 0.598 to 0.954 mm d⁻¹ and r ranged from 0.952 to 0.978 when 3 inputs variables were used, and RMSE of 0.846 to 1.326 mm d⁻¹ and r of 0.910 to 0.956 when 2 input variables were used. The Priestley-Taylor and Makkink methods correlated very well with the Penman-Monteith method followed by the Hargreaves method which overestimates the higher values of ETo. The mass transfer method also correlated satisfactorily but it underestimated the ETo values.
Conference Paper
Tree boosting is a highly effective and widely used machine learning method. In this paper, we describe a scalable end-to-end tree boosting system called XGBoost, which is used widely by data scientists to achieve state-of-the-art results on many machine learning challenges. We propose a novel sparsity-aware algorithm for sparse data and weighted quantile sketch for approximate tree learning. More importantly, we provide insights on cache access patterns, data compression and sharding to build a scalable tree boosting system. By combining these insights, XGBoost scales beyond billions of examples using far fewer resources than existing systems.
Article
In the pharmaceutical industry it is common to generate many QSAR models from training sets containing a large number of molecules and a large number of descriptors. The best QSAR methods are those that can generate the most accurate predictions but that are not overly expensive computationally. In this paper we compare extreme gradient boosting (XGBoost) to random forest and single-task deep neural nets on 30 in-house data sets. While XGBoost has many adjustable parameters, we can define a set of standard parameters at which XGBoost makes predictions, on the average, better than those of random forest and almost as good as those of deep neural nets. The biggest strength of XGBoost is its speed. Whereas efficient use of random forest requires generating each tree in parallel on a cluster, and deep neural nets are usually run on GPUs, XGBoost can be run on a single cluster CPU in less than a third of the wall-clock time of either of the other methods.