| General flowchart of GA-SVM algorithm.

| General flowchart of GA-SVM algorithm.

Source publication
Article
Full-text available
Transverse mixing coefficient (TMC) is known as one of the most effective parameters in the two-dimensional simulation of water pollution, and increasing the accuracy of estimating this coefficient will improve the modeling process. In the present study, genetic algorithm (GA)-based support vector machine (SVM) was used to estimate TMC in streams....

Context in source publication

Context 1
... each instance of the whole training set is estimated once so the cross-validation accuracy is the percentage of correctly classified data. The general flowchart of GA-SVM is illustrated in Figure 4. In the present study, SVM and GA-SVM were applied by using RBF kernel function and input variables. ...

Similar publications

Chapter
Full-text available
Aiming at the problem of water pollution classification, a water pollutant classification method based on multi-classification support vector machine is proposed. By constructing and optimizing the coding matrix, a classification coding table and decoding table are formed, and SVM sub-classifiers are used for data classification. The classification...

Citations

... These researchers presented results of superior accuracy compared to the proposed empirical formulas derived by multiple linear regression. For intermediate mixing analysis, D T has also been estimated using a machine learning model [19,[29][30][31][32][33]. In these studies, 165-420 datasets, a significant portion of which included lab-scale results, were adopted to develop machine learning models, and the estimated D T showed enhanced performance compared to the empirical formulae. ...
Article
Full-text available
The advection–dispersion equation has been widely used to analyze the intermediate field mixing of pollutants in natural streams. The dispersion coefficient, manipulating the dispersion term of the advection–dispersion equation, is a crucial parameter in predicting the transport distance and contaminated area in the water body. In this study, the transverse dispersion coefficient was estimated using machine learning regression methods applied to oversampled datasets. Previous research datasets used for this estimation were biased toward width-to-depth ratio (W/H) values ≤ 50, potentially leading to inaccuracies in estimating the transverse dispersion coefficient for datasets with W/H > 50. To address this issue, four oversampling techniques were employed to augment the dataset with W/H > 50, thereby mitigating the dataset’s imbalance. The estimation results obtained from data resampling with nonlinear regression method demonstrated improved prediction accuracy compared to the pre-oversampling results. Notably, the combination of adaptive synthetic sampling (ADASYN) and eXtreme Gradient Boosting regression (XGBoost) exhibited improved accuracy compared to other combinations of oversampling techniques and nonlinear regression methods. Through the combined ADASYN–XGBoost approach, it is possible to enhance the transverse dispersion coefficient estimation performance using only two variables, W/H and bed friction effects (U/U*), without adding channel sinuosity; this represents the effects of secondary currents.
... The dispersion coefficients in longitudinal and transverse directions can be predicted using empirical and ML-based models, and several ML approaches have been used to estimate dispersion coefficients in natural streams. For instance, Azamathulla and Wu (2011) applied a support vector regression (SVR) model, Kargar et al. (2020) used random forest regression (RFR), Tayfur and Singh (2005) and Noori et al. (2016) employed artificial neural network (ANN) algorithms to predict the longitudinal dispersion coefficient, while Nezaratian et al. (2021) used SVR and ANFIS models for to predict transverse dispersion coefficient in streams. ...
Article
Full-text available
Numerous empirical equations and machine learning (ML) techniques have emerged to forecast dispersion coefficients in open channels. However, the efficacy of certain learning-based models in predicting these coefficients remains unstudied. Also, the direct application of machine learning-derived dispersion coefficients to Lagrangian sediment transport models has not been investigated. The present study utilizes data from prior research to assess the performance of ensemble ML-based models, specifically, random forest regression (RFR) and gradient boosting regression (GBR) inn estimating longitudinal and transverse dispersion in natural streams. The optimal hyper-parameters of these ensemble models were fine-tuned using grid-search cross-validation. The ML-based dispersion models were then integrated into a Lagrangian particle tracking model (PTM) to simulate suspended sediment concentration in natural streams. Suspended sediment concentration distribution maps generated from developed PTM with ML-based dispersion coefficients were compared with field data. The findings indicated that the GBR model, with a coefficient of determination (R²) of 0.95, outperformed the RFR model, which had an R² of 0.9, in predicting longitudinal dispersion coefficients in a natural stream across both training and testing stages. However, during the testing phase, the RFR model with an R² of 0.94 performed better than the GBR model with an R² of 0.91 in predicting transverse dispersion. Both models consistently underestimated dispersion coefficients in both training and testing stages. Comparisons between the PTM with ensemble dispersion coefficients and empirical-based dispersion relationships revealed the superior performance of the GBR model compared to the other two methods.
... For this purpose, a variety of methods were applied to do more comprehensive and efficient hyperparameter tuning. Nezaratian et al., (2021) implemented a genetic algorithm (GA) coupled with the support vector machine (SVM) for the estimation of the transverse mixing coefficient (TMC). Mansour-Bahmani et al., (2021) applied multilayered perceptron neural network (MLPNN) and genetic programming (GP) to estimate urban wastewater discharge. ...
Article
Full-text available
Physically based models (PBMs), including stormwater management model (SWMM), require a significant amount of in situ data and expertise to predict water quality in urban watersheds. In recent years, data-driven models have been increasingly used as an alternative for the prediction of pollutant concentrations. Supervised machine learning (ML) models have been used for estimating stormwater quality parameters. However, optimizing the structure of such ML models has rarely been considered. This study aims to comprehensively evaluate the optimization of the supervised ensemble bagging ML model for forecasting stormwater quality using an ML-based optimization method called Bayesian optimization (BO). To that end, a bagging ensemble model, namely random forest (RF), was first developed for estimating total suspended solids (TSS) concentration in urban watersheds. Eleven factors, including drainage area, land-use types, impervious area, rainfall depth, the volume of runoff, and antecedent dry days, were implemented as predictive features in the model, and their data were acquired from the National Stormwater Quality Database (NSQD). Values for the number of basic estimators, the number of basic selected features for developing basic estimators, subsamples, and the maximum depth of basic learners were optimized using BO. A sensitivity analysis was done on the ML model and the BO parameters, including acquisition function, number of initial points, and realizations. Results indicated that the accuracy of the RF model depends on all mentioned RF parameters. The performance of the best-developed RF model was satisfactory in both the training and the testing steps. This model obtained the R² values of 0.955 and 0.915 for the training and testing step, respectively. The study demonstrated the potential of a combination of the RF models and BO for accurately predicting stormwater quality parameters.
... In many types of research, LSSVR has been used to solve hydraulic problems [17,18]. According to Olyaie, Banejad and Heydari [19], LSSVR was the best ML method for predicting the Cd of piano-key weirs. ...
... ET 0 potential evapotranspiration (mm/day) I m moisture index, mm n/N ratio of actual to maximum possible duration of sunshine hours R 2 coefficient of determination R n net radiation at the crop surface (MJm À2 day À1 ) s the water surplus, mm d the water deficit, mm T mean average temperature (°C) T max maximum temperature (°C) T min minimum temperature (°C) U wind velocity (km day À1 ) revealed that all models had a tolerable level of accuracy in predicting the piezometric heads, although the MARS model performed the best and the M5 method performed the worst Parsaie et al. (2021). Nezaratian et al. (2021) approximated the transverse mixing coefficient (TMC) in streams using SVM based on genetic algorithm (GA) and found that efficient TMC estimate by GA-SVM can reduce the complexity by minimizing the number of input parameters. ...
Article
Full-text available
The Penman–Monteith evapotranspiration (ET) model has superior predictive ability to other methods, but it is challenging to apply in several Indian stations, owing to the need for a large number of climatic variables. The study investigated an artificial neural network (ANN) model for calculating ET for various agro-climatic regions of India. Sensitivity analysis showed that the overall average changes in ET0 values for 25% change in the climatic variables were 18, 16, 14, 7, 5, and 4%, respectively, for Tmax, RHmean, Rn, wind speed, Tmin, and sunshine hours. The dominant climatic variables were identified from the principal component analysis (PCA) and ET0 was computed using an ANN with dominant climatic variables. The ANN architecture with backpropagation technique had one hidden layer and neurons ranging from 10 to 30 for all climatic variables and from 5 to 10 for PCA variables. The new ET models were statistically compared with Penman–Monteith ET estimate, and found reliable. PCA variables guaranteed an estimate of ET0 accounting for 98% of the variability. The average values of coefficient of determination, standard error of estimate, and percentage efficiency were observed as 0.96, 0.24, and 94%, respectively. HIGHLIGHTS The Penman–Monteith ET model is the standard but data-intensive, so its applicability is limited.; The crucial climatic variables influencing ET are identified for various agro-climatic regions using principal component analysis and sensitivity analysis.; New ET models are developed and compared with the standard Penman–Monteith ET estimate.; Less data-intensive ANN models are proven to be acceptable in estimating ET0.;
... In [18], the study proposed a convolution neural network model to reduce the deaths caused by Pneumonia. The proposed model achieved 85.6% and 92.31% accuracy, respectively, which is an improvement over previous models. ...
Article
In this research, a combined wavelet-neural network (WMLPNN) and wavelet-support vector machine (WSVM) model were developed to predict the piezometric head inside the core of earthen dams, and the obtained performance was compared with the conventional SVM and MLPNN models. For this purpose, monthly data of the water surface elevation in the reservoir (up to 5 months) and the piezometers installed in the core of the Bam earthen dam, which is located in the southwest of Kerman province in Iran, were used. In the development of hybrid WMLPNN and WSVM models, various wavelet transforms including haar, db, and sym were utilized, and up to five degrees of decomposition of each of the signals was tested. The sigmoid tangent and the radial functions were used as transfer and kernel neuron functions in the WMLPNN and WSVM models. The results showed that the WMLPNN model with a three-layer structure (one introduction layer, one hidden layer, and one output layer) in the testing stage with RMSE = 1.340, \({R}^{2}=0.974\) and the WSVM with four kernel functions in the hidden layer with statistical indices of RMSE = 0.774, \({R}^{2}=0.987\) can predict the piezometric head in the core of the earthen dam. The results showed that adding more than three units of time delays in the input information does not increase the modeling accuracy. A comparison of the performance of both hybrid models shows that the SVM model is slightly more accurate. The comparison of the performance of developed combined models with their conventional state shows that the use of the wavelet algorithm can increase the accuracy of the mentioned models by up to 12%.
Article
Fuzzy inference system, Sardasht city, Water quality. Assessing water quality is an important step drawot dht optimal and appropriate use of drinking water resources. Therefore, the necessity of studying water quality characteristics in water resource management programs has been highly considered. Ambiguity and lack of inherent certainty governing water resources in the evaluation of goals, criteria, and decision-making units, wa inconsistency and carelessness in the opinions and judgments of decision-makers have led to the tendency towards theories of fuzzy sets and, as a result, fuzzy logic as an efficient and useful tool for planning and making decisions. In the present dnttogordnt awdto ydwladq awdto was first classified by international standard methods (definitive evaluation method) for drinking purposes. Then classification was modeled and compared using Mamdani fuzzy inference. For this purpose, the four-year average of quality parameters of underground water sources related to 33 sources including 10 well rings, 22 spring mouths, and an aqueduct in operation in Sardasht yadqre used as inputs in two cases. In the deterministic evaluation method (Schoeller diagram), the characteristics and the water quality determination diagram were determined. In the four-year average fuzzy inference model, eight water quality parameters were classified into three groups, in the first group the parameters of Na + , Ca +2 , and Mg +2 , in the second group the parameters of HCO3-, SO4-2, and Cl-were placed in the third group of two TH and TDS parameters. After determining each group with two input parameters, each input parameter was considered including three membership functions, so that the rules considered for it were estimated as nine (3x3). The results based on the deterministic method showed that all the studied samples were in the good to acceptable group. But Mamdani's fuzzy findings showed that two samples with a confidence level of 50% were in the acceptable category and other samples with a confidence level of 83-87% were placed in the desirable category for drinking.
Preprint
Full-text available
Several empirical equations and machine learning approaches have been developed to predict dispersion coefficients in open channels; however, the ability of some learning-based models to predict these coefficients has not yet been evaluated, and the direct application of machine learning-based dispersion coefficients to Lagrangian sediment transport models has not been studied. In this research, data from previous studies is used to evaluate the ability of ensemble machine learning models, i.e., random forest regression (RFR) and gradient boosting regression (GBR), to predict longitudinal and transverse dispersion in natural streams. The optimal principal parameters of ensemble models were adjusted using the grid-search cross-validation technique, and the machine learning-based dispersion models were integrated with a Lagrangian particle tracking model to simulate suspended sediment concentration in natural streams. The resulting suspended sediment concentration distribution was compared with the field data. The results showed that GBR model, with a coefficient of determination (R 2) of 0.95, performed better than the RFR model, with R 2 =0.9, in predicting the longitudinal dispersion coefficients in a natural stream in both training and testing stages. However, the RFR model with R 2 = 0.94 performed better than the GBR (R 2 = 0.91) in predicting the transverse dispersion in testing stage. Both models underestimated the dispersion coefficients in the training and testing stages. Comparison between the PTM with ensemble dispersion coefficients and empirical-based dispersion relationships revealed the better performance of the GBR model compared to the other two methods.