| General flowchart of GA-SVM algorithm.

Source publication

Figure 1 | General steps of pollution dispersion in a stream (Fischer...

Figure 2 | Correlations between all input and output parameters.

Figure 3 | Nonlinear SVM with Vapnik's e-insensitive loss function.

Figure 4 | General flowchart of GA-SVM algorithm.

Figure 5 | Boxplots of all parameters with outliers (*).

A genetic algorithm-based support vector machine to estimate the transverse mixing coefficient in streams

Article

Full-text available

Aug 2021

Transverse mixing coefficient (TMC) is known as one of the most effective parameters in the two-dimensional simulation of water pollution, and increasing the accuracy of estimating this coefficient will improve the modeling process. In the present study, genetic algorithm (GA)-based support vector machine (SVM) was used to estimate TMC in streams....

Context 1

... each instance of the whole training set is estimated once so the cross-validation accuracy is the percentage of correctly classified data. The general flowchart of GA-SVM is illustrated in Figure 4. In the present study, SVM and GA-SVM were applied by using RBF kernel function and input variables. ...

View in full-text

Classification accuracy of different methods

Water Pollutant Classification Method Based on Multi-Class Support Vector Machine

Chapter

Full-text available

Nov 2023

Aiming at the problem of water pollution classification, a water pollutant classification method based on multi-classification support vector machine is proposed. By constructing and optimizing the coding matrix, a classification coding table and decoding table are formed, and SVM sub-classifiers are used for data classification. The classification...

Application of Oversampling Techniques for Enhanced Transverse Dispersion Coefficient Estimation Performance Using Machine Learning Regression

Article

Full-text available

May 2024

The advection–dispersion equation has been widely used to analyze the intermediate field mixing of pollutants in natural streams. The dispersion coefficient, manipulating the dispersion term of the advection–dispersion equation, is a crucial parameter in predicting the transport distance and contaminated area in the water body. In this study, the transverse dispersion coefficient was estimated using machine learning regression methods applied to oversampled datasets. Previous research datasets used for this estimation were biased toward width-to-depth ratio (W/H) values ≤ 50, potentially leading to inaccuracies in estimating the transverse dispersion coefficient for datasets with W/H > 50. To address this issue, four oversampling techniques were employed to augment the dataset with W/H > 50, thereby mitigating the dataset’s imbalance. The estimation results obtained from data resampling with nonlinear regression method demonstrated improved prediction accuracy compared to the pre-oversampling results. Notably, the combination of adaptive synthetic sampling (ADASYN) and eXtreme Gradient Boosting regression (XGBoost) exhibited improved accuracy compared to other combinations of oversampling techniques and nonlinear regression methods. Through the combined ADASYN–XGBoost approach, it is possible to enhance the transverse dispersion coefficient estimation performance using only two variables, W/H and bed friction effects (U/U*), without adding channel sinuosity; this represents the effects of secondary currents.

Application of Machine Learning Approaches in Particle Tracking Model to Estimate Sediment Transport in Natural Streams

Article

Full-text available

Mar 2024
WATER RESOUR MANAG

Numerous empirical equations and machine learning (ML) techniques have emerged to forecast dispersion coefficients in open channels. However, the efficacy of certain learning-based models in predicting these coefficients remains unstudied. Also, the direct application of machine learning-derived dispersion coefficients to Lagrangian sediment transport models has not been investigated. The present study utilizes data from prior research to assess the performance of ensemble ML-based models, specifically, random forest regression (RFR) and gradient boosting regression (GBR) inn estimating longitudinal and transverse dispersion in natural streams. The optimal hyper-parameters of these ensemble models were fine-tuned using grid-search cross-validation. The ML-based dispersion models were then integrated into a Lagrangian particle tracking model (PTM) to simulate suspended sediment concentration in natural streams. Suspended sediment concentration distribution maps generated from developed PTM with ML-based dispersion coefficients were compared with field data. The findings indicated that the GBR model, with a coefficient of determination (R²) of 0.95, outperformed the RFR model, which had an R² of 0.9, in predicting longitudinal dispersion coefficients in a natural stream across both training and testing stages. However, during the testing phase, the RFR model with an R² of 0.94 performed better than the GBR model with an R² of 0.91 in predicting transverse dispersion. Both models consistently underestimated dispersion coefficients in both training and testing stages. Comparisons between the PTM with ensemble dispersion coefficients and empirical-based dispersion relationships revealed the superior performance of the GBR model compared to the other two methods.

Hyperparameter tuning of supervised bagging ensemble machine learning model using Bayesian optimization for estimating stormwater quality

Article

Full-text available

Mar 2024

Mohammadreza Moeini

Physically based models (PBMs), including stormwater management model (SWMM), require a significant amount of in situ data and expertise to predict water quality in urban watersheds. In recent years, data-driven models have been increasingly used as an alternative for the prediction of pollutant concentrations. Supervised machine learning (ML) models have been used for estimating stormwater quality parameters. However, optimizing the structure of such ML models has rarely been considered. This study aims to comprehensively evaluate the optimization of the supervised ensemble bagging ML model for forecasting stormwater quality using an ML-based optimization method called Bayesian optimization (BO). To that end, a bagging ensemble model, namely random forest (RF), was first developed for estimating total suspended solids (TSS) concentration in urban watersheds. Eleven factors, including drainage area, land-use types, impervious area, rainfall depth, the volume of runoff, and antecedent dry days, were implemented as predictive features in the model, and their data were acquired from the National Stormwater Quality Database (NSQD). Values for the number of basic estimators, the number of basic selected features for developing basic estimators, subsamples, and the maximum depth of basic learners were optimized using BO. A sensitivity analysis was done on the ML model and the BO parameters, including acquisition function, number of initial points, and realizations. Results indicated that the accuracy of the RF model depends on all mentioned RF parameters. The performance of the best-developed RF model was satisfactory in both the training and the testing steps. This model obtained the R² values of 0.955 and 0.915 for the training and testing step, respectively. The study demonstrated the potential of a combination of the RF models and BO for accurately predicting stormwater quality parameters.

Reliable prediction of the discharge coefficient of triangular labyrinth weir based on soft computing techniques

Article

Jun 2023
FLOW MEAS INSTRUM

ANN-based PCA to predict evapotranspiration: a case study in India

Article

Full-text available

May 2023

The Penman–Monteith evapotranspiration (ET) model has superior predictive ability to other methods, but it is challenging to apply in several Indian stations, owing to the need for a large number of climatic variables. The study investigated an artificial neural network (ANN) model for calculating ET for various agro-climatic regions of India. Sensitivity analysis showed that the overall average changes in ET0 values for 25% change in the climatic variables were 18, 16, 14, 7, 5, and 4%, respectively, for Tmax, RHmean, Rn, wind speed, Tmin, and sunshine hours. The dominant climatic variables were identified from the principal component analysis (PCA) and ET0 was computed using an ANN with dominant climatic variables. The ANN architecture with backpropagation technique had one hidden layer and neurons ranging from 10 to 30 for all climatic variables and from 5 to 10 for PCA variables. The new ET models were statistically compared with Penman–Monteith ET estimate, and found reliable. PCA variables guaranteed an estimate of ET0 accounting for 98% of the variability. The average values of coefficient of determination, standard error of estimate, and percentage efficiency were observed as 0.96, 0.24, and 94%, respectively. HIGHLIGHTS The Penman–Monteith ET model is the standard but data-intensive, so its applicability is limited.; The crucial climatic variables influencing ET are identified for various agro-climatic regions using principal component analysis and sensitivity analysis.; New ET models are developed and compared with the standard Penman–Monteith ET estimate.; Less data-intensive ANN models are proven to be acceptable in estimating ET0.;

Enhanced feature selection algorithm for pneumonia detection

Article

Jan 2023

Development of Soft Computing Models Based on Wavelet Analysis for Estimating Piezometric Heads in Earth Dams

Article

Jul 2023

In this research, a combined wavelet-neural network (WMLPNN) and wavelet-support vector machine (WSVM) model were developed to predict the piezometric head inside the core of earthen dams, and the obtained performance was compared with the conventional SVM and MLPNN models. For this purpose, monthly data of the water surface elevation in the reservoir (up to 5 months) and the piezometers installed in the core of the Bam earthen dam, which is located in the southwest of Kerman province in Iran, were used. In the development of hybrid WMLPNN and WSVM models, various wavelet transforms including haar, db, and sym were utilized, and up to five degrees of decomposition of each of the signals was tested. The sigmoid tangent and the radial functions were used as transfer and kernel neuron functions in the WMLPNN and WSVM models. The results showed that the WMLPNN model with a three-layer structure (one introduction layer, one hidden layer, and one output layer) in the testing stage with RMSE = 1.340, \({R}^{2}=0.974\) and the WSVM with four kernel functions in the hidden layer with statistical indices of RMSE = 0.774, \({R}^{2}=0.987\) can predict the piezometric head in the core of the earthen dam. The results showed that adding more than three units of time delays in the input information does not increase the modeling accuracy. A comparison of the performance of both hybrid models shows that the SVM model is slightly more accurate. The comparison of the performance of developed combined models with their conventional state shows that the use of the wavelet algorithm can increase the accuracy of the mentioned models by up to 12%.

Groundwater quality assessment using fuzzy inference system for drinking purposes (Case study: Sardasht city, West Azerbaijan province, Iran)

Article

Jun 2023

Fuzzy inference system, Sardasht city, Water quality. Assessing water quality is an important step drawot dht optimal and appropriate use of drinking water resources. Therefore, the necessity of studying water quality characteristics in water resource management programs has been highly considered. Ambiguity and lack of inherent certainty governing water resources in the evaluation of goals, criteria, and decision-making units, wa inconsistency and carelessness in the opinions and judgments of decision-makers have led to the tendency towards theories of fuzzy sets and, as a result, fuzzy logic as an efficient and useful tool for planning and making decisions. In the present dnttogordnt awdto ydwladq awdto was first classified by international standard methods (definitive evaluation method) for drinking purposes. Then classification was modeled and compared using Mamdani fuzzy inference. For this purpose, the four-year average of quality parameters of underground water sources related to 33 sources including 10 well rings, 22 spring mouths, and an aqueduct in operation in Sardasht yadqre used as inputs in two cases. In the deterministic evaluation method (Schoeller diagram), the characteristics and the water quality determination diagram were determined. In the four-year average fuzzy inference model, eight water quality parameters were classified into three groups, in the first group the parameters of Na + , Ca +2 , and Mg +2 , in the second group the parameters of HCO3-, SO4-2, and Cl-were placed in the third group of two TH and TDS parameters. After determining each group with two input parameters, each input parameter was considered including three membership functions, so that the rules considered for it were estimated as nine (3x3). The results based on the deterministic method showed that all the studied samples were in the good to acceptable group. But Mamdani's fuzzy findings showed that two samples with a confidence level of 50% were in the acceptable category and other samples with a confidence level of 83-87% were placed in the desirable category for drinking.

Application of Machine Learning Approaches in Particle Tracking Model to Estimate Sediment Transport in Natural Streams

Preprint

Full-text available

Jun 2023

Several empirical equations and machine learning approaches have been developed to predict dispersion coefficients in open channels; however, the ability of some learning-based models to predict these coefficients has not yet been evaluated, and the direct application of machine learning-based dispersion coefficients to Lagrangian sediment transport models has not been studied. In this research, data from previous studies is used to evaluate the ability of ensemble machine learning models, i.e., random forest regression (RFR) and gradient boosting regression (GBR), to predict longitudinal and transverse dispersion in natural streams. The optimal principal parameters of ensemble models were adjusted using the grid-search cross-validation technique, and the machine learning-based dispersion models were integrated with a Lagrangian particle tracking model to simulate suspended sediment concentration in natural streams. The resulting suspended sediment concentration distribution was compared with the field data. The results showed that GBR model, with a coefficient of determination (R 2) of 0.95, performed better than the RFR model, with R 2 =0.9, in predicting the longitudinal dispersion coefficients in a natural stream in both training and testing stages. However, the RFR model with R 2 = 0.94 performed better than the GBR (R 2 = 0.91) in predicting the transverse dispersion in testing stage. Both models underestimated the dispersion coefficients in the training and testing stages. Comparison between the PTM with ensemble dispersion coefficients and empirical-based dispersion relationships revealed the better performance of the GBR model compared to the other two methods.

Gene expression models

Chapter

Jan 2023

| General flowchart of GA-SVM algorithm.

Context in source publication

Similar publications

Citations