Fig 7 - uploaded by Md Galal Uddin
Content may be subject to copyright.
Results of confusion matrices obtained from the tested four prediction models for the multi-class classification of water quality in Cork Harbour.

Results of confusion matrices obtained from the tested four prediction models for the multi-class classification of water quality in Cork Harbour.

Source publication
Article
Full-text available
Existing water quality index (WQI) models assess water quality using a range of classification schemes. Consequently, different methods provide a number of interpretations for the same water properties that contribute to a considerable amount of uncertainty in the correct classification of water quality. The aims of this study were to evaluate the...

Contexts in source publication

Context 1
... A common comparison matrix was computed in order to assess the overall discrimination capacity of multiple random variables for classification into four groups. Fig. 7 provides the confusion matrix for four predictive classifiers. (ii) Once the comparison matrix was obtained, it was used for the direct comparison of the abilities of different variables for the determination of how many classes were classified correctly (e.g., water quality as "good," "fair," "marginal" or "poor"). (iii) Finally, the ...
Context 2
... compare the performance of the four machine learning classifiers in order to identify the best algorithms in terms of correct classification. The results of the classifiers were evaluated using five validation metrics (accuracy, precision, sensitivity, specificity, and F1 score) for the imbalanced dataset, the confusion matrices is one of them. Fig. 7 shows the confusion matrix for the four predictive classifiers models. In this analysis, 10,000 observations belonging to four classifications, including "good", "fair", "marginal," and "poor", were used to predict the classification. ...
Context 3
... shown in Fig. 7(a), good water quality is classified 99.2% correctly, whereas 0.8% is classified incorrectly. In contrast, the fair water quality class is correctly classified at 92.4% and wrongly classified at 7.6%, respectively. Whereas the marginal water quality is correctly classified at 89.6% and wrongly classified at 10.4%. Similarly, for poor ...
Context 4
... In contrast, the fair water quality class is correctly classified at 92.4% and wrongly classified at 7.6%, respectively. Whereas the marginal water quality is correctly classified at 89.6% and wrongly classified at 10.4%. Similarly, for poor water quality, 52.0% of observations are correctly classified, whereas 48.0% are incorrectly classified (Fig. ...
Context 5
... results of KNN show that four water quality classes are 100% correctly classified. There was no prediction error in the classification (Fig. 7b). That means the KNN model had an overfitting problem, which may be due to the imbalanced dataset (Japkowicz, ...
Context 6
... the SVM classifier, an average of 95% of the observations are classified correctly for all water quality classes except for poor water quality (Fig. 7c). Only 78.5% of the observations were correctly classified into the poor class, whereas the remaining observations were classified ...
Context 7
... XGBoost is classified water quality 99.5% correctly for all water quality classes in Cork Harbour (Fig. ...
Context 8
... A common comparison matrix was computed in order to assess the overall discrimination capacity of multiple random variables for classification into four groups. Fig. 7 provides the confusion matrix for four predictive classifiers. (ii) Once the comparison matrix was obtained, it was used for the direct comparison of the abilities of different variables for the determination of how many classes were classified correctly (e.g., water quality as "good," "fair," "marginal" or "poor"). (iii) Finally, the ...
Context 9
... compare the performance of the four machine learning classifiers in order to identify the best algorithms in terms of correct classification. The results of the classifiers were evaluated using five validation metrics (accuracy, precision, sensitivity, specificity, and F1 score) for the imbalanced dataset, the confusion matrices is one of them. Fig. 7 shows the confusion matrix for the four predictive classifiers models. In this analysis, 10,000 observations belonging to four classifications, including "good", "fair", "marginal," and "poor", were used to predict the classification. ...
Context 10
... shown in Fig. 7(a), good water quality is classified 99.2% correctly, whereas 0.8% is classified incorrectly. In contrast, the fair water quality class is correctly classified at 92.4% and wrongly classified at 7.6%, respectively. Whereas the marginal water quality is correctly classified at 89.6% and wrongly classified at 10.4%. Similarly, for poor ...
Context 11
... In contrast, the fair water quality class is correctly classified at 92.4% and wrongly classified at 7.6%, respectively. Whereas the marginal water quality is correctly classified at 89.6% and wrongly classified at 10.4%. Similarly, for poor water quality, 52.0% of observations are correctly classified, whereas 48.0% are incorrectly classified (Fig. ...
Context 12
... results of KNN show that four water quality classes are 100% correctly classified. There was no prediction error in the classification (Fig. 7b). That means the KNN model had an overfitting problem, which may be due to the imbalanced dataset (Japkowicz, ...
Context 13
... the SVM classifier, an average of 95% of the observations are classified correctly for all water quality classes except for poor water quality (Fig. 7c). Only 78.5% of the observations were correctly classified into the poor class, whereas the remaining observations were classified ...
Context 14
... XGBoost is classified water quality 99.5% correctly for all water quality classes in Cork Harbour (Fig. ...

Similar publications

Preprint
Full-text available
Flow cytometry is a commonly used diagnostic technique for haematological malignancies. The gold standard method for analysis of flow cytometry data is manual gating, which is time consuming and requires a highly skilled operator, generating a bottleneck in the workflow and potentially increasing time to diagnose malignancy. For nearly 20 years att...

Citations

... Another study recommended using the deep Bidirectional Stacked Simple Recurrent Unit [14] to create a precise WQ prediction strategy in intelligent mariculture. In the current study by Uddin et al. [15], four machine-learning classifier algorithms -support vector machines (SVM), Naive Bayes (NB), random forest (RF), k-nearest neighbor (KNN), and gradient boosting (XGBoost) were used to determine the best classifier for predicting water quality classes using the seven WQI models that are frequently used. Solanki et al. [16] have used a deep learning network model to assess and forecast the chemical eigenvalues of water, particularly oxygen levels, and pH. ...
Preprint
Full-text available
    Availability of water is one of the most important aspects of Earth’s status as the only planet capable of supporting life. Although water makes up 70% of the earth’s surface, the availability of drinkable water is extremely limited. Water makes up about 70% of the human body and aids in the healthy functioning of the human body. Contaminated water can have a pernicious effect on the human body, thus it’s important to find a safe drinking water source. Five machine learning algorithms were explored to estimate the potability of water in this study. Three regression algorithms are applied to estimate the missing values in this study. Among the implemented, a Deep Neural Network (DNN) model achieves a better accuracy of 66.1%, with precision, recall, and AUC scores of 61.2%, 35.8%, and 67%, respectively which is comparable with the present state-of-the-art. The Support Vector Machine (SVM) applied has achieved the highest precision and the lowest recall, despite having the second-highest accuracy of 65.1% in this study. AdaBoost (ADB) achieves the highest recall of 44.1%, as well as the highest AUC score of 74.5%. In addition, a local explanation artificial algorithm called LIME is applied to explain why a certain sample of water is potable.
    ... In summary, the discussion effectively contextualizes the research within the broader water resource management landscape, and highlights the importance of AI and ML models in addressing water-related challenges. The validity and reliability of the applied methodologies are enhanced by aligning the study's findings with existing research (Kushwaha et al. 2023;Mohammed et al. 2023;Sharma and Machiwal 2021;Uddin et al. 2023). ...
    Article
    Full-text available
    Today, humanity has managed to overcome many challenges related to water. One of the most significant challenges concerning surface water resources is their preservation and maintenance. The investigation of the variations of Miqan Lake, which is located in the Markazi province of Iran, is the focus of this study. Given its vicinity to the city of Arak and considering the consecutive years of droughts in Iran, this study examines the fluctuations in Miqan Lake water level. We utilized four artificial intelligence models to study the trend of lake changes. These models include an essential machine learning model, namely a single-layer feed-forward neural network, which is combined with three evolutionary algorithms: particle swarm optimization, genetic algorithm, and imperialist competitive algorithm. Subsequently, in addition to the SLFFN model, three new hybrid evolutionary models were developed to address the modeling of changes and fluctuations related to water in the Miqan lake. This research's initial data corresponds to monthly samples collected from Miqan Lake for over 15 years. To examine the mutual effects of modeling and potential errors, besides regression receiver operation characteristic (RROC) analysis, a probability density function (PDF) analysis was also conducted. The results showed that PSELM had the lowest error in estimating lake water level fluctuations, obtaining the best performance in the evaluation indicators: RMSE, MAPE, and SI of 0.59, 0.028, and 0.018, respectively. The best match in actual and estimated values also belonged to ICELM. In the analysis of RROC, ICELM was superior, and the results of PDF analysis confirmed the ICELM superiority. Ultimately, these results indicate a noticeable reduction in the water level of Miqan Lake, leading to various risks such as pollution of freshwater resources and the onslaught of dust storms in the city of Arak.
    ... It directly affects people's health and primary living conditions, as well as the maintenance of ecosystems and food production in farming. Providing that sanitation and drinking water are accessible and sustainably managed for everyone is one of the UN's Sustainable Development policy objectives [1,2]. Rivers County, lakes, a transitional waterway, and waters off the coast up to one nautical kilometre from shore are examples of surface and groundwater bodies that have been committed to achieving good qualitative and quantitative status under the EU's Water Framework Directive [3], which was established to establish coordinated efforts in water policy. ...
    ... The current research utilised information from the 2019 [1] water quality monitoring program for its objectives. The Irish Environmental Protection Agency (EPA) usually keeps an eye on 32 observation points to check the water condition of the harbour. ...
    Article
    Full-text available
    With the introduction of 6G connections and developments in the field of drones, there is an increasing chance to change monitoring of the surroundings, especially in the position of water quality evaluation. Previous water quality index algorithms evaluate water quality utilising a variety of methods of categorisation. Various methodologies offer different views of water attributes, leading to difficulty in determining an accurate evaluation of water quality. This examination specifies a novel technique for dynamic underwater identification based on the capability of drones working in a 6G network. The suggested method applies the you only look once (YOLO) object detection algorithm to high-resolution drone photos, allowing for real-time recognition and evaluation of water quality parameters. Initially, the drone with high-resolution cameras captured high-quality images of the water surrounding. The drone transmits the data with the help of 6G networks and stores it in cloud environments. Next, the YOLO with CNN method is used to recognise and monitor the different water qualities, such as pollutants, algae blooms, and debris, dynamically. The deployment of drones with YOLO-CNN efficiently monitors the water environment. The results of the study demonstrated that, when it comes to accurate categorisation, the YOLO method may be a valuable and trustworthy tool for evaluating the quality of coastal waters. As a result, YOLO with CNN model achieves 91% of accuracy is prediction of water quality.
    ... The study [15] employed four machine learning classifier algorithms (support vector machines, Naï ve Bayes, random forest, k-nearest neighbor, and gradient boosting) to identify the best classifier for predicting water quality classes using seven widely used WQI models and three new models developed by the authors. The results revealed that the k-nearest neighbor (KNN) and XGBoost algorithms excelled, achieving 100% and 99.9% correct classifications, respectively. ...
    Article
    Full-text available
    This study aims to forecast water quality in the Tumkur district, Karnataka state, India, to increase pollution levels. Various machine learning techniques, including support vector machines, regression trees, linear regression, and neural networks, are employed. The Water Quality Index (WQI) is determined using parameters such as total hardness, pH, alkalinity, turbidity, chloride, dissolved solids, and conductivity. The dataset is split into training and testing sets (80:20) to assess model performance. Support Vector Machines and Linear Regression outperform other models, achieving R2 values of 0.96 and 0.99 for training and testing, respectively. This research underscores the importance of advanced machine learning techniques for accurate water quality prediction, crucial for effective pollution reduction strategies in the region.
    ... Researchers are continuously investigating novel algorithms and techniques to improve the accuracy and resilience of machine learning models (Ahmed et al., 2024;Khoi et al., 2022). Furthermore, they are actively developing approaches to transfer this new technique across various regions and water bodies, enabling the utilization of these advanced techniques for better outcomes (Uddin et al., 2023). ...
    Article
    Full-text available
    The Water Quality Index (WQI) is a primary metric used to evaluate and categorize surface water quality which plays a crucial role in the management of fresh water resources. Machine Learning (ML) modeling offers potential insights into water quality index prediction. This study employed advanced ML models to get potential insights into the prediction of water quality index for the Aik-Stream, an industrially polluted natural water resource in Pakistan with 19 input water quality variables aligning them with surrounding land use and anthropogenic activities. Six machine learning algorithms, i.e. Adaptive Boosting (AdaBoost), K-Nearest Neighbors (K-NN), Gradient Boosting (GB), Random Forests (RF), Support Vector Regression (SVR), and Bayesian Regression (BR) were employed as benchmark models to predict the Water Quality Index (WQI) values of the polluted stream to achieve our objectives. For model calibration, 80% of the dataset was reserved for training, while 20% was set aside for testing. In our comparative analyses of predictive models for water quality index, the Gradient Boost (GB) model stood out the fittest for its precision, utilizing a combination of just seven parameters (chemical oxygen demand, total organic carbon, oil & grease, Ammonia- nitrogen, arsenic, nickel and zinc), surpassing other models by achieving better results in both training (R2 = 0.88, RMSE = 7.24) and testing (R2 = 0.85, RMSE = 8.67). Analyzing feature importance showed that all the selected variables, except for NO3 N, TDS and temperature had an impact on the accuracy of the models predictions. It is concluded that the application of machine learning to assess water quality in polluted environments enhances accuracy and facilitates real-time tracking, enabling proactive risk mitigations.
    ... Other than NSF-WQI and CCME-WQI, there exist several WQI models that are briefly discussed in the supplementary file (S1).The process of evaluating WQI sometimes becomes cumbersome, as it requires the determination of several WQPs by laboratory testing and analysis of bulky dataset [25]. Therefore the current advancement in WQI modeling includes machine learning to improve and optimize WQI models in terms of parameter selection, weight evaluation and selecting better aggregation function Uddin et al. [26], Uddin et al. [27,28]. Several researchers focused on assessing the water quality with statistical analysis, especially Principal Component Analysis (PCA), to reduce the dataset and to find critical WQPs. ...
    ... The improved WQI model, applied to Cork Harbour, Ireland, selects indicators based on importance to water quality, employs objective ranking and weighting methods, and suggests optimal aggregation functions. In Uddin et al.'s (2023b) study, the aims were to evaluate WQI model performance, correct classification using machine learning, and introduce a new coastal water quality assessment scheme. The study's goals were achieved by archiving WQI scores for coastal water quality, employing four predictive classifier models, identifying the best model, and evaluating performance using machine learning metrics. ...
    Article
    Full-text available
    A common technique for assessing the overall water quality state of surface water and groundwater systems globally is the water quality index (WQI) method. The aim of the research is to use four machine learning classifier algorithms: Gradient boosting, Naive Bayes, Random Forest, and K-Nearest Neighbour to determine which model was most effective at forecasting the various water quality index and classes of the Albanian Shkumbini River. The analysis was performed on the data collected during a 4-year period, in six monitoring points, for nine parameters. The predictive accuracy of the models, XGBoost, Random Forest, K-Nearest Neighbour, and Naive Bayes, was determined to be 98.61%, 94.44%, 91.22%, and 94.45%, respectively. Notably, the XGBoost algorithm demonstrated superior performance in terms of F1 score, sensitivity, and prediction accuracy, the lowest errors during both learning (RMSE = 2.1, MSE = 9.8, MAE = 1.13) and evaluating (RMSE = 0.0, MSE = 0.01, MAE = 0.01) stages. The findings highlighted that Biochemical oxygen demand (BOD), Bicarbonate (HCO3), and Total Phosphor had the most positive impact on the Shkumbini River’s water quality. Additionally, a statistically significant, strong positive correlation (r = 0.85) was identified between BOD and WQI, emphasizing its crucial role in influencing water quality in the Shkumbini River.
    ... Quality of beaches is mainly assessed by indices that describe and evaluate the beach. Indices developed to assess beach quality often emphasize water quality (El-Sorogy et al. 2023;Uddin et al. 2023) or address specific management issues, such beach litter (Schattschneider et al. 2020;Scarrica et al. 2022), sea level rise (Revell et al. 2021) or beach erosion (Andreadis et al. 2021). Nevertheless, some indices propose an integrative approach to assess beach quality holistically, considering the physical characteristics, the environmental quality and human aspects, such as socioeconomic and recreation (Ariza et al. 2010;Gine et al. 2018;Bombana and Ariza 2019;Diniz et al. 2022;Li et al. 2023). ...
    Article
    Full-text available
    Beach rankings are very frequent on the internet; however, the information provided on how these rankings are made is often unclear and their content is mostly subjective. In addition, the vast majority of these rankings do not take into account the fact that beaches are coastal eco-systems. The aim of the research was to develop an objective framework to rank the quality of beaches worldwide. The framework integrates indicators to assess the socioecological system quality and can be used as a basis for effective beach management. The methodology involved the collection, evaluation and grouping of indicators into domains and categories. Moreover, a measurement technique and a 5-point rating score for each indicator was used. Weights were calculated for different beach types using an analytical hierarchical process and the methodology was validated by a focus group of beach management experts. The quality value of each beach was calculated through equations and the results were presented in graphs inspired by the Circles of Sustainability and the Ocean Health Index. The theoretical application was tested on Portuguese beaches. The framework presents a holistic assessment of four domains: Recreation, Protection, Conservation and Sanitary. The resulting Beach Ranking Framework (BRF) is an objective, holistic framework designed to communicate with society, unlike the existing beach quality assessments.
    ... However, the existed WQI model produces considerable uncertainty when converting a large number of water quality index data into numerical form (Gupta and Gupta 2021;Uddin et al. 2023c). To address model uncertainties, several recent studies have employed cuttingedge machine learning (ML) and artificial intelligence (AI) techniques (Ding et al. 2023;Uddin et al. 2023d), but the approach is suitable for assessing coastal and transitional water quality (Uddin et al. 2022a. And WQI models are often developed according to site-specific guidelines for specific regions and are not generic (Uddin et al. 2021). ...
    Article
    Full-text available
    The more water quality evaluation indicators, the greater the amount of water quality evaluation calculation. Under the requirements of evaluation accuracy, the index screening method is usually used to optimize water quality evaluation index to reduce the calculation amount of water quality evaluation. Taking Mengcheng Gate of Guo River as an example, the information sensitivity index screening method was used to simplify the water quality evaluation index system from 17 to 12 indicators. The variable fuzzy comprehensive evaluation method was used to compare and evaluate the original index system and the optimal index system. The results showed that the water quality results of the optimal index system are consistent with the original index evaluation results. And the water quality of Mengcheng Gate is basically stable at class I level. The information sensitivity method reduced the number of indicators by 29.41%. The error between the water quality evaluation results based on the optimal indicators and the original indicators is less than 10%. The maximum error, minimum error, and average error are 8.92%, 0.08%, and 2.46%, respectively. It revealed that the information sensitivity method can eliminate the information redundancy while retaining the information of the original index system. It can also reduce the calculation amount of water quality evaluation in the future. The variable fuzzy comprehensive evaluation method can reasonably reflect the complex nonlinear relationship between water quality index and water quality category. This accuracy has practical significance for guiding the optimization of water quality evaluation index system, improving the efficiency of water quality evaluation. This model provides a scientific basis for indicator selection methods in similar river water quality evaluations.
    ... Through the systematic selection of hyperparameter configurations using an acquisition function, Bayesian Optimization refines model hyperparameters in a principled manner, resulting in improved predictive capabilities and broader generalization. Recently several water research studies used this technique for enhancingthe predictive capabilities of model(Candelieri et al., 2018;Moeini et al., 2023;Uddin et al., 2023c;Yan et al., 2023). The research utilized this technique for the optimization of the best-set of hyperparameters in approaching the methodology ofUddin et al. (2023c).Machine learning algorithmsFor the development of the ML model(s), the study encompasses various ML algorithms including Ensembles of trees (EN), Gaussian process regression (GPR), Kernel approximation linear regression (KAR), Linear regression (LR), Neural Networks (NN), Regression trees (RT), ...
    ... Recently several water research studies used this technique for enhancingthe predictive capabilities of model(Candelieri et al., 2018;Moeini et al., 2023;Uddin et al., 2023c;Yan et al., 2023). The research utilized this technique for the optimization of the best-set of hyperparameters in approaching the methodology ofUddin et al. (2023c).Machine learning algorithmsFor the development of the ML model(s), the study encompasses various ML algorithms including Ensembles of trees (EN), Gaussian process regression (GPR), Kernel approximation linear regression (KAR), Linear regression (LR), Neural Networks (NN), Regression trees (RT), ...
    Article
    Full-text available
    The Rooppur Nuclear Power Plant (RNPP) at Ishwardi, Bangladesh is planning to go into operation within 2024 and therefore, adjacent areas of RNPP is gaining adequate attention from the scientific community for environmental monitoring purposes especially for water resources management. However, there is a substantial lack of literature as well as environmental datasets for earlier years since very little was done at the beginning of the RNPP's construction phase. Therefore, this study was conducted to assess the potential toxic elements (PTEs) contamination in the groundwater and its associated health risk for residents at the adjacent part of the RNPP during the year of 2014–2015. For the purposes of achieving the aim of the study, groundwater samples were collected seasonally (dry and wet season) from nine sampling sites and afterwards analyzed for water quality indicators such as temperature (Temp.), pH, electrical conductivity (EC), total dissolved solid (TDS), total hardness (TH) and for PTEs including Iron (Fe), Manganese (Mn), Copper (Cu), Lead (Pb), Chromium (Cr), Cadmium (Cd) and Arsenic (As). This study adopted the newly developed Root Mean Square water quality index (RMS-WQI) model to assess the scenario of contamination from PTEs in groundwater whereas the human health risk assessment model was utilized to quantify the risk of toxicity from PTEs. In most of the sampling sites, PTEs concentration was found higher during the wet season than the dry season and Fe, Mn, Cd and As exceeded the guideline limit for drinking water. The RMS score mostly classified the groundwater in terms of PTEs contamination into “Fair” condition. The non-carcinogenic risks (expressed as Hazard Index-HI) revealed that around 44% and 89% of samples for adults and 67% and 100% of samples for children exceeded the threshold limit set by USEPA (HI > 1) and possessed risks through the oral pathway during dry and wet season, respectively. Furthermore, the calculated cumulative HI score was found higher for children than the adults throughout the study period. In terms of carcinogenic risk (CR) from PTEs, the magnitude of risk decreased following the pattern of Cr > As > Cd. Although the current study is based on old dataset, the findings might serve as a baseline for monitoring purposes to reduce future hazardous impact from the power plant.