ArticlePDF AvailableLiterature Review

Integrating artificial intelligence, machine learning, and deep learning approaches into remediation of contaminated sites: A review

Authors:

Abstract

The growing number of contaminated sites across the world pose a considerable threat to the environment and human health. Remediating such sites is a cumbersome process with the complexity originating from the need for extensive sampling and testing during site characterization. Selection and design of remediation technology is further complicated by the uncertainties surrounding contaminant attributes, concentration, as well as soil and groundwater properties, which influence the remediation efficiency. Additionally, challenges emerge in identifying contamination sources and monitoring the affected area. Often, these problems are overly simplified, and the data gathered is underutilized rendering the remediation process inefficient. The potential of artificial intelligence (AI), machine-learning (ML), and deep-learning (DL) to address these issues is noteworthy, as their emergence revolutionized the process of data management/analysis. Researchers across the world are increasingly leveraging AI/ML/DL to address remediation challenges. Current study aims to perform a comprehensive literature review on the integration of AI/ML/DL tools into contaminated site remediation. A brief introduction to various emerging and existing AI/ML/DL technologies is presented, followed by a comprehensive literature review. In essence, ML/DL based predictive models can facilitate a thorough understanding of contamination patterns, reducing the need for extensive soil and groundwater sampling. Additionally, AI/ML/DL algorithms can play a pivotal role in identifying optimal remediation strategies by analyzing historical data, simulating scenarios through surrogate models, parameter-optimization using nature inspired algorithms, and enhancing decision-making with AI-based tools. Overall, with supportive measures like open-data policies and data integration, AI/ML/DL possess the potential to revolutionize the practice of contaminated site remediation.
Chemosphere 345 (2023) 140476
Available online 20 October 2023
0045-6535/© 2023 Elsevier Ltd. All rights reserved.
Integrating articial intelligence, machine learning, and deep learning
approaches into remediation of contaminated sites: A review
Jagadeesh Kumar Janga
a
, Krishna R. Reddy
a
,
*
, K.V.N.S. Raviteja
b
a
University of Illinois Chicago, Department of Civil, Materials, and Environmental Engineering, 842 West Taylor Street, Chicago, IL 60607, USA
b
SRM University AP, Department of Civil Engineering, Guntur, Andhra Pradesh 522503, India
HIGHLIGHTS GRAPHICAL ABSTRACT
Comprehensive review of AI/ML/DL
techniques in site remediation is
performed.
Bibliometric analysis showed an
increasing interest across the world on
this topic.
ML based predictive models can be used
for spatial contamination prediction.
Predictive data-driven models can sur-
rogate complicated physical models.
AI enables effective parameter optimi-
zation for efcient remediation design.
ARTICLE INFO
Handling Editor: Dr Y Yeomin Yoon
Keywords:
Environmental remediation
Big data
Surrogate models
Data-driven approach
Optimization
Decision-making
ABSTRACT
The growing number of contaminated sites across the world pose a considerable threat to the environment and
human health. Remediating such sites is a cumbersome process with the complexity originating from the need for
extensive sampling and testing during site characterization. Selection and design of remediation technology is
further complicated by the uncertainties surrounding contaminant attributes, concentration, as well as soil and
groundwater properties, which inuence the remediation efciency. Additionally, challenges emerge in identi-
fying contamination sources and monitoring the affected area. Often, these problems are overly simplied, and
the data gathered is underutilized rendering the remediation process inefcient. The potential of articial in-
telligence (AI), machine-learning (ML), and deep-learning (DL) to address these issues is noteworthy, as their
emergence revolutionized the process of data management/analysis. Researchers across the world are increas-
ingly leveraging AI/ML/DL to address remediation challenges. Current study aims to perform a comprehensive
literature review on the integration of AI/ML/DL tools into contaminated site remediation. A brief introduction
to various emerging and existing AI/ML/DL technologies is presented, followed by a comprehensive literature
review. In essence, ML/DL based predictive models can facilitate a thorough understanding of contamination
patterns, reducing the need for extensive soil and groundwater sampling. Additionally, AI/ML/DL algorithms can
play a pivotal role in identifying optimal remediation strategies by analyzing historical data, simulating scenarios
through surrogate models, parameter-optimization using nature inspired algorithms, and enhancing decision-
making with AI-based tools. Overall, with supportive measures like open-data policies and data integration,
AI/ML/DL possess the potential to revolutionize the practice of contaminated site remediation.
* Corresponding author.
E-mail addresses: jreddy3@uic.edu (J.K. Janga), kreddy@uic.edu (K.R. Reddy), raviteja.k@srmap.edu.in (K.V.N.S. Raviteja).
Contents lists available at ScienceDirect
Chemosphere
journal homepage: www.elsevier.com/locate/chemosphere
https://doi.org/10.1016/j.chemosphere.2023.140476
Received 21 August 2023; Received in revised form 15 October 2023; Accepted 16 October 2023
Chemosphere 345 (2023) 140476
2
1. Introduction
Articial Intelligence/Machine Learning/Deep Learning (AI/ML/
DL) technologies have emerged as powerful tools with the potential to
revolutionize various elds, including environmental sciences and en-
gineering (Zhong et al., 2021). This study focuses on their application in
the specialized domain of contaminated site remediation. Contaminated
sites are a growing concern in developing countries, while developed
nations have been grappling with this issue for years. Sources such as
waste dumps, chemical spills, and agricultural practices contribute to
soil and groundwater contamination, leading to ecological and health
risks (Sharma and Reddy, 2004). Over the past few decades, many
contaminated sites have been identied in numerous locations across
the world that are posing a problem to the earth and the environment
ever since (Singh and Naidu, 2012). The presence of these contaminated
sites gives rise to several issues, including the contamination of drinking
water sources, the pollution of groundwater that can further contami-
nate surface waters, and the overall exposure of humans and ecosystems
to associated risks (Khan et al., 2004). Soil and groundwater pollution
poses a signicant challenge, that is comparable to air and surface water
pollution (Sharma and Reddy, 2004). To tackle this problem, various
public bodies such as the United States Environmental Protection
Agency (USEPA), Central Pollution Control Board of India (CPCB),
among many others are collaborating with researchers to develop
effective solutions to remediate such sites.
The process of cleaning up contaminated sites involves several key
steps: site characterization, risk assessment, devising remedial goals and
comparing alternatives, remediation design, and monitoring (Sharma
and Reddy, 2004): Initially, site characterization is conducted to un-
derstand the nature and extent of contamination, followed by a thor-
ough risk assessment to evaluate the potential harm to human health
and the environment. Subsequently, if the contaminated sites pose more
than acceptable risk to human health or the surrounding environment,
various remediation techniques need to be identied and employed to
address the contamination effectively.
Contaminated site remediation projects pose several challenges.
Firstly, these projects are inherently costly and complicated due to the
variability of sites and contaminant types (Lehr et al., 2002). This
inherent nature of contaminated site remediation generates a signicant
amount of data throughout the process. One of the most challenging
tasks is site characterization, which involves collecting extensive data
regarding site geology, soil stratigraphy, hydraulic properties, ground-
water levels, contaminant types, concentrations, and dynamic biogeo-
chemical parameters (Laha et al., 2000; Tao et al., 2022). Groundwater
ow and contaminant transport modeling further contributes to the
generation of large datasets, which can be time-consuming to
comprehend.
Contaminated site remediation requires performing numerous lab-
oratory and eld tests to assess the behavior of materials and also to
understand the subsurface conditions. These tests are typically
descriptive, time consuming, expensive and require a lot of human
effort. Further, it is required to study a large number of variables like
contamination chemistry, fate and transport, geology and hydrogeology.
It is evident that these properties are associated with wide range of
variability (Baecher, 2023) that demands numerous tests in large areas,
and in addition requires repetitive testing to ensure accuracy. Another
challenge lies in the risk assessment phase, where multiple receptors,
exposure pathways, and risks associated with various contaminants
must be considered. The selection of the most suitable technology and
the optimization of system variables are additional hurdles. Once the
remediation is implemented, continuous monitoring through soil and
groundwater sampling generates vast amounts of data. Unfortunately,
proper utilization of this data is often overlooked, and problems are
oversimplied with numerous assumptions. Moreover, processing and
analyzing these large datasets require substantial human resources,
resulting in time-consuming and expensive processes. As a result,
neglecting to consider all necessary factors can lead to ineffective
remediation efforts.
AI/ML/DL technologies can be harnessed to address the above-listed
crucial challenges. These advanced technologies can assist in different
aspects of the remediation process. For instance, they can be used to
analyze and interpret large amounts of data collected during site char-
acterization, enabling a more comprehensive understanding of the
contamination patterns and sources (Yaseen, 2021; Zhang et al., 2023).
These technologies can also aid in reducing the requirement of soil and
groundwater sampling and subsequent laboratory characterization
(Hanoon et al., 2021). Additionally, AI/ML/DL algorithms can support
the risk assessment process by integrating multiple data sources and
predicting the potential impacts of contaminants on human health and
ecosystems (Li et al., 2022b). Furthermore, these advanced technologies
can assist in identifying the most suitable and efcient remediation
techniques by analyzing historical data, conducting simulations through
surrogate models, and optimizing decision-making process (Li et al.,
2022a). By analyzing historical data, conducting simulations, and
optimizing the decision-making process, these technologies can
contribute to the selection and implementation of appropriate remedi-
ation strategies. This can lead to more effective cleanup efforts, mini-
mizing the risks associated with exposure to contaminants and
improving the overall efciency of remediation projects. These tech-
nologies can effectively analyze and interpret the extensive data
collected during site characterization and modeling. Proper utilization
of these technologies enables optimization of remediation process by
allowing more accurate site characterization, comprehensive risk
assessment, identication of best remediation strategies, and the opti-
mization of remediation process. Ultimately, AI/ML/DL technologies
can play an increasingly important role in addressing the challenges
posed by contaminant site remediation, leading to more effective
cleanup efforts and improved environmental outcomes. A signicant
amount of research was carried out in the integration of these technol-
ogies into various phases of contaminated site remediation. Zhang et al.
(2023) have performed an extensive literature review on the utilization
of ML-based models for spatial prediction of contamination patterns.
Yaseen (2021) has provided a brief account on the literature involving
utilization of ML models in simulating adsorption of heavy metals in soil
and water bodies. In their review of advances in literature concerning
control and abatement of soil heavy metal pollution, Gautam et al.
(2023) have emphasized the benets of using AI in this process. In
addition, researchers have also extensively studied the employment of
AI in groundwater quality and ow modeling (Asher et al., 2015; Han-
oon et al., 2021). However, to the authors knowledge no review has
been performed to account and summarize the advancements of inte-
grating AI into contaminated site remediation process as a whole.
Hence, current study aims to perform a comprehensive review of the
literature available on utilizing AI/ML/DL based technologies during
various stages of contaminated site remediation. First, various types of
emerging and existing technologies based on AI/ML/DL are briey
explained, followed by a comprehensive review of various published
studies to apply these techniques in site remediation, and made rec-
ommendations to direct future research and enhance the remediation
efforts.
2. Articial intelligence
Emerging technologies like AI, ML, and DL have the potential to
make site remediation less expensive by reducing the human effort and
the need for rigorous sampling and monitoring, while also increasing the
efciency of the remediation (Raviteja and Reddy, 2023). These tech-
nologies can be applied in engineering optimization, which otherwise
requires large number of eld and laboratory tests, numerical and
physical modeling, and the analysis of corresponding data to determine
optimized parameters. Fig. 1 shows the evolution of the overall domain
of AI, where ML is a subset of AI, and DL is the subset of ML. AI has
J.K. Janga et al.
Chemosphere 345 (2023) 140476
3
emerged as an academic discipline in the early 1950s focusing mostly on
rule-based systems based on knowledge representation systems for
decision-making (Lu, 2019). ML is later developed in the late 1980s
specically on certain optimizations where the algorithms can learn
from the data to improve the prediction accuracy and decision-making
capacity. DL, which consists of multiple hidden neural layers, is
further developed as a newer subset to ML in the early 2010s to enhance
the ability of neural networks in understanding and processing complex
data sets (Lu, 2019).
AI imitates or surpasses human intelligence through specialized
hardware and software, enabling the development and utilization of
computational systems capable of discovery, inference, and prediction.
These systems nd applications in various domains, including computer
vision, natural language processing, and data science. AI extends beyond
mimicking human intellect. As shown in Fig. 2, AI consists of a wide
variety of systems including but not limited to different ML algorithms,
various types of nature inspired optimization algorithms (NIO),
knowledge-based systems, symbolic AI, computer vision, motion cap-
ture, and natural language processing. Although there are numerous
other techniques apart from NIO and ML algorithms, from the
perspective of contaminated site remediation, most of the research
focused on the following aspects: optimization of remediation design
using different optimization algorithms, use of rule-based systems for
decision-making based on historical data, and predictive modeling of
contaminated subsurface conditions using different types of ML and DL
algorithms. Hence, these components of AI are further elaborated and
briey described in the subsequent sections.
2.1. Nature inspired optimization algorithms
Optimization of the design parameters is a key practice in any en-
gineering practice. Similarly, in contaminated site remediation, the goal
of designing a remedial strategy is to optimize the design parameters in a
way that can reduce the environmental and economic costs while also
providing maximum remediation efciency. As most site remediation
problems consists of non-linear and heterogeneous datasets, it might
often occur that conventional simplex or gradient-based optimization
algorithms might not be adequate to solve such complex optimization
problems. Across the years, several optimization algorithms inspired by
nature as shown in Fig. 2, drawing on the principles of physical and
biological patterns observed in various natural systems, have been
created to tackle complex real-life optimization challenges. (Yang,
2014).
One such algorithm is the genetic algorithm (GA) that serves as an
optimization technique rooted in natural selection, mirroring the bio-
logical evolution process as dened by Darwins evolution theory. It has
the ability to effectively solve both constrained and unconstrained
optimization challenges (Gen and Cheng, 1999). This algorithm itera-
tively renes a population of individual solutions. In each cycle, certain
individuals from the present population are chosen as parents, guiding
the generation of offspring for the next iteration. Across subsequent
generations, the population gradually advances towards an optimal
solution.
Inspired by natural selection and genetic inheritance in living or-
ganisms, genetic algorithms are employed to nd optimal solutions to
complex problems by mimicking the process of evolution through ge-
netic operators such as selection, crossover, and mutation (Gen and
Cheng, 1999). A population of possible solutions is randomly generated
and evaluated for their tness based on a predened objective function.
The ttest individuals are then chosen to reproduce and generate a new
generation of possible solutions, which undergo crossover and mutation
to introduce genetic diversity. This process continues until a satisfactory
solution is attained, or a predened stopping criterion is met. Similarly,
other evolutionary algorithms such as differential evolution are also
based on Darwins evolution theory offering further exibility in oper-
ator selection and strategy design (Yang, 2014).
Other nature-inspired metaheuristics, as shown in Fig. 2, include
simulated annealing based on the metal annealing process (Kirkpatrick
et al., 1983), articial immune system (AIS) based on vertebrate im-
mune system (Farmer et al., 1986), wind driven optimization (WDO)
based on the movement of air parcels in Earths atmosphere (Bayraktar
et al., 2010), and harmony search that is a music-inspired algorithm
(Geem et al., 2001). Some commonly used swarm intelligence based
optimization algorithms, which are inspired by the social interactions of
natural swarms of species include: particle swarm optimization (PSO)
(Kennedy and Eberhart, 1995), ant colony optimization (ACO) based on
Fig. 1. Evolution of articial intelligence (AI), machine learning (ML), and deep learning (DL) technologies.
J.K. Janga et al.
Chemosphere 345 (2023) 140476
4
behavior of social ants (Dorigo et al., 2006), articial bee colony (ABC)
based on the behavior of bees in a colony (Karaboga, 2005), grey wolf
optimization (GWO) based on the leadership hierarchy and hunting
mechanism of grey wolves (Mirjalili et al., 2014).
These nature-inspired algorithms have been used extensively in
diverse elds such as engineering, computer science, economics, and
biology to solve complex problems that are difcult to tackle using
traditional optimization methods (Yang, 2014; Yang and He, 2016;
Lambora et al., 2019). They are especially benecial for addressing
problems that involve a large search space with inherent non-linearity,
and multiple conicting objectives, which is often the case with
contaminated site remediation.
3. Machine learning
ML, a subset of AI involves creating mathematical models to facili-
tate predictions and decisions, all without requiring explicit program-
ming (Sarker, 2021). ML models are generally trained using training
data and it is highly inuenced by the completeness of such data. ML
involves various approaches like supervised, semi-supervised, unsuper-
vised, and reinforcement learning, among others. Each of these methods
have specic applications in solving various problems. Supervised
learning approaches are used to solve regression, and classication
problems (James et al., 2023). Whereas, unsupervised learning ap-
proaches are used for dimensionality reduction, clustering and associ-
ation problems (Alloghani et al., 2020). Reinforcement learning is
mostly used for challenges associated with problems that require
real-time learning (Perera and Kamalaruban, 2021). Numerous algo-
rithms have been developed over the past few decades to implement
these techniques. Some of the widely used ML algorithms are presented
in Fig. 3.
3.1. Supervised learning
In supervised learning, the machine learns from user-provided data,
which shows how the inputs are mapped to outcomes. This helps the
machine to build a model for predicting outcomes of new inputs based
on the trends identied from past examples (training data) (James et al.,
2023). For instance, if there are three fruits with different colors for
each, and the objective is to sort them into groups as per the type and
color. In this case, as the user has previous experience and memory to
recognize, the sorting can be done quickly without any iterations. The
variables in this case are known as labeled variables as the features of
the variables are known. The learning algorithm is known as supervised
learning as the user can recognize the features and identify the cluster to
which the variable belongs to, without iterations.
One of the most basic versions of supervised ML is the linear
regression analysis, which is a rather simple model. However, use of a
simple model like linear regression for complex datasets introduces lot
of bias into the predictions (James et al., 2023). Hence, gradually
numerous parametric and non-parametric models were developed to
handle complex non-linear data, while balancing the bias-variance
trade-off, to make predictions more accurate and close to reality.
Commonly employed supervised learning models as presented in Fig. 3,
include decision-tree based models such as gradient boosting and
random forest, and support vector machines/regressors for both classi-
cation and regression tasks. However, k-Nearest Neighbors (kNN),
Bayesian models such as naïve bayes, and discriminant analyses models
are particularly used for classication tasks. Many of the supervised
learning models mentioned above can be utilized for predictive
modeling of contaminated subsurface systems when ample labeled data
is accessible, whether obtained through eld sampling or generated via
numerical modeling (process-based reactive transport models). These
models are suitable for both classication and regression tasks in this
Fig. 2. An overview of commonly used articial intelligence (AI) techniques.
J.K. Janga et al.
Chemosphere 345 (2023) 140476
5
context.
3.2. Unsupervised learning
In unsupervised learning, the user cannot identify the features of the
variables as opposite to supervised learning. Therefore, the variables are
said to be unlabeled. In this case, the sorting will be done based on the
initial expectation of the user. Further, the objective can be achieved
only after certain iterations. Unsupervised learning employs ML algo-
rithms to examine and group unmarked datasets (Alloghani et al., 2020).
These algorithms unveil concealed patterns or data clusters indepen-
dently. Its ability to identify resemblances and variations in data renders
it optimal for tasks like exploring data patterns, contaminant clustering,
and image recognition, which can be used for 3-D delineation of
contaminant distribution (Chen et al., 2023). The differences in super-
vised and unsupervised learning can be understood by comparing
various parameters (Alloghani et al., 2020). It can be noted that the
accuracy of supervised learning is better than unsupervised learning
which indicate that initial user involvement can improve the efciency
of the learning algorithms. Unsupervised learning techniques, such as
clustering, can be useful in situations where detailed sampling and
testing for some contaminants may be expensive, and also to identify
contaminant sources (Tariq et al., 2008). In such cases, areas can be
clustered based on similarities in other subsurface conditions and the
relative abundance of other contaminants, which can easily be
measured. Following this, smarter sampling strategies for other
chemicals, which are expensive to test, can be formulated to minimize
expenses and based on the contaminant clusters, the sources of such
contaminants can be identied. On the other hand, another unsuper-
vised learning technique, i.e., dimensionality reduction, can be applied
in situations where the data is too complex to comprehend.
3.3. Semi-supervised learning
Semi-supervised learning is a type of ML technique in which an al-
gorithm learns from both labeled and unlabeled data. In this approach, a
small amount of labeled data is used to guide the learning process, while
a larger amount of unlabeled data is used to improve the accuracy of the
model (Zhou, 2021). It is particularly useful in situations where it may
be difcult or expensive to obtain labeled data, and abundant avail-
ability of unlabeled data. This approach can be used in different types of
applications, including image classication, anomaly detection, and
natural language processing. However, it is important to understand that
the effectiveness of semi-supervised learning depends on the quality and
quantity of the labeled and unlabeled data. Past studies have indicated
the effectiveness of application of semi-supervised learning for
contaminant source identication (Vesselinov et al., 2018).
3.4. Reinforcement learning
Reinforcement learning is a subeld of ML that deals with training
algorithms to make decisions in dynamic environments (Sutton and
Fig. 3. Various commonly used machine learning (ML) algorithms.
J.K. Janga et al.
Chemosphere 345 (2023) 140476
6
Barto, 2018). It involves a user/agent interacting with an environment,
receiving feedback in the form of rewards based on its actions, and
adjusting its behavior to maximize long-term rewards. Reinforcement
learning aims to develop an optimal policy that can guide the agents
actions towards achieving a specic objective. In the context of site
remediation, reinforcement learning can be especially useful in cases
where the operational decisions have to be made based on real-time
monitoring data.
3.5. Articial neural networks
Articial neural networks (ANNs) are the algorithms that reect the
structure and function of the human brain, allowing computer programs
to recognize patterns for problem solving (Aggarwal, 2018). They are
composed of interconnected nodes, also known as neurons, which are
organized in layers. Each neuron takes input from other neurons, applies
a transformation to it, and outputs a result that is passed on to other
neurons. All the nodes are connected, and each neuron is associated with
corresponding weight and threshold value. A certain node will be acti-
vated only if the output from the neuron is greater than its threshold
value.
The general structure of a neural network comprises of nodes, also
called neurons, connecting the input layer, hidden layers and the output
layer (Goodfellow et al., 2016). The input layer comprises of an array of
neurons to store various features of the input variables. Input layer is the
initial layer that receives the input data and passes forward to hidden
layers. Hidden layers are located next to input layers that receives the
data for performing computational analysis. As the values of neurons in
these layers are unobservable and inaccessible outside of the network,
these are termed as hidden layers. The hidden layers generally perform
various tasks such as extracting features of input data, application of
mathematical functions and generation of output. The output of each
neuron in the hidden layer is a non-linear transformation of its input,
which allows the neural network to learn complex relationships between
inputs and outputs (Gurney, 1997). The number of neurons in the hid-
den layer is a hyperparameter that can be tuned to optimize the per-
formance of the neural network. Too few neurons in the hidden layer can
lead to under tting, where the model is unable to capture the
complexity of the data, while too many neurons can lead to overtting,
where the model tries to identify nonexistent patterns with an aim to
produce accurate t to the training data making the model unsuitable to
be generalized in order to make good predictions with new data. The
number of hidden layers in a neural network depends on the dimensions
and features of the input data (Goodfellow et al., 2016). If the input data
is linear-separable, then no hidden layers are required for the analysis.
Most of the engineering problems require 35 hidden layers for an ac-
curate analysis. However, choosing higher number of hidden layers for
relatively simple datasets will increase the complexity of the model and
may result in over-tting. Hence, care must be taken while tuning the
hyperparameters of neural networks in particular or any other ML model
in general.
The output layer is the last layer of neural network that produces the
nal output of the model. There can be several types of output layers
that can be specic to the problem that is being solved using the neural
network. Activation function is a crucial component located in the
hidden and output layers of neural networks (Sharma et al., 2017). The
activation function operates on the output of each neuron in a layer,
transforming it into a non-linear form that can be passed on to the next
layer or used as the nal output of the network.
In the hidden layers, the activation function plays a crucial role in
introducing non-linearity into the network, thereby enabling it to model
complex patterns in data (Sharma et al., 2017). The choice of activation
function depends on both the problem being solved and the architecture
of the neural network. In the output layer, the selection of the activation
function is based on the nature of the problem at hand. For instance, in
binary classication tasks, a sigmoid activation function may be
employed to generate a probability output within the range of 0 and 1.
Alternatively, in regression tasks, a linear activation function can be
employed to produce a continuous output (Sharma et al., 2017).
Based on the direction of ow of information these neural networks
can be divided into feed-forward, where information ows only in one
direction without any looping and recurrent neural networks that are bi-
directional networks. Lately, back-propagation algorithms are being
used to train feed-forward networks to reduce the errors in predicted
output (Svozil et al., 1997), while RNNs use feedback loops for this
purpose (Salehinejad et al., 2017).
With this brief introduction on basic working principles of neural
networks, they can be divided into two types: shallow learning networks
and DL networks (Aggarwal, 2018). Although there is no denite
consensus among the ML-community, networks with only one or two
hidden layers are generally termed as shallow learning networks, and
neural networks with more than two layers are termed as DL networks. A
single layer perceptron is the simplest form of neural network consisting
of an input layer, an activation function and an output (Auer et al.,
2008). Extreme learning machines (ELM) on the other hand are ANNs
with one input layer, one hidden layer, and one output layer (Huang
et al., 2006). In an ELM the weights assigned for the hidden layer are
randomly generated and are not iteratively adjusted, hence training
these models is extremely rapid. Conversely, DL utilizes neural networks
containing multiple hidden layers.
Overall, neural networks are versatile models with the ability to
perform supervised, unsupervised, semi-supervised, and reinforcement
learning tasks, making their applications extensive. Even in the context
of site remediation, both shallow and deep neural networks have found
utility in diverse projects, encompassing tasks such as site character-
ization, contaminant source identication, and remediation design and
optimization (Srivastava and Singh, 2014; Zhao et al., 2020; Zheng
et al., 2022). Particularly, DL networks, a subset of articial neural
networks (ANNs), have surged in popularity due to their capability in
processing intricate data forms. This surge has led to the development of
numerous DL models, which were also commonly used in literature
related to site remediation. Consequently, a dedicated section providing
a very brief overview of DL models is provided in the next section.
4. Deep learning
Deep learning (DL) is a subset of ML that deals with data of increased
complexity, where conventional ML models may not be adequate and
can result in inaccurate analyses. DL typically employs neural networks
with multiple hidden layers in order to achieve accurate estimation of
the output. These neural networks are designed to learn and extract
hierarchical representations of data from multiple layers of abstraction,
which can then be used to make predictions or decisions (Goodfellow
et al., 2016). DL models are very advanced and can mimic the behavior
of neurons in the human brain for predicting outputs. Convolutional
neural networks (CNN) and recurrent neural networks (RNN) are two of
the widely used and more sophisticated algorithms to implement DL. In
addition to these, various other types of existing DL models commonly
employed in various applications are presented in Fig. 4. A fully con-
nected multi-layer neural network is commonly referred to as a multi-
layer perceptron (MLP), which is generally classied as a DL model.
Further, several different models such as Boltzmann machines,
auto-encoders,deep belief networks etc., were developed by altering the
way information ows between nodes in different layers, to excel at
specic tasks. For example, encoders are originally programmed with
two major components: and encoder and a decoder, to approximately
copy the features of the input variables to the output variables, so that
they can excel at dimensionality reduction and feature learning appli-
cations (Goodfellow et al., 2016). Similarly, numerous DL network ar-
chitectures were developed and are being developed to enhance the way
these neural networks process information. However, the optimal choice
of network architecture is dependent on the unique features of the data
J.K. Janga et al.
Chemosphere 345 (2023) 140476
7
being processed and the specic problem being addressed. Brief de-
scriptions of some techniques generally used to implement DL in site
remediation are presented below.
4.1. Recurrent neural networks
Recurrent Neural Networks (RNNs) are a category of neural net-
works that are specically designed to handle sequential data, such as
natural language text or time-series data. RNNs utilize feedback con-
nections, forming loops that enable information to persist and propagate
through the network over time (Yu et al., 2019). This feedback mech-
anism enables RNNs to model dynamic temporal dependencies in the
input data, making them well suited for applications such as speech
recognition, natural language processing, and video analysis.
In RNNs, the connections between the nodes create a loop, allowing
the output of some nodes to affect the subsequent input of the same
node. This loop structure enables RNNs to exhibit temporal dynamic
behavior. The hidden state of an RNN has a recurrent connection, which
ensures that sequential information, such as the dependencies between
words in a text and while making predictions, is captured in the input
data. Furthermore, RNNs utilize parameter sharing as a technique to
minimize the quantity of parameters that require learning. This en-
hances their efciency in managing sequential data. Long short-term
memory networks (LSTM), and Gated recurrent units (GRUs) are two
common types on RNN architecture, which can be especially efcient in
processing sequential data (Chung et al., 2014).
4.2. Long-Short Term Memory
Long-Short Term Memory (LSTM) is a type of recurrent neural
network (RNN) that addresses the issue of vanishing gradients that is
common in traditional RNNs (Sherstinsky, 2020). The LSTM architec-
ture incorporates specialized memory cells that can retain information
over extended periods, enabling them to capture long-term de-
pendencies in sequential data (Hochreiter and Schmidhuber, 1997).
These memory cells are equipped with gates that regulate the ow of
information, allowing the network to selectively store or forget past
information. During training, the connection weights and biases in the
network are updated, analogous to the way physiological changes in
synaptic strengths store long-term memories (Sherstinsky, 2020). At
each time-step, the activation patterns in the network change similar to
how changes in electric ring patterns, in the brain, store short-term
memories. Because of its effectiveness in modeling sequential data
with long-term dependencies, LSTM has gained popularity in diverse
elds and has also been applied in site remediation tasks (Qiu et al.,
2023; Li et al., 2021).
4.3. Convolution neural networks
Convolutional Neural Networks (CNNs) are neural networks specif-
ically designed for tasks involving image recognition and processing.
They employ shared-weight architectures in their convolutional layers,
where kernels slide over input features, generating feature maps that
capture various patterns and objects within images. This unique
approach enables CNNs to excel in tasks related to visual data analysis
(Gu et al., 2018). Kernels are building blocks of CNN used to extract the
relevant features of the input using the convolution operation. CNN
captures the spatial features from an image. CNNs help us in identifying
the object accurately, the location of an object, as well as its relationship
with other objects in an image.
CNNs employ convolutional layers to extract features from input
data, followed by pooling layers that decrease the dimensionality of
feature maps (Gu et al., 2018). The output of the convolutional and
pooling layers is then passed through one or more fully connected layers
to produce the nal output. CNNs are particularly effective for image
classication tasks because they are able to learn features directly from
the raw pixel data, rather than requiring handcrafted features. They are
also capable of handling variations in the input image such as changes in
lighting, rotation, and scale, making them very efcient. Owing to this
exceptional ability of CNNs, they can be used in site characterization
tasks to analyze spatial contaminant distribution patterns based on
various imaging techniques. Studies in the past have used CNN for
estimating contaminant concentrations based on visible and near
infrared spectroscopy (imaging) (Pyo et al., 2020).
4.4. Autoencoders
Autoencoders are a special class of feedforward neural networks that
are trained to copy the input features to the output layer. Autoencoders
consist of two main parts: the encoder, which maps the input data to a
latent space (the compressed representation), and the decoder, which
reconstructs the input data from this compressed representation (Bank
et al., 2023). The encoder and decoder are both neural networks, and
they work together to minimize the difference between the input and the
output (reconstruction loss). Autoencoders are typically trained using
backpropagation and gradient descent algorithms to minimize the
reconstruction loss similar to that of feed forward neural networks.
However, autoencoders are programmed to copy the input imperfectly.
By forcing the network to learn a compressed representation, they
capture the essential features of the input data while discarding the
non-essential details, which makes them useful for dimensionality
reduction and feature extraction applications (Bank et al., 2023). Pyo
et al. (2020) used convolutional autoencoder for dimensionality
reduction of visible and near-infrared spectroscopy (VNIRS) data for
prediction of heavy metal contamination.
Recent developments to autoencoders made them suitable for
generative modeling, where they are trained to generate new data
samples that are similar to the training data (Goodfellow et al., 2016).
Variational Autoencoders (VAEs) are a specic type of autoencoder
designed for generative tasks. VAEs incorporate probabilistic techniques
to generate new data points (Doersch, 2016). Kang et al. (2021) have
used convolutional VAE for monitoring of contamination source zone
during remediation.
While DL networks such as restricted Boltzmann machines, deep
belief networks, generative adversarial networks, etc., are recognized in
diverse elds, they are not widely explored for applications in contam-
inated site remediation process. Hence, the implementation and
Fig. 4. Few commonly employed algorithms to implement deep learning (DL).
J.K. Janga et al.
Chemosphere 345 (2023) 140476
8
principals behind these algorithms have not been discussed in this sec-
tion. In order to explore more about the architectures and working
principles of these models, refer to Goodfellow et al. (2016).
5. Special techniques
5.1. Fuzzy logic
Fuzzy logic is a mathematical framework that allows for reasoning
with uncertain or imprecise information. This approach is used for
computational work based on degrees of truth rather than the usual true
or false (1 or 0). Fuzzy logic includes 0 and 1 as extreme cases of truth
but with various intermediate degrees of truth (Zadeh, 1973). This al-
lows for more nuanced and exible reasoning, particularly in complex
systems where there may be multiple factors inuencing an outcome.
Fuzzy logic proves valuable in engineering scenarios characterized by
ambiguous certainties and uncertainties, or when handling imprecise
data, as observed in natural language processing technologies. More-
over, it excels in governing and managing machine outputs, adapting to
diverse input variables. Fuzzy logic based models are such as fuzzy
c-means clustering, and fuzzy based optimization were employed in past
studies related to contaminated site remediation (Chen et al., 2023; Hu
and Chan, 2015).
5.2. Sugeno Fuzzy Logic
The Sugeno fuzzy inference, also known as Takagi-Sugeno-Kang
fuzzy inference, utilizes singleton output membership functions, which
can be either a constant or a linear function of the input values (Takagi
and Sugeno, 1985; Sugeno and Kang, 1988). In comparison to
centroid-based defuzzication (Runkler, 1996), the defuzzication
process for Sugeno systems is computationally more efcient. It employs
a weighted average or weighted sum of a few data points.
Sugeno fuzzy logic (SFL) nds widespread use in applications
requiring precise control, such as industrial automation, control sys-
tems, and robotics. It is also frequently employed in decision-making
systems like expert systems and rule-based systems. The principal
advantage of SFL is its capacity to model complex nonlinear systems
with accuracy and efciency while retaining easy interpretability of the
fuzzy rules. Sadeghfam et al. (2019) used SFL based surrogate models
coupled with GA-based optimization to optimize the pumpage schedule
while remediating excessive total dissolved solids (TDS) in groundwater,
using pump, treat, and inject (PTI) technology. They observed that SFL
models in comparison to ANN based models can yield better computa-
tional efciency while reporting similar accuracy.
5.3. Surrogate models
Surrogate models, also referred to as metamodels or emulators;
imitate simulation models with high accuracy while requiring fewer
computational resources (Cozad et al., 2014; Asher et al., 2015). These
models are usually created using a smaller dataset generated by
resource-intensive simulations or experiments. This dataset helps create
mathematical or statistical models that can quickly and accurately
predict how a system or process behaves. Surrogate models are built
using a data-driven approach with strategically selected sample simu-
lation outputs at specic points in the design parameter space. For each
of these points, a full simulation is run to calculate the corresponding
output.
The pairs of input (design parameters) and output values are
collected into a training dataset, which is used to construct a statistical
model. Unlike traditional methods that use predetermined datasets,
surrogate models use active learning to gradually expand their training
data. This approach signicantly improves both the efciency and the
accuracy of training process. When a new sample is identied, a new
simulation is performed to calculate its corresponding output value. The
surrogate model is then updated with this new information. This process
is repeated until the surrogate models accuracy meets the desired level.
Surrogate models are widely applied across various elds, such as en-
gineering, physics, chemistry, and nance. They effectively reduce the
computational expenses associated with complex simulations, enabling
more efcient optimization and design procedures. Surrogate models
were also widely used by researchers for groundwater and subsurface
modeling to optimize remediation design parameters, and also for
contaminant source identication.
5.4. Ensemble learning
Ensemble learning involves creating a prediction model by har-
nessing the strengths of several simpler base models (Polikar, 2012).
Ensemble learning can be divided into two main tasks: rst, developing
a population of base learners from the training data, and then combining
these learners to create a composite predictor. Tree based ML algorithms
such as gradient boosting algorithms and random forest belong to the
class of ensemble learning techniques. These algorithms learn from
multiple decision trees (weak/base learners) to enhance the pre-
diction/classication accuracy. Any number of ML models can be
combined to build an ensemble-learning model, given that it is not
detrimental to the nal prediction accuracy. Several studies have
employed an ensemble of surrogate models for predictions in the context
of contaminated site remediation (Chu and Lu, 2015; Hou et al., 2017;
Ouyang et al., 2017b; Xing et al., 2019; Qiu et al., 2023).
5.5. Decision making
The process of decision-making involves selecting the optimal course
of action among a range of available alternatives. It requires the iden-
tication and evaluation of various options, considering potential out-
comes and consequences, and ultimately choosing a path based on the
available information, preferences, and goals.
To arrive at a conclusion, decision-making necessitates analyzing
data from multiple sources with varying levels of certainty, merging the
information by weighting certain data sources over others (Kochender-
fer, 2015). An agent, which acts based on observations of its environ-
ment, interacts with the environment through an observe-act cycle. At
time-‘t, the agent receives an observation of the environment, repre-
sented as (O
t
), and then chooses an action (a
t
) through the
decision-making process.
Given the past sequence of observations O
1
, O
2
, O
t
and knowledge
of the environment, the agent must choose an action that best achieves
its objectives, considering various sources of uncertainty. Intelligent
decision-making systems can be useful in making real-time decisions in
contaminated site remediation, especially in the age of big data.
6. Model development
Model selection is a crucial task in the domain of AI, which involves
the identication of the most appropriate algorithm or model for a given
problem. Optimal model selection can signicantly inuence the accu-
racy and efciency of prediction or classication tasks.
To select the most suitable model, various factors such as accuracy,
computational complexity, interpretability, robustness, and generaliza-
tion performance must be considered. Model development process
typically involves dividing the available data into training, validation,
and test sets. The training set is used to train different models, the
validation set to select the best model, and the test set to evaluate the
nal models performance. Several techniques can be employed for
model selection, including cross-validation, grid search, and Bayesian
optimization (James et al., 2023). Cross-validation involves iterative
division of the available data into training and validation sets to eval-
uate the models performance on each iteration. Grid search involves
trying various combinations of hyperparameters to identify the optimal
J.K. Janga et al.
Chemosphere 345 (2023) 140476
9
model, whereas Bayesian optimization uses probabilistic approaches to
search for the most effective hyperparameters.
6.1. Steps involved in model development
The following are typical steps forAI based data-driven model
development (James et al., 2023):
Data collection: It is imperative to use data from a trustworthy source
since the quality of data directly inuences the models outcome. High-
quality data is relevant to the problem being addressed, contains mini-
mal missing and duplicated values, and represents various sub-
categories/classes appropriately. It is commonly believed that having
more data results in a better model, which leads to higher accuracy.
However, the quality of data is equally important as the quantity. The
models accuracy heavily relies on the data quality, and having a large
dataset with poor quality may not improve the models performance.
Thus, it is essential to ensure the data quality before using it to develop
models. This includes checking for missing and duplicated values and
verifying that the data represents the subcategories/classes present.
Using high-quality data enhances the models accuracy, making it more
reliable and useful for addressing various problems.
Data preprocessing: This is a critical step in developing accurate and
reliable models. Randomizing the data ensures that it is evenly distrib-
uted, which is important for preventing bias in the model. Data cleaning
is also essential to remove unwanted data, such as missing values, rows,
and columns, as well as duplicates and to convert data types if necessary.
Once the data is cleaned, it is split into two sets - a training set and a
testing set. The training set is used to train the model, while the testing
set is used to evaluate the models performance. The training set is what
your model learns from, and the testing set is used to check the accuracy
of your models learning after training. Proper data preparation ensures
that the model is trained in high-quality data, leading to more accurate
and reliable predictions. By randomly distributing the data, cleaning the
data, and splitting it into training and testing sets, the model can learn
effectively and produce accurate predictions on unseen data.
Model selection: This is a crucial step in developing effective AI/ML/
DL models that can accurately solve the task at hand. Choosing a model
that is relevant to the specic task is essential. This involves assessing
whether the model is suited for numerical or categorical data and
choosing accordingly. It is also important to ensure that the selected
model is suitable for the specic problem at hand. Different models have
their strengths and weaknesses, and it is essential to choose a model that
is appropriate for the specic problem. Additionally, the complexity of
the model should be considered, as overly complex models can lead to
overtting, while overly simple models may not capture the complexity
of the problem. In conclusion, model selection is a crucial step in
developing effective AI/ML models. It requires careful consideration of
the data type, problem complexity, and the strengths and weaknesses of
different models to choose the most appropriate one for the task at hand.
By selecting the right model, one can ensure that the model can solve the
problem accurately and effectively, leading to more reliable and accu-
rate results. The model accuracy can be evaluated using various model
accuracy parameters.
Model Training: In model training, the prepared data is used to teach
the machine-learning model to recognize patterns and make predictions.
By training on the data, the model learns to accomplish the tasks set out
for it. As training progresses, the model becomes better at predicting
outcomes, resulting in improved prediction accuracy.
Model Testing: To evaluate the performance of a model, it is neces-
sary to test it on previously unseen data. The dataset used for testing is
separate from the one used for training and is referred to as the testing
set. The testing set allows for an objective evaluation of the models
generalization ability to unseen data.
Parameter tuning: This is a crucial step in improving the accuracy of
a model once it has been selected. The process involves adjusting the
values of the parameters represented in the model to achieve optimal
performance. By ne-tuning the values of specic parameters, the model
can be tailored to perform optimally for a specic task. The accuracy of
the model can be signicantly improved by tuning the parameters
effectively.
Making predictions: This is the nal step in a typical ML workow.
After the model has been trained and tested with unseen data, it can be
used to make predictions on new data. It is important to ensure that the
data used for making predictions is of the same quality as the data used
for training and testing. This will help ensure that the model makes
accurate predictions and performs well in real-world applications.
Performance metrics: Performance metrics are used to evaluate the
effectiveness of AI models in solving specic problems. Different types of
metrics are available depending on the purpose for which the models are
used (classication/regression/clustering). Mainly in the context of
models used in contaminated site remediation-based studies, regression
metrics (MSE, MAE, etc.), classication metrics (precision, accuracy,
recall, etc.), correlation metrics (R
2
, r, etc.), and rank metrics (spear-
mans rank correlation coefcient) were generally used. Formulae for
such model performance evaluation metrics are presented in Table 1.
7. Integration of AI/ML/DL into contaminated site remediation
7.1. Bibliometric analysis
A bibliometric analysis was conducted to analyze the scientic
literature avaialble regarding the appication of AI/ML/DL in contami-
nated site remediation using the Scopus database developed by Elsevier.
The search query used to nd relevant literature is as follows:
(Groundwater OR Ground water OR Soil) AND (Remediation)
AND (Articial Intelligence OR Machine Learning OR Deep
Learning OR Neural Networks OR Genetic Algorithm OR Opti-
mization Algorithms) AND NOT (wastewaterOR waste water). The
search yielded a total of 427 documents, including technical articles,
review articles, book chapters and conference papers.
Fig. 3a illustrates the distribution of published papers over the years.
It shows that the use of AI in site remediation has been in practice since
the late 1990s. Initially, the research mainly focused on building
decision-support tools mainly using genetic algorithms and optimizing
various parameters using AI-based models. However, post 2010, the
research diversied into different stages of site remediation. This in-
cludes predicting the contaminant concentration and spatial distribution
of contaminants, employing DL models to create accurate surrogates to
traditional process-based simulation models, conducting simulation-
optimization of subsurface models to optimize remediation tech-
niques, use of simulation-optimization techniques for contamination
source identication and developing robust decision support tools in
data-rich elds.
Moreover, Fig. 5a indicates that the use of AI/ML/DL in site reme-
diation has experienced signicant growth in the past ve years. To
comprehend the key research areas where these technologies are
applied, the co-occurrence of index-based and authors keywords was
analyzed using VOSViewer, an open-source software tool for con-
structing and visualizing bibliometric networks. The resulting keyword
network is presented in Fig. 5b and the nation of origin of co-authors of
the articles is presented in Fig. 5c.
The analysis reveals that a considerable amount of research has been
devoted to groundwater contamination and remediation, while soil
contamination and remediation have been relatively less explored.
Additionally, both ML and DL models were extensively employed for
various applications in contaminated site remediation. Early research
focusing on the use of GA for optimizing remedial strategies and
building decision support tools, along with continued usage of GA for
various optimization practices, has contributed to its higher relevance in
the bibliography. In addition, as observed in Fig. 5c, most of the research
regarding the integration of AI into site remediation is concentrated in
the USA, and China followed by India, Iran, and the UK.
J.K. Janga et al.
Chemosphere 345 (2023) 140476
10
7.2. Site characterization and risk assessment
Site characterization is a crucial part of contaminated site remedia-
tion. It provides the necessary information for risk assessments and
designing effective remediation systems. The primary objectives of site
characterization are to identify the nature and extent of contamination.
This includes determining the types of contaminants, their quantities,
locations, and the phases in which they occur. Owing to the inherent
nature of contaminated sites, rigorous groundwater and soil sampling
for laboratory testing and use of data intensive statistical techniques for
this purpose is common. However, with the emergence of AI-based
technologies, this procedure can become signicantly less cumber-
some while maintaining the accuracy of the predictions intact (Hu et al.,
2003). Table 2 displays a variety of studies centered on predicting
contaminant concentrations, spatial distributions, or a combination of
both, along with risk assessment activities. Table 2 provides a detailed
understanding of how AI-based technologies have been employed by
researchers across the world to address problems in site characterization
activities. Nevertheless, a brief description of how these technologies
can help enhance site characterization and risk assessment activities is
provided below:
Spatial distribution of contamination: Numerous studies have re-
ported the use of AI/ML/DL in estimating the spatial distribution of
contaminants (Kanevski, 1999; Shaker et al., 2010; Zhang et al., 2023).
The input data required for such prediction can be either sparsely
sampled groundwater or soil monitoring data, or feature based micro-
scopic or drone images. Specic applications include feature extraction
from aerial or spectral images, prediction of contaminant concentration
with the help of relationship between the extracted features and
contaminant concentrations (Jia et al., 2021b). Another way of pre-
dicting spatial distribution is by using groundwater monitoring wells or
soil sampling data and subsequent laboratory testing and the use of same
to spatially interpolate the contaminant concentrations. A brief meth-
odology owchart to determine the spatial dispersion of pollutants,
employing various imaging techniques, laboratory investigations, and
leveraging ML/DL models, is depicted in Fig. 6. Table 2 briey sum-
marizes various studies reporting the application of different models to
predict spatial distribution of contaminants.
Risk assessment: This is a critical phase following the spatial delin-
eation of contaminant distribution, as described above. This spatial
delineation provides valuable insights that can be harnessed to pinpoint
areas of elevated risk. By identifying these risk zones, it becomes feasible
to tailor an appropriate and targeted approach for remediation strate-
gies. This step ensures that resources are allocated efciently, and in-
terventions are designed to mitigate potential hazards based on the
specic contamination patterns identied effectively. In addition, ML
based frameworks can be used to identify the hotspots and drivers of
contamination in soils, resulting in a comprehensive understanding of
the risks associated (Yang et al., 2021).
7.3. Selection of remediation technology (decision-making)
Availability of data holds paramount importance in engineering-
based elds, particularly when dealing with the growing number of
contaminated sites and various types of contaminants worldwide.
Identifying contamination and selecting suitable remediation tech-
niques can be challenging, but ML models offer a valuable solution by
leveraging vast amounts of pollutant and remediation data accumulated
from decades of contaminated site remediation experiences. For
instance, Li et al. (2022a) employed a decision tree classier on data
from the CERCLA database to classify common pollutants and associated
remediation techniques across 144 contaminated sites in four US states.
The study revealed a decline in the growth of contaminated sites over
the past decade, with physical remediation technologies being the most
employed.
In addition to using AI-based technologies for data analysis regarding
prevalent remediation technologies, they can also be utilized to evaluate
the most suitable remediation technique based on in-situ site charac-
teristics (Wijaya et al., 2023). These characteristics may include soil
microbial data, physicochemical properties of the soil, initial form of the
contaminant, and the extent of contamination. By employing data
analysis techniques on such information, one can assess which remedi-
ation technology aligns best with the prevalent site-specic conditions.
Consequently, by leveraging past remediation project data and in-situ
site characteristics, AI-based technologies empower decision-makers to
make informed choices when selecting the appropriate remediation
technology for a given scenario (Li et al., 2022b).
AI-based decision-support tools to select and implement an appro-
priate remedial action have been studied since the early 2000s (Chen
et al., 2003; García et al., 2006; Dunea et al., 2014). A general
Table 1
Various model-performance evaluation metrics used to evaluate AI/ML/DL based models.
Metric Formula Metric Formula
R
2
1i=n
i=1(yi
yi)2
i=n
i=1(yiy)2
Root mean square error (RMSE) 
i=n
i=1(yi
yi)2
n
Mean absolute error (MAE) i=n
i=1yi
yi
n
Mean absolute percentage error (MAPE) 1
ni=n
i=1
yi
yi
yi×100%
Mean square error (MSE) 1
ni=n
i=1(yi
yi)2 Willmotts index of agreement, d 1i=n
i=1yi
yi
2×i=n
i=1(yiy)
Relative absolute error (RAE) i=n
i=1yi
yi
i=n
i=1yiy
Relative error (RE) i=n
i=1
yi
yi
yi×100%
Precision TP
TP +FP
Recall TP
TP +FN
Accuracy TP +TN
TP +FP +TN +FN
Cohens Kappa coefcient 2× (TP ×TN FN ×FP)
(TP +FP) × (FP +TN) + (TP +FN) × (FN +TN)
F1 score 2×precision ×recall
precision +recall
AUC Area under ROC curve
IQ Q3Q1 RPIQ IQ
RMSE
α
Probability of FP β Probability of FN
TErate
α
+β Spearmans rank correlation coefcient 16×di2
n(n21)
Notes:
yi - predicted value, yiactual observed value, y mean of observed values, n-total no. Of test values, TP- True Positives, TN True Negatives, FP False
Positives, FN False Negatives, ROC - Receiver Operating Characteristic Curve, Q
1
value below which 25% samples can be found, Q
3
Value below which 75%
samples can be found, di difference in the ranks given to the two variable values for each item of the data.
J.K. Janga et al.
Chemosphere 345 (2023) 140476
11
theoretical approach to selection of remediation technology is presented
in Fig. 7. Initially when computational infrastructure was limited,
decision-support tools were developed using expert inputs (Geng et al.,
2001), later such tools were converted into probabilistic tools providing
the probability of achieving required remediation efciency if a reme-
diation technology is adopted (He et al., 2006; Qin et al., 2007). Later,
weighing systems were developed to weigh the preferences of different
stakeholders so that the decision support tools can weigh in the same
before choosing the remediation technology (Balasubramaniam et al.,
2007). With the growing concerns of climate change, due to unsus-
tainable and environmentally unfriendly practices, including
sustainability and resiliency in decision-making with respect to site
remediation is of paramount importance (Reddy et al., 2019a, 2019b).
More recently, research on decision support tools started including
sustainability and resiliency, along with stakeholder feedback, of the
remediation technologies (Li et al., 2022b; Huysegoms and Cappuyns,
2017).
In summary, the integration of AI based techniques and data analysis
in contaminated site remediation can revolutionize the way environ-
mental challenges are addressed. By harnessing the power of data, AI-
based technologies offer a more informed and effective approach to
identifying contamination, understanding its distribution, and selecting
Fig. 5. Results of bibliometric analysis: (a) Annual scientic production of papers between 20002023, August, (b) Visualization of authorsand indexed keywords
co-occurrence constructed using full counting, and (c) Visualization of the origin nations-network of co-authors from the literature analyzed.
J.K. Janga et al.
Chemosphere 345 (2023) 140476
12
Table 2
Various studies using AI/ML/DL based models for site characterization and risk assessment.
Contaminant(s) under
study
Model(s) used Model accuracy
parameters
Objective Result Reference
1,2,3-
trichloropropane
(TCP)
CART, RF, and BRT R
2
, RMSE, and MAE To predict spatial distribution of TCPs
in the absence of groundwater
monitoring, using various effecting
parameters like historical land-use,
dissolved oxygen, and co-
contaminant-nitrate data. To compare
the performance of various models
BRT, and RF models were found to be
better performing models with R
2
values of 0.44, and 0.41 compared to
CART (R
2
=0.020). RF was
successfully used to predict the spatial
distribution and identify the relevant
parameters effecting TCP
concentration i.e., precipitation,
dissolved oxygen, and nitrate
concentration.
Hauptman et al.
(2023)
DEHP Bi-LSTM, kNN, and RF Confusion matrix To develop a novel DL based model to
predict the spatial variation of DEHP
in the study area
The LSTM based model proposed in
the study has indicated a better
performance compared to the other
two models in predicting the spatial
variation of DEHP
Zheng et al.
(2022)
PFAS RF, and LR Spearmans
correlation
coefcient, and AUC
To build an ML model in order to
accurately predict the PFAS
contamination using features such as
co-contaminant ngerprint, proximity
to airport and military installations,
and other surface and subsurface
features to allow for prioritizing the
groundwater testing
The ML model built was able to
predict the PFAS concentrations with
high accuracy even with limited
availability of co-contaminant data.
Model was projected to reduce the
number of groundwater wells to be
tested by 70% as compared to
traditional random sampling
approach
George and Dixit
(2021)
PAHs SVR
GA for optimization
RMSE, MAE, and
MAPE
To build an SVR model to predict
PAHs using TPHs as descriptors and
compare optimization techniques for
selecting hyperparameters to improve
model accuracy.
The SVR model utilizing the GA
approach with Gaussian Kernel
functions yielded the most accurate
predictions when compared to other
optimization approaches in the study.
Study suggests that TPH
concentrations can serve as a reliable
means to predict PAH concentrations
without compromising on prediction
accuracy using the developed model
Akinpelu et al.
(2020)
Arsenic RF, ERF, SVM, MLP Accuracy, Precision,
Recall, F1 score, and
Cohens Kappa
coefcient
Mapping risk level for soil As pollution
using high resolution aerial imaging
(HRAI), and ML
ERF gave better predictions compared
to the other three ML algorithms with
an average classication accuracy of
0.87.
Jia et al. (2021a)
Chromium GRNN, and MLP with
and without kriging
MAE, RMSE,
Spearmans rank
correlation
coefcient,
To predict the concertation of
chromium (Cr), which is abnormally
distributed, using hybrid models
consisting of ML models and residual
kriging
Estimation of ML model residuals
using residual kriging helps in
smoothing out abnormal high and low
predictions and can be superior as
compared to pure ML-based models.
Tarasov et al.
(2018)
Cr(VI) XGBoost kNN for
imputation of missing
data
R
2
, MSE, and RMSE To predict the long-term groundwater
contamination with pollutants like Cr
(VI) using XGBoost model optimized
using Bayesian search cross-validation
approach
Optimized XGBoost model can predict
the contamination with a good
accuracy (R
2
=0.99 during training,
R
2
=0.85 during testing)
Mazumdar et al.
(2022)
Copper, Manganese,
and Nickel
GRNN, and MLP with
and without kriging
R
2
, RMSE,
Spearmans
correlation
coefcient,
Willmotts index of
agreement, IQ
To predict spatial distribution of
heavy metal contamination using
hybrid models constructed using ML
models and geostatistical kriging
Hybrid approaches employed did
improve the prediction accuracy
compared to basic ML models and
generally used universal kriging for
spatial correlation
Sergeev et al.
(2019)
Arsenic, Copper, and
Lead
CACNN, CNN, ANN,
RFR
PCA for dimensionality
reduction
R
2
, and RMSE Estimation of heavy metal
contamination from VNIRS data using
DL based models
CACNN provided reasonably good
estimates of all three elements, while
ANN and RFR could not provide
accurate estimates of all three
elements at the same time
Pyo et al. (2020)
Iron, Manganese,
and Zinc
MLP Sum of Squares
Error, RMSE, and
Relative Error
Estimation and prediction of heavy
metal levels using macro elements and
altitude level data, at the Mount Ida
national park
ANN can be effective in predicting
contaminant concentrations based on
various known various soil
parameters
Sari et al. (2022)
Heavy metals DL with nearest
neighbor neural
network
RMSE,
α
, β, and
TErate
To build a DL based spatial
interpolation model, using residual
network (ResNet) architecture and
compare it with traditional kriging
based spatial interpolation models
DL algorithms provide a robust
alternative to kriging based
interpolation by providing higher
accuracy for spatial interpolation of
contaminant concentrations
Man et al. (2021)
Heavy metals RF R
2
, ME, MSE, and
RMSE
To determine the factor importance in
predicting the heavy metal
distribution in coastal reclaimed soils
RF model was successfully used to
establish the importance of various
factors such as soil mineral
composition, soil organic content, and
chemical properties such as pH in
Zhang et al.
(2021)
(continued on next page)
J.K. Janga et al.
Chemosphere 345 (2023) 140476
13
Table 2 (continued )
Contaminant(s) under
study
Model(s) used Model accuracy
parameters
Objective Result Reference
determining the distribution
dynamics of various heavy metals
Heavy metals ANN, BP-FFNN R
2
, MSE, MAE, and
RMSE
To determine the best pollution
indexing approach to assess
groundwater pollution through ML
and DL based approaches
DL (BP-FFNN) based approach was
more appropriate to determine the
effectiveness of the pollution indices
compared to ML (ANN) based
approach in assessing the ground
water pollution
Singha et al.
(2020)
Heavy metals MLR1, RF with fuzzy c-
means
R
2
, and RMSE To predict the spatial variation of
heavy metal pollution, identify the key
inuencing factors, determine the risk
levels, and delineate the risk zones
The effectiveness of the RF model in
assessing the spatial distribution of
contamination and identifying crucial
inuencing factors has been well
established. Moreover, the successful
application of fuzzy c-means for
detecting and outlining risk zones has
proven valuable in visualizing and
pinpointing areas of concern.
Chen et al. (2023)
Soil Microplastics
(MPs)
SVR-RBF, BPNN, RF,
RBFN, LSTM, XGBoost,
CART, RR, and LASSO
regression
R
2
, RMSE, and MAE To assess and compare the accuracy
and applicability of different ML
models in predicting soil MPs
abundance. To predict the spatial
distribution of MPs based on most
accurate model
SVR-RBF model was found to be the
most accurate compared to other
models in predicting the soil MPs
abundance. RF based ensemble model
was the best in explaining the
environmental factors effecting MPs
distribution.
Qiu et al. (2023)
Boron BPNN, SVM, and LR MAE, and RMSE To predict the concentration of
geothermal originated boron in the
study area using traditional ML
models-SVM, and linear regression
and a DL based model
DNN was able to predict the
concentrations of boron in
groundwater, surrounding the
geothermal wells in the study area,
with a better accuracy as compared to
the other two traditional models
Tut Haklidir and
Haklidir (2020)
Nitrate BRT, SVM, and MDA AUC, Kappa, and
MSE
To produce contamination risk maps
in the study area using the predictions
from ML based models
All three models and an ensemble of
the three models, weighed according
to the performance, indicated good
performance in predicting the
groundwater contamination (AUC
>80%) along with risk levels
Sajedi-Hosseini
et al. (2018)
Fluoride ELM, MLP, and SVM R
2
, RMSE, and MAE To investigate the ability of ELMs with
various activation functions to predict
uoride contamination in
groundwater as compared to that of
MLP, and SVM
ELM models outperformed MLP and
SVM models in predicting the uoride
contamination of groundwater. ELM
based on RBF gave the best results as
compared to linear, polynomial, and
sigmoid-based kernel function. ELM
models were also computationally
more efcient
Barzegar et al.
(2017)
Lead, Magnesium,
Iron, Zinc, and
Ammonia
MLR2, and RF Confusion matrix, F1
score, Accuracy,
Precision, and Recall
To predict the groundwater pollution
and delineate the spatial distribution
of risk zones with the help of ammonia
index and other relevant variables.
RF model showed a better
performance when compared with
MLR with an accuracy of 93%. Study
was successfully able to depict the
effectiveness of ML models in
predicting groundwater
contamination using ammonia index
and other relevant variables
Madani et al.
(2022)
Radium ANN, SVM RMSE
b
To develop an optimized detection
system based on various detector-
algorithm combinations to improve
the accuracy of detecting ‘hot
particles
a
ML algorithms can improve detection
limits compared to conventional
count rate algorithms by
concentrating on the spectral shape
changes. ANN in particular gave
better results as compared to SVM.
Varley et al.
(2015)
Note: Abbreviations - DEHP - Di(2-ethylhexyl) phthalate, PFAS - Per- and polyuoroalkyl substances, PAHs Polycyclic Aromatic Hydrocarbons; CART - Classication
and Regression Tree, RF - Random Forest, BRT Boosted Regression Trees, LSTM Long Short Term Memory, Bi-LSTM Bidirectional LSTM, kNN k-Nearest
Neighbor, LR Linear Regression, SVR - Support Vector Regression, GA Genetic Algorithm, SVM Support Vector Machine, ERF Extreme RF, MLP Multi-Layer
Perceptron, GRNN Generalized Regression Neural Networks, CNN Convolutional Neural Network, CACNN Convolutional Autoencoder CNN, ANN Articial
Neural Networks, RFR - Random Forest Regressor, PCA Principle Component Analysis, BP-FFNN Back-Propagated Feed Forward Neural Network, MLR1 Multiple
Linear Regression, RBF - Radial Basis Function, SVR-RBFSVR with RBF as kernel function, BPNN Back Propagated Neural Network, RBFNRBF Network, XGBoost
Extreme Gradient Boosting, RR Ridge Regression, LASSO -Least Absolute Shrinkage and Selection Operator, MDA Multivariate Discriminant Analysis, ELM
Extreme Learning Machine, MLR2 Multivariate Logistic Regression; VNIRS Visible and Near Infrared Spectroscopy; TPH Total Petroleum Hydrocarbons.
a
‘hotparticles Small highly radioactive items.
b
RMSE was used as a general purpose metric to improve the prediction accuracy of the models. However, actual eld performance of the deployed detectors was
evaluated with the help of following metrics: Overall detection rate (ODR), Maximum detection rate (MDR), and False alarm rate (FAR).
J.K. Janga et al.
Chemosphere 345 (2023) 140476
14
the most appropriate remediation technique for each scenario with the
help of appropriately built decision-support tools.
7.4. Design and optimization of remediation technologies
Following the comprehensive characterization of the site and the
careful selection of the remediation technology to be implemented, a
pivotal next step involves an efcient design of the chosen remediation
approach. The primary objective of this design phase is to enhance the
efcacy of the remediation process while concurrently minimizing both
short-term and long-term costs (Sharma and Reddy, 2004). This opti-
mization is achieved through a strategic balance that maximizes the
efciency of the remediation while ensuring its successful imple-
mentation. Various studies reporting the application of AI/ML/DL
Fig. 6. A general approach to prediction of soil contamination using limited chemical analyses and spectral imaging for site characterization.
J.K. Janga et al.
Chemosphere 345 (2023) 140476
15
models to design or optimize the select remediation technique are pre-
sented in Table 3. These techniques can be harnessed to design and
optimize various widely used remediation technologies (Wang et al.,
2022). While Table 3 offers insights into the current application of AI in
this context, the following sub-sections provide a concise overview of
how these technologies are currently utilized and their potential future
applications in the design and optimization of various remediation
methods.
Pumping-based remediation: Several techniques such as pump, treat,
and inject or surfactant enhanced aquifer remediation are usually
employed to remediate commonly occurring groundwater contaminants
such as petroleum hydrocarbons, DNAPLs, and other water-soluble
chemicals (Sharma and Reddy, 2004). In order to design such systems
optimally, numerous simulations are required to be performed which
can get computationally expensive. Owing to this, ML/DL-based surro-
gates were proven to be reasonably accurate surrogates to reduce the
computational burden of such simulation-optimization problems
(Sreekanth and Datta, 2011; Luo et al., 2013; Luo and Lu, 2014; Chu and
Lu, 2015; Hou et al., 2015, 2016; Ouyang et al., 2017a, 2017b). In
addition, various algorithms such as GA, PSO among others can be used
to optimize the parameters such as pumping schedule, remediation
duration etc. A general approach to apply AI based techniques in
simulation-optimization problems, as shown in Fig. 8, consists of two
main steps: First step being building an appropriate surrogate model and
second step being selection and implementation of an appropriate
optimization algorithm that best ts the purpose. Ensemble learning, as
shown in Fig. 8, and as described earlier, can be employed as necessary
to improve the prediction accuracy and computational efciency of
surrogate models.
Immobilization or containment of contaminants: In-situ remediation
methods like immobilization or containment of contaminants necessi-
tate the utilization of specic materials, such as nanoscale-zero valent
iron (Manfron et al., 2020) or biochar (Kamdar et al., 2023; Zhang et al.,
2022). These materials should possess desirable attributes, including
elevated surface area, enhanced reactivity, and limited biodegradability,
to facilitate the desired remediation outcomes. AI based techniques can
be used in determining the suitability, and predicting the remediation
efciency along with the factors inuencing remediation efciency.
Phytoremediation: The uptake of hazardous contaminants by plants
leading to reduction in contaminant concentration is termed as phy-
toremediation (Reddy et al., 2020; Reddy and Amaya-Santos, 2017).
Often plants will not survive in the presence of hazardous pollutants
such as heavy metals, or need soil amendments with materials such as
sewage sludge, biochar, or compost for optimal performance. AI based
techniques in this case can be used to predict the suitable plant species
for a contaminant type and also can be used to determine optimal
amendment conditions for effective phytoremediation based on limited
experimental results.
Bioremediation: Enhancing the in-situ degradation of organic con-
taminants by pumping necessary nutrients or microbes and simulating
favorable conditions is termed as bioremediation (Decesaro et al., 2017).
AI can be harnessed to optimize bioremediation by evaluating favorable
reaction conditions by analyzing available experimental data or data
from past remediaton projects (Jalali et al., 2023; Stef et al., 2022;
Mohammadi et al., 2021).
Natural attenuation: The natural degradation of organic compounds
without human intervention is termed as natural attenuation (Sharma
and Reddy, 2004). AI based techniques can be employed in this case to
determine whether the in-situ conditions are suitable for natural atten-
uation and to predict the duration of remediation.
Electro-kinetic enhanced bioremediation: Bio-based remediation
techniques such as phytoremediation and microbial degradation
(bioremediation) can be less effective and more often, it is possible that
such remediation alone is not sufcient to remove the contaminants
completely. Hence, electro-kinetic remediation, where applied electric
charge helps in the movement of charged contaminants or additives, can
be used to enhance remedial efciency of bio-remediation techniques
(Cameselle and Reddy, 2022). However, owing to various physical,
chemical, and biological processes involved, it is difcult to predict the
performance of such techniques. Coupled process-based transport and
fate models are usually employed to predict the performance of
electro-kinetic remediation, which are computationally intensive.
AI-based simulation-optimization, as described earlier in Pumping--
based remediation, can be used in this case to optimize the design and
operation of electro-kinetic remediation.
Soil Washing: Soil washing is one of the oldest ex-situ remediation
technologies where contaminated soil is dug out and washed using
aqueous chemical solutions to leach the contaminants out of the soil.
Selecting an appropriate washing agent to remove multiple contami-
nants is a challenge (Sharma and Reddy, 2004). AI can be benecial in
selecting and appropriate technology in this regard (Zhang et al., 2022).
Past studies have also employed classication models like k-Nearest
neighbors (kNN) to predict the Cd-ion removal efciency by soil
washing (Muazu and Olatunji, 2023).
Although widely used in design and optimization of various tech-
nologies, integration of AI/ML/DL into remediation endeavors can be
further developed to employ more efcient DL models and latest opti-
mization techniques. Also, integration of AI/ML/DL into other
Fig. 7. A theoretical approach to build decision support tool for selecting remediation technology.
J.K. Janga et al.
Chemosphere 345 (2023) 140476
16
Table 3
Various studies demonstrating the use of AI/ML/DL based models for remediation design and optimization.
Remediation
technology
optimized
Remediation of Model(s) used Model
performance
metrics
Objective Dataset Outcomes Reference
Pump, Treat, and
Inject (PTI)
TDS in
groundwater
SFL and ANN
models as
surrogates and
GA for
optimization
R
2
and RMSE To optimize the pumpage
schedule in order to
reduce the TDS
concentrations while
minimizing the costs.
MODFLOW-
2000, and
MT3DMS
Use of SFL instead of
ANNs does not
signicantly affect the
pumpage schedule but
signicantly reduces the
model runtime.
Employing such AI-based
optimization driven by
ML-based surrogates can
be effective in optimizing
the remediation
strategies.
Sadeghfam
et al. (2019)
Pump and Treat Groundwater
contaminated
with CCl
4
3-D CNN Precision,
Accuracy,
Sensitivity, and
Specicity
To predict the future well
performance based on
past performance data
and numerical
simulations.
Field data and
simulation data
from custom
made fate and
transport model
(1) DL models can be an
effective to predict
contaminant plume
distribution, to help in
decision-making process.
(2) The architecture of
3D-CNN is exible and
can be readily extended
to include numerous
variables.
Song et al.
(2023)
SEAR DNAPL
contaminated
aquifer
PRS, RBFN,
SVR, GP, and
Kriging
R
2
and RMSE (1) To create an efcient
ensemble surrogate
model with the help of
ve different surrogate
models and verify the
accuracy. (2) To verify
the application of
adaptive sequential
sampling in the
optimization of remedial
strategy.
Simulations
using UTCHEM
model
(1) An ensemble
surrogate can possibly
improve the accuracy as
compared to stand-alone
surrogate models if the
surrogates are chosen
properly
(2) Adaptive sequential
sampling can improve
the reliability of the
results obtained
Ouyang et al.
(2017c)
RBFN, SVR and
GA to solve
optimization
problem
R
2
, and
Absolute errors,
and Relative
errors (mean
and maximum)
Utilization of set pair
analysis in order to build
an ensemble surrogate
model to optimize
remediation strategy
Simulations
using UTCHEM
model
Set pair analysis can be
an effective method to
improve the ensemble
surrogate models
selection, building, and
accuracy
Hou et al.
(2017)
3-D CNN R
2
, and SSIM To employ CNN model in
order to accurately
predict the removal rates
of DNAPL by considering
heterogeneity of the
aquifer and also identify
risk zones post
remediation to aid in
decision making
Simulations
using UTCHEM
model
The proposed
optimization strategy
using multiple
realizations of k values
and source zone
architecture can
accurately identify the
optimal solutions with a
99.8% speed up as
compared to
conventional simulation
optimization. This
approach can also allow
in delineating risk zones
based on NAPL left in
aquifer post remediation.
Du et al.
(2022)
In-situ
bioremediation
Petroleum
contaminated
site
Fuzzy based
optimization
To develop a fuzzy rule
based predictive control
system to optimize the in-
situ bioremediation
process and
demonstration using a
case-study
Simulations
using UTCHEM
model
The developed fuzzy
based predictive control
system allows online,
real-time, cost-effective,
and optimized control of
the in-situ
bioremediation process
during the entire cleanup
duration. Its primary
advantage is its real-time
handling of uncertainties
in the simulation model.
The case-study highlights
the crucial role of
simulation model
accuracy in developing
Hu and Chan
(2015)
(continued on next page)
J.K. Janga et al.
Chemosphere 345 (2023) 140476
17
Table 3 (continued )
Remediation
technology
optimized
Remediation of Model(s) used Model
performance
metrics
Objective Dataset Outcomes Reference
an effective control
system
In-situ
bioremediation
Chlorinated
ethenes
CART AUC To develop and
demonstrate the value of
data mining approach in
identifying the most
promising in-situ
remediation strategy and
to identify the
parameters effecting the
in-situ reductive
dechlorination potential
Groundwater
monitoring wells
data
The representative CART
model was capable of
effectively classifying the
3-month-ahead
dechlorination potential
with 75.8% and 69.5%
true positive rates for the
training and the test set,
respectively. Study
demonstrated the use of
data mining to determine
factors inuencing in-situ
dechlorination potential.
Lee et al.
(2016)
Electro-kinetic
enhanced
bioremediation
Chlorinated
solvents in low
permeable
porous media
ANN R
2
To build a surrogate
model to process-based
numerical model
simulating the electro-
kinetic enhanced
bioremediation and
perform sensitivity and
uncertainty analysis.
Simulation
results from
process-based
numerical model
The surrogate model
built has exhibited a
robust performance (R
2
>0.99
)
in predicting the
relative area of
distribution and relative
mass. The surrogate
model built so is then
successfully used to
perform sensitivity and
uncertainty analyses
Sprocati and
Rolle (2021)
Phytoremediation PAH
contaminated
soil
BP-FFNN R
2
and RMSE To predict the ideal
conditions for maximum
uptake of PAHs by
Melilutus alba.
Experimental
dataset obtained
by performing
pot experiments
(1) The ANN model
accurately predicted the
PAH levels in plant roots
using several soil
properties
(2) Based on the ANN
output, different soil
amendments were
recommended for
different soil pH
conditions
Olawoyin
(2016)
Cadmium
contaminated
soil
BP-FFNN R
2
, MSE, MAE,
and Correlation
coefcient
To predict the changes in
cadmium uptake ability
by Sinapis alba L. with
sewage sludge
modication.
Experimental
dataset by
performing pot
experiments
ANN based predictive
model had signicantly
higher accuracy as
compared to response
surface methodology in
predicting cadmium
removal rate. Such
models can be used to
determine the percentage
of sewage modication
for optimal remediation.
Jaskulak et al.
(2020)
Soil
contaminated
with Heavy
metals
XGBoost R
2
, RMSE, F1
score, Precision,
Recall, and
accuracy
To predict the factors
effecting phyto
extraction of heavy
metals by hyper
accumulators, using ML
based model
Data from
literature
The output parameters
like HM concentration in
shoot, shoot yield, bio-
concentration factor,
metal extraction ratio,
and remediation time
were accurately
predicted by the model
based on input
parameters consisting of
soil, HM, and plant
properties.
Shi et al.
(2023)
Monitored natural
attenuation
PAH
contaminated
soil
RF, and LDA Accuracy To predict the PAH
degradation in soil using
both the models. To
assess the importance of
different factors in the
degradation of PAH
Experimental
mesocosm trials
RF model showed better
accuracy than LDA model
in predicting the
degradation of
investigated PAHs. The
correlations between
various variables and the
degradation obtained
using RF helped in
understanding the factors
effecting the degradation
Picariello
et al. (2022)
ZVI based
permeable
reactive barriers
Chlorinated
organic
compounds
XGBoost RMSE, MAE,
and MAPE
To use ML based model in
order to select the
optimal iron-based
Data from
literature
An XGBoost model was
developed with
descriptors such as the
Ren et al.
(2023)
(continued on next page)
J.K. Janga et al.
Chemosphere 345 (2023) 140476
18
commonly used soil and groundwater remediation technologies such as
vitrication, soil vapor extraction, soil fracturing etc., has not yet been
investigated, as far as the authors are aware.
7.5. Contaminant source identication, apportionment and monitoring
Source identication and monitoring of source zones is crucial in
optimizing the remedial efforts, and to avoid further contamination of
groundwater. Few studies have performed source identication of
contaminated aquifers using various types of surrogate models (Rao,
2006; Srivastava and Singh, 2014, 2015; Zhao et al., 2016; Hou and Lu,
2018; Xing et al., 2019; Kang et al., 2021). However, the utilization of AI
based techniques in the monitoring of contaminants is relatively less
explored. As represented in Table 4, monitoring of source zones or
contaminated areas can be performed using various data types such as
simulation data, remote sensing data or aerial imaging or spectral im-
aging data. Meray et al., (2022) have developed a framework called
PyLEnM which is a machine-learning based framework developed for
long-term contamination monitoring strategies. Such frameworks can
contribute to the efcient supervision of contamination sources by
determining the optimum no of monitoring wells needed and mini-
mizing the usage of other resources. As a result, the monitoring costs can
also be lowered.
Various types of studies were found in the eld of source identi-
cation or apportionment, and monitoring and a brief overview of them is
presented in Table 4. As understood from Table 4, the sources of
increased contamination in soil matrix can be apportioned to various
sources based on the geographical data regarding enterprises, irrigation
Table 3 (continued )
Remediation
technology
optimized
Remediation of Model(s) used Model
performance
metrics
Objective Dataset Outcomes Reference
reactive materials for
employing in permeable
reactive barriers
particle size, surface
area, and pore size of ZVI
based materials, and
reaction conditions to
predict the kinetic
reaction constant of ZVI
based materials. Results
indicated that specic
area is one of the most
important factor in
deciding the reaction
rate.
In-situ
immobilization
using biochar
Heavy metal
contaminated
soil
SVR, RF, and
ANN
R
2
, and RMSE To predict the heavy
metal immobilization
efciency of biochar-
amended soil based on
data of various properties
of soil, biochar, and
heavy metal that can
affect immobilization
efciency.
Data from
literature
RF based model provided
better accuracy
compared to the other
two models in predicting
the immobilization
efciency. Updated RF
model by removing
redundant variables
provided even better
accuracy. GUI developed
based on the updated RF
model performed
reasonably well with
error <30% even using
data outside the training
dataset.
Palansooriya
et al. (2022)
Heavy metal
contaminated
soil
BP-FFNN, and
RF
R
2
, and RMSE To predict the
remediation efciency of
ve heavy metals based
on biochar and soil
characteristics,
incubation and initial HM
conditions
Data from
literature
Both ANN and RF models
showed excellent
performance, with RF
being more tolerant to
missing data. The
analysis of inuential
factors revealed that the
type of heavy metals, pH
value of biochar, dosage,
and remediation time
were crucial for the
remediation process.
Sun et al.
(2022)
In-situ
immobilization
using nanoscale-
ZVI
Arsenic
contaminated
soil
ANN Correlation
coefcient
To predict the arsenic
immobilization efciency
of nanoscale zero-valent
iron particles
Experimental
data
Results indicated a good
correlation between the
input parameters and
output efciency using
ANN. Further
experimental results are
needed to simulate
diverse conditions for
future studies.
Han et al.
(2021)
Note: Abbreviations- SEAR Surfactant-Enhanced Aquifer Remediation, ZVI Zero-Valent Iron; TDS Total Dissolved Solids, DNAPL Dense Non-Aqueous Phase
Liquids, PAH Polycyclic Aromatic Hydrocarbons; SFL Sugeno Fuzzy Logic, ANN Articial Neural networks, GA Genetic Algorithm, CNN Convolutional Neural
Networks, 3D CNN Three Dimensional CNN, PRS Polynomial Response Surface, RBFN-Radial Basis Function Networks, SVR Support Vector Regression, CART
Classication and Regression Tree, BP-FFNN Back Propagated Feed Forward Neural Network, XGBoost Extreme Gradient Boosting, RF- Random Forest, LDA Liner
Discriminant Analysis; Mean Error =(Error/n), SSIM Structural Similarity Index Metric-quanties the structural similarity between two 2-D images; HM Heavy
Metal; UTCHEM - University of Texas Chemical Compositional Simulator; GUI Graphical User Interface.
J.K. Janga et al.
Chemosphere 345 (2023) 140476
19
history and other urbanization factors with the help of ML based clas-
sication and clustering models. However, it is important to further the
research in the direction of source identication or apportionment, and
monitoring, as the risk for recontamination always exists if the source
contaminators are not properly managed.
8. Limitations of AI based tools
8.1. Over tting
Overtting occurs when a model performs well on the training data
but poorly on the testing data, resulting in poor accuracy. This can
happen due to high variance and low bias, model complexity, or inad-
equate training data size. To reduce overtting, techniques such as
increasing the training data size, reducing model complexity, shrinkage
and regularization (Ridge and Lasso) can be used (James et al., 2023;
Jabbar and Khan, 2015). Dropout can also be used in neural networks to
tackle overtting.
8.2. Under tting
An undert model results in high prediction errors for both training
and test data. The reasons for undertting include high bias and low
variance, the model being too simple, the size of the training data being
insufcient, and the training data not being cleaned and containing
noise. Techniques to reduce undertting include increasing model
complexity, increasing the number of features by performing feature
engineering, removing noise from the data, and increasing the number
of epochs or the duration of training to get better results (Jabbar and
Khan, 2015).
8.3. Data unavailability
One of the main requirements for data driven models is the abun-
dance of data to be able to understand the relationships between various
variables accurately. However, data regarding contaminated site reme-
diation is not properly organized and is not available in open-source
databases, and incomplete data can lead to less accurate predictions.
Therefore, to take full advantage of prediction capabilities of ML based
models, efforts should be made to enhance data collection and sharing
through collaborations and open data initiatives. Further research and
development of ML-based models and algorithms and their use in
remediation activities should be encouraged, and these models should
be trained on larger and diverse datasets. Moreover, integrating real-
time monitoring and sensor technologies can enhance timely contami-
nation identication and prompt remediation actions. Additionally,
continuous evaluation and adaptation of remediation strategies, inter-
national collaboration, and knowledge exchange are crucial for
advancing the eld. By implementing these recommendations,
contamination patterns can be better understood, and remediation
techniques can be effectively designed and implemented, leading to
improved environmental outcomes.
9. Concluding remarks
The primary aim of this investigation was to perform an exhaustive
Fig. 8. A general approach to perform AI-based simulation-optimization for contaminated site remediation modeling
J.K. Janga et al.
Chemosphere 345 (2023) 140476
20
Table 4
Various studies reporting applications of AI/ML/DL for source identication, source apportionment and monitoring.
Approach Model used Case Model
accuracy
parameters
Objective Outcome(s) Reference
Source identication
inverse simulations for
plume development
modeling
Surrogate KELM,
Optimization PSO,
GA, QPSO, QGA
Hypothetical
and actual
R
2
, and RE To build an accurate surrogate
model and optimization
algorithms which can work for
both hypothetical (simulation
based) and actual (monitoring
data) cases
(1) KELM could accurately predict
the simulation results with R
2
of
0.9990 and reasonably predict the
actual data with R
2
>0.99 in 9/15
wells
(2) QGA and QPSO had better
prediction accuracies in terms of
source identication of
hypothetical case while the
accuracy was similar for all four
algorithms in the actual case
Zhao et al.
(2020)
Surrogates - LSTM,
Kriging, KELM,
RBFN
Optimization - GA
Hypothetical R
2
, RE, and
RMSE
To employ DL based model
(LSTM) as a surrogate model to
traditional physics based
simulation model and compare
its performance with other
surrogate models as compared to
LSTM model.
LSTM had the highest accuracy
followed by RBFN, KELM, and
kriging. While the total time
required to train the model was the
least for kriging followed by
KELM, RBFN, and LSTM.
However, compared to the
computational time required for
traditional simulation/
optimization approach all four
surrogate models saved >99% of
the computational time required.
Li et al.
(2021)
Spatial distribution
prediction and
corresponding land use,
soil and water features
extraction for source
apportionment
RF, SVM, and BP-
FFNN
Actual R
2
, MAE,
MSE, and
RMSE
To employ ML based models to
identify the key factors causing
heavy metal pollution in
groundwater
Outcomes suggested that the rapid
urbanization is one of the leading
cause of increase in heavy metal
concentrations in the subsurface.
The prediction accuracy for
different contaminants varied for
different models used. Overall RF
and SVM performed better than
ANN
Zhang
et al.
(2020)
RF, NB for
classication and
BLMI for source
apportionment
Actual Precision,
Recall, and F1
score
To use the proposed combination
of models to predict spatial
variation of heavy metals,
identify effecting factors, and
correlate using spatial clustering
The NB classier was effectively
used to identify 250
contaminating enterprise and 25
contributing medium industry
types in the study area. RF model
was effective in predicting
contaminant concentrations and in
addition to quantitatively
determining the factors
responsible. BLMI was able to
successfully cluster the risk zones
by using the contaminant risk
levels and contributing factors.
Huang
et al.
(2022)
Classication - NB,
ANN, and SVM,
source
apportionment -
BLMI
Actual Accuracy, and
Kappa
coefcient
To classify the enterprise types
using ML models and then
perform source apportionment
using BLMI technique
Results revealed that NB
performed slightly better in
classifying the enterprises. Source
apportionment using BLMI
indicated that increased Cd
concentrations were mainly
caused by excessive fertilization,
and due to coal mining and metal
industries, and chemical industries
were the main reason for Hg
pollution
Jia et al.
(2019)
Monitoring of source zone
during remediation
CVAE-EnKF Hypothetical NRMSE To determine the variation in
DNAPL saturation at the source
zone with changing source zone
architecture during remediation
using DL-based model (auto-
encoder)
The proposed CVAE-EnKF
framework offers a physically
based prior distribution for DNAPL
saturations, leading to enhanced
estimations of evolving DNAPL
source zone architecture and
associated remediation metrics
throughout the remediation
process. As a result, CVAE-EnKF
demonstrates promising
capabilities as a high-resolution
method for monitoring DNAPL
remediation.
Kang et al.
(2021)
Monitoring contaminated
area using remote
sensing
RT, SVM, and RF Actual MAE, RAE,
Precision,
To monitor the contamination
due to oil spills in the study using
remote sensing data
Spectral indices obtained from
remote sensing data can be a great
way to differentiate between
Kaplan
et al.
(2022)
(continued on next page)
J.K. Janga et al.
Chemosphere 345 (2023) 140476
21
literature review encompassing the existing research landscape con-
cerning the utilization of AI/ML/DL across diverse dimensions of
contaminated site remediation. In its initial phase, the study delivered a
succinct preamble to distinct categories of AI/ML/DL models while
introducing an array of metrics employed for the assessment of model
performance. Subsequent to this, a thorough bibliometric analysis was
conducted to delve into the prevalence and deployment of AI/ML/DL
within the domain of contaminated site remediation. The ndings
indicated an increasing prevalence of research related to AI/ML/DL in
site remediation in the past ve years. It is found that more research has
been dedicated to employing these techniques in groundwater remedi-
ation compared to soil remediation. Furthermore, a thorough review of
literature was conducted concerning the use of these technologies at
different stages of site remediation, including site characterization, risk
assessment, decision support tools (DST), remediation design and opti-
mization, and source identication and monitoring. Following this, it
has been understood that a signicant portion of research has been
focused on specic types of problems, such as ML/DL-based models in
simulation-optimization problems for contaminated aquifers, DSTs for
petroleum-contaminated sites, ML/DL based models for spatial inter-
polation of contaminant concentration, and groundwater contaminant
source identication. However, studies pertaining to AI/ML/DL appli-
cations in optimizing diverse remediation technologies and post-
remediation monitoring were relatively scarce, possibly due to limited
data availability. Nevertheless, the studies found in these cases
demonstrated the immense potential of AI/ML/DL-based tools in
remediation optimization and monitoring processes. To fully harness the
advantages of these cutting-edge computing tools in site remediation,
future research should prioritize building robust databases encompass-
ing diverse remediation technologies and monitoring statistics. By doing
so, the remediation process can be signicantly enhanced through the
integration of AI/ML/DL. With an emphasis on data availability and
innovative research approaches, the eld of contaminated site remedi-
ation can continue to progress and benet from these emerging
technologies.
CRediT author statement
Jagadeesh Kumar Janga: Conceptualization, Methodology, Inves-
tigation, Resources, Writing - Original Draft, Krishna R. Reddy:
Conceptualization, Methodology, Investigation, Resources, Supervision,
Project administration, Funding acquisition, Writing - Review & Editing,
KVNS Raviteja: Conceptualization, Methodology, Investigation, Re-
sources, Writing - Review & Editing.
Declaration of competing interest
The authors declare that they have no known competing nancial
interests or personal relationships that could have appeared to inuence
the work reported in this paper.
Data availability
No data was used for the research described in the article.
References
Aggarwal, C.C., 2018. Neural Networks and Deep Learning. Springer, Cham.
Akinpelu, A.A., Ali, M.E., Owolabi, T.O., Johan, M.R., Saidur, R., Olatunji, S.O.,
Chowdbury, Z., 2020. A support vector regression model for the prediction of total
polyaromatic hydrocarbons in soil: an articial intelligent system for mapping
environmental pollution. Neural Comput. Appl. 32, 1489914908.
Alloghani, M., Al-Jumeily, D., Mustana, J., Hussain, A., Aljaaf, A.J., 2020. A systematic
review on supervised and unsupervised machine learning algorithms for data
science. In: Supervised and Unsupervised Learning for Data Science, pp. 321.
Asher, M.J., Croke, B.F., Jakeman, A.J., Peeters, L.J., 2015. A review of surrogate models
and their application to groundwater modeling. Water Resour. Res. 51 (8),
59575973.
Auer, P., Burgsteiner, H., Maass, W., 2008. A learning rule for very simple universal
approximators consisting of a single layer of perceptrons. Neural Network. 21 (5),
786795.
Baecher, G.B., 2023. 2021 Terzaghi lecture: geotechnical systems, uncertainty, and risk.
J. Geotech. Geoenviron. Eng. 149 (3), 03023001.
Balasubramaniam, A., Boyle, A.R., Voulvoulis, N., 2007. Improving petroleum
contaminated land remediation decision-making through the MCA weighting
process. Chemosphere 66 (5), 791798.
Bank, D., Koenigstein, N., Giryes, R., 2023. Autoencoders. Machine Learning for Data
Science Handbook: Data Mining and Knowledge Discovery Handbook, pp. 353374.
Barzegar, R., Asghari Moghaddam, A., Adamowski, J., Fijani, E., 2017. Comparison of
machine learning models for predicting uoride contamination in groundwater.
Stoch. Environ. Res. Risk Assess. 31, 27052718.
Bayraktar, Z., Komurcu, M., Werner, D.H., 2010. Wind Driven Optimization (WDO): a
novel nature-inspired optimization algorithm and its application to
electromagnetics. In: 2010 IEEE Antennas and Propagation Society International
Symposium. IEEE, pp. 14.
Cameselle, C., Reddy, K.R., 2022. Electrobioremediation: combined electrokinetics and
bioremediation technology for contaminated site remediation. Indian Geotech. J. 52
(5), 12051225.
Chen, Z., Huang, G.H., Chan, C.W., Geng, L.Q., Xia, J., 2003. Development of an expert
system for the remediation of petroleum-contaminated sites. Environ. Model. Assess.
8, 323334.
Chen, D., Wang, X., Luo, X., Huang, G., Tian, Z., Li, W., Liu, F., 2023. Delineating and
identifying risk zones of soil heavy metal pollution in an industrialized region using
machine learning. Environ. Pollut. 318, 120932.
Chu, H., Lu, W., 2015. Optimization design based on ensemble surrogate models for
DNAPLs-contaminated groundwater remediation. J. Water Supply Res. Technol. -
Aqua 64 (6), 697707.
Chung, J., Gulcehre, C., Cho, K., Bengio, Y., 2014. Empirical Evaluation of Gated
Recurrent Neural Networks on Sequence Modeling arXiv preprint arXiv:1412.3555.
Cozad, A., Sahinidis, N.V., Miller, D.C., 2014. Learning surrogate models for simulation-
based optimization. AIChE J. 60 (6), 22112227.
Table 4 (continued )
Approach Model used Case Model
accuracy
parameters
Objective Outcome(s) Reference
Recall, and F1
score
contaminated and
uncontaminated areas. ML based
models could effectively classify
the site as contaminated or
uncontaminated based on the
differences in spectral values of
clean and contaminated areas.
Such technologies have scope to
further aid in monitoring of the
contaminated sites.
Note: Abbreviations- KELM Kernel Extreme Leaning Machine, PSO Particle Swarm Optimization, GA Genetic Algorithm, QPSO Quantum PSO, QGA Quantum
GA, LSTM Long Short Term Memory, RBFN Radial Basis Function Network, RF Random Forest, SVM Support Vector Machine, BP-FFNN Back Propagated Feed
Forward Neural Network, NB Naïve Bayes Classier, BLMI Bivariate Local Morans I, ANN Articial Neural Network, CVAE Convolutional Variational Auto
Encoder, EnKF Ensemble Kalman Filter, RT Random Tree; NRMSE Normalized Root Mean Square Error =RMSE
xmax xmin
.
J.K. Janga et al.
Chemosphere 345 (2023) 140476
22
Decesaro, A., Rampel, A., Machado, T.S., Thom´
e, A., Reddy, K.R., Margarites, A.C.,
Colla, L.M., 2017. Bioremediation of soil contaminated with diesel and biodiesel fuel
using biostimulation with microalgae biomass. J. Environ. Eng. 143 (4), 04016091.
Doersch, C., 2016. Tutorial on Variational Autoencoders arXiv:1606.05908.
Dorigo, M., Birattari, M., Stutzle, T., 2006. Ant colony optimization. IEEE Comput. Intell.
Mag. 1 (4), 2839.
Du, J., Shi, X., Mo, S., Kang, X., Wu, J., 2022. Deep learning based optimization under
uncertainty for surfactant-enhanced DNAPL remediation in highly heterogeneous
aquifers. J. Hydrol. 608, 127639.
Dunea, D., Iordache, S., Pohoata, A., Neagu Frasin, L.B., 2014. Investigation and
selection of remediation technologies for petroleum-contaminated soils using a
decision support system. Water, Air, Soil Pollut. 225, 118.
Farmer, J.D., Packard, N.H., Perelson, A.S., 1986. The immune system, adaptation, and
machine learning. Phys. Nonlinear Phenom. 22 (13), 187204.
García, M., L´
opez, E., Kumar, V., Valls, A., 2006. A multicriteria fuzzy decision system to
sort contaminated soils. In: Modeling Decisions for Articial Intelligence: Third
International Conference. Springer Berlin Heidelberg, Tarragona, Spain,
pp. 105116.
Gautam, K., Sharma, P., Dwivedi, S., Singh, A., Gaur, V.K., Varjani, S., et al., 2023.
A Review on Control and Abatement of Soil Pollution by Heavy Metals: Emphasis on
Articial Intelligence in Recovery of Contaminated Soil. Environmental Research,
115592.
Geem, Z.W., Kim, J.H., Loganathan, G.V., 2001. A new heuristic optimization algorithm:
harmony search. Simulation 76 (2), 6068.
Gen, M., Cheng, R., 1999. Genetic Algorithms and Engineering Optimization, vol. 7. John
Wiley & Sons, Inc.
Geng, L., Chen, Z., Chan, C.W., Huang, G.H., 2001. An intelligent decision support
system for management of petroleum-contaminated sites. Expert Syst. Appl. 20 (3),
251260.
George, S., Dixit, A., 2021. A machine learning approach for prioritizing groundwater
testing for per-and polyuoroalkyl substances (PFAS). J. Environ. Manag. 295,
113359.
Goodfellow, I., Bengio, Y., Courville, A., 2016. Deep Learning. MIT press.
Gurney, K., 1997. An Introduction to Neural Networks, rst ed. CRC Press.
Gu, J., Wang, Z., Kuen, J., Ma, L., Shahroudy, A., Shuai, B., Liu, T., Wang, X., Wang, G.,
Cai, J., Chen, T., 2018. Recent advances in convolutional neural networks. Pattern
Recogn. 77, 354377.
Han, Z., Salawu, O.A., Zenobio, J.E., Zhao, Y., Adeleye, A.S., 2021. Emerging investigator
series: immobilization of arsenic in soil by nanoscale zerovalent iron: role of
suldation and application of machine learning. Environ. Sci.: Nano 8 (3), 619633.
Hanoon, M.S., Ahmed, A.N., Fai, C.M., Birima, A.H., Razzaq, A., Sherif, M., et al., 2021.
Application of articial intelligence models for modeling water quality in
groundwater: comprehensive review, evaluation and future trends. Water, Air, Soil
Pollut. 232, 141.
Hauptman, B.H., Naughton, C.C., Harmon, T.C., 2023. Using Machine Learning to
Predict 1, 2, 3-trichloropropane Contamination from Legacy Non-point Source
Pollution of Groundwater in Californias Central Valley, vol. 22. Groundwater for
Sustainable Development, 100955.
He, L., Chan, C.W., Huang, G.H., Zeng, G.M., 2006. A probabilistic reasoning-based
decision support system for selection of remediation technologies for petroleum-
contaminated sites. Expert Syst. Appl. 30 (4), 783795.
Hochreiter, S., Schmidhuber, J., 1997. Long short-term memory. Neural Comput. 9 (8),
17351780.
Hou, Z., Lu, W., Chu, H., Luo, J., 2015. Selecting parameter-optimized surrogate models
in DNAPL-contaminated aquifer remediation strategies. Environ. Eng. Sci. 32 (12),
10161026.
Hou, Z., Lu, W., Chen, M., 2016. Surrogate-based sensitivity analysis and uncertainty
analysis for DNAPL-contaminated aquifer remediation. J. Water Resour. Plann.
Manag. 142 (11), 04016043.
Hou, Z., Lu, W., Xue, H., Lin, J., 2017. A comparative research of different ensemble
surrogate models based on set pair analysis for the DNAPL-contaminated aquifer
remediation strategy optimization. J. Contam. Hydrol. 203, 2837.
Hou, Z., Lu, W., 2018. Comparative study of surrogate models for groundwater
contamination source identication at DNAPL-contaminated sites. Hydrogeol. J. 26
(3).
Hu, Z., Chan, C.W., 2015. In-situ bioremediation for petroleum contamination: a fuzzy
rule-based model predictive control system. Eng. Appl. Artif. Intell. 38, 7078.
Hu, Z., Chan, C.W., Huang, G.H., 2003. A fuzzy expert system for site characterization.
Expert Syst. Appl. 24 (1), 123131.
Huang, G., Wang, X., Chen, D., Wang, Y., et al., 2022. A hybrid data-driven framework
for diagnosing contributing factors for soil heavy metal contaminations using
machine learning and spatial clustering analysis. J. Hazard Mater. 437, 129324.
Huang, G.B., Zhu, Q.Y., Siew, C.K., 2006. Extreme learning machine: theory and
applications. Neurocomputing 70 (13), 489501.
Huysegoms, L., Cappuyns, V., 2017. Critical review of decision support tools for
sustainability assessment of site remediation options. J. Environ. Manag. 196,
278296.
Jabbar, H., Khan, R.Z., 2015. Methods to avoid over-tting and under-tting in
supervised machine learning (comparative study). Computer Science,
Communication and Instrumentation Devices 70 (10), 978981, 3850.
Jalali, F.M., Chahkandi, B., Gheibi, M., Eftekhari, M., Behzadian, K., Campos, L.C., 2023.
Developing a smart and clean technology for bioremediation of antibiotic
contamination in arable lands. Sustainable Chemistry and Pharmacy 33, 101127.
James, G., Witten, D., Hastie, T., Tibshirani, R., Taylor, J., 2023. An Introduction to
Statistical Learning: with Applications in python. Springer International Publishing,
New York, USA.
Jaskulak, M., Grobelak, A., Vandenbulcke, F., 2020. Modeling and optimizing the
removal of cadmium by Sinapis alba L. from contaminated soil via Response Surface
Methodology and Articial Neural Networks during assisted phytoremediation with
sewage sludge. Int. J. Phytoremediation 22 (12), 13211330.
Jia, X., Hu, B., Marchant, B.P., Zhou, L., Shi, Z., Zhu, Y., 2019. A methodological
framework for identifying potential sources of soil heavy metal pollution based on
machine learning: a case study in the Yangtze Delta, China. Environ. Pollut. 250,
601609.
Jia, X., Cao, Y., OConnor, D., Zhu, J., Tsang, D.C., Zou, B., Hou, D., 2021a. Mapping soil
pollution by using drone image recognition and machine learning at an arsenic-
contaminated agricultural eld. Environ. Pollut. 270, 116281.
Jia, X., OConnor, D., Shi, Z., Hou, D., 2021b. VIRS based detection in combination with
machine learning for mapping soil pollution. Environ. Pollut. 268, 115845.
Kamdar, B.A., Solanki, C.H., Reddy, K.R., 2023. Moringa seed cake biochar: a novel
binder for sustainable remediation of lead-contaminated soil. J. Environ. Eng. 149
(10), 04023059.
Kanevski, M.F., 1999. Spatial predictions of soil contamination using general regression
neural networks. Syst. Res. Inf. Sci. 8, 241256.
Kang, X., Kokkinaki, A., Power, C., Kitanidis, P.K., Shi, X., Duan, L., et al., 2021.
Integrating deep learning-based data assimilation and hydrogeophysical data for
improved monitoring of DNAPL source zones during remediation. J. Hydrol. 601,
126655.
Kaplan, G., Aydinli, H.O., Pietrelli, A., Mieyeville, F., Ferrara, V., 2022. Oil-contaminated
soil modeling and remediation monitoring in arid areas using remote sensing. Rem.
Sens. 14 (10), 2500.
Karaboga, D., 2005. An Idea Based on Honey Bee Swarm for Numerical Optimization.
Technical Report-TR06, Erciyes University, Engineering Faculty. Computer
Engineering Department.
Kennedy, J., Eberhart, R., 1995. Particle swarm optimization. In: Proceedings of
ICNN95-international Conference on Neural Networks, vol. 4. IEEE, pp. 19421948.
Khan, F.I., Husain, T., Hejazi, R., 2004. An overview and analysis of site remediation
technologies. J. Environ. Manag. 71 (2), 95122.
Kirkpatrick, S., Gelatt Jr., C.D., Vecchi, M.P., 1983. Optimization by simulated annealing.
Science 220 (4598), 671680.
Kochenderfer, M.J., 2015. Decision Making under Uncertainty: Theory and Application.
MIT Press.
Laha, S., Mukherjee, S., Nebhrajani, S.R., 2000. Information management system for site
remediation efforts. Environ. Manag. 25, 513523.
Lambora, A., Gupta, K., Chopra, K., 2019. Genetic algorithm-A literature review. In: 2019
International Conference on Machine Learning, Big Data, Cloud and Parallel
Computing (COMITCon). IEEE, pp. 380384.
Lee, J., Im, J., Kim, U., L¨
ofer, F.E., 2016. A data mining approach to predict in situ
detoxication potential of chlorinated ethenes. Environ. Sci. Technol. 50 (10),
51815188.
Lehr, J.H., Hyman, M., Gass, T., Seevers, W.J., 2002. Handbook of Complex
Environmental Remediation Problems. McGraw-Hill Education.
Li, J., Lu, W., Luo, J., 2021. Groundwater contamination sources identication based on
the Long-Short Term Memory network. J. Hydrol. 601, 126670.
Li, H., Zhou, Z., Long, T., Wei, Y., Xu, J., Liu, S., Wang, X., 2022a. Big-data analysis and
machine learning based on oil pollution remediation cases from CERCLA database.
Energies 15 (15), 5698.
Li, X., Yi, S., Cundy, A.B., Chen, W., 2022b. Sustainable decision-making for
contaminated site risk management: a decision tree model using machine learning
algorithms. J. Clean. Prod. 371, 133612.
Lu, Y., 2019. Articial intelligence: a survey on evolution, models, applications and
future trends. Journal of Management Analytics 6 (1), 129.
Luo, J., Lu, W., Xin, X., Chu, H., 2013. Surrogate model application to the identication
of an optimal surfactant-enhanced aquifer remediation strategy for DNAPL-
contaminated sites. J. Earth Sci. 24 (6), 10231032.
Luo, J., Lu, W., 2014. A mixed-integer non-linear programming with surrogate model for
optimal remediation design of NAPLs contaminated aquifer. Int. J. Environ. Pollut.
54 (1), 116.
Madani, A., Hagage, M., Elbeih, S.F., 2022. Random Forest and Logistic Regression
algorithms for prediction of groundwater contamination using ammonia
concentration. Arabian J. Geosci. 15 (20), 1619.
Man, J., Zeng, L., Luo, J., Gao, W., Yao, Y., 2021. Application of the deep learning
algorithm to identify the spatial distribution of heavy metals at contaminated sites.
ACS ES&T Engineering 2 (2), 158168.
Manfron, S., Thom´
e, A., Cecchim, I., Reddy, K.R., 2020. Application of zero-valent iron
nanoparticles (nZVI) on the remediation of contaminated soil and groundwater: a
review. Quím. Nova 43, 623631.
Mazumdar, H., Murphy, M.P., Bhatkande, S., Emerson, H.P., Kaplan, D.I., Gohel, H.A.,
2022. Optimized machine learning model for predicting groundwater
contamination. In: 2022 IEEE MetroCon. IEEE, pp. 13.
Meray, A.O., Sturla, S., Siddiquee, M.R., Serata, R., Uhlemann, S., Gonzalez-Raymat, H.,
et al., 2022. Pylenm: a machine learning framework for long-term groundwater
contamination monitoring strategies. Environ. Sci. Technol. 56 (9), 59735983.
Mirjalili, S., Mirjalili, S.M., Lewis, A., 2014. Grey wolf optimizer. Adv. Eng. Software 69,
4661.
Mohammadi, M., Gheibi, M., Fathollahi-Fard, A.M., Eftekhari, M., Kian, Z., Tian, G.,
2021. A hybrid computational intelligence approach for bioremediation of
amoxicillin based on fungus activities from soil resources and aatoxin B1 controls.
J. Environ. Manag. 299, 113594.
Muazu, N.D., Olatunji, S.O., 2023. K-nearest neighbor based computational intelligence
and RSM predictive models for extraction of Cadmium from contaminated soil. Ain
Shams Eng. J. 14 (4), 101944.
J.K. Janga et al.
Chemosphere 345 (2023) 140476
23
Olawoyin, R., 2016. Application of backpropagation articial neural network prediction
model for the PAH bioremediation of polluted soil. Chemosphere 161, 145150.
Ouyang, Q., Lu, W., Lin, J., Deng, W., Cheng, W., 2017a. Conservative strategy-based
ensemble surrogate model for optimal groundwater remediation design at DNAPLs-
contaminated sites. J. Contam. Hydrol. 203, 18.
Ouyang, Q., Lu, W., Hou, Z., Zhang, Y., Li, S., Luo, J., 2017b. Chance-constrained multi-
objective optimization of groundwater remediation design at DNAPLs-contaminated
sites using a multi-algorithm genetically adaptive method. J. Contam. Hydrol. 200,
1523.
Ouyang, Q., Lu, W., Miao, T., Deng, W., Jiang, C., Luo, J., 2017c. Application of ensemble
surrogates and adaptive sequential sampling to optimal groundwater remediation
design at DNAPLs-contaminated sites. J. Contam. Hydrol. 207, 3138.
Palansooriya, K.N., Li, J., Dissanayake, P.D., Suvarna, M., et al., 2022. Prediction of soil
heavy metal immobilization by biochar using machine learning. Environ. Sci.
Technol. 56 (7), 41874198.
Perera, A.T.D., Kamalaruban, P., 2021. Applications of reinforcement learning in energy
systems. Renew. Sustain. Energy Rev. 137, 110618.
Picariello, E., Baldantoni, D., De Nicola, F., 2022. Investigating natural attenuation of
PAHs by soil microbial communities: insights by a machine learning approach.
Restor. Ecol. 30 (8), e13655.
Polikar, R., 2012. Ensemble learning. Ensemble Machine Learning: Methods and
Applications 134.
Pyo, J., Hong, S.M., Kwon, Y.S., Kim, M.S., Cho, K.H., 2020. Estimation of heavy metals
using deep neural network with visible and infrared spectroscopy of soil. Sci. Total
Environ. 741, 140162.
Qin, X.S., Huang, G.H., Huang, Y.F., Zeng, G.M., Chakma, A., Li, J.B., 2007. NRSRM: a
decision support system and visualization software for the management of
petroleum-contaminated sites. Energy Sources, Part A. 28 (3), 199220.
Qiu, Y., Zhou, S., Zhang, C., Qin, W., Lv, C., Zou, M., 2023. Identication of potentially
contaminated areas of soil microplastic based on machine learning: a case study in
Taihu Lake region, China. Sci. Total Environ. 877, 162891.
Rao, S.V.N., 2006. A computationally efcient technique for source identication
problems in three-dimensional aquifer systems using neural networks and simulated
annealing. Environ. Forensics 7 (3), 233240.
Raviteja, K.V.N.S., Reddy, K.R., 2023. Application of articial intelligence, machine
learning, and deep learning in contaminated site remediation. In: Al Khaddar, R.,
et al. (Eds.), Recent Developments in Energy and Environmental Engineering,
TRACE 2022, Lecture Notes in Civil Engineering, vol. 333. Springer, Singapore.
Reddy, K.R., Amaya-Santos, G., 2017. Effects of variable site conditions on
phytoremediation of mixed contaminants: eld-scale investigation at big marsh site.
J. Environ. Eng. 143 (9), 04017057.
Reddy, K.R., Cameselle, C., Adams, J.A., 2019a. Sustainable Engineering: Drivers,
Metrics, Tools, and Applications. John Wiley &Sons, Inc.
Reddy, K.R., Kumar, G., Du, Y.J., 2019b. Risk, sustainability, and resiliency
considerations in polluted site remediation. In: Zhan, L., Chen, Y., Bouazza, A. (Eds.),
8th International Congress on Environmental Geotechnics Volume 1, Environmental
Science and Engineering. Springer, Singapore.
Reddy, K.R., Chirakkara, R.A., Martins Ribeiro, L.F., 2020. Effects of elevated
concentrations of co-existing heavy metals and PAHs in soil on phytoremediation.
J. Hazardous, Toxic, and Radioactive Waste 24 (4), 04020035.
Ren, Y., Cui, M., Zhou, Y., Lee, Y., Ma, J., Han, Z., Khim, J., 2023. Zero-valent iron based
materials selection for permeable reactive barrier using machine learning. J. Hazard
Mater. 453, 131349.
Runkler, T.A., 1996. Extended defuzzication methods and their properties. In:
Proceedings of IEEE 5th International Fuzzy Systems, vol. 1. IEEE, pp. 694700.
Sadeghfam, S., Hassanzadeh, Y., Khatibi, R., Nadiri, A.A., Moazamnia, M., 2019.
Groundwater remediation through pump-treat-inject technology using optimum
control by articial intelligence (OCAI). Water Resour. Manag. 33, 11231145.
Sajedi-Hosseini, F., Malekian, A., Choubin, B., Rahmati, O., Cipullo, S., Coulon, F.,
Pradhan, B., 2018. A novel machine learning-based approach for the risk assessment
of nitrate groundwater contamination. Sci. Total Environ. 644, 954962.
Salehinejad, H., Sankar, S., Barfett, J., Colak, E., Valaee, S., 2017. Recent Advances in
Recurrent Neural Networks arXiv preprint arXiv:1801.01078.
Sari, M., Cosgun, T., Yalcin, I.E., Taner, M., Ozyigit, I.I., 2022. Deciding heavy metal
levels in soil based on various ecological information through articial intelligence
modeling. Appl. Artif. Intell. 36 (1), 2014189.
Sarker, I.H., 2021. Machine learning: algorithms, real-world applications and research
directions. SN Computer Science 2 (3), 160.
Sergeev, A.P., Buevich, A.G., Baglaeva, E.M., Shichkin, A.V., 2019. Combining spatial
autocorrelation with machine learning increases prediction accuracy of soil heavy
metals. Catena 174, 425435.
Shaker, R., Tofan, L., Bucur, M., Costache, S., Sava, D., Ehlinger, T., 2010. Land coverand
landscape as predictors of groundwater contamination: a neural-network modelling
approach applied to Dobrogea, Romania. J. Environ. Protect. Ecology 11 (1),
337348.
Sharma, H.D., Reddy, K.R., 2004. Geoenvironmental Engineering: Site Remediation,
Waste Containment, and Emerging Waste Management Technologies. John Wiley &
Sons, Inc.
Sharma, S., Sharma, S., Athaiya, A., 2017. Activation functions in neural networks. Int. J.
Eng. Appl. Sci. Technol 6 (12), 310316.
Sherstinsky, A., 2020. Fundamentals of recurrent neural network (RNN) and long short-
term memory (LSTM) network. Phys. Nonlinear Phenom. 404, 132306.
Shi, L., Li, J., Palansooriya, K.N., Chen, Y., et al., 2023. Modeling phytoremediation of
heavy metal contaminated soils through machine learning. J. Hazard Mater. 441,
129904.
Singh, B.K., Naidu, R., 2012. Cleaning contaminated environment: a growing challenge.
Biodegradation 23, 785786. https://doi.org/10.1007/s10532-012-9590-5, 2012.
Singha, S., Pasupuleti, S., Singha, S.S., Kumar, S., 2020. Effectiveness of groundwater
heavy metal pollution indices studies by deep-learning. J. Contam. Hydrol. 235,
103718.
Song, X., Ren, H., Hou, Z., Lin, X., Karanovic, M., Tonkin, M., et al., 2023. Predicting
future well performance for environmental remediation design using deep learning.
J. Hydrol. 617, 129110.
Sprocati, R., Rolle, M., 2021. Integrating process-based reactive transport modeling and
machine learning for electrokinetic remediation of contaminated groundwater.
Water Resour. Res. 57 (8), e2021WR029959.
Sreekanth, J., Datta, B., 2011. Coupled simulation-optimization model for coastal aquifer
management using genetic programming-based ensemble surrogate models and
multiple-realization optimization. Water Resour. Res. 47 (4).
Srivastava, D., Singh, R.M., 2014. Breakthrough curves characterization and
identication of an unknown pollution source in groundwater system using an
articial neural network (ANN). Environ. Forensics 15 (2), 175189.
Srivastava, D., Singh, R.M., 2015. Groundwater system modeling for simultaneous
identication of pollution sources and parameters with uncertainty characterization.
Water Resour. Manag. 29, 46074627.
Stef, P.F., Thirumalaiyammal, B., Anburaj, R., Mishel, P.F., 2022. Articial intelligence
in bioremediation modelling and clean-up of contaminated sites: recent advances,
challenges and opportunities. In: Kumar, V., Thakur, I.S. (Eds.), Omics Insights in
Environmental Bioremediation. Springer, Singapore, pp. 683702.
Sugeno, M., Kang, G.T., 1988. Structure identication of fuzzy model. Fuzzy Set Syst. 28
(1), 1533.
Sun, Y., Zhang, Y., Lu, L., Wu, Y., Zhang, Y., Kamran, M.A., Chen, B., 2022. The
application of machine learning methods for prediction of metal immobilization
remediation by biochar amendment in soil. Sci. Total Environ. 829, 154668.
Sutton, R.S., Barto, A.G., 2018. Reinforcement Learning: an Introduction. MIT Press.
Svozil, D., Kvasnicka, V., Pospichal, J., 1997. Introduction to multi-layer feed-forward
neural networks. Chemometr. Intell. Lab. Syst. 39 (1), 4362.
Takagi, T., Sugeno, M., 1985. Fuzzy identication of systems and its applications to
modeling and control. IEEE Transactions on Systems, Man, and Cybernetics (1),
116132.
Tao, H., Liao, X., Cao, H., Zhao, D., Hou, Y., 2022. Three-dimensional delineation of soil
pollutants at contaminated sites: progress and prospects. J. Geogr. Sci. 32 (8),
16151634.
Tarasov, D.A., Buevich, A.G., Sergeev, A.P., Shichkin, A.V., 2018. High variation topsoil
pollution forecasting in the Russian Subarctic: using articial neural networks
combined with residual kriging. Appl. Geochem. 88, 188197.
Tariq, S.R., Shah, M.H., Shaheen, N., Jaffar, M., Khalique, A., 2008. Statistical source
identication of metals in groundwater exposed to industrial contamination.
Environ. Monit. Assess. 138, 159165.
Tut Haklidir, F.S., Haklidir, M., 2020. Prediction of geothermal originated boron
contamination by deep learning approach: at Western Anatolia Geothermal Systems
in Turkey. Environ. Earth Sci. 79, 116.
Varley, A., Tyler, A., Smith, L., Dale, P., Davies, M., 2015. Remediating radium
contaminated legacy sites: advances made through machine learning in routine
monitoring of hot particles. Sci. Total Environ. 521, 270279.
Vesselinov, V.V., Alexandrov, B.S., OMalley, D., 2018. Contaminant source
identication using semi-supervised machine learning. J. Contam. Hydrol. 212,
134142.
Wang, X., Li, R., Tian, Y., Zhang, B., Zhao, Y., Zhang, T., Liu, C., 2022. A computational
framework for design and optimization of risk-based soil and groundwater
remediation strategies. Processes 10 (12), 2572.
Wijaya, J., Byeon, H., Jung, W., Park, J., Oh, S., 2023. Machine learning modeling using
microbiome data reveal microbial indicator for oil-contaminated groundwater.
J. Water Proc. Eng. 53, 103610.
Xing, Z., Qu, R., Zhao, Y., Fu, Q., Ji, Y., Lu, W., 2019. Identifying the release history of a
groundwater contaminant source based on an ensemble surrogate model. J. Hydrol.
572, 501516.
Yang, X.S., 2014. Nature-inspired Optimization Algorithms, rst ed. Elsevier,
Amsterdam; Boston.
Yang, X.S., He, X., 2016. Nature-inspired optimization algorithms in engineering:
overview and applications. Nature-Inspired Comput. Engin. 120.
Yang, S., Taylor, D., Yang, D., He, M., Liu, X., Xu, J., 2021. A synthesis framework using
machine learning and spatial bivariate analysis to identify drivers and hotspots of
heavy metal pollution of agricultural soils. Environ. Pollut. 287, 117611.
Yaseen, Z.M., 2021. An insight into machine learning models era in simulating soil, water
bodies and adsorption heavy metals: review, challenges and solutions. Chemosphere
277, 130126.
Yu, Y., Si, X., Hu, C., Zhang, J., 2019. A review of recurrent neural networks: LSTM cells
and network architectures. Neural Comput. 31 (7), 12351270.
Zadeh, L.A., 1973. Outline of a new approach to the analysis of complex systems and
decision processes. IEEE Transact. Systems, Man, and Cybernetics (1), 2844.
Zhang, H., Yin, A., Yang, X., Fan, M., et al., 2021. Use of machine-learning and receptor
models for prediction and source apportionment of heavy metals in coastal
reclaimed soils. Ecol. Indicat. 122, 107233.
Zhang, H., Yin, S., Chen, Y., Shao, S., et al., 2020. Machine learning-based source
identication and spatial prediction of heavy metals in soil in a rapid urbanization
area, eastern China. J. Clean. Prod. 273, 122858.
Zhang, Y., Ren, M., Tang, Y., Cui, X., Cui, J., Xu, C., et al., 2022. Immobilization on
anionic metal (loid) s in soil by biochar: a meta-analysis assisted by machine
learning. J. Hazard Mater. 438, 129442.
J.K. Janga et al.
Chemosphere 345 (2023) 140476
24
Zhang, Y., Lei, M., Li, K., Ju, T., 2023. Spatial prediction of soil contamination based on
machine learning: a review. Front. Environ. Sci. Eng. 17 (8), 93.
Zhao, Y., Lu, W., Xiao, C., 2016. A Kriging surrogate model coupled in
simulationoptimization approach for identifying release history of groundwater
sources. J. Contam. Hydrol. 185, 5160.
Zhao, Y., Qu, R., Xing, Z., Lu, W., 2020. Identifying groundwater contaminant sources
based on a KELM surrogate model together with four heuristic optimization
algorithms. Adv. Water Resour. 138, 103540.
Zheng, S., Wang, J., Zhuo, Y., Yang, D., Liu, R., 2022. Spatial distribution model of DEHP
contamination categories in soil based on Bi-LSTM and sparse sampling. Ecotoxicol.
Environ. Saf. 229, 113092.
Zhong, S., Zhang, K., Bagheri, M., Burken, J.G., Gu, A., Li, B., Ma, X., Marrone, B.L.,
Ren, Z.J., Schrier, J., Shi, W., 2021. Machine learning: new ideas and tools in
environmental science and engineering. Environ. Sci. Technol. 55 (19),
1274112754.
Zhou, Z.H., 2021. Machine Learning. Springer Nature. https://doi.org/10.1007/978-
981-15-1967-3.
J.K. Janga et al.
... Being module-based, the framework allows any advancements in each module to be readily updated. For example, the ML part is rapidly advancing, becoming more intelligent in handling high-dimensional and large-scale datasets [76][77][78]. Such an advance can be readily incorporated into the ML module in the framework. ...
Article
Full-text available
A numerical approach assisted by machine learning was developed for screening and optimizing soil remediation strategies. The approach includes a reactive transport model for simulating the remediation cost and effect of applicable remediation technologies and their combinations for a target site. The simulated results were used to establish a relationship between the cost and effect using a machine learning method. The relationship was then used by an optimization method to provide optimal remediation strategies under various constraints and requirements for the target site. The approach was evaluated for a site contaminated with both arsenic and polycyclic aromatic hydrocarbons at a former shipbuilding factory in Guangzhou City, China. An optimal strategy was obtained and successfully implemented at the site, which included the partial excavation of the contaminated soils and natural attenuation of the residual contaminated soils. The advantage of the approach is that it can fully consider the natural attenuation capacity in designing remediation strategies to reduce remediation costs and can provide cost-effective remediation strategies under variable constraints for policymakers. The approach is general and can be applied for screening and optimizing remediation strategies at other remediation sites.
... The increasing global prevalence of contaminated sites poses significant threats to integrating artificial intelligence, machine learning, and deep learning technologies, offering a promising solution by revolutionizing data management and analysis and enabling more effective remediation (Janga et al., 2023). Additionally, Cognitive conflict interviews helped chemistry teachers transition from misconceptions to scientific understanding, offering promise for overcoming cognitive barriers in science education (Syahrial et al., 2023). ...
Article
Full-text available
The study's primary objective was to evaluate the effectiveness of remedial strategies in enhancing the academic performance of senior one-level students in mathematics and sciences. Employing a qualitative approach, the research utilized a six-month observation period to assess the impact of remedial strategies implemented by twelve experienced mathematics and sciences teachers across three selected schools. The study focused on ensuring the validity and reliability of the evaluation methods, obtaining informed consent, and maintaining participant confidentiality. The observed remedial strategies included individualized instruction, small group tutoring, technology integration, hands-on learning activities, formative assessment and feedback, peer collaboration, parental involvement, and extended learning opportunities. The findings revealed that these strategies were pivotal in addressing students' various needs and raising a supportive learning environment conducive to academic success. Notably, there was a strong correlation between the successful implementation of remedial strategies and improved academic performance among senior one-level students in mathematics and sciences. The study recommended developing and implementing personalized learning plans for these students, emphasizing the importance of continuous professional development for teachers to optimize the utilization of identified remedial strategies.
... The significant compounds were trained separately and repeatedly, with the optimal number of hidden neurons ranging from 1 to 10. Hidden neurons 1 to 10 was chosen as an improvement from previous study on grading the agarwood oil into two grades using k=1 until k=5 demonstrated accuracy of 83.3% [32]. The best architecture of the developed ANN model was evaluated using metrics such as confusion matrices, accuracies, sensitivities, specificities, precisions, mean squared errors (MSEs), correlation coefficients (Rs), and the number of epochs [43][44][45]. The aim, contributions and methods of this study are outlined below. ...
Article
Full-text available
The agarwood oil quality has been divided into four grades, including low, medium-low, medium-high, and high, and has been thoroughly examined in this manuscript. Recently, there has been a high demand for agarwood oil but the current grading method is based on conventional techniques that rely on visual inspection of various characteristics such as intensity, smell, texture, and weight. However, this method is not standardized, making it difficult to grade agarwood oil accurately. Therefore, the use of artificial neural networks (ANN) in artificial intelligence (AI) was employed to develop a system for identifying agarwood oil quality using the Levenberg-Marquardt (LM) algorithm. Data from 660 samples of chemical compounds extracted from agarwood oil were used to train the ANN. To enhance the accuracy of agarwood oil quality identification with LM performance, the data was split into 70% for validation, 15% for training, and 15% for testing. The results showed that the ANN with the eleven inputs (10-epi-ɤ-eudesmol, α-agarofuran, ɤ-eudesmol, β-agarofuran, ar-curcumene, valerianol, β-dihydro agarofuran, α-guaiene, allo aromadendrene epoxide and ϒ-cadinene) trained by ten hidden neurons of LM algorithm provided the best performance with 100% for accuracy, specificity, sensitivity and precision as well as minimum convergence epoch. The experimental implementation of the model was done using the MATLAB version R2015a platform. This study will help to standardize agarwood oil quality determination using intelligent modeling techniques and serve as a guide for future research in the essential oil industry.
... AI presents a paradigm change in hazardous waste management as demand for sustainable and ecologically responsible civil engineering practices rises. This paper summarises how AI-driven Decision Support Systems improve dangerous waste management efficiency, accuracy, and sustainability [2]. ...
... (Janga et al. 2023). ...
Preprint
Full-text available
Bio-oil produced through pyrolysis of lignocellulosic biomass has recently received significant attention due to its possible uses as a second-generation biofuel. The yield and characteristics of produced bio-oil are affected by reaction conditions (reactor type, particle size, feed rate, operating temperature, heating rate, retention time, etc.) and the type of feedstock that is used (softwood, hardwood, agricultural plant residues, miscanthus, etc.). Recently, machine learning (ML) techniques have been widely employed to forecast the performance of the pyrolysis and the characteristics of bi-oil. In this study, a comprehensive review of ML research on bio-oil has been carried out. Regression methods were most frequently employed to build prediction models. The top five ML methods for bio-oil research were random forest, artificial neural network, gradient boosting, support vector regression, and linear regression. In addition, users frequently extract features using their own knowledge and restricted datasets were employed I previous studies. We highlighted the challenges and potential of cutting-edge ML techniques in bio-oil production.
Book
Full-text available
Artificial Intelligence (AI) has emerged as a transformative force in the healthcare sector, particularly in the realm of remote patient monitoring (RPM). RPM involves the collection, analysis, and interpretation of patient data outside of traditional clinical settings, allowing healthcare providers to monitor patients' health remotely. Advancements in AI have significantly enhanced RPM by enabling more accurate and timely monitoring, diagnosis, and intervention, thereby improving patient outcomes and reducing healthcare costs. One of the key applications of AI in RPM is predictive analytics, where algorithms analyze patient data to identify patterns and predict potential health issues before they escalate. This proactive approach allows healthcare providers to intervene early, preventing complications and hospitalizations. AI-powered wearables and sensors collect continuous data on vital signs, activity levels, and other health metrics, providing a comprehensive view of patients' health status in real-time. Machine learning algorithms analyze this data to detect anomalies and trends, alerting healthcare providers to any deviations from normal parameters. Furthermore, AI facilitates personalized medicine by tailoring treatment plans to individual patients based on their unique characteristics and medical history. By integrating AI-driven decision support systems into RPM platforms, healthcare providers can make more informed clinical decisions, optimize resource allocation, and improve the efficiency of healthcare delivery. In conclusion, AI holds immense potential to revolutionize remote patient monitoring by enabling more personalized, proactive, and efficient healthcare delivery. Addressing the challenges associated with its implementation will be crucial in realizing the full benefits of AI in RPM and improving patient care outcomes.
Book
Full-text available
Artificial Intelligence (AI) has emerged as a transformative force in the healthcare sector, particularly in the realm of remote patient monitoring (RPM). RPM involves the collection, analysis, and interpretation of patient data outside of traditional clinical settings, allowing healthcare providers to monitor patients' health remotely. Advancements in AI have significantly enhanced RPM by enabling more accurate and timely monitoring, diagnosis, and intervention, thereby improving patient outcomes and reducing healthcare costs. One of the key applications of AI in RPM is predictive analytics, where algorithms analyze patient data to identify patterns and predict potential health issues before they escalate. This proactive approach allows healthcare providers to intervene early, preventing complications and hospitalizations. AI-powered wearables and sensors collect continuous data on vital signs, activity levels, and other health metrics, providing a comprehensive view of patients' health status in real-time. Machine learning algorithms analyze this data to detect anomalies and trends, alerting healthcare providers to any deviations from normal parameters. Furthermore, AI facilitates personalized medicine by tailoring treatment plans to individual patients based on their unique characteristics and medical history. By integrating AI-driven decision support systems into RPM platforms, healthcare providers can make more informed clinical decisions, optimize resource allocation, and improve the efficiency of healthcare delivery. In conclusion, AI holds immense potential to revolutionize remote patient monitoring by enabling more personalized, proactive, and efficient healthcare delivery. Addressing the challenges associated with its implementation will be crucial in realizing the full benefits of AI in RPM and improving patient care outcomes.
Preprint
Full-text available
Accurate identification of hydraulic conductivity fields (K) and contaminant source parameters is imperative for the enhanced assessment and effective remediation of polluted aquifers. Given the challenges posed by non-Gaussian distributions, high dimensionality, and ill-posed nature of groundwater inversion problems, reducing unknowns is a common strategy. Unlike conventional parameterization methods constrained by prior assumptions, this research introduces an innovative deep learning-based parameterization method (DLPM), AEdiffusion. AEdiffusion combines a Diffusion Denoising Probabilistic Model (DDPM) with a Variational Autoencoder (VAE) through a generator-refiner strategy, enabling the generation of high-dimensional fields (K) from low-dimensional latent representations. Additionally, this study examines the application of Wasserstein Generative Adversarial Network with Gradient Penalty (WGAN-GP), another advanced DLPM, in groundwater inversion. Through comparative analysis within a Data Assimilation (DA) framework, focusing on non-Gaussian K fields and three potential contaminant sources under varied data availability scenarios, this study reveals that both DLPM-based inversion frameworks are capable of identifying K fields and the true contaminant sources. Notably, the AEdiffusion-based framework excels in extracting critical information from sparse observations, delivering more stable performance, but at the cost of increased time consumption compared to WGAN-GP-based framework.
Article
Full-text available
The present investigation applies the stabilization/solidification technique for lead (Pb)-contaminated soil remediation utilizing an organic binder to negate the environmental consequences caused by inorganic binders such as cement. This research synthesized novel biochar by slow pyrolysis of moringa seed cake or de-oiled cake (waste generated after oil recovery) and tested its physicochemical characteristics , which revealed that it possesses a high pH and abundant surface functional groups that can act as potential adsorption sites for lead. Furthermore, the effects of biochar content (0% to 10% w/w) and curing time on the stabilization of soil contaminated with Pb at a concentration of 5,000 mg=kg were evaluated. The toxicity characteristic leaching test showed that treatment with 10% w/w biochar and 28 days of curing reduced Pb leachability to regulatory limits with over 89% immobilization efficiency. Moreover, the soil strength and the pH increased steadily with the biochar content and curing time while maintaining stability after 56 to 90 days of curing. Microstructural characterization revealed the underlying mechanisms in effectively stabilizing lead in soil, including precipitation, surface complexation with functional groups (C═O, O═C─O), and lead encapsulation in calcium silicate hydrates (C─S─H).
Article
Full-text available
This study presents a smart technological framework to efficiently remove azithromycin from natural soil resources using bioremediation techniques. The framework consists of several modules, each with different models such as Penicillium Simplicissimum (PS) bioactivity, soft computing models, statistical optimisation, Machine Learning (ML) algorithms, and Decision Tree (DT) control system based on Removal Percentage (RP). The first module involves designing experiments using a literature review and the Taguchi Orthogonal design method for cultural conditions. The RP is predicted as a function of cultural parameters using Response Surface Methodology (RSM) and three ML algorithms: Instance-Based K (IBK), KStar, and Locally Weighted Learning (LWL). The sensitivity analysis shows that pH is the most important factor among all parameters, including pH, Aeration Intensity (AI), Temperature, Microbial/Food (M/F) ratio, and Retention Time (RT), with a p-value of <0.0001. AI is the next most significant parameter, also with a p-value of <0.0001. The optimal biological conditions for removing azithromycin from soil resources are a temperature of 32 °C, pH of 5.5, M/F ratio of 1.59 mg/g, and AI of 8.59 m³/h. During the 100-day bioremediation process, RP was found to be an insignificant factor for more than 25 days, which simplifies the conditions. Among the ML algorithms, the IBK model provided the most accurate prediction of RT, with a correlation coefficient of over 95%.
Article
Full-text available
Soil microplastic (MP) pollution has recently become increasingly aggravated, with severe consequences being generated. Understanding the spatial distribution characteristics of soil MPs is an important prerequisite for protecting and controlling soil pollution. However, determining the spatial distribution of soil MPs through a large number of soil field sampling and laboratory test analyses is unrealistic. In this study, we compared the accuracy and applicability of different machine learning models for predicting the spatial distribution of soil MPs. The support vector machine regression model with radial basis function (RBF) as kernel function (SVR–RBF) has a high prediction accuracy (R2=0.8934). Among the six ensemble models, random forest (R2=0.9007) could better explain the significance of source and sink factors affecting the occurrence of soil MPs. Soil texture, population density, and MPs point of interest (MPs–POI) were the main source-sink factors affecting the occurrence of soil MPs. Furthermore, the accumulation of MPs in soil was significantly affected by human activity. The spatial distribution map of soil MP pollution in the study area was drawn based on the bivariate local Moran’s I model of soil MP pollution and the normalized difference vegetation index (NDVI) variation trend. A total of 48.74 km2 of soil was in an area of serious MP pollution, mainly concentrated in urban soil. This study provides a hybrid framework that includes spatial distribution prediction of MPs, source-sink analysis, and pollution risk area identification, providing scientific and systematic methods and techniques for pollution management in other soil environments.
Chapter
Soil and groundwater contamination is caused by improper waste disposal practices and accidental spills, posing threat to public health and the environment. It is imperative to assess and remediate these contaminated sites to protect public health and the environment as well as to assure sustainable development. Site remediation is inherently complex due to the many variables involved, such as contamination chemistry, fate and transport, geology, and hydrogeology. The selection of remediation method also depends on the contaminant type and distribution and subsurface soil and groundwater conditions. Depending on the type of remediation method, many systems and operating variables can affect the remedial efficiency. The design and implementation of site remediation can be expensive, time-consuming, and may require much human effort. Emerging technologies such as Artificial Intelligence, Machine Learning, and Deep Learning have the potential to make site remediation cost-effective with reduced human effort. This study provides a brief overview of these emerging technologies and presents case studies demonstrating how these technologies can help contaminated site remediation decisions.KeywordsSite remediationArtificial intelligenceMachine learningDeep learning
Article
The zero-valent iron (ZVI) based reactive materials are potential remediation reagents in permeable reactive barriers (PRB). Considering that reactive materials is the essential to determining the long-term stability of PRB and the emergence of a large number of new iron-based materials. Here, we present a new approach using machine learning to screen PRB reactive materials, which proposes to improve the efficiency and practicality of selection of ZVI-based materials. To compensate for the insufficient amount of existing machine learning source data and the real-world implementation, machine learning combines evaluation index (EI) and reactive material experimental evaluations. XGboost model is applied to estimate the kinetic data and SHAP is used to improve the accuracy of model. Batch and column tests were conducted to investigate the geochemical characteristics of groundwater. The study find that specific surface area is a fundamental factor correlated with the kinetic constants of ZVI-based materials, according to SHAP analysis. Reclassifying the data with specific surface area significantly improved prediction accuracy (reducing RMSE from 1.84 to 0.6). Experimental evaluation results showed that ZVI had 3.2 times higher anaerobic corrosion reaction kinetic constants and 3.8 times lower selectivity than AC-ZVI. Mechanistic studies revealed the transformation pathways and endpoint products of iron compounds. Overall, this study is a successful initial attempt to use machine learning for selecting reactive materials.
Article
Monitoring groundwater (GW) quality is essential for the sustainable management of water resources to preserve public health and ecosystem functioning. The present study developed a machine learning (ML) modeling framework using high-throughput sequencing microbiome data as input variables, which successfully predicted the status and source of GW pollution. No systematic spatiotemporal patterns in the environmental parameters and community diversity indices were observed for the GW samples taken from a total petroleum hydrocarbon (TPH)-contaminated site. In contrast, the ML modeling optimized via model selection and hyperparameter tuning led to high prediction accuracy (>98 %) in classifying the status and source of GW pollution. Feature importance analysis using the ML models (logistic regression and support vector machine with radial basis function) identified members of Rhodocyclaceae, Syntrophaceae, and Helicobacteraceae as strong indicators of GW polluted with TPHs. The identification of these microbial taxa as pollution indicators was consistent with their known ecophysiology associated with TPH metabolism. The usefulness of these microbial indicators was then validated using both conventional hypothesis testing and phylogenetic analysis. Overall, the ML modeling pipeline established in this study using microbiome data provides new information on the interaction between a set of microbial biomarkers and enhances the predictive understanding of GW pollution and its bioremediation potential.
Article
"Save Soil Save Earth" is not just a catchphrase; it is a necessity to protect our soil ecosystem from the unwanted and unregulated level of xenobiotic contamination. Numerous challenges such as type, lifespan, nature of pollutants and high cost of treatment has been associated with the treatment or remediation of contaminated soil, whether it be either on-site or off-site. Due to the food chain, the health of non-target soil species as well as human health were impacted by soil contaminants, both organic and inorganic. In this review, the use of microbial omics approaches and artificial intelligence or machine learning has been comprehensively explored with recent advancements in order to identify the sources, characterise, quantify, and mitigate soil pollutants from the environment for increased sustainability. This will generate novel insights into methods for soil remediation that will reduce the time and expense of soil treatment.