ArticlePDF Available

Reinforcement learning applications in environmental sustainability: a review

Authors:
  • SKEMA Business School, Université Côte d'Azur

Abstract and Figures

Environmental sustainability is a worldwide key challenge attracting increasing attention due to climate change, pollution, and biodiversity decline. Reinforcement learning, initially employed in gaming contexts, has been recently applied to real-world domains, including the environmental sustainability realm, where uncertainty challenges strategy learning and adaptation. In this work, we survey the literature to identify the main applications of reinforcement learning in environmental sustainability and the predominant methods employed to address these challenges. We analyzed 181 papers and answered seven research questions, e.g., “How many academic studies have been published from 2003 to 2023 about RL for environmental sustainability?” and “What were the application domains and the methodologies used?”. Our analysis reveals an exponential growth in this field over the past two decades, with a rate of 0.42 in the number of publications (from 2 papers in 2007 to 53 in 2022), a strong interest in sustainability issues related to energy fields, and a preference for single-agent RL approaches to deal with sustainability. Finally, this work provides practitioners with a clear overview of the main challenges and open problems that should be tackled in future research.
This content is subject to copyright. Terms and conditions apply.
Vol.:(0123456789)
Artificial Intelligence Review (2024) 57:88
https://doi.org/10.1007/s10462-024-10706-5
1 3
Reinforcement learning applications inenvironmental
sustainability: areview
MaddalenaZuccotto1· AlbertoCastellini1· DavideLaTorre2· LapoMola3,4·
AlessandroFarinelli1
Published online: 12 March 2024
© The Author(s) 2024
Abstract
Environmental sustainability is a worldwide key challenge attracting increasing attention
due to climate change, pollution, and biodiversity decline. Reinforcement learning, initially
employed in gaming contexts, has been recently applied to real-world domains, including
the environmental sustainability realm, where uncertainty challenges strategy learning and
adaptation. In this work, we survey the literature to identify the main applications of rein-
forcement learning in environmental sustainability and the predominant methods employed
to address these challenges. We analyzed 181 papers and answered seven research ques-
tions, e.g., “How many academic studies have been published from 2003 to 2023 about RL
for environmental sustainability?” and “What were the application domains and the meth-
odologies used?”. Our analysis reveals an exponential growth in this field over the past two
decades, with a rate of 0.42 in the number of publications (from 2 papers in 2007 to 53 in
2022), a strong interest in sustainability issues related to energy fields, and a preference for
single-agent RL approaches to deal with sustainability. Finally, this work provides prac-
titioners with a clear overview of the main challenges and open problems that should be
tackled in future research.
Keywords Reinforcement learning· Environmental sustainability· Reinforcement learning
applications· Sustainable development· Artificial intelligence
* Maddalena Zuccotto
maddalena.zuccotto@univr.it
1 Department ofComputer Science, University ofVerona, Strada Le Grazie 15, 37134Verona, Italy
2 SKEMA Business School, Université Côte d’Azur, Sophia Antipolis, 60 Rue Fedor Dostoïevski,
06902Valbonne, France
3 Department ofManagement, University ofVerona, Via Cantarane 24, 37129Verona, Italy
4 SKEMA Business School, Université Côte d’Azur (GREDEG), Sophia Antipolis, 60 Rue Fedor
Dostoïevski, 06902Valbonne, France
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
M.Zuccotto et al.
1 3
88 Page 2 of 68
1 Introduction
Artificial Intelligence (AI) is taking an increasingly important role in industry and soci-
ety. AI techniques have been recently introduced in autonomous driving, personalized
shopping, and fraud prevention, just to make a few examples. A key challenge faced
by today’s society for which AI can bring an important advancement is environmental
sustainability. Climate change, pollution, biodiversity decline, poor health, and poverty
have led in the last years governments and companies to focus more and more their
efforts and investments on solutions to environmental sustainability problems, which
are usually characterized by an inefficient and increased use of resources. Environmen-
tal sustainability can be defined as a set of constraints regarding the use of renewa-
ble and nonrenewable resources on the one hand, pollution, and waste assimilation on
the other (Goodland 1995). In this regard, in 2015, the United Nations published the
“2030 Agenda for Sustainable Development” the centerpiece of which is 17 Sustainable
Development Goals (United Nations 2015) to be fully achieved by 2030 to attain sus-
tainable development in the economic, social, and environmental contexts, and elimi-
nate all forms of poverty.
AI-based algorithms can control autonomous drones used in water monitoring (Stec-
canella etal. 2020; Marchesini etal.2021; Bianchi etal.2023), extract from acquired data
new insight about environmental conditions (Castellini etal. 2020; Azzalini etal. 2020),
improve the healthiness of indoor environments (Capuzzo etal. 2022), or demand forecast
in district heating networks (Bianchi etal.2019; Castellini etal. 2021,2022). Several AI
techniques have been employed to address various environmental sustainability challenges.
These approaches enable the efficient management of distributed resources within smart
grids (Roncalli etal. 2019; Orfanoudakis and Chalkiadakis 2023), improve the power flow
for DC grids (Blij etal. 2020), increase the utilization of renewable resources for elec-
tric vehicle charging (Koufakis etal. 2020), and mitigate carbon emissions in urban trans-
portation by fostering ridesharing and reducing traffic congestion (Bistaffa et al. 2021,
2017). Furthermore, a crucial aspect of climate change prevention involves optimizing the
energy consumption associated with heating and cooling residential properties. To tackle
this issue, AI-based approaches have been developed methods to enhance the efficiency of
home systems (Panagopoulos etal. 2015; Auffenberg etal. 2017) and quantify the thermal
efficiency of residences (Brown etal. 2021). Among the broad spectrum of AI techniques
in this survey, we focus on Reinforcement Learning (RL)(Sutton and Barto 2018), which
has recently obtained impressive success, achieving human-level performance in several
tasks, such as in the context of games (Silver etal. 2016, 2017).
One of the most important and interesting challenges in today’s RL research is the
application of RL algorithms to real-world domains, where uncertainty makes strategy
learning and adaptation much more complex than in game environments. In particular, the
application of RL to environmental sustainability has achieved, in the last decade, a strong
interest from both the computer science community and the communities of environmental
sciences and business. Reducing carbon emissions requires increasing renewable resources
usage, such as solar and wind power. While these resources are economically efficient,
their stochastic and intermittent nature poses challenges in replacing nonrenewable energy
sources within energy networks. RL, with a systematic trial-and-error interaction with
dynamic environments, offers a promising approach for learning optimal policies that can
adapt to changing system dynamics and effectively manage environmental uncertainty.
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Reinforcement learning applications inenvironmental…
1 3
Page 3 of 68 88
Thus, an RL agent is capable of handling variations in operating conditions, for instance,
due to a change in resource availability or weather conditions.
This work surveys the recent use of RL to improve environmental sustainability. It pro-
vides a comprehensive overview of the different application domains where RL has been
used, such as energy and water resource management, and traffic management. The goal is
to show practitioners the state-of-the-art RL methods that are currently used to solve envi-
ronmental sustainability problems in each of these domains. For each paper analyzed, we
consider
The problem tackled,
The RL approach used,
The challenges faced,
The formalization of the RL problem (i.e., type of state/action space, type of transition
model, type of RL method, performance measures used to evaluate the results).
The paper is structured as follows. Section2 presents the surveys already available on top-
ics close to RL and environmental sustainability. Section3 presents the basic concepts of
RL as well as a formalization of the main concepts. In Sect.4, we present the research
methodology used in our survey. Section5 describes the results of our research, consider-
ing different levels of detail. In particular, in Sects.5.1.1, 5.1.2 and 5.1.3, we provide a
quantitative analysis of the state-of-the-art related to the application of RL in environmen-
tal sustainability over the last two decades. Then, Sect.5.1.4 outlines domains where RL
techniques are applied and the RL-based approaches employed to address environmental
sustainability. In Sect.5.2, our focus shifts to a subset of 35 main papers, for which we
analyze the application domains of proposed RL techniques, provide technical insights into
problem formalization, discuss the performance metrics used for evaluation, and consider
the challenges addressed. Section5.3 provides an in-depth analysis of each of these main
papers. Finally, in Sect.6 we discuss our findings, and in Sect.7 we draw conclusions and
summarize future directions.
2 Related work
The literature provides already some surveys on the application of RL to problems
related to environmental sustainability, but all these works focus only on specific aspects
of environmental sustainability or they consider also AI methods different from RL. For
instance, Ma etal. (2020) focus on Energy-Harvesting Internet of Things (IoT) devices,
offering insights into recent advancements addressing challenges in commercializa-
tion, standards development, context sensing, intermittent computing, and communica-
tion strategies. Charef etal. (2023) conduct a study considering various AI techniques,
including RL, to enhance energy sustainability within IoT networks. They categorize
studies based on the challenges they address, establishing connections between chal-
lenges and AI-based solutions while delineating the performance metrics used for
evaluation. Within the domain of Architecture, Engineering, Construction, and Opera-
tion, Rampini and ReCecconi (2022) concentrate on the application of AI techniques,
including RL, in Asset Management. Their work reviews studies related to several
aspects such as energy management, condition assessment, operations, risk, and project
management, identifying key points for future development in this context. Alanne and
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
M.Zuccotto et al.
1 3
88 Page 4 of 68
Sierla (2022) shift their focus to smart buildings, discussing the learning capabilities of
intelligent buildings and categorizing learning application domains based on objectives.
They also survey the application of RL and Deep Reinforcement Learning (DRL) in
decision-making and energy management, encompassing aspects like control of heating
and cooling systems and lighting systems. Within the context of smart buildings and
smart grids, Mabina et al. (2021) examine the utilization of Machine Learning (ML),
including RL, for optimizing energy consumption and electric water heater schedul-
ing, emphasizing the advantages of these approaches in Demand Response (DR) due
to their interaction with the environment. Himeur etal. (2022) investigate the integra-
tion of AI-big data analytics into various tasks such as load forecasting, water man-
agement, and indoor environmental quality monitoring, focusing on the role of RL and
DRL in optimizing occupant comfort and energy consumption. Yang etal. (2020) focus
on the application of RL and DRL techniques to sustainable energy and electric sys-
tems, addressing issues such as optimization, control, energy markets, cyber security,
and electric vehicle management.
In the realm of transportation systems, Li etal. (2023) explore various topics, includ-
ing cooperative mobility-on-demand systems, driver assistance systems, autonomous
vehicles (AVs), and electric vehicles (EVs). Sabet and Farooq (2022) study the state-
of-the-art in the context of Green Vehicle Routing Problems, which involve reducing
greenhouse gas (GHG) emissions and addressing issues like charging activities, pickup
and delivery operations, and energy consumption. Moreover, the authors note that most
of the works leverage metaheuristics while using RL methods is uncommon. Chen etal.
(2019) tackle sustainability concerns within the Internet of Vehicles, leveraging 5th
generation mobile network (5G) technology, Mobile Edge Computing architecture, and
DRL to optimize energy consumption and resource utilization. Rangel-Martinez etal.
(2021) assess the application of ML techniques, including RL, in manufacturing, with
a focus on energy-related fields impacting environmental sustainability. Sivamayil etal.
(2023) explore a wide range of RL applications (e.g., Natural Language Processing,
health care, etc.) emphasizing Energy Management Systems with an environmental sus-
tainability perspective. Mischos etal. (2023) investigate Intelligent Energy Management
Systems across diverse building environments, considering control types and optimiza-
tion approaches, including ML, DL, and DRL. Yao etal. (2023) discuss the application
of Agent-Based Modeling and Multi-Agent System modeling in the transition to Multi-
Energy Systems, highlighting RL and suggesting future research directions in Multi-
Agent Reinforcement Learning (MARL) for energy systems.
While these works address specific aspects of environmental sustainability using RL
methods, our review takes a comprehensive approach, analyzing all contexts in which
RL techniques have recently contributed to enhancing environmental sustainability. Our
goal is to provide practitioners with insights into state-of-the-art methods for addressing
environmental sustainability challenges across various application domains, including
energy and water resource management and traffic management. In summary, the main
contribution of this survey consists of offering an overview of RL application domains
within the context of environmental sustainability.
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Reinforcement learning applications inenvironmental…
1 3
Page 5 of 68 88
3 Reinforcement learning: preliminaries andmain denitions
In this section, we present the basic concepts of RL as well as a formalization of the
main concepts. RL, a prominent machine learning paradigm, focuses on learning a pol-
icy that maximizes cumulative rewards, i.e., which action should be selected consider-
ing the environment configuration for achieving the best possible outcome. Key ele-
ments of RL are listed in the following:
The agent is the entity that makes decisions and performs actions in the environment;
The environment represents the system with which the agent interacts and provides
the agent with feedback on the performed action;
The policy is a function that defines the agent’s behavior considering the environment con-
figuration (i.e., a map between what the agent observes and what the agent should do);
The reward is a numerical signal that provides feedback on the action performed by the
agent;
The value function specifies state values, namely, how valuable it is to reach a state, con-
sidering also future states reachable from it;
The model of the environment (optional) is a stochastic function providing next state prob-
ability given current state and action, it allows simulating the behavior of the environment
in response to the agent’s actions.
RL methods (Sutton and Barto 2018) can be categorized into two main groups: model-free
and model-based (Moerland et al. 2020). Over the past two decades, model-free methods
have demonstrated significant success. Meanwhile, model-based approaches have become a
focal point in current research due to their potential to enhance sample efficiency, which is a
reduction in interactions with the environment. This efficiency is achieved by explicitly repre-
senting the model of the environment and incorporating relevant prior knowledge (Castellini
etal. 2019; Zuccotto etal. 2022a, b). Additionally, model-based methods offer the advantage
of addressing the risks associated with taking actions in partially observable environments
(Mazzi etal. 2021, 2023; Simão etal. 2023) or partially known environment (Castellini etal.
2023).
A common framework to formalize the RL problem is by using Markov Decision Pro-
cess (MDP)(Puterman 1994). An MDP is a tuple
(S
,
A
,
T
,
R
,
𝛾)
where S is a finite set of
states, A is a finite set of actions,
TS×A
Π(S)
is the transition model where
Π(S)
is
the space of probability distribution over states,
RS×A
is the reward function and
𝛾∈[0, 1)
is the discount factor. The agent’s goal is to maximize the expected discounted
return
𝔼
[
t=0
𝛾
t
R(st,at
)]
acting optimally, namely, choosing in each state
st
, at time t, the
action
at
with the highest expected reward. The solution of an MDP is an optimal policy,
namely, a function that optimally maps states into actions. A policy is optimal if it maxi-
mizes the expected discounted return. The discount factor
reduces the weight of long-term
rewards guaranteeing convergence. In the case of partially observable environments, an
extension of the MDP framework, namely POMDP(Kaelbling etal. 1998), can be used. A
POMDP is a tuple
(S
,
A
,
O
,
T
,
Ω
,
R
,
𝛾)
where the elements shared with MDP are augmented
by
Ω
, a finite set of observations, and
OS×A
Π(Ω)
, the observation model. In contrast
to MDPS, in POMDPs the agent is not able to directly observe the current state
st
but it
maintains a probability distribution over states S, called belief, which updates at each time-
step. The belief summarizes the agent’s previous experiences, i.e. the sequence of actions
and observations that the agent took from an initial belief
b0
to the belief b. The solution
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
M.Zuccotto et al.
1 3
88 Page 6 of 68
of a POMDP is an optimal policy, namely, a function that optimally maps belief states into
actions. In the following, we will survey applications of RL to environmental sustainability,
hence we will investigate how the elements described in this section (e.g., MDP modeling
framework, RL algorithms, etc.) have been used so far to solve problems related to environ-
mental sustainability.
4 Review methodology
In this section, we outline the research methodology we used for this study. It consists of 5
steps: (i) the definition of the research questions, (ii) paper collection process, (iii) the defi-
nition of inclusion and exclusion criteria, (iv) the identification of relevant studies based on
inclusion and exclusion criteria, (v) data extraction and analysis.
Research questions. The first step involves defining the research questions we want
to answer on the application of RL techniques for environmental sustainability. The goal
of our questions is twofold: to offer a quantitative analysis of the state of the art related to
the application of RL to environmental sustainability and to analyze the use of these tech-
niques focusing on sustainability. Specifically, we aim to answer the following questions:
RQ1: How many academic studies have been published from 2003 to 2023 about RL
for environmental sustainability?
RQ2: What were the most relevant publication channels used?
RQ3: In which country were located the most active research centers?
RQ4: What were the application domains and the methodologies used?
RQ5: How was the RL problem formalized (i.e., type of state/action space, type of tran-
sition model, and type of dataset used)?
RQ6: Which evaluation metrics were used to assess the performance?
RQ7: What were the challenges addressed?
The databases we use to collect papers are those of the search engines Scopus and Web of
Science. To limit the scope of research to the application of RL approaches for environ-
mental sustainability, we define the following search strings:
“reinforcement learning AND sustainable AND environment”;
“reinforcement learning AND environmental AND sustainability”;
“reinforcement learning AND environment AND sustainability”;
“reinforcement learning AND environmental AND sustainable”.
The search on the two databases led to a total of 375 papers, 236 collected from Scopus
and 139 from Web of Science.
Selection criteria for the initial set of (181) papers. To refine the results of the search,
we outline the following inclusion and exclusion criteria.
Inclusion criteria. To determine studies eligible for inclusion in this work, we consider
the following criteria:
It is written in English;
It is clearly focused on RL for environmental sustainability;
In the case of duplicate articles, the most recent version is included.
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Reinforcement learning applications inenvironmental…
1 3
Page 7 of 68 88
Exclusion criteria. To further refine our search, we apply the following exclusion criteria:
the study is an editorial, a conference review, or a book chapter.
Following these criteria, we found 181 papers (104 articles, 70 conference papers,
and 7 reviews). We combine the information in the index keywords of these papers
with their number of citations and the publication year. In particular, we compute
the number of occurrences of each keyword to identify the application domains and
methodologies most used in the literature. To this aim, we standardize the keywords
to avoid spelling variations. Then, we combine these values with the number of cita-
tions and the publication year to identify the most recent and relevant studies. In cases
where index keywords are missing, we use author keywords. For the only three papers
that do not have author nor index keywords, we use the title as related keywords.
Selection criteria for the set of (35) main papers. To identify papers for the in-
depth analysis, we applied the following criteria that consider the most important
keyword occurrences (i.e., the most frequent keywords), the publication year, and the
number of citations based on publication year.
Presence of at least one keyword with no less than 10 occurrences;
Publication year from 2013 to 2023;
Number of citations:
Papers published in 2022–2021, at least 3 citations;
Papers published in 2020–2019, at least 10 citations;
Papers published in 2018–2013, at least 20 citations.
Following these criteria, we selected 35 studies that have been explored in-depth, and
answers to the research questions defined above have been reported.
In the following sections, we first consider the initial 181 papers found using the
search strings defined above and applying inclusion/exclusion criteria. In Sect.5.1.1
we answer question RQ1 for those papers, in Sect. 5.1.2 we answer question RQ2,
in Sect.5.1.3 we answer question RQ3 and in Sect.5.1.4 we answer question RQ4.
Namely, we first analyze the number of papers that focus on RL for sustainability pub-
lished in the last 20 years, then we identify the main international conferences, work-
shops, and journals used to disseminate research, subsequently, we find the research
centers that are particularly active in this research/application topic, and finally, we
analyze the application domains and RL methodologies used. From Sect.5.2, we start
focusing only on the main 35 papers identified using main papers selection criteria.
In particular, we answer question RQ4 in Sect.5.2.1, question RQ5 in Sect. 5.2.2,
question RQ6 in Sect.5.2.3, and question RQ7 in Sect.5.2.4. Namely, for these main
papers, we first analyze the application domains of RL techniques and the RL-based
approaches used to tackle environmental sustainability; then we analyze the way in
which the problem has been formalized; subsequently, we investigate the evaluation
measures used; finally, we identify the main challenges addresses. Notice that ques-
tions RQ1, RQ2, and RQ3 have not been answered considering only the main 35
papers because these questions aim to provide a quantitative analysis of the state of the
art as a whole, and this subset of articles is part of the 181 papers used to answer these
three questions.
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
M.Zuccotto et al.
1 3
88 Page 8 of 68
5 Results ofthereview
This section reports the results of the analysis provided in this survey, first for the ini-
tial set of 181 papers, then for the subset of the main 35 papers.
5.1 Analysis oftheinitial set of181 papers
The initial set of papers, selected using the search strings of Sect.4, is analyzed by answer-
ing questions RQ1, RQ2, RQ3, and RQ4.
5.1.1 RQ1: How many academic studies have been published from2003 to2023
aboutRL forenvironmental sustainability?
This research question aims to quantify the interest of the international scientific com-
munity in applying RL methods to environmental sustainability problems over the last 20
years. As shown in Fig.1, the number of publications (pink dots) remained relatively low
until 2018, with the number of publications each year less than five. Since 2019, there has
been a rapid growth of up to 53 papers in 2022, showing the increasing interest in this
topic during the last few years. It is important to notice that the data for the year 2023 are
updated to April 2023 and do not represent a decrease in the number of studies published.
Application of inclusion and exclusion criteria leads to no publication in the years 2004,
2005, 2010, and 2011. In Fig.1, we also show that the increase in the number of publica-
tions fits an exponential pattern (green line) with a growth rate of 0.42 in the number of
publications (from 2 papers in 2007 to 53 in 2022). To compute the regression model, we
do not consider 2023 in the model since its information is partial.
Fig. 1 Academic studies published from 2003 to 2023. Pink dots represent the number of publications per
year used to compute the regression model represented by the green line
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Reinforcement learning applications inenvironmental…
1 3
Page 9 of 68 88
5.1.2 RQ2: What were themost relevant publication channels used?
With this research question, we aim to show what are the main channels used to dis-
seminate research in the application of RL techniques to environmental sustainability
problems. In Table1, we show the journals and conferences with at least 2 publications.
As can be seen, the topics of the journals and conferences are very varied. In particular,
some of these communication channels are specific for sustainability, e.g., “Sustainabil-
ity (Switzerland)” and “Sustainable Cities and Society”, and many are related to envi-
ronmental aspects such as “IOP Conference Series: Earth and Environmental Science”
Table 1 Journals and conferences with at least two publications
In the “Scope” column, “CS” and “APP” indicate a technical/informatics or application-oriented perspec-
tive of the Conference/Journal, respectively, while “CS + APP” denotes a combination of them
*Including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics
Publications Scope
Conference
Lecture notes in computer science (*) 5 CS
IEEE conference on intelligent transportation systems 3 CS + APP
International conference on autonomous agents and multiagent systems 2 CS
IEEE international conference on distributed computing systems 2 CS
International conference on mobility, sensing and networking 2 CS + APP
IOP conference series: earth and environmental science 2 APP
Land, water and environmental management: integrated systems for sustain-
ability
2 APP
Journal
IEEE access 7 APP
IEEE internet of things journal 5 CS + APP
Sustainability (Switzerland) 5 CS + APP
IEEE transactions on intelligent transportation systems 4 CS + APP
Sustainable cities and society 4 CS + APP
Energies 3 APP
IEEE transactions on green communications and networking 3 CS + APP
IEEE transactions on vehicular technology 3 CS + APP
Journal of cleaner production 3 APP
Applied energy 2 APP
Applied sciences (Switzerland) 2 APP
Electronics (Switzerland) 2 CS + APP
Energy and buildings 2 APP
IEEE sensors journal 2 APP
IEEE transactions on network and service management 2 CS + APP
IEEE wireless communications 2 APP
Journal of hydrology 2 APP
Resources, conservation and recycling 2 APP
Sensors 2 APP
Sustainable energy technologies and assessments 2 APP
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
M.Zuccotto et al.
1 3
88 Page 10 of 68
and “IEEE Transactions on Green Communications and Networking”. Moreover, in the
third column of Table1, we provide an overview of the scope of the publication chan-
nels. To this aim, we analyze the information presented on the website of each con-
ference and journal about its scope, indicating whether it has a technical/informatics
or application-oriented perspective (“CS” or “APP”, respectively) or a combination of
them (“CS + APP”). As can be seen, most of the publication channels are application-
oriented (2 conferences + 12 journals), followed by those that present a combined scope
(2 conferences + 8 journals), finally, a few of them (3 conferences) have a more techni-
cal/informatics perspective.
5.1.3 RQ3: In which country were located themost active research centers?
This research question aims to show which countries whose research centers are most con-
cerned with the application of RL methods to environmental sustainability issues. With
this in mind, we leverage the information in the Scopus and Web of Science databases
about the 181 papers that were not excluded by the application of inclusion and exclu-
sion criteria. In Fig.2, we show only the countries with at least 5 publications and, as we
can see, the highest number of papers comes from research centers located in China (33
papers), followed by the United States (29 papers), and the United Kingdom (17 papers).
It is important to note that most of these works are developed in collaboration between
research centers in multiple countries, so we count the paper for each collaborating coun-
try. To show co-author relationships, in Fig.3, we represent only countries with at least
5 occurrences among analyzed documents. Each country is depicted as a circle, a link
between 2 circles represents a co-authorship relation, and the line weight is proportional to
the number of papers in the co-authorship relationship. As we can see, the countries with
Fig. 2 Number of publications per Country on RL approaches for environmental sustainability
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Reinforcement learning applications inenvironmental…
1 3
Page 11 of 68 88
more links are the United States (9 links), followed by Australia (7 links), China, and India
(6 links).
5.1.4 RQ4: What were theapplication domains andthemethodologies used?
This research question aims to analyze the application domains and the RL methodologies
used for tackling issues related to environmental sustainability. To this aim, we analyze the
index keywords of the 181 papers that were not excluded by applying the inclusion and
exclusion criteria and the authors’ keywords for works with no index keywords. In Fig.4,
Fig. 3 Co-author relationships with Country as a unit of analysis. Nodes represent states, and links depict
co-authorship relationships. The thickness of the link is proportional to the number of papers in the co-
authorship relationship
Fig. 4 Overview of application domains. For each application domain (y-axis), we show the number of
occurrences of keywords belonging to its macro-area (x-axis)
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
M.Zuccotto et al.
1 3
88 Page 12 of 68
we show the application domains with more than 10 index keyword occurrences. We group
the keywords into macro areas, such as in “Energy” we include keywords like “energy”,
“energy conservation”, “energy consumption”, etc., in “Electric energy” we group key-
words such as “electric energy storage”, “electric load dispatching”, “smart grid”, etc.
(see Appendix for details). The image clearly shows that there is a wide variety of applica-
tion domains, but most of the applications deal with sustainability issues related to energy
fields.
Regarding the proposed approaches, we follow the same procedure as previously
described for application domains grouping keywords that refer to the same method. For
example, in “Actor-Critic” we group keywords such as “actor critic”, “advantage actor-
critic (A2C)”, and “soft actor critic”. As we can see in Fig.5, the most widely used RL
method for dealing with environmental sustainability in different application domains is
a state-of-the-art model-free algorithm, namely Q-Learning (Watkins 1989). It is impor-
tant to note that, in the image, we show only RL approaches, but there are also index
keywords related to other approaches like “genetic algorithm”, “simulated annealing”,
etc.
Moreover, we perform a bibliometrics analysis on the co-occurrence of index keywords
by using VOSviewer (Perianes-Rodriguez etal.2016). Having a co-occurrence means that
2 keywords occur in the same work. After a data cleaning process, VOSviewer detects 17
clusters by considering keywords with 3 occurrences at least. In Fig.6, each cluster cor-
responds to a color, and each element of the cluster, namely a keyword, is depicted by a
circle in the cluster color. For instance, the blue cluster is made of several blue nodes, each
of which contains a keyword (e.g., electric vehicles, charging (batteries)) belonging clus-
ter. The size of the circle and the circle label depend on the number of occurrences of the
related keyword. Lines between items depict co-occurrences of keywords in a paper. Each
cluster groups keywords identifying an application domain and/or the approaches used to
tackle related environmental sustainability issues. For example, cluster 1 (red colored on
the top-right) is somewhat related to traffic signals control for traffic management through
the application of control strategies. Cluster 2 (green colored on the left) is related to power
management and energy harvesting in wireless sensor networks.
Fig. 5 Overview of RL methods used. For each RL method (y-axis), we show the number of occurrences of
corresponding keywords (x-axis)
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Reinforcement learning applications inenvironmental…
1 3
Page 13 of 68 88
5.2 Analysis ofthe35 main papers
In this section, we focus on the 35 papers chosen using the selection criteria for the main
papers (see Sect.4). First, we provide a high-level analysis of the application domain and
the RL approaches used to address environmental sustainability issues (research question
RQ4). Then, we give an overview of the RL problem formalization (i.e., type of state/
action space, type of transition model, type of RL method) (research question RQ5). Sub-
sequently, we analyze the performance measures used to evaluate the results (research
question RQ6). Finally, we evaluate the main challenges faced (research question RQ7).
5.2.1 RQ4: What were theapplication domains andthemethodologies used?
In Table2, we summarize the application domains and the RL approaches used in the
selected works. First, we group the 35 main works according to their main related appli-
cation domains (first column). It is important to note that application domains may
overlap consequently, we report all application domains common to all papers in the
same group. Then, we indicate for each paper (second column) the method behind the
proposed technique (third column). The selected papers tackle environmental sustain-
ability issues in the application domains shown in Fig.4. In particular, the most relevant
Fig. 6 Bibliometrics analysis on the co-occurrence of index keywords. Each color outlines a cluster, and
each circle of the cluster color represents a keyword, while edges represent co-occurrences of keywords in
the same work
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
M.Zuccotto et al.
1 3
88 Page 14 of 68
Table 2 Technical information about the selected works. In the third column, we report the RL methodology used
Application domain Paper Method State Action Tr. model Dataset
Electric vehicles, batteries, energy Sultanuddin etal. (2023) DDQN N/A D St. S
Zhang etal. (2021a) MA AC N/A D* Det.* R
IoT Ajao and Apeh (2023) Q-Learning D* D* St. S
Zhang etal. (2021b) Adaptive regression D* C* Det.* S
Han etal. (2020) MAB N/A D* St. S
Water resources Emamjomehzadeh etal. (2023) Q-Learning N/A N/A N/A S
Skardi etal. (2020) Q-Learning N/A N/A Det., St. R + S
Emissions/pollution Chen etal. (2021) MADDPG C C N/A S
Huo etal. (2023) Multi-agent Q-learning D* D N/A S
Agriculture Elavarasan and DurairajVincent (2020) DRQN D* N/A N/A R + S
Data, energy Shaw etal. (2022) Q-Learning, SARSA D* D* St. S (R based)
Venkataswamy etal. (2023)AC D* D N/A S, R
Urban traffic and transportation Ounoughi etal. (2022) DQN N/A D* N/A R
AlizadehShabestray and Abdulhai (2019) DQN D* D St. S
Aziz etal. (2018) RMART D* D* St.* S
Khalid etal. (2023) DQN D* D* Det.* S
Buildings, energy Kathirgamanathan etal. (2021)SAC C N/A N/A R + S
DeGracia etal. (2015) SARSA(
𝜆
) D* D* Det. S
Manufacturing Wang and Wang (2022) Policy network D* D* Det.* S*
Leng etal. (2021) Q-Learning* N/A D* St.* S (R based)
Mobile and wireless communication,
energy, renewable/sustainable energies
Liu etal. (2021) DDPG C C N/A S
Miozzo etal. (2015) Distributed Q-Learning D* D N/A S
Miozzo etal. (2017) Distributed Q-Learning D* D N/A S
Giri and Majumder (2022) Deep Q-Learning C* D* St.* S
Mobile and wireless communication Al-Jawad etal. (2021) Q-Learning D* D* N/A S
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Reinforcement learning applications inenvironmental…
1 3
Page 15 of 68 88
Table 2 (continued)
Application domain Paper Method State Action Tr. model Dataset
Energy, electric energy Sheikhi etal. (2016) Q-Learning N/A N/A St. S
Harrold etal. (2022) Rainbow DQN C* D N/A R + S
Jendoubi and Bouffard (2022) MADDPG N/A C St.* S*
Energy Gao etal. (2023) Q-Learning N/A D* St.* R
Wireless sensor network, energy, renew-
able/sustainable energies
Hsu etal. (2014) Q-Learning* D* D* N/A S
Chen etal. (2016) Q-Learning D* D* St.* R + S
Feng etal. (2023) DDPG C C St.* S
Autonomous vehicles Bouhamed etal. (2020) DDPG C* C N/A S
Q-Learning D* D*
Sacco etal. (2021)AC C* D* N/A R + S
Gu etal. (2023) Policy gradient C* D St* S
In the fourth and fifth columns, we indicate with “C” and “D” continuous and discrete state/action spaces, respectively. In the sixth column, “Det.” and “St.” denote determin-
istic and stochastic transition models respectively. In the seventh column “S” and “R” indicate synthetic and real-world datasets, respectively
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
M.Zuccotto et al.
1 3
88 Page 16 of 68
application domain correlates to the macro area of “Energy”. Indeed, it involves more
than half of the papers in the table, considering both the works in which it represents the
main application domain and those in which it is related to the main application domain.
In Table2, we also show that 16 out of the 35 selected papers use DRL approaches
such as Deep Q-Network (DQN) (Mnih et al. 2015) and Double Deep Q-Network
(DDQN)(van Hasselt etal. 2016), and another 2 rely on DRL techniques in multi-agent
contexts, such as Multi-Agent Deep Deterministic Policy Gradient (MADDPG)(Lowe
et al. 2017). RL techniques are used by 10 articles, here the most used method is
Q-Learning, and 7 apply RL approaches to a multi-agent context. Finally, only 1 paper
adopts a Genetic Algorithm-based RL (GARL) approach.
5.2.2 RQ5: How wastheRL problem formalized (i.e., type ofstate/action space, type
oftransition model, andtype ofdataset used)?
This research question deals with a technical point of view, which we think may be helpful
for practitioners to get an overview of the environments considered by the authors in devel-
oping the proposed methods. In Table2, we summarize the information related to problem
formulation that we found in the selected papers. For each paper, we point out if the state
and action spaces are continuous or discrete and whether the transition model is determin-
istic or stochastic. Finally, we provide information on the dataset used in the experiments,
outlining whether real-world or synthetic data are used. It is important to note that not all
papers explicitly provide this information. Thus, we mark with “*” all information inferred
from reading the article. On the other hand, “N/A” specifies that the information available
was not enough to infer the required data.
In the selected papers, most of the spaces of states and actions are discrete. Indeed, only
9 approaches use a continuous space state (third column of Table2), and 6 use a continu-
ous action space (fourth column of Table2). Regarding the transition model, we can see
that the model is stochastic in most cases where the information is available. In DeGracia
etal. (2015),“Det.” and “St.” are both reported because the authors test the proposed meth-
odology on both models. Finally, in the last column, we note that most of the experiments
are performed on synthetic datasets. In fact, only 9 papers use real-world data, 6 of which
combine them with synthetic data (“R + S” in the table), while 2 others use the real data to
generate larger data sets from them (“S (R based)” in the table). Only Venkataswamy etal.
(2023) test the proposed approach on both dataset types (“R, S” in the table).
5.2.3 RQ6: Which evaluation metrics were used toassess theperformance?
This research question aims to provide an overview of the authors’ performance meas-
ure choices to evaluate the proposed approaches in the 35 selected papers. In the second
column of Table3, we report information about the metrics found in the articles, which
are also indicated in the in-depth analysis of each paper in Section5.3. As we can see in
Table3, the performance measures vary widely depending on the application domain and
the goal of the method proposed in each paper. For example, reward is used as a metric in
9 articles but is computed differently depending on the context. Concerning electric vehi-
cles, in Sultanuddin etal. (2023), the reward corresponds to a penalty function consider-
ing the cost of charging and a departure incentive. Instead, in wastewater treatment plants
(WWTPs), Chen etal. (2021) use a reward function that takes into account the operational
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Reinforcement learning applications inenvironmental…
1 3
Page 17 of 68 88
Table 3 We summarize the performance measures used by the authors to evaluate the proposed approaches in the second column and the challenges they address in the third
column
Paper Performance measures Challenges
Sultanuddin etal. (2023) Reward, voltage levels, load curves, charging/discharging
curves
Grid overload prevention, driving pattern uncertainty, dimen-
sionality
Zhang etal. (2021a) MCWT, MCP, TSF, CFR Dimensionality, coordination and cooperation among agents,
charging requests competitiveness, joint optimization of multi-
ple optimization objectives
Ajao and Apeh (2023) Detection accuracy, recall, precision, specificity, F-measure Security threat to sustainability functionality
Zhang etal. (2021b) Operational logs, power wastage, power requirements, average
failure ratio
Energy requirements management, smart power allocation
Han etal. (2020) Number of ready nodes, throughput Spatial uncertainty
Emamjomehzadeh etal. (2023) Water table level, nitrate concentration, energy usage, GHG
emissions
WEF nexus modeling and management for an urban area inte-
grated water resource management
Skardi etal. (2020) Water and wastewater allocation, water and groundwater level,
nitrate concentration
Social attachments quantification and consideration, cooperation
among agents
Chen etal. (2021) Reward, Q-values, influents, inflow rate, DO and dosage val-
ues, energy consumption, cost, EP, and GHG emissions
WWTP impact optimization
Huo etal. (2023) Productivity, operational mistakes, GHG emissions, queuing
time
Operational randomness and uncertainties
Elavarasan and DurairajVincent (2020) R2, MAE, MSE, RMSE, MedAE, MSLE, MAPE, PDF,
explained variance score, accuracy
Mapping between raw data and crop yield values, effectiveness
dependence on extracted features quality
Shaw etal. (2022) Energy consumption, SLAV, number of migrations, ESV Energy awareness, slow convergence to optimal policy
Venkataswamy etal. (2023) Monetary job value Intermittent power, environment nonuniformity, system design
and configuration effects, learning and improving available
heuristic policies
Ounoughi etal. (2022) MSE, MAE, noise levels, CO
2
emission, fuel consumption Sustainability and proactivity integration
AlizadehShabestray and Abdulhai (2019) Average of intersection travel time, queue time, and network
travel time, weighted average intersection person travel time
Regular and transit vehicles consideration
Aziz etal. (2018) Average delay, stopped delay, number of stops, and network-
wide delay, GHG emissions
Traffic congestion information sharing, reward function dynamic
adaptation
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
M.Zuccotto et al.
1 3
88 Page 18 of 68
Table 3 (continued)
Paper Performance measures Challenges
Khalid etal. (2023) Execution time, reward, path planned, distance Quality of experience assurance, optimal order user serving,
distance minimization
Kathirgamanathan etal. (2021) Energy purchased and cost, discomfort, reward, temperature,
power demand
Robustness, scalability, lack of well-established environments
DeGracia etal. (2015) Electrical energy saving Energy saving maximization, thermal energy storage optimiza-
tion
Wang and Wang (2022) Overall Nondominated Vector Generation, C Metric, Hyper
volume, D1
R
Energy awareness with simultaneous makespan and energy
minimization
Leng etal. (2021) MSE, MSLE, RMSE, R2, unit and total profit, acceptance rate Demand uncertainty, order customization
Liu etal. (2021) Achievable rate, transmission power Effective communication, IRS phase optimization
Miozzo etal. (2015)Throughput gain, traffic drop rate, energy efficiency, energy
efficient improvement, traffic demand, harvested energy,
battery level, policy, normalized load at the macro bs, total
energy spent, average load, average cell load for the macro bs
battery outage, Jain’s fairness index
Energy harvesting
Miozzo etal. (2017) Switch-off rate, battery level, excess energy
Giri and Majumder (2022) Reward, capacity, network lifetime, average delay Dimensionality, efficient use of collected energy for QoS
Al-Jawad etal. (2021) Throughput, packet loss, rejected flows, PSNR, MOS Sustainable QoS
Sheikhi etal. (2016) Storage charge level, operational cost, primary energy involved Energy system parameter variability or stochasticity, smart grid
architecture issues
Harrold etal. (2022) MAPE, energy cost savings, relative savings, episodic rewards,
and value distribution
Energy arbitrage, renewables usage improvement, limited data
availability
Jendoubi and Bouffard (2022) Annual total cost, daily operation cost, PV production, aggre-
gated demand, power to be charged/discharged, generator
provided power, provider delivered electricity, PAR
Energy dispatch
Gao etal. (2023) Working and non-working energy consumption, the ratio
between working and non-working energy consumption
Energy consumption minimization
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Reinforcement learning applications inenvironmental…
1 3
Page 19 of 68 88
Table 3 (continued)
Paper Performance measures Challenges
Hsu etal. (2014) RBE, EDC, OTRT, ToD achievability Simultaneous achievement of throughput on demand satisfaction
and power consumption reduction
Chen etal. (2016) Nodes potential energy, network lifetime, area coverage ratio,
number of residual alive nodes versus the network lifetime,
recharging cycle
Simultaneous area coverage and energy balancing
Feng etal. (2023) Distribution of the ratio between the expected per-slot
harvested energy and the MS-to-sink distance within the
network, moving trajectories and steps, reward, actor-loss,
battery level, accuracy, convergence, training time
Lack of energy-related information, energy harvesting and data
transmission trade-off
Bouhamed etal. (2020) Path followed, reward, battery level, completion time of the
tour against the ground unit transmission power
Limited battery capacity, obstacle awareness
Sacco etal. (2021) Task completion time, utility, average node-antenna distance,
energy consumption, task completion time against the aver-
age computing workload, CDF and utility evolution
Task completion time reduction, energy efficiency
Gu etal. (2023) Energy loss, collision loss, reward, lane changes Efficient response to environmental observations
The papers are grouped according to the main related application domains, as in Table2. Performance measures written in italics are used in both Miozzo etal. (2015) and
Miozzo etal. (2017)
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
M.Zuccotto et al.
1 3
88 Page 20 of 68
cost, consisting of multiple components, such as energy cost and biogas price, and several
indicators, like energy consumed by the aeration and sludge treatment processes and GHG
emissions. Another performance measure common to multiple application domains is, for
example, energy consumption. Indeed, it is used in contexts such as water resources man-
agement(Emamjomehzadeh etal. 2023), WWTPs(Chen etal. 2021), data centers(Shaw
etal. 2022), and AVs(Sacco etal. 2021). Even approaches related to the same application
domain may differ in terms of performance measures depending on their objective. Con-
sidering, for example, the water resources context, both Emamjomehzadeh et al. (2023)
and Skardi etal. (2020) evaluate their proposed approaches using resource level and nitrate
concentration. However, in (Emamjomehzadeh etal. 2023), energy consumption and GHG
emissions are also considered, while in (Skardi etal. 2020), resource allocation is used.
5.2.4 RQ7: What were thechallenges addressed?
This research question aims to offer an overview of the issues that the authors have tackled
within the 35 selected papers. In the third column of Table3, we summarize information
about the challenges addressed in the articles, which are also indicated in the in-depth anal-
ysis of each paper in Section5.3. As with the performance measures, we can see in Table3
that the challenges faced vary greatly depending on the application context and the goal
of the method proposed in each paper. As an example, considering the domain of electric
vehicles, Sultanuddin etal. (2023) address several challenges, like avoiding network energy
overload at peak times, considering the uncertainty of driving patterns, and managing
large state spaces. On the other hand, in addition to the challenge related to dimensionality,
Zhang etal. (2021a) also address issues related to coordination and collaboration among
agents, the competitiveness of charging demands, and joint optimization of multiple objec-
tive functions. However, although not explicitly stated by the authors, the challenge that
unites these papers is the development of approaches capable of adapting to changes in a
dynamic environment and managing the uncertainty associated with the environment that,
in many cases, arises from the use of renewable resource sources, which have a stochastic
and intermittent nature whose management adds further complexity to the problem.
5.3 Analysis ofsingle papers (grouped byapplication domain)
In this section, we group the 35 main papers by application domain and analyze each single
paper answering research questions RQ4, RQ5, RQ6, and RQ7. This provides the reader
interested in a specific application domain with a deep knowledge of the main features of
these papers. Notice that in answering RQ5, we use the information available in Table 2
and report in the text a “(*)” for all information inferred from reading the article.
5.3.1 Electric vehicles, Batteries, Energy
The transportation system is characterized by an increasing presence of EVs due to their
eco-friendly features. In Sultanuddin etal. (2023), it is proposed a DDQN-based approach
to provide a smart scalable charging strategy for EV fleets that ensures all cars have suf-
ficient charging for their trips without exceeding the maximum energy threshold of the
power grid. The charging management system combines information on the current state
of the network and vehicle with historical data, being able to schedule charging at least 24
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Reinforcement learning applications inenvironmental…
1 3
Page 21 of 68 88
hours in advance. In developing the proposed approach, it is considered an environment
with discrete actions and a stochastic transition model. The experimental evaluation is per-
formed on a synthetic dataset by using as metrics the reward, the voltage levels, the load
curves, and the charging/discharging curves. The rapid growth in the popularity of EVs
subjects the power grid infrastructure to challenges, such as preventing grid overload at
peak times. Moreover, the authors address issues related to driving pattern uncertainty and
handling large state spaces.
Zhang et al. (2021a) propose a framework for charging recommendations based on
MARL, called Multi-Agent Spatio-Temporal Reinforcement Learning (MASTER). By lev-
eraging a multi-agent actor-critic framework with Centralized Training and Decentralized
Execution (CTDE), the proposed approach increases the collaboration and cooperation
among agents, and it can make use of information about possible future charging competi-
tion through the use of a delayed access strategy. The framework is further extended to
multi-critics for addressing multiple objective optimizations. MASTER works in environ-
ments characterized by discrete actions (*) and a deterministic transition model (*), and it
has been tested on a real-world dataset. To evaluate its performance, the Mean Charging
Wait Time (MCWT), Mean Charging Price (MCP), Total Saving Fee (TSF), and Charging
Failure Rate (CFR) are used as performance measures. In the development of the proposed
charging recommendation approach, the authors face several challenges such as dealing
with large state and action space, coordination and cooperation among agents in a large-
scale system, potential competitiveness of future charging requests, and the joint optimiza-
tion of multiple optimization objectives.
5.3.2 IoT
Recent years have seen rapid advances in IoT technology enabling the development of
smart services such as smart cities, buildings, and oceans. Regarding smart cities, Ajao
and Apeh (2023) consider the Industrial Internet of Things and present a framework for
edge computing vulnerabilities. Indeed, edge computing security threatens the sustainabil-
ity functionality of urban infrastructure with various attacks, such as Man-in-the-Middle
and denial of service. In particular, to tackle authentication and privacy violation problems,
this work proposes a secure framework modeling in Petri Net, namely Secure Trust-Aware
Philosopher Privacy and Authentication (STAPPA), on which a Distributed Authorization
Algorithm is implemented. Moreover, a GARL approach is developed to optimize the net-
work during learning, detect anomalies, and optimize routing. This work regards an envi-
ronment characterized by discrete state and action spaces (*), and a stochastic transition
model. The authors test the proposed approach on a synthetic dataset and assess the perfor-
mance in anomaly detection and detection accuracy by using the popular detection accu-
racy, recall, precision, specificity, and F-measure. Ajao and Apeh (2023) deal with security
challenges, in particular authentication and privacy violation problems.
Zhang etal. (2021b) propose an IoT-based Smart Green Energy (IoT-SGE) management
system for improving the energy management of power grids allowed by DRL. The pro-
posed approach is able to balance power availability and demand by keeping grid states
steady, thus reducing power wastage. In developing IoT-SGE, it is considered an environ-
ment with discrete states (*), continuous actions (*), and a deterministic transition model
(*). The proposed approach has been evaluated on a synthetic dataset by the use of oper-
ational logs, power wastage and requirement, and average failure ratio as metrics. The
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
M.Zuccotto et al.
1 3
88 Page 22 of 68
authors address an energy sustainability issue, in particular, they aim to manage energy
requirements and allocate smart power systems.
In the context of smart ocean systems, Han etal. (2020) present an analytical model to
evaluate the performance of an Internet of Underwater Things network with energy har-
vesting capabilities. The goal of this work is the maximization of IoT nodes throughput
by optimally selecting the window size. To this aim, the authors propose an RL approach
and leverage the Branch and Bound method to solve the optimization problem by autono-
mously adapting random access parameters through interaction with the network environ-
ment. Considering a realistic scenario, it is proposed a MARL approach to deal with the
lack of network information. In this case, random access parameters autonomously adapt
by using a distributed Multi-Armed Bandit (MAB)-based algorithm for each node. The
environment considered in this work is characterized by deterministic actions (*) and a
stochastic transition model. The authors test the proposed approach on a synthetic dataset,
evaluating its performance in channel access regulation in relation to the number of ready
nodes per time slot and throughput. Finally, this work addresses a fairness issue due to
spatial uncertainty in underwater acoustic communication, to deal with which the authors
formalize an optimization problem for maximizing the IoT network nodes throughput.
5.3.3 Water resources
Water resource management is a key aspect of sustainable development and usually does
not include social aspects. Emamjomehzadeh etal. (2023) propose a novel urban water
metabolism model that combines urban metabolism with the Water, Energy, and Food
(WEF)(Radini etal. 2021) nexus and thus it can consider interconnections among water,
energy, food, material, and GHG emissions. Moreover, this work proposes a physical-
behavioral model that relates the proposed approach to a MARL agent-based model nei-
ther fully cooperative nor fully competitive developed using Q-Learning. In this case, the
only technical information available concerns the use of a synthetic dataset. The proposed
approach is evaluated in terms of water table level, nitrate density, energy usage, and GHG
emissions. Considering water resource management challenges related to sustainability, the
authors aim to model and manage the WEF nexus for Integrated Water Resource Manage-
ment in an urban area, taking into account stakeholders’ characteristics.
Skardi etal. (2020) propose, instead, an approach for quantifying and including social
attachments in water and wastewater allocation tasks. This work proposes a paired physi-
cal-behavioral model, and the authors leverage Q-Learning to include social and behavioral
aspects in the decision-making process. Specifically, it uses the approach proposed by Baz-
zan etal. (2011) to integrate Social Analysis in Q-Learning, and they choose between indi-
vidual or social behavior through the use of specific reward functions. In developing the
proposed method both a deterministic and a stochastic transition model is considered. Tests
are performed on a dataset that combines real-world and synthetic data, and the perfor-
mance evaluation is conducted considering water and treated wastewater allocation to the
agents, water and groundwater level, and the concentration of nitrates to measure ground-
water quality. Using Social Network Analysis, the authors tackle a key challenge in com-
mon resource management, i.e., the cooperation among agents. Also, they aim to quantify
and include social attachments in water resource management.
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Reinforcement learning applications inenvironmental…
1 3
Page 23 of 68 88
5.3.4 Emissions/pollution
The development of WWTPs has a positive impact on environmental protection by reduc-
ing pollution but, at the same time, they consume resources and produce GHG emissions
as well as residual sludge. With this in mind, Chen etal. (2021) propose an approach based
on MADDPG to control Dissolved Oxygen (DO) and chemical dosage at once and improve
sustainability accordingly. Specifically, the proposed approach uses two agents, one to con-
trol DO and one to control chemical dosage. Moreover, a reward function is designed based
on life cycle cost and various Life Cycle Assessment mid-point indicators respectively.
The proposed approach is developed considering an environment with continuous state
and action spaces and tested on a synthetic dataset. To evaluate the training process, the
reward and the Q-values determined by trained critic networks are used as metrics, while
to analyze the variation of the influents and control parameters, the authors leverage the
influents (COD, TN, TP, and NH
3
-N), inflow rate, DO and dosage values. Finally, to assess
the impact of the proposed approach are used energy consumption, cost, Eutrophication
Potential (EP), and GHG emissions. WWTPs have a positive impact on environmental pro-
tection since they reduce contaminants and environmental pollution. However, at the same
time, WWTPs consume resources and produce GHG emissions as well as residual sludge,
thus the authors seek to optimize their impact on environmental sustainability.
Intelligent fleet management is crucial in mitigating direct GHG emissions in open-pit
mining operations. In this context, Huo etal. (2023) propose a MARL-based dispatching
system for reducing GHG emissions. To this aim, this work presents an environment for
haulage simulation that integrates a component for real-time computing of GHG emis-
sions. Then, Q-Learning is leveraged to improve fleet productivity and reduce trucks’ emis-
sions by decreasing their waiting time. In the development of the proposed approach, an
environment characterized by discrete state (*) and action spaces is considered. Tests are
performed on a synthetic dataset and productivity, number of operational mistakes, GHG
emissions, and time spent in queue are used as evaluation metrics. In this work, the authors
tackle operational randomness and uncertainties in fleet management for reducing haul
trucks’ GHG emissions in open-pit mining operations
5.3.5 Agriculture
In the context of sustainable agriculture, one of the key aspects of food security is crop
yield prediction. Elavarasan and DurairajVincent (2020) tackle this problem by using a
DRL approach, specifically a Deep Recurrent Q-Network (DRQN) (Hausknecht and Stone
2015) model. It consists of a Recurrent Neural Network (RNN) (Rumelhart et al. 1986)
on top of the DQN. The proposed approach sequentially stacks the RNN layers, feeds the
network with pre-trained parameters, and adds a linear layer to map the RNN output into
Q-values. The Q-Learning network builds a crop yield prediction environment as a ‘yield
prediction game’ that leverages both parametric feature combinations and thresholds use-
ful in agricultural production. The authors consider an environment with discrete states
(*) and test their approach on a dataset combining real-world and synthetic data, evaluat-
ing the performance by using the following metrics: Determination Coefficient (R2), Mean
Absolute Error (MAE), Mean Squared Error (MSE), Root Mean Squared Error (RMSE),
Median Absolute Error (MedAE), Mean Squared Logarithmic Error (MSLE), Mean Abso-
lute Percentage Error (MAPE), Probability Density function (PDF), Explained Variance
Score, and accuracy. Finally, Elavarasan and DurairajVincent (2020) address issues related
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
M.Zuccotto et al.
1 3
88 Page 24 of 68
to the application of Deep Learning (DL) methods to crop yield prediction for increas-
ing food production. Specifically, the authors tackle the incapability of DL approaches
to directly map, linearly or non-linearly, raw data with crop yield values and the strong
dependence of their effectiveness on the quality of features extracted from data.
5.3.6 Data, energy
Data centers are among the largest consumers of energy. In Shaw et al. (2022), an RL-
based Virtual Machine (VM) consolidation algorithm named Advanced Reinforcement
Learning Consolidation Agent (ARLCA). Its aim consists of simultaneously improving
energy efficiency and delivery service guarantees. In this work, a global resource manager
constantly monitors the state of the system and identifies hosts that may be overloaded due
to the resource demand change over time. The proposed approach rebalances the VM dis-
tribution and avoids the rapid overloading of hosts while ensuring efficient operation. This
work presents two implementations of ARLCA based on two RL methods, i.e., Q-Learning
and SARSA, and it tests two different approaches to balance the exploration-exploration
tradeoff, namely
𝜖
-greedy, and softmax. Finally, the authors leverage the Potential Based
Reward Shaping(Ng et al. 1999) technique to include domain knowledge in the reward
structure and speed up the learning process. ARLCA works in an environment with dis-
crete state and action spaces (*) and a stochastic transition model. Its performance is evalu-
ated on a synthetic dataset (real-world-based). To evaluate the proposed VM consolidation
algorithms, energy consumption, Service Level Agreement Violations (SLAV), number
of migrations, and Energy Service Level Agreement Violations (ESV) are used as perfor-
mance measures. In this work, the authors tackle a key challenge for cloud computing ser-
vices, namely energy awareness. Further, they also face the slow convergence to the opti-
mal policy of conventional RL algorithms.
Renewable energy Aware Resource management (RARE), a DRL approach for job
scheduling in a green data center, is presented in Venkataswamy etal. (2023). This work
proposes a customized actor-critic method in which the authors use three Deep Neural Net-
works (DNNs): the encoder, the actor, and the critic. The encoder summarizes information
about the state of the environment into a compact representation of it, used as input for
both the action and the critic. The actor returns the probability of choosing each schedul-
ing action, while the critic estimates, for each action, the total expected value achieved by
starting in the current state and applying a specific action. Moreover, since DRL requires
a significant amount of interactions with the environment to explore it and then to adapt
a randomly initialized DNN policy, the authors leverage an offline learning algorithm,
namely, Behavioral Cloning, to learn a policy based on existing heuristic policy data used
as prior experience. In particular, the actor network is trained to imitate the action selec-
tion process of data within the replay memory. In developing RARE, it is considered an
environment characterized by discrete states (*) and actions and tested the performance
on both synthetic and real-world datasets by using the total job economic value as metrics.
In this work, the authors tackle several challenges related to the application of RL tech-
niques to the context of green datacenters. The first issue relates to the environment. The
dynamic of green data center environments makes the scheduling process difficult as it has
to consider and manage the intermittent and variable nature of renewable energy sources.
Moreover, the lack of uniformity in the environments makes it challenging to compare dif-
ferent approaches. The second challenge highlights the absence of discussion regarding the
systems design choices effect (e.g., the planning horizon size). Such lack does not help to
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Reinforcement learning applications inenvironmental…
1 3
Page 25 of 68 88
clarify the reasons for the better performance of the RL scheduler over heuristic policies.
Furthermore, the authors discuss employing RL schedulers as a black box, without con-
sidering different configurations, such as the size of the neural network, which can lead to
improved performance. Finally, the last challenge highlights that existing RL schedulers do
not focus on learning and improving available heuristic policies.
5.3.7 Urban traffic andtransportation
In recent years, the traffic congestion level has increased significantly with a conse-
quent negative impact on the environment. Ounoughi etal. (2022) present EcoLight,
an approach for controlling traffic signals based on DRL, which aims to reduce noise
pollution, CO
2
emissions, and fuel consumption. The proposed method combines the
Sequence to Sequence Long Short Term Memory (SeqtoSeq-LSTM) prediction model
with the DQN algorithm. SeqtoSeq-LSTM is used to forecast the traffic noise level that
is part of the traffic information given as input to the DQN to determine the action to
perform. EcoLight works in environments with discrete actions (*) and has been tested
on a real-world dataset. The performance of EcoLight is evaluated by using the MSE,
MAE, noise levels, CO
2
emission, and fuel consumption as metrics. In this work, the
authors tackle the issue of developing a control method that considers not only mobility
and current traffic conditions but also integrates sustainability and proactivity.
On the other hand, Alizadeh Shabestray and Abdulhai (2019) present Multimodal
iNtelligent Deep (MiND), a DRL-based traffic signal controller that considers both
regular vehicles and public transit and leverages sensors’ information, like occupancy,
position, and speed, to optimize the flow of people through an intersection by using
DQN. In developing MiND, the authors regard an environment characterized by discrete
states (*) and action and a stochastic transition method and test the proposed approach
on a synthetic dataset. To assess the performance of the proposed approach the follow-
ing measures are used: average intersection travel time, average in queue time, average
network travel time, and weighted average intersection person travel time. In this work,
the authors have to fulfill some important requirements to develop a real-time adaptive
traffic signal controller. Indeed, the controller has to consider both regular vehicles and
public transit traffic, and leverage sensors’ data on vehicle speed, position, and occu-
pancy, moreover, the decision-making process should be fast.
Aziz et al. (2018) present an RL-based approach to control traffic signals in con-
nected vehicle environments for reducing travel delays and GHG emissions. The pro-
posed method, the R-Markov Average Reward Technique (RMART), leverages conges-
tion information sharing among neighbor signal controllers and a multi-reward structure
that can dynamically adapt the reward function according to the level of congestion at
intersections. The considered environment presents discrete state (*) and action spaces
(*) and a stochastic (*) transition model. The authors test RMART on a synthetic data-
set and to evaluate its performance they use as metrics the average delay, stopped delay,
number of stops, and network-wide delay, while to assess the performance from a sus-
tainability point of view they leverage GHG emissions, i.e., CO, CO
2
, NOX, VOC,
PM10. Finally, this work deals with the traffic signal control problem to reduce travel
delays and GHG emissions by addressing the following issues: the sharing of conges-
tion information among neighbor signal controllers and the dynamic adaptation of the
reward function on the base of congestion level.
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
M.Zuccotto et al.
1 3
88 Page 26 of 68
Reducing the number of drivers who commute in search of car parking in urban centers
has a positive impact on environmental sustainability. In this context, Khalid etal. (2023)
propose a Long-range Autonomous Valet Parking framework that optimizes the path plan-
ning of AVs to minimize distance while serving all users by picking them up and dropping
them off at their required spots. The authors propose two learning-based solutions: Double-
Layer Ant Colony Optimization (DL-ACO) and DQN-based algorithms. DL-ACO can be
applied in new or unfamiliar environments, while DQN can be used in familiar environ-
ments to make efficient and fast decisions since it is pre-trainable. The DL-ACO approach
determines the most efficient path between pairs of spots and subsequently establishes the
optimal order in which users can be served. To deal with dynamic environments, it is pro-
posed a DQN-based algorithm in which the agent learns to solve the task by interacting
with the environment and using memory experience replay and the target network. The
proposed techniques aim to improve the carpool and parking experience while reducing
the congestion rate. In this work, the environment considered is characterized by discrete
states (*) and actions (*), and a deterministic (*) transition model. The proposed approach
is tested on a synthetic dataset and execution time, reward, path planned, and distance are
used as performance measures. In this work, the authors deal with path planning problems
in dynamic environments while ensuring the quality of experience for each user, optimiz-
ing the order of user pick-up and drop-off, and finally minimizing the overall distance.
5.3.8 Buildings
Buildings are interesting from a DR and Demand Side Management point of view. In this
context, Kathirgamanathan etal. (2021) leverage a DRL algorithm, namely Soft Actor-
Critic (SAC), intending to automatize energy management and harness energy flexibility
by controlling the cooling set point in a commercial building environment. In developing
the proposed approach, the authors regard an environment with continuous states, and they
evaluate the performance on a dataset that combines real-world and synthetic data using as
evaluation metrics the energy purchased, energy cost, discomfort, total reward, tempera-
ture evolution, and power demand. Kathirgamanathan etal. (2021) tackle the application
of DRL methods to automatize DR without the need for a specific building model and their
robustness to different operating environments and scalability. Moreover, the authors point
out that the lack of well-established environments makes it challenging to compare RL
algorithms over different buildings.
De Gracia etal. (2015) instead consider Thermal Energy Storage, and in particular
latent heat, techniques to maximize energy savings by leveraging a Ventilated Double
Skin Facade (VDSF) with Phase Change Material (PCM) used as a cold energy storage
system. By using an RL approach, i.e., SARSA(
𝜆
), the authors control the VDSF to opti-
mally schedule the solidification of PCM through mechanical ventilation during nighttime
and the stored cold release into the indoor environment at peak demand time, considering
weather and indoor conditions. The environment considered in this work presents discrete
states (*), discrete actions (*), and a deterministic transition model. Moreover, the pro-
posed approach is evaluated on a synthetic dataset considering electrical energy savings.
This work aims to maximize energy savings by considering both the benefit of VDSF and
the energy used in the solidification process. Therefore, it is crucial to determine the best
time for the charging process to solidify the PCM and store coldness.
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Reinforcement learning applications inenvironmental…
1 3
Page 27 of 68 88
5.3.9 Manufacturing
Manufacturing industries are among the largest energy consumers, so it is crucial to
develop approaches that make them more energy efficient. In this regard, Wang and Wang
(2022) tackle the Energy-Aware Distributed Hybrid Flow-shop Scheduling Problem (EAD-
HFSP). The goal of this work consists of simultaneously minimizing two conflicting
objectives, makespan, and Total Energy Consumption. To this aim, the authors formulate
a mixed-integer linear programming model of the EADHFSP and combine a Coopera-
tive Memetic Algorithm with an RL-based agent to solve the problem. The authors com-
bine two heuristics to initialize the population with various solutions and finally propose
an improvement scheme in which solutions are refined by using the appropriate operator
determined by a policy agent, while the solution selection is performed through the use of
a decomposition strategy for balancing convergence and diversity. In this work, it is con-
sidered an environment characterized by discrete state (*) and action (*) spaces, a deter-
ministic (*) transition model, and the performance of the presented approach is tested on
a synthetic dataset considering the Overall Nondominated Vector Generation, C Metric,
Hyper volume, and D1
R
as evaluation metrics. This work addresses the EADHFSP with the
minimization of makespan and total energy consumption, a challenging problem due to the
simultaneous optimization of two conflicting objectives.
Leng etal. (2021) focus on Printed Circuit Board (PCB) manufacturing and propose a
Loosely-Coupled Deep Reinforcement Learning (LCDRL) model for energy-efficient order
acceptance decisions. The authors leverage DL, specifically a Convolutional Neural Net-
work(LeCun 1989), to obtain an accurate prediction of the production cost, makespan,
and carbon consumption of each order by considering historical order labeled data. Then,
the proposed approach combines the forecasted data with order features to decide whether
to accept the order and determine the optimal acceptance sequence by using a reinforce-
ment learning approach based on Q-Learning. The authors regard an environment with
discrete actions (*) and a stochastic transition model, and they test the proposed method
on a synthetic dataset (real-world-based). As performance measures, the metrics MSE,
MSLE, RMSE, and R2 are used to evaluate the prediction accuracy of LCDRL, while the
performance of the approach is assessed in terms of unit profit, total profit, and accept-
ance rate. This work tackles the problem of order acceptance in PCB manufacturing to
achieve energy efficiency, reduce carbon emissions, and improve material usage. Two criti-
cal aspects of PCB manufacturing are demand uncertainty and order customization which
can lead to different profits, energy consumption, and carbon emissions. These two factors
have to be considered in production planning under production constraints.
5.3.10 Mobile andwireless communication
Sustainable energy infrastructures need high-quality communication systems to connect
user facilities and power plants for providing information interaction. In this context, Liu
etal. (2021) propose the use of a 6G network and Intelligent Reflective Surface (IRS) tech-
nology to create a wireless networking platform and suggest a DRL method to optimize the
phase shift of IRS and therefore improve the communication quality. Combining the 6G
Network with the IRS technology, the authors provide high-quality coverage while gain-
ing energy-saving benefits. In particular, this work proposes the application of the Deep
Deterministic Policy Gradient (DDPG) (Lillicrap etal. 2016) algorithm to configure the
IRS phase shift for enhancing system coverage. The authors consider an environment
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
M.Zuccotto et al.
1 3
88 Page 28 of 68
characterized by continuous state and action spaces. The performance of the proposed
approach is assessed on a synthetic dataset using two reflection units as metrics: the achiev-
able rate to measure the service quality and the transmission power. Developing sustaina-
ble energy infrastructure is challenging from several points of view. Liu etal. (2021) tackle
the need for an effective global covering communication system using the IRS technology,
whose phase shift configuration is challenging itself.
In the context of two-tier urban Heterogeneous Networks (HetNets), Miozzo et al.
(2015) model the Small Cell (SC) network as a decentralized multi-agent system. The
authors’ goal consists of improving system performance and self-sustainability of the SCs
in terms of energy consumption. To this aim, they leverage the distributed Q-Learning
algorithm so that every agent learns an appropriate Radio Resource Management (RRM)
policy. Miozzo etal. (2015) is extended in Miozzo etal. (2017). Here, it is proposed to train
offline the algorithm to compute Q-values with which initialize the Q-tables of the SCs
that will be used in the online method. In both approaches, the environment presents dis-
crete states (*) and actions (*) and the dataset used is synthetic. In both works, the authors
evaluate the proposed approaches in terms of network performance by using the through-
put gain and traffic drop rate and their energy performance in terms of energy efficiency
and energy efficiency improvement. Moreover, Miozzo etal. (2015) analyze the behavior
of the HetNet considering traffic demand, harvested energy, battery level, policy, and nor-
malized load at the macro Base Station (BS). Also, the authors consider as performance
metrics the total amount of energy the system spent, the average load, the average cell load
for the macro BS battery outage, and Jain’s fairness index to assess the Quality of Service
(QoS) improvement. Finally, Miozzo etal. (2017) assess the computed policy by leverag-
ing the switch-off rate as a performance measure and use the battery level to analyze the
convergence of the online algorithm and evaluate the excess energy over storage capacity.
Both works address the problem of introducing energy harvesting into the computation of
sleeping strategies to achieve energy efficiency. This is challenging due to the irregular and
intermittent nature of renewable energies.
Giri and Majumder (2022), instead, leverage a Deep Q-Learning algorithm for optimiz-
ing resource allocation in energy-harvested cognitive radio networks, where primary users
networks share channel resources with secondary users and nodes can harvest energy from
the environment, such as solar or wind. The proposed approach addresses the dynamic allo-
cation of resources to achieve optimal network and throughput capacity, considering QoS,
energy constraints, and interference limitations. Moreover, the authors utilize both linear
and non-linear energy-harvested models, proposing a novel reward function that incorpo-
rates the non-linear one/model. The proposed approach works in environments character-
ized by continuous states (*), discrete actions (*), and a stochastic transition model (*) and
it has been tested on a synthetic dataset by using reward, capacity, network lifetime, and
average delay as performance measures. Giri and Majumder (2022) address the limitations
of Q-Learning-based allocation methods, then allow dealing with high-dimensional prob-
lems, improving convergence performance, and efficiently harnessing the collected energy
to meet the network’s QoS requirements.
Internet traffic has increased in recent years, and in the development of next-genera-
tion networks, it is important to address the QoS issue sustainably. In this context, Al-
Jawad etal. (2021) propose an RL-based algorithm to solve routing problems in a Software
Defined Network (SDN) environment, named Reinforcement lEarning-based Dynamic
rOuting (REDO). Indeed, the proposed approach leverages Q-Learning to handle traffic
flows by determining the most appropriate routing strategy among a set of conventional
routing algorithms with the aim to maximize flows meeting the Service Level Agreement
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Reinforcement learning applications inenvironmental…
1 3
Page 29 of 68 88
as to throughput, packet loss, and rejection rate. In developing REDO, the authors con-
sider an environment with discrete state (*) and action spaces (*). The performance of the
proposed approach is evaluated on a synthetic dataset in terms of throughput, packet loss,
rejected flows, PSNR, and Mean Opinion Score (MOS). In the development of next-gen-
eration networks like SDN, Al-Jawad etal. (2021) address the problem of providing QoS
sustainably through the solution of a traffic flow routing problem.
5.3.11 Electric energy
One way to increase environmental sustainability is to improve the energy efficiency of
smart hubs. To this aim, Sheikhi etal. (2016) present the new Smart Energy Hub frame-
work for modeling distinct energy infrastructures under a single framework. The authors’
goal consists of optimizing the electrical and natural gas consumption of a residential
customer through the use of Q-Learning. Moreover, to improve and support information
management among users and utility service providers, the proposed framework lever-
ages Cloud Computing based systems. In this case the only technical information avail-
able concern the use of a stochastic transition model and a synthetic dataset. To evaluate
the performance of the proposed approach, the metrics used are the storage charge level,
the operational cost, and the primary energy involved. As regards dynamic load manage-
ment in smart hubs, the authors tackle two issues. The first one is related to energy system
parameters which are often assumed to be constant but can vary with time or be stochastic
in practice. The second one is, instead, related to the conventional smart grid architecture,
which has several reported issues, including exposure to cyber-attacks, single failure prob-
lems, limited memory and storage capacity in the energy management system, and difficul-
ties in implementing real-time early warning systems due to limited energy and bandwidth
resources.
In the context of smart energy networks, Harrold etal. (2022) consider a microgrid
environment and leverage DRL to control a battery for energy arbitrage and increased
use of renewable energies, namely solar and wind energy. Specifically, the authors apply
the Rainbow Deep Q-Network (Hessel et al. 2018) algorithm and add predicted values
for demand, Renewable Energy Source (RES), and energy price to agents’ information
by leveraging an Artificial Neural Network. In this work, the environment considered is
characterized by continuous states (*) and discrete actions. The authors test the proposed
approach on a dataset that considers both real-world and synthetic data and assess the pre-
diction accuracy using the MAPE. Also, they evaluate the performance of the proposed
approach through energy cost savings, relative savings, episodic rewards, and value distri-
bution. This work tackles the problem of controlling an Energy Storage System in a micro-
grid with its demand, RES, and dynamic energy pricing to perform energy arbitrage and
improve the use of RES leading to reduced energy cost. Finally, the authors point out the
limited availability of data that requires an efficient algorithm training procedure.
A key aspect of sustainability and cost-effectiveness in grid operation is optimal energy
dispatch. Jendoubi and Bouffard (2022) address a multi-dimensional power dispatch prob-
lem within a power system by leveraging MARL, specifically the MADDPG algorithm.
The proposed control framework performs CTDE to improve the coordination among
dispatchable units without communication needed, and thus it mitigates data privacy and
communication issues. In developing the presented approach, it is considered an environ-
ment with continuous actions and a stochastic transition model (*). The dataset used to
evaluate the performance is synthetic and the proposed method is evaluated in terms of the
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
M.Zuccotto et al.
1 3
88 Page 30 of 68
annual total cost, variation of daily operation cost, photovoltaics (PV) production, aggre-
gated demand, amount of power to be charged/discharged, amount of power provided by
a diesel generator, amount of electricity delivered by the electricity provider, the differ-
ence in the amount of electricity delivered by the electricity provider between two consecu-
tive time steps and Peak-to-Average Ratio (PAR). The authors address the energy dispatch
aspects related to the development of distributed energy resources control strategies in grid
operation to simultaneously reduce costs and delays and allow local coordination among
energy resources.
5.3.12 Energy
In recent years, international trade and container handling at port terminals have increased
greatly. Improving sustainability in port operations closely relates to the energy consump-
tion at Automated Container Terminals, where Automatic Stacking Cranes (ASCs) are
used to load, unload, and pile containers. In this context, Gao etal. (2023) propose a digi-
tal twin-based approach for container yard management. Specifically, this work focuses on
determining the optimal allocation of container tasks and scheduling of ASCs to reduce
the energy consumption of ASCs while maintaining efficient loading and unloading opera-
tions. The proposed approach leverages a virtual container yard to simulate the operating
plan and mixed integer programming model to optimize the scheduling problem taking
into account the energy consumption. Finally, the authors use the Q-Learning algorithm
to determine the optimal scheduling plan and minimize energy consumption. The envi-
ronment considered in this work presents discrete actions (*) and a stochastic (*) transi-
tion model. The performance of the proposed approach is evaluated on a real-world dataset
using working and non-working energy consumption and the ratio between them as met-
rics. To improve the sustainability of port operations, in this work the problem of optimiz-
ing container yard operations to minimize energy consumption is addressed. Indeed, sev-
eral factors can introduce randomness and uncertainty into these operations, and incorrect
distribution of tasks can lead to suboptimal utilization of ASCs.
5.3.13 Wireless sensor network
Regarding embedded systems powered by a renewable energy source like an Energy Har-
vesting Wireless Sensor Node (EHWSN), Hsu etal. (2014) present a method called Rein-
forcement Learning-based throughput on-demand provisioning dynamic power management
(RLTDPM). By leveraging the Q-Learning algorithm, the proposed approach allows the
EHWSN to adapt the operational duty cycle to satisfy both the energy neutrality condition
and the throughput on-demand (ToD) requirement, ensuring perpetual operation. In devel-
oping RLTDPM, the authors regard an environment characterized by discrete state (*) and
actions (*) and evaluate the performance on a synthetic dataset by considering the residual
battery energy (RBE), exercised duty cycle (EDC), offset to the required ToD (OTRT), and
ToD achievability. In this work, the authors address the problem of simultaneously achiev-
ing two mutually conflicting goals i.e., satisfying ToD and reducing power consumption.
Energy-Harvesting Wireless Sensor Networks (WSNs) are widely used in energy-con-
strained operation problems. In particular, Chen etal. (2016) focus on Solar-Powered Wire-
less Sensor Networks (SPWSNs) and present an RL-based Sleep Scheduling for Coverage
algorithm to improve the sustainability of SPWSN’s operations. The proposed approach
leverages a precedence operator in the group formation algorithm to prioritize sensors in
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Reinforcement learning applications inenvironmental…
1 3
Page 31 of 68 88
sparsely covered areas ensuring the desired coverage distribution. Then, the authors pro-
pose a multi-sensor cooperation Q-Learning group model to properly choose nodes’ working
modes by leveraging the developed learning and action selection strategies. The whole group
learns the sleep schedule by changing the role of the active node. The environment consid-
ered in this work presents discrete state (*) and action (*) spaces and a stochastic transition
model. The proposed approach is tested on a dataset that combines real-world and synthetic
data, and its performance is evaluated in terms of energy balancing between group members
by using the potential energy of nodes as metrics, network lifetime, area coverage ratio, num-
ber of residual alive nodes versus the network lifetime, and the recharging cycle. In this work,
the authors tackle a sleep scheduling problem to simultaneously achieve the desired area cov-
erage and energy balance between group nodes to extend the network lifetime.
On the other hand, Feng etal. (2023) propose an RL-based approach to maximize data
throughput in self-sustainable WSNs. The authors consider a Mobile Sensor (MS) that col-
lects and transmits data to a fixed sink while moving within the network and harvesting
energy from the environment. By leveraging DDPG, the MS can determine the optimal
trajectory to optimize the EH performance and data transmission dealing with unknown
energy supply dynamics. The environment considered in this work is characterized by
continuous states and actions, and a stochastic transition model (*). Moreover, the perfor-
mance of the proposed approach is assessed on a synthetic dataset considering as evalua-
tion metrics the distribution of the ratio between the expected per-slot harvested energy and
the MS-to-sink distance within the network, moving trajectories of the MS, reward, actor-
loss, battery level, accuracy, moving steps, convergence, and training time. The authors
tackle two main challenges concerning the MS’s trajectory optimization to maximize data
throughput. The first relates to the lack of energy-related information such as the energy
sources’ placement, future energy harvesting potential, and statistical parameters like the
average energy harvesting rate, which makes the problem challenging. The second consists
of the tradeoff between energy harvesting and data transmission. Indeed, moving closer
to energy sources allows the MS to increase the energy harvesting amount. However, this
may lead to decreasing data transmission power due to a possible increase in the distance
between the MS and the sink.
5.3.14 Autonomous vehicles
In the last decade, Unmanned Aerial Vehicles (UAVs), i.e., drones, have been used in vari-
ous scenarios such as rapid disaster response, Search-And-Rescue, environmental monitor-
ing, etc., where humans are unable to operate in a timely and efficient manner, for example,
due to the presence of physical obstacles. Bouhamed etal. (2020) consider the application
of UAVs as mobile data collection units in delay-tolerant WSNs. The authors propose a
mechanism that exploits two RL techniques, namely DDPG and Q-Learning algorithms.
The proposed approach uses DDPG to determine the best trajectory for a UAV to reach the
target destination while avoiding obstacles in the environment. Q-Learning, on the other
hand, is used to schedule the best order of visiting nodes to minimize the time needed to
collect data within a predefined time limit. In this work, the environment presents con-
tinuous state (*) and action spaces for the DDPG-based part of the approach while discrete
states (*) and actions (*) for the Q-Learning-based one. The proposed mechanism is tested
on a synthetic dataset and to evaluate its obstacle avoidance and scheduling performance,
the authors analyze the path followed by the UAV, the reward collected, the UAV’s bat-
tery level, and the completion time of the tour against the ground unit transmission power.
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
M.Zuccotto et al.
1 3
88 Page 32 of 68
This work addresses issues related to the limited battery capacity of UAVs and challenges
related to navigating in obstacle-prone environments to enable communication between the
UAV and low transmission power sensors.
Sacco etal. (2021) propose a MARL approach based on the actor-critic framework to
tackle task offloading problems from UAV swarms in edge computing environments to
simultaneously reduce task completion time and improve energy efficiency. The proposed
approach determines a distributed decision strategy through the collaboration among the
system’s mobile nodes that share information about the overall system state. This informa-
tion is then used by the agents to decide whether to compute a task locally or offload it to the
edge cloud and in this case, the proposed technique chooses the best transmission technol-
ogy among Wi-Fi access points and mobile network. In developing the proposed approach,
the environment considered presents continuous states (*) and discrete actions (*), and the
dataset used for testing combines real-world and synthetic data. The performance of the pre-
sented techniques is assessed in terms of task completion time and utility against a varying
number of agents, and average node-antenna distance. Then, the authors evaluate the energy
consumption necessary to complete the task by varying the average node-antenna distance
and computing workload and assess the task completion time against the average comput-
ing workload. In addition, the cumulative distribution function (CDF) and utility evolution
through episodes are considered to analyze the variability of performance among nodes and
convergence performance, respectively. Finally, in this work, the authors tackle the problem
of reducing the time necessary for task completion of UAV swarms by leveraging task off-
loading to the edge cloud while improving energy efficiency.
In the context of autonomous driving, Gu et al. (2023) tackle the application of RL
methods focusing on energy-saving and environmentally friendly driving strategies within
a cooperative adaptive cruise control platoon. More precisely, the goal of this work consists
of training platoon member vehicles to react effectively when the leading vehicle faces a
severe collision. The authors leverage the Policy Gradient algorithm for training an RL
agent to minimize the energy consumption in inter-vehicle communication for decision-
making while avoiding collisions or minimizing the resulting damage. To this aim, two
different loss functions are used, i.e., collision loss and energy loss. Moreover, utilizing
a specific reward function can both ensure the vehicle’s safety and consider the fuel con-
sumption resulting from the action performed by the vehicle. This work considers an envi-
ronment characterized by continuous states (*), discrete actions, and a stochastic transition
model (*). The proposed approach has been tested on a synthetic dataset using energy loss,
collision loss, reward, and lane changes as metrics. A key challenge of green autonomous
driving addressed in this work is the development of effective strategies that can respond to
environmental observations by automatically generating appropriate control signals.
6 Discussion
The analysis of the literature performed in this work shows that most of the works about RL for
environmental sustainability concern the energy application domain, followed by urban traffic
and transportation. The main RL technique used in the reviewed manuscripts is Q-Learning.
Concerning the 35 selected articles, we observe that energy-related issues involve most of the
papers, and about half of them leverage DRL approaches, such as DQN and DDQN. In devel-
oping the proposed methods, the authors mainly consider domains with discrete state and action
spaces, and stochastic transition models, using synthetic datasets to evaluate the performance.
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Reinforcement learning applications inenvironmental…
1 3
Page 33 of 68 88
Problems related to environmental sustainability were traditionally tackled with
optimization techniques in which the concept of adaptability has to be introduced
explicitly. In contrast, one of the strengths of RL is its natural way of dealing with
adaptability to changing or different environments, a crucial feature in environmen-
tal sustainability problems since in this context the agent has to handle variations in
operating conditions due to, for example, changes in resource availability or weather
conditions. For instance, Chen etal. (2016) introduce a RL-based Sleep Scheduling
for Coverage (RLSSC) approach to ensure sustainable time-slotted operations in solar-
powered wireless sensor networks. This algorithm is compared to LEACH(Heinzel-
man et al. 2002), a high-energy-efficient hierarchical routing protocol, wherein the
node chosen to be active in the current round is ineligible for selection in the subse-
quent round, and a random algorithm that randomly determines active nodes within a
group. Among the various aspects considered, a crucial criterion for evaluating algo-
rithm effectiveness lies in maintaining equilibrium in energy levels, as significant dis-
parities in current residual energy arise when a node receives an energy supplement.
RLSSC initially exhibits fluctuations but eventually converges through iterative learn-
ing, exhibiting slight oscillations up and down in response to varying solar strength
throughout the day. Moreover, the proposed approach demonstrates real-time energy
balancing among sensor nodes. In contrast, non-RL-based methods lack the capacity
to adapt to the dynamic environment. Another aspect to consider is network lifetime,
where RLSSC excels in adapting to uncertainties associated with harvesting time and
the amount of acquired energy. This adaptability enables RLSSC to dynamically adjust
its scheme in real-time, effectively extending the overall network lifetime. This is only
one of several examples showing that RL can provide a strong advantage in solving
problems related to environmental sustainability because of its natural capability to
deal with uncertainty and adaptation in sequential decision-making.
However, we identify several open problems in the application of RL techniques to
environmental sustainability. These concern scalability, data efficiency, and the necessity
to deal with large data volumes, often posing cost challenges. In future developments, it
is crucial to improve pre-training methods that allow the generation of initial policies by
simulation and leverage knowledge acquired by solving a related task. RL methods are
also sensitive to reward function therefore reward engineering is important to avoid a nega-
tive impact on performance. Moreover, in dealing with environmental sustainability prob-
lems in specific contexts like IoT, it is of particular importance to consider the presence of
computational limitations and then optimize the computational complexity of the method.
Finally, we note that most of the approaches involve single-agent systems. Extending the
proposed approaches to the multi-agent context would allow the cooperative computa-
tion of optimal policies accounting for common performance objectives to improve shared
resources management and environmental sustainability.
7 Conclusions
This review focuses on the application of RL techniques to address environmental sus-
tainability challenges, a topic of increasing interest in the international scientific commu-
nity. We have examined several contexts where RL techniques have been recently used to
enhance environmental sustainability, offering practitioners insights into state-of-the-art
methodologies across diverse application domains. RL has found practical application in
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
M.Zuccotto et al.
1 3
88 Page 34 of 68
environmental sustainability because the inherent uncertainty of this domain poses chal-
lenges to strategy learning and adaptation that can be naturally tackled by RL. The review of
the literature performed in this survey has identified the most common applications of RL in
environmental sustainability and the most popular methods used to address these challenges
in the last two decades. We have first provided a quantitative analysis of the state-of-the-
art related to the application of RL in environmental sustainability and then analyzed the
use of these techniques, focusing on sustainability concerns. In particular, we have provided
an overview of the application domains of the proposed RL techniques and the approaches
used for environmental sustainability issues. Moreover, we have narrowed our attention to
35 selected papers and provided technical information on the formalization of the RL prob-
lem, the performance measures adopted for evaluation, and the challenges addressed.
Keywords mapping intomacro areas
See Tables4, 5, 6, 7 , 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20.
Table 4 Keywords grouped in the “Energy” macro area
Energy
Energy Energy aware Energy conservation
Energy consumption Energy distributions Energy efficiency
Energy harvesting Energy management Energy management systems
Energy resource Energy source Energy storage
Energy sustainability Energy systems Energy utilization
Distributed energies Dynamic energy Energy allocations
Energy arbitrages Energy availability Energy consumption balances
Energy flexibility Energy infrastructures Energy management strategy
Energy market Energy neutrality Energy optimal scheduling
Energy routing Energy savings Energy storage system
Energy theft Energy transfer Energy usage
Energy use Energy-awareness Energy-constrained networks
Energy-saving strategies Harvesting energies Intelligent energy management
Non-renewable energy Total energy consumption Wireless energy transfers
Energy trading
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Reinforcement learning applications inenvironmental…
1 3
Page 35 of 68 88
Table 5 Keywords grouped in the “Electric energy” macro area
Electric energy
Electric energy storage Electric load dispatching
Electric power transmission networks Electric power utilization
Dynamic energy managements Dynamic loads
Dynamic power management Electric load management
Electric loads Electric power system control
Electric power transmission Electrical networks
Electricity demands Electricity loss
Micro grid Smart grid
Smart power grids Electricity grids
Grid resilience Power grids
Electricity grids Smart grid communications
Grid resilience Distributed power generation
Power management Power supply
Low power electronics Dynamic power management
Electric power system control Electric power transmission
inductive power transmission Low power
Low power and lossy network (lln) Low power networks
Low-power consumption Low-power devices
Power Power allocations
Power dispatch Power grids
Power harvesting Power limitations
Power system simulator for engineering Power system simulators
Power traces Power control
Power transmission systems Reactive power
Reactive power output Power management (telecommunication)
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
M.Zuccotto et al.
1 3
88 Page 36 of 68
Table 6 Keywords grouped in the “Urban Traffic and Transportation” macro area
Urban traffic and transportation
street traffic control Sustainable mobility Traffic congestion
Traffic emission Traffic flow Traffic light control
Traffic management Traffic signal control Traffic signals
Adaptive traffic signal control Intelligent traffic controls Optimal traffic control
Traffic Traffic conditions Traffic environment
Traffic light Traffic management strategies Traffic scheduling
Urban traffic Highway administration Motor transportation
Transportation Transportation system Urban transportation
Bus bunching Bus transportation Sustainable transportation
Intelligent transportation Transport systems Transportation network
Transportation planning
Table 7 Keywords grouped in the “Water resources” macro area
Water resources
Aquifer Ground water Wastewater
Wastewater treatment Water management Water quality
Water resources Water resources management Water supply
Water treatment Surface water Sustainable wastewater treatments
Waste water management Waste water recycling Wastewater treatment plant
Water metabolism Water purification Water resources systems
Water treatment plants Groundwater resources Water level
Water quantity Watersheds
Table 8 Keywords grouped in the “Renewable/sustainable energies” macro area
Renewable/sustainable energies
Renewable energies Renewable energy resources Renewable energy source
Renewable resource Renewables Alternative energy
Solar energy Green energy Smart renewable energy
Use of renewable energies Sustainable energy Solar power generation
Renewable power generation Tidal power Wind power
Hydroelectric power plants Hydropower
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Reinforcement learning applications inenvironmental…
1 3
Page 37 of 68 88
Table 9 Keywords grouped in the “Emissions/Pollution” macro area
Emissions/Pollution
Carbon emission Carbon footprint Emission control
Gas emissions Greenhouse gas Greenhouse gas emissions
Acoustic noise Air pollution monitoring Atmospheric pollution
Carbon abatement strategy Carbon sequestration Co2 emissions
Greenhouse emissions Greenhouse gas emission reduction Groundwater pollution
Noise pollution Pollution control Vehicular emission
Water pollution Air quality
Table 10 Keywords grouped in the “Mobile and Wireless communication” macro area
Mobile and wireless communication
5G mobile communication systems Mobile telecommunication systems
6g mobile communication Mobile communications
Base stations Small cells
Wireless communication links Wireless communications
Wireless telecommunication systems Sustainable wireless communication network
Wireless communications networks Communication network
Heterogeneous networks Wireless powered communication network
Table 11 Keywords grouped in the “Data” macro area
Data
Data acquisition Data handling Data transfer
Digital storage Big data Data aggregation
Data aggregation and fusion Data analytics Data distribution
Data logger Data mining Data sensing
Data-driven approach Data-driven design Database technology
Datacenter Distributed database Next generation data centers
Data transfer Data-communication Real-time data
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
M.Zuccotto et al.
1 3
88 Page 38 of 68
Table 12 Keywords grouped in
the “Wireless sensor network”
macro area
Wireless sensor network
Wireless sensor network Wireless sensor node
Heterogeneous wireless sensor networks Solar-powered wire-
less sensor networks
Rechargeable sensor networks Wireless smart sensors
Sensor nodes Integrating sensors
Sensor networks Wireless smart sensors
Sensor networks Smart sensors
Adaptive sensor selections Mobile sensors
Sensor Sensor payloads
Sleep scheduling
Table 13 Keywords grouped
in the “Autonomous vehicles”
macro area
Autonomous vehicles
Autonomous driving Autonomous navigation
Autonomous vehicles Unmanned aerial vehicles (uav)
Autonomous unmanned aerial
vehicles
Uav networks
Unmanned aerial vehicle Uav positioning
Autonomous vehicle control Autonomous ship
Auto-navigation Automated vehicles
Table 14 Keywords grouped in
the “Batteries” macro area Batteries
Battery energy storage
systems
Battery management
systems
Battery storage
Charging (batteries) Secondary batteries Battery operation
Electric batteries Battery capacity Lead acid batteries
Lithium batteries Residual battery
Table 15 Keywords grouped in
the “Manufacturing” macro area Manufacturing
Manufacture Manufacturing
Production control Supply chains
Sustainable manufacturing Distributed manufacturing systems
Manufacturing environments Large scale manufacturing systems
Manufacturing industries Manufacturing sector
Printed circuit board manufac-
turing
Process manufacturing
Sustainable manufacturing engineering and resource-efficient
production
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Reinforcement learning applications inenvironmental…
1 3
Page 39 of 68 88
Database search results
In the following section, we report the results of the search performed on the databases
used, namely Scopus and Web of Science. In the following table, for each paper, we
indicate the authors, title, source title, and publication year in addition to the database in
which a corresponding record is present. More specifically, in the “Database” column,
we use the letters “S” and “W” to indicate the presence of the paper on the Scopus and W
databases, respectively, while “S, W” indicates the paper’s presence on both databases.
Table 16 Keywords grouped in
the “Electric vehicles” macro
area
Electric vehicles
charging station Electric vehicle charging
Electric vehicle charging station Electric vehicles
Electric vehicle charging station recom-
mendation
E mobilities
Fuel cell hybrid electric vehicles Vehicle-to-grid
Table 17 Keywords grouped in
the “IoT” macro area IoT
Internet of things Internet of underwater
things
Green internet of thing
Industrial internet of
thing
Table 18 Keywords grouped in
the “Agriculture” macro area Agriculture
Agricultural robots Agriculture Crops
Agricultural productions Agricultural products Crop productivity
Smart agricultures Sustainable agricultural Sustainable agri-
cultural system
Sustainable agriculture
Table 19 Keywords grouped in the “Vehicles” macro area
Vehicles
Intelligent vehicle highway systems Vehicles Transit vehicles Vehicle dispatch Vehicle fleets
Table 20 Keywords grouped in the “Buildings” macro area
Buildings
Building Building coverage ratios Building energy
Building energy flexibility Building energy management
systems
Building energy managements
Building stocks College buildings Intelligent buildings
Office buildings
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
M.Zuccotto et al.
1 3
88 Page 40 of 68
Authors Title Source title Year Database
Sultanuddin S.J., Vibin R.,
Rajesh Kumar A., Behera
N.R., Pasha M.J., Baseer
K.K.
Development of improved
reinforcement learning
smart charging strategy
for electric vehicle fleet
Journal of Energy Storage 2023 S, W
Ajao L.A., Apeh S.T. Secure edge computing vul-
nerabilities in smart cities
sustainability using petri
net and genetic algorithm-
based reinforcement
learning
Intelligent Systems with
Applications
2023 S
Wang J., Sun L. Robust dynamic bus
control: a distributional
multi-agent reinforcement
learning approach
IEEE Transactions on
Intelligent Transportation
Systems
2023 S, W
Szoke L., Aradi S., Bécsi T. Traffic signal control with
successor feature-based
deep reinforcement learn-
ing agent
Electronics (Switzerland) 2023 S, W
Ali M.Y., Alsaeedi A., Shah
S.A.A., Yafooz W.M.S.,
Malik A.W.
Energy efficient data dis-
semination for large-scale
smart farming using
reinforcement learning
Electronics (Switzerland) 2023 S, W
Yao R., Hu Y., Varga L. Applications of agent-based
methods in multi-energy
systems-a systematic
literature review
Energies 2023 S, W
Kazemeini A., Swei O. Identifying environmentally
sustainable pavement
management strategies
via deep reinforcement
learning
Journal of Cleaner Produc-
tion
2023 S, W
Emamjomehzadeh O., Kera-
chian R., Emami-Skardi
M.J., Momeni M.
Combining urban metabo-
lism and reinforcement
learning concepts for sus-
tainable water resources
management: A nexus
approach
Journal of Environmental
Management
2023 S
Charef N., Ben Mnaouer A.,
Aloqaily M., Bouachir O.,
Guizani M.
Artificial intelligence
implication on energy
sustainability in Internet
of Things: A survey
Information Processing and
Management
2023 S, W
Savazzi S., Rampa V.,
Kianoush S., Bennis M.
An energy and carbon
footprint analysis of
distributed and federated
learning
IEEE Transactions on
Green Communications
and Networking
2023 S, W
Naseer F., Khan M.N.,
Altalbe A.
Telepresence robot with
DRL assisted delay com-
pensation in iot-enabled
sustainable healthcare
environment
Sustainability (Switzerland) 2023 S
Kolat M., Kővári B., Bécsi
T., Aradi S.
Multi-Agent reinforcement
learning for traffic signal
control: a cooperative
approach
Sustainability (Switzerland) 2023 S, W
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Reinforcement learning applications inenvironmental…
1 3
Page 41 of 68 88
Authors Title Source title Year Database
Sivamayil K., Rajasekar E.,
Aljafari B., Nikolovski
S., Vairavasundaram S.,
Vairavasundaram I.
A systematic study on rein-
forcement learning based
applications
Energies 2023 S, W
Khalid M., Wang L., Wang
K., Aslam N., Pan C.,
Cao Y.
Deep reinforcement
learning-based long-range
autonomous valet parking
for smart cities
Sustainable Cities and
Society
2023 S
Gao Y., Chang D., Chen
C.-H.
A digital twin-based
approach for optimiz-
ing operation energy
consumption at automated
container terminals
Journal of Cleaner Produc-
tion
2023 S
Badakhshan S., Jacob R.A.,
Li B., Zhang J.
Reinforcement learning for
intentional islanding in
resilient power transmis-
sion systems
2023 IEEE Texas Power
and Energy Conference,
TPEC 2023
2023 S
Zhang W., Valencia A.,
Chang N.
Fingerprint networked
reinforcement learning
via multiagent modeling
for improving decision
making in an urban food-
energy-water nexus
IEEE Transactions on
Systems, Man, and Cyber-
netics: Systems
2023 S, W
Li C., Bai L., Yao L., Waller
S.T., Liu W.
A bibliometric analysis and
review on reinforcement
learning for transportation
applications
Transportmetrica B 2023 S, W
Venkataswamy V., Grigsby
J., Grimshaw A., Qi Y.
RARE: renewable energy
aware resource manage-
ment in datacenters
Lecture Notes in Com-
puter Science (including
subseries Lecture Notes
in Artificial Intelligence
and Lecture Notes in
Bioinformatics)
2023 S, W
No author name available
[Conference Review]
9th International Confer-
ence on Sustainable
Design and Manufactur-
ing, SDM 2022
Smart Innovation, Systems
and Technologies
2023 S
Koch L., Picerno M.,
Badalian K., Lee S.-Y.,
Andert J.
Automated function
development for emission
control with deep rein-
forcement learning
Engineering Applications of
Artificial Intelligence
2023 S, W
Huo D., Sari Y.A., Kealey
R., Zhang Q.
Reinforcement learning-
based fleet dispatching for
greenhouse gas emission
reduction in open-pit min-
ing operations
Resources, Conservation
and Recycling
2023 S
Feng Y., Zhang X., Jia R.,
Lin F., Lu J., Zheng Z.,
Li M.
Intelligent trajectory design
for mobile energy harvest-
ing and data transmission
IEEE Internet of Things
Journal
2023 S, W
Chen M., Li Y., Zhang X.,
Liao R., Wang C., Bi X.
Optimization of river envi-
ronmental management
based on reinforcement
learning algorithm: a case
study of the Yellow River
in China
Environmental Science and
Pollution Research
2023 S, W
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
M.Zuccotto et al.
1 3
88 Page 42 of 68
Authors Title Source title Year Database
Gu Z., Liu Z., Wang Q.,
Mao Q., Shuai Z., Ma Z.
Reinforcement learning-
based approach for
minimizing energy loss of
driving platoon decisions
Sensors 2023 W
Baba-Nalikant M., Syed-
Mohamad S. M., Husin
M. H., Abdullah N. A.,
Saleh M. S. M., Rahim
A. A.
A zero-waste campus
framework: perceptions
and practices of university
campus community in
Malaysia
Recycling 2023 W
Daradkeh M. Lurkers versus contributors:
an empirical investigation
of knowledge contribution
behavior in open innova-
tion communities
Journal of Open Innovation:
Technology, Market, and
Complexity
2022 S
Jendoubi I., Bouffard F. Data-driven sustainable dis-
tributed energy resources’
control based on multi-
agent deep reinforcement
learning
Sustainable Energy, Grids
and Networks
2022 S
Tomin N., Shakirov V.,
Kurbatsky V., Muzychuk
R., Popova E., Sidorov D.,
Kozlov A., Yang D.
A multi-criteria approach to
designing and managing
a renewable energy com-
munity
Renewable Energy 2022 S, W
Adetunji K.E., Hofsajer
I.W., Abu-Mahfouz A.M.,
Cheng L.
A novel dynamic planning
mechanism for allocating
electric vehicle charg-
ing stations considering
distributed generation and
electronic units
Energy Reports 2022 S
Li R., Zhang X., Jiang L.,
Yang Z., Guo W.
An adaptive heuristic algo-
rithm based on reinforce-
ment learning for ship
scheduling optimization
problem
Ocean and Coastal Manage-
ment
2022 S, W
Zhang W., Xie M., Scott C.,
Pan C.
Sparsity-aware intelligent
spatiotemporal data sens-
ing for energy harvesting
IoT system
IEEE Transactions on
Computer-Aided Design
of Integrated Circuits and
Systems
2022 S, W
Yao L., Leng Z., Jiang J.,
Ni F.
Large-scale maintenance
and rehabilitation optimi-
zation for multi-lane high-
way asphalt pavement: a
reinforcement learning
approach
IEEE Transactions on
Intelligent Transportation
Systems
2022 S, W
Adetunji K.E., Hofsajer
I.W., Abu-Mahfouz A.M.,
Cheng L.
An optimization planning
framework for allocat-
ing multiple distributed
energy resources and
electric vehicle charging
stations in distribution
networks
Applied Energy 2022 S, W
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Reinforcement learning applications inenvironmental…
1 3
Page 43 of 68 88
Authors Title Source title Year Database
Mahmud S., Abbasi A.,
Chakrabortty R.K., Ryan
M.J.
A self-adaptive hyper-
heuristic based multi-
objective optimisation
approach for integrated
supply chain scheduling
problems
Knowledge-Based Systems 2022 S, W
Musaddiq A., Ali R., Kim
S.W., Kim D.-S.
Learning-based resource
management for low-
power and lossy IoT
networks
IEEE Internet of Things
Journal
2022 S
Giri M.K., Majumder S. Deep Q-learning based
optimal resource alloca-
tion method for energy
harvested cognitive radio
networks
Physical Communication 2022 S
Selukar M., Jain P., Kumar
T.
Inventory control of
multiple perishable goods
using deep reinforcement
learning for sustainable
environment
Sustainable Energy Tech-
nologies and Assessments
2022 S, W
No author name available
[Conference Review]
IFAC Workshop on Control
for Smart Cities, CSC
2022 - Proceedings
IFAC-PapersOnLine 2022 S
Alibabaei K., Gaspar P.D.,
Assunção E., Alire-
zazadeh S., Lima T.M.,
Soares V.N.G.J., Caldeira
J.M.L.P.
Comparison of on-policy
deep reinforcementlearn-
ing A2C with off-policy
DQN in irrigation optimi-
zation: a case study at a
site in portugal
Computers 2022 S, W
Raza A., Shah M.A., Khat-
tak H.A., Maple C., Al-
Turjman F., Rauf H.T.
Collaborative multi-agents
in dynamic industrial
internet of things using
deep reinforcement
learning
Environment, Development
and Sustainability
2022 S, W
Shaw R., Howley E., Bar-
rett E.
Applying reinforcement
learning towards automat-
ing energy efficient virtual
machine consolidation in
cloud data centers
Information Systems 2022 S, W
Jurj S.L., Werner T., Grundt
D., Hagemann W., Möhl-
mann E.
Towards safe and sustain-
able autonomous vehicles
using environmentally-
friendly criticality metrics
Sustainability (Switzerland) 2022 S
Oubbati O.S., Atiquzzaman
M., Lim H., Rachedi A.,
Lakas A.
Synchronizing UAV teams
for timely data collec-
tion and energy transfer
by deep reinforcement
learning
IEEE Transactions on
Vehicular Technology
2022 S, W
Xu G., Guo F. Sustainability-oriented
maintenance manage-
ment of highway bridge
networks based on
Q-learning
Sustainable Cities and
Society
2022 S
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
M.Zuccotto et al.
1 3
88 Page 44 of 68
Authors Title Source title Year Database
Wang J.-J., Wang L. A cooperative memetic
algorithm with learning-
based agent for energy-
aware distributed hybrid
flow-shop scheduling
IEEE Transactions on Evo-
lutionary Computation
2022 S, W
Zhang T., Gou Y., Liu J.,
Yang T., Cui J.-H.
UDARMF: An underwater
distributed and adaptive
resource management
framework
IEEE Internet of Things
Journal
2022 S, W
Zhang M., Lu Y., Hu Y.,
Amaitik N., Xu Y.
Dynamic scheduling
method for job-shop
manufacturing systems by
deep reinforcement learn-
ing with proximal policy
optimization
Sustainability (Switzerland) 2022 S, W
Ma Y., Kassler A., Ahmed
B.S., Krakhmalev P.,
Thore A., Toyser A., Lind-
bäck H.
Using deep reinforcement
learning for zero defect
smart forging
Advances in Transdiscipli-
nary Engineering
2022 S
Jang J., Yang H.J. Deep learning-aided user
association and power
control with renewable
energy sources
IEEE Transactions on Com-
munications
2022 S, W
Danassis P., Erden Z.D.,
Faltings B.
Exploiting environmental
signals to enable policy
correlation in large-scale
decentralized systems
Autonomous Agents and
Multi-Agent Systems
2022 S
Manchella K., Haliem M.,
Aggarwal V., Bhargava B.
PassGoodPool: joint
passengers and goods
fleet management with
reinforcement learning
aided pricing, matching,
and route planning
IEEE Transactions on
Intelligent Transportation
Systems
2022 S, W
Yk S., Wu J., Song S. Research on autonomous
driving decision based on
improved deep determin-
istic policy algorithm
SAE Technical Papers 2022 S
Neumann M., Palkovits D.S. Reinforcement learning
approaches for the optimi-
zation of the partial oxida-
tion reaction of methane
Industrial and Engineering
Chemistry Research
2022 S, W
Luo M., Du B., Klemmer
K., Zhu H., Wen H.
Deployment optimization
for shared e-mobility
systems with multi-agent
deep neural search
IEEE Transactions on
Intelligent Transportation
Systems
2022 S, W
Dey S., Saha S., Singh A.K.,
McDonald-Maier K.
SmartNoshWaste: using
blockchain, machine
learning, cloud computing
and QR code to reduce
food waste in decentral-
ized web 3.0 enabled
smart cities
Smart Cities 2022 S
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Reinforcement learning applications inenvironmental…
1 3
Page 45 of 68 88
Authors Title Source title Year Database
Kim J., Park J., Cho K. Continuous autonomous
ship learning framework
for human policies on
simulation
Applied Sciences (Swit-
zerland)
2022 S, W
Tuli S., Gill S.S., Xu M.,
Garraghan P., Bahsoon R.,
Dustdar S., Sakellariou R.,
Rana O., Buyya R., Casale
G., Jennings N.R.
HUNTER: AI based holistic
resource management
for sustainable cloud
computing
Journal of Systems and
Software
2022 S, W
Wang T., Liu L., Ding T. Optimized sustainable strat-
egy in aerial terrestrial
IoT network
Proceedings - 2022 18th
International Conference
on Mobility, Sensing and
Networking, MSN 2022
2022 S, W
Hoover W., Guerra-Zubiaga
D.A., Banta J., Wandene
K., Key K., Gonzalez-
Badillo G.
Industry 4.0 trends in intel-
ligent manufacturing auto-
mation exploring machine
learning
ASME International
Mechanical Engineering
Congress and Exposition,
Proceedings (IMECE)
2022 S
No author name available
[Conference Review]
Proceedings - 22nd IEEE
International Conference
on Data Mining Work-
shops, ICDMW 2022
IEEE International Confer-
ence on Data Mining
Workshops, ICDMW
2022 S
Heo S., Mayer P., Magno M. Predictive energy-aware
adaptive sampling with
deep reinforcement
learning
ICECS 2022 - 29th IEEE
International Conference
on Electronics, Circuits
and Systems, Proceedings
2022 S, W
No author name available
[Conference Review]
20th international confer-
ence on service-oriented
computing, ICSOC 2022
Lecture Notes in Com-
puter Science (including
subseries Lecture Notes
in Artificial Intelligence
and Lecture Notes in
Bioinformatics)
2022 S
Rampini L., Re Cecconi F. Artificial intelligence
in construction asset
management: a review of
present status, challenges
and future opportunities
Journal of Information
Technology in Construc-
tion
2022 S, W
Wang K., Yang R., Liu C.,
Samarasinghalage T.,
Zang Y.
Extracting electricity
patterns from high-
dimensional data: a com-
parison of K-Means and
DBSCAN algorithms
IOP Conference Series:
Earth and Environmental
Science
2022 S
No author name available
[Conference Review]
Proceedings - 2022 IEEE
international conference
on autonomic comput-
ing and self-organizing
systems companion,
ACSOS-C 2022
Proceedings - 2022 IEEE
International Conference
on Autonomic Comput-
ing and Self-Organizing
Systems Companion,
ACSOS-C 2022
2022 S
Dusparic I. Reinforcement learning for
sustainability: adapting in
large-scale heterogeneous
dynamic environments
Proceedings - 2022 IEEE
International Conference
on Autonomic Comput-
ing and Self-Organizing
Systems Companion,
ACSOS-C 2022
2022 S, W
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
M.Zuccotto et al.
1 3
88 Page 46 of 68
Authors Title Source title Year Database
Amadi K.W., Iyalla I.,
Radhakrishna P., Al Saba
M.T., Waly M.M.
Continuous dynamic drill-
off test whilst drilling
using reinforcement learn-
ing in autonomous rotary
drilling system
Society of Petroleum Engi-
neers - ADIPEC 2022
2022 S
No author name available
[Conference Review]
Proceedings - 2022 IEEE
5th international confer-
ence on artificial intel-
ligence and knowledge
engineering, AIKE 2022
Proceedings - 2022 IEEE
5th International Confer-
ence on Artificial Intel-
ligence and Knowledge
Engineering, AIKE 2022
2022 S
Wu L., Guo S., Liu Y.,
Hong Z., Zhan Y., Xu W.
Sustainable federated learn-
ing with long-term online
VCG Auction Mechanism
Proceedings - International
Conference on Distributed
Computing Systems
2022 S, W
Baumgart U., Burger M. Optimal control of traffic
flow based on reinforce-
ment learning
Communications in Com-
puter and Information
Science
2022 S
Sabet S., Farooq B. Green vehicle routing prob-
lem: state of the art and
future directions
IEEE Access 2022 S, W
No author name available
[Conference Review]
Proceedings of International
Conference on Comput-
ing, Communication,
Security and Intelligent
Systems, IC3SIS 2022
Proceedings of International
Conference on Comput-
ing, Communication,
Security and Intelligent
Systems, IC3SIS 2022
2022 S
Andersen P.-A., Goodwin
M., Granmo O.-C.
CaiRL: a high-performance
reinforcement learning
environment toolkit
IEEE Conference on Com-
putatonal Intelligence and
Games, CIG
2022 S
Korecki M., Helbing D. Analytically guided rein-
forcement learning for
green it and fluent traffic
IEEE Access 2022 S
No author name available
[Conference Review]
3rd international confer-
ence on resources and
environmental research,
ICRER 2021
IOP Conference Series:
Earth and Environmental
Science
2022 S
Ounoughi C., Touibi G.,
Yahia S.B.
EcoLight: eco-friendly traf-
fic signal control driven
by urban noise prediction
Lecture Notes in Com-
puter Science (including
subseries Lecture Notes
in Artificial Intelligence
and Lecture Notes in
Bioinformatics)
2022 S, W
Tang Y., Deng X., Yi L.,
Xia Y., Yang L.T., Tang
A.X.
Collaborative intelligent
confident information cov-
erage node sleep schedul-
ing for 6G-empowered
green IoT
IEEE Transactions on
Green Communications
and Networking
2022 S, W
Paul S., Chowdhury S. A graph-based reinforce-
ment learning framework
for urban air mobility fleet
scheduling
AIAA Aviation 2022 Forum 2022 S
No author name available
[Conference Review]
7th international scientific-
technical conference,
MANUFACTURING
2022
Lecture Notes in Mechani-
cal Engineering
2022 S
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Reinforcement learning applications inenvironmental…
1 3
Page 47 of 68 88
Authors Title Source title Year Database
No author name available
[Conference Review]
2nd conference of innova-
tive product design and
intelligent manufacturing
system, IPDIMS 2020
Lecture Notes in Mechani-
cal Engineering
2022 S
No author name available
[Conference Review]
7th international scientific-
technical conference,
MANUFACTURING
2022
Lecture Notes in Mechani-
cal Engineering
2022 S
Isufaj R., Sebastia D.A.,
Piera M.A.
Toward conflict resolution
with deep multi-agent
reinforcement learning
Journal of Air Transporta-
tion
2022 S
Yuan M., Pun M., Wang D. Rényi state entropy maxi-
mization for exploration
acceleration in reinforce-
ment learning
IEEE Transactions on Arti-
ficial Intelligence
2022 S
Eriksson K., Ramasamy
S., Zhang X., Wang Z.,
Danielsson F.
Conceptual framework of
scheduling applying dis-
crete event simulation as
an environment for deep
reinforcement learning
Procedia CIRP 2022 S
Zhang W., Liu H., Xiong H.,
Xu T., Wang F., Xin H.,
Wu H.
RLCharge: imitative multi-
agent spatiotemporal
reinforcement learning for
electric vehicle charging
station recommendation
IEEE Transactions on
Knowledge and Data
Engineering
2022 S, W
Zhang W., Zhang J., Xie M.,
Liu T., Wang W., Pan C.
M2M-routing: environmen-
tal adaptive multi-agent
reinforcement learning
based multi-hop routing
policy for self-powered
IoT systems
Proceedings of the 2022
Design, Automation and
Test in Europe Confer-
ence and Exhibition,
DATE 2022
2022 S, W
No author name available
[Conference Review]
7th international scientific-
technical conference,
MANUFACTURING
2022
Lecture Notes in Mechani-
cal Engineering
2022 S
No author name available
[Conference Review]
7th international scientific-
technical conference,
MANUFACTURING
2022
Lecture Notes in Mechani-
cal Engineering
2022 S
Mao Z., Fang Z., Li M.,
Fan Y.
EvadeRL: evading PDF
malware classifiers with
deep reinforcement
learning
Security and Communica-
tion Networks
2022 S
Lin B., Duan J., Han M.,
Cai L.X.
Decentralized reinforcement
learning-based access
control for energy sustain-
able underwater acoustic
sub-network of MWCN
Wireless Networks (United
Kingdom)
2022 S
No author name available
[Conference Review]
7th international scientific-
technical conference,
MANUFACTURING
2022
Lecture Notes in Mechani-
cal Engineering
2022 S
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
M.Zuccotto et al.
1 3
88 Page 48 of 68
Authors Title Source title Year Database
Maree C., Omlin C.W. Balancing profit, risk, and
sustainability for portfolio
management
2022 IEEE Symposium on
Computational Intelli-
gence for Financial Engi-
neering and Economics,
CIFEr 2022 - Proceedings
2022 S, W
Lee J., Sun Y.G., Sim I.,
Kim S.H., Kim D.I., Kim
J.Y.
Non-technical loss detection
using deep reinforcement
learning for feature cost
efficiency and imbalanced
dataset
IEEE Access 2022 S, W
Liu Y., Yang M., Guo Z. Reinforcement learning
based optimal decision
making towards product
lifecycle sustainability
International Journal of
Computer Integrated
Manufacturing
2022 S, W
Alanne K., Sierla S. An overview of machine
learning applications for
smart buildings
Sustainable Cities and
Society
2022 S
Samuel O., Javaid N.,
Alghamdi T.A., Kumar N.
Towards sustainable smart
cities: a secure and scal-
able trading system for
residential homes using
blockchain and artificial
intelligence
Sustainable Cities and
Society
2022 S, W
Harrold D.J.B., Cao J.,
Fan Z.
Data-driven battery opera-
tion for energy arbitrage
using rainbow deep
reinforcement learning
Energy 2022 S, W
No author name available
[Conference Review]
Sustainable smart cities and
territories international
conference, SSCT 2021
Lecture Notes in Networks
and Systems
2022 S
Dhiman S., Lallotra B., Replacing of steel with
bamboo as reinforcement
with addition of sisal fiber
2022 W
Bekdas G., Yucel M.,
Nigdeli S. M.
Generation of eco-friendly
design for post-tensioned
axially symmetric rein-
forced concrete cylindri-
cal walls by minimizing
of CO2 emission
Structural Design of Tall
and Special Buildings
2022 W
Vandaele M., Stalhammar S. Hope dies, action begins?
The role of hope for
proactive sustainability
engagement among uni-
versity students
International Journal of
Sustainability in Higher
Education
2022 W
Sagar K. V., Jerald J. Real-time automated guided
vehicles scheduling with
markov decision process
and double Q-Learning
algorithm
Materials Today-Proceed-
ings
2022 W
Kalinin M., Ovasapyan T.,
Poltavtseva M.
Application of the learning
automaton model for
ensuring cyber resiliency
Symmetry-Basel 2022 W
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Reinforcement learning applications inenvironmental…
1 3
Page 49 of 68 88
Authors Title Source title Year Database
Liu Q., Sun S., Rong B.,
Kadoch M.
Intelligent reflective surface
based 6G communications
for sustainable energy
infrastructure
IEEE Wireless Communica-
tions
2021 S, W
Muhammad G., Hossain
M.S.
Deep-reinforcement-
learning-based sustainable
energy distribution for
wireless communication
IEEE Wireless Communica-
tions
2021 S
Grzelczak M., Duch P. Deep reinforcement learn-
ing algorithms for path
planning domain in grid-
like environment
Applied Sciences (Swit-
zerland)
2021 S
Gao A., Wang Q., Liang W.,
Ding Z.
Game combined multi-
agent reinforcement learn-
ing approach for UAV
assisted offloading
IEEE Transactions on
Vehicular Technology
2021 S, W
Atli İ., Ozturk M., Valastro
G.C., Asghar M.Z.
Multi-objective uav
positioning mechanism
for sustainable wireless
connectivity in environ-
ments with forbidden
flying zones
Algorithms 2021 S
Guo L., Li Z., Outbib R. Reinforcement learning
based energy manage-
ment for fuel cell hybrid
electric vehicles
IECON Proceedings
(Industrial Electronics
Conference)
2021 S, W
Kővári B., Szőke L., Bécsi
T., Aradi S., Gáspár P.
Traffic signal control via
reinforcement learning for
reducing global vehicle
emission
Sustainability (Switzerland) 2021 S, W
Rangel-Martinez D., Nigam
K.D.P., Ricardez-Sandoval
L.A.
Machine learning on sus-
tainable energy: A review
and outlook on renewable
energy systems, catalysis,
smart grid and energy
storage
Chemical Engineering
Research and Design
2021 S, W
Choi J.-H., Yang B., Yu
C.W.
Artificial intelligence as
an agent to transform
research paradigms in
building science and
technology
Indoor and Built Environ-
ment
2021 S
Shani P., Chau S., Swei O. All roads lead to sustain-
ability: opportunities
to reduce the life-cycle
cost and global warming
impact of U.S. roadways
Resources, Conservation
and Recycling
2021 S
Jia R., Zhang X., Feng Y.,
Wang T., Lu J., Zheng Z.,
Li M.
Long-term energy collec-
tion in self-sustainable
sensor networks: a deep
Q-learning approach
IEEE Internet of Things
Journal
2021 S, W
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
M.Zuccotto et al.
1 3
88 Page 50 of 68
Authors Title Source title Year Database
Zhao J., Rodriguez M.A.,
Buyya R.
A deep reinforcement
learning approach to
resource management in
hybrid clouds harnessing
renewable energy and task
scheduling
IEEE International Confer-
ence on Cloud Comput-
ing, CLOUD
2021 S, W
Kathirgamanathan A., Man-
gina E., Finn D.P.
Development of a soft actor
critic deep reinforcement
learning approach for har-
nessing energy flexibility
in a large office building
Energy and AI 2021 S, W
Mabina P., Mukoma P.,
Booysen M.J.
Sustainability matchmak-
ing: linking renewable
sources to electric water
heating through machine
learning
Energy and Buildings 2021 S
Chen K., Wang H.,
Valverde-Pérez B., Zhai
S., Vezzaro L., Wang A.
Optimal control towards
sustainable wastewater
treatment plants based on
multi-agent reinforcement
learning
Chemosphere 2021 S, W
Sacco A., Flocco M.,
Esposito F., Marchetto G.
Supporting sustainable
virtual network mutations
with mystique
IEEE Transactions on
Network and Service
Management
2021 S
Munir M.S., Tran N.H.,
Saad W., Hong C.S.
Multi-agent meta-reinforce-
ment learning for self-
powered and sustainable
edge computing systems
IEEE Transactions on
Network and Service
Management
2021 S
Pérez-Pons M.E., Alonso
R.S., García O., Marreiros
G., Corchado J.M.
Deep q-learning and prefer-
ence based multi-agent
system for sustainable
agricultural market
Sensors 2021 S
Zhang X., Manogaran G.,
Muthu B.
IoT enabled integrated
system for green energy
into smart cities
Sustainable Energy Tech-
nologies and Assessments
2021 S, W
Emami-Skardi M.J.,
Momenzadeh N., Kera-
chian R.
Social learning diffusion
and influential stake-
holders identification
in socio-hydrological
environments
Journal of Hydrology 2021 S, W
Almalki A.J., Alsofyani M.,
Alghuried A., Wocjan P.,
Wang L.
Model-based variational
autoencoders with autore-
gressive flows
Proceedings of the 2021
5th World Conference on
Smart Trends in Systems
Security and Sustainabil-
ity, WorldS4 2021
2021 S
Liu B., Han W., Wang E.,
Ma X., Xiong S., Qiao C.,
Wang J.
An efficient message dis-
semination scheme for
cooperative drivings via
multi-agent hierarchical
attention reinforcement
learning
Proceedings - International
Conference on Distributed
Computing Systems
2021 S, W
Ghosh S., De S., Chatterjee
S., Portmann M.
Learning-based adaptive
sensor selection frame-
work for multi-sensing
WSN
IEEE Sensors Journal 2021 S
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Reinforcement learning applications inenvironmental…
1 3
Page 51 of 68 88
Authors Title Source title Year Database
Li L., Luo Y., Pu L. Q-learning enabled intel-
ligent energy attack in
sustainable wireless com-
munication networks
IEEE International Confer-
ence on Communications
2021 S, W
Sacco A., Esposito F., Mar-
chetto G., Montuschi P.
Sustainable task offload-
ing in UAV networks via
multi-agent reinforcement
learning
IEEE Transactions on
Vehicular Technology
2021 S
Razack A.J., Ajith V., Gupta
R.
A deep reinforcement learn-
ing approach to traffic
signal control
2021 IEEE Conference on
Technologies for Sustain-
ability, SusTech 2021
2021 S
Zhang W., Liu H., Wang F.,
Xu T., Xin H., Dou D.,
Xiong H.
Intelligent electric vehicle
charging recommenda-
tion based on multi-agent
reinforcement learning
The Web Conference 2021 -
Proceedings of the World
Wide Web Conference,
WWW 2021
2021 S, W
Park J., Lee J., Kim T., Ahn
I., Park J.
Co-evolution of predator-
prey ecosystems by
reinforcement learning
agents
Entropy 2021 S, W
Eyni A., Skardi M.J.E.,
Kerachian R.
A regret-based behavioral
model for shared water
resources management:
Application of the corre-
lated equilibrium concept
Science of the Total Envi-
ronment
2021 S, W
Raeisi M., Mahboob A.S. Intelligent control of urban
intersection traffic light
based on reinforcement
learning algorithm
26th International Computer
Conference, Computer
Society of Iran, CSICC
2021
2021 S, W
Piovesan N., Lopez-Perez
D., Miozzo M., Dini P.
Joint load control and
energy sharing for renew-
able powered small base
stations: a machine learn-
ing approach
IEEE Transactions on
Green Communications
and Networking
2021 S
Yin H., Wei J., Zhao H.,
Xiong J., Mei K., Zhang
L., Ren B., Ma D.
An intelligent adaptative
architecture for wireless
communication in com-
plex scenarios
Scientia Sinica Informa-
tionis
2021 S
Leng J., Ruan G., Song Y.,
Liu Q., Fu Y.,Ding K.,
Chen X.
A loosely-coupled deep
reinforcement learning
approach for order accept-
ance decision of mass-
individualized printed
circuit board manufactur-
ing in industry 4.0
Journal of Cleaner Produc-
tion
2021 S, W
Baumgart U., Burger M. A reinforcement learn-
ing approach for traffic
control
International Conference
on Vehicle Technology
and Intelligent Transport
Systems, VEHITS - Pro-
ceedings
2021 S, W
Dinh T.H.L., Kaneko M.,
Wakao K., Kawamura K.,
Moriyama T., Takatori Y.
Towards an energy-efficient
DQN-based user asso-
ciation in Sub6GHz/mm
wave integrated networks
Proceedings - 2021 17th
International Conference
on Mobility, Sensing and
Networking, MSN 2021
2021 S
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
M.Zuccotto et al.
1 3
88 Page 52 of 68
Authors Title Source title Year Database
Barth A., Zhang L., Ma O. Cooperation of a team of
heterogeneous swarm
robots for space explora-
tion
Proceedings of the Inter-
national Astronautical
Congress, IAC
2021 S
Al-Jawad A., Comsa I.-S.,
Shah P., Gemikonakli O.,
Trestian R.
REDO: a reinforcement
learning-based dynamic
routing algorithm selec-
tion method for SDN
2021 IEEE Conference on
Network Function Vir-
tualization and Software
Defined Networks, NFV-
SDN 2021 - Proceedings
2021 S, W
Cloud J.M., Nieves R.J.,
Duke A.K., Muller T.J.,
Janmohamed N.A., Buck-
les B.C., Dupuis M.A.
Towards autonomous lunar
resource excavation
via deep reinforcement
learning
Accelerating Space Com-
merce, Exploration, and
New Discovery confer-
ence, ASCEND 2021
2021 S
Isufaj R., Sebastia D.A.,
Piera M.A.
Towards conflict resolution
with deep multi-agent
reinforcement learning
14th USA/Europe Air Traf-
fic Management Research
and Development Semi-
nar, ATM 2021
2021 S
No author name available
[Conference Review]
7th International Confer-
ence on Life System
Modeling and Simulation,
LSMS 2021, and the 7th
International Conference
on Intelligent Computing
for Sustainable Energy
and Environment, ICSEE
2021
Communications in Com-
puter and Information
Science
2021 S
No author name available
[Conference Review]
18th International Confer-
ence on Mobile Systems
and Pervasive Computing,
MobiSPC 2021, the 16th
International Confer-
ence on Future Networks
and Communications,
FNC 2021 and the 11th
International Conference
on Sustainable Energy
Information Technology,
SEIT 2021
Procedia Computer Science 2021 S
Gambin A.F., Angelats E.,
Gonzalez J.S., Miozzo M.,
DIni P.
Sustainable marine ecosys-
tems: deep learning for
water quality assessment
and forecasting
IEEE Access 2021 S, W
No author name available
[Conference Review]
AHFE conferences on
human factors in software
and systems engineering,
artificial intelligence and
social computing, and
energy, 2021
Lecture Notes in Networks
and Systems
2021 S
Serrano J.C., Mula J., Poler
R.
Digital twin for supply
chain master planning in
zero-defect manufacturing
IFIP Advances in Informa-
tion and Communication
Technology
2021 S
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Reinforcement learning applications inenvironmental…
1 3
Page 53 of 68 88
Authors Title Source title Year Database
Chaudhuri R., Mukherjee
K., Narayanam R., Vallam
R.D.
Collaborative reinforce-
ment learning framework
to model evolution of
cooperation in sequential
social dilemmas
Lecture Notes in Com-
puter Science (including
subseries Lecture Notes
in Artificial Intelligence
and Lecture Notes in
Bioinformatics)
2021 S, W
Danassis P., Erden Z.D.,
Faltings B.
Improved cooperation by
exploiting a common
signal
Proceedings of the Interna-
tional Joint Conference
on Autonomous Agents
and Multiagent Systems,
AAMAS
2021 S
Daneshvar M., Asadi S.,
Mohammadi-Ivatloo B.
Energy trading possibilities
in the modern multi-car-
rier energy networks
Power Systems 2021 S
Zheng Z., Yan P., Chen Y.,
Cai J., Zhu F.
Increasing crop yield using
agriculture sensing data in
smart plant factory
Lecture Notes in Com-
puter Science (including
subseries Lecture Notes
in Artificial Intelligence
and Lecture Notes in
Bioinformatics)
2021 S
Musaddiq A., Ali R., Choi
J.-G., Kim B.-S., Kim
S.-W.
Collision observation-based
optimization of low-power
and lossy IoT network
using reinforcement
learning
Computers, Materials and
Continua
2021 S, W
Ballis H., Dimitriou L. Evaluating the performance
of reinforcement learning
signalling strategies for
sustainable urban road
networks
Advances in Intelligent
Systems and Computing
2021 S
Surovik D., Wang K.,
Vespignani M., Bruce J.,
Bekris K.E.
Adaptive tensegrity
locomotion: Controlling
a compliant icosahedron
with symmetry-reduced
reinforcement learning
International Journal of
Robotics Research
2021 S, W
Mandhare P., Yadav J.,
Kharat V., Patil C. Y.
Control and coordination of
self-adaptive traffic signal
using deep reinforcement
learning
International Journal of
Next-Generation Comput-
ing
2021 W
Hamutoglu N. B., Unveren-
Bilgic E. N., Salar H. C.,
Sahin Y. L.
The effect of E-learning
experience on readiness,
attitude, and self-control/
self-management
Journal of Information
Technology Education-
Innovations in Practice
2021 W
de Oliveira A. M. L.,
Marques C. V., Field’s
K. A. P.
Application of mathemati-
cal modeling in the con-
struction of the ecological
house and its perspectives
in teaching
Cadernos Educacao Tecno-
logia e Sociedade
2021 W
Tiwari T., Shastry N., Nandi
A.
Deep learning based lateral
control system
Proceedings - 2020 IEEE
International Symposium
on Sustainable Energy,
Signal Processing and
Cyber Security, iSSSC
2020
2020 S
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
M.Zuccotto et al.
1 3
88 Page 54 of 68
Authors Title Source title Year Database
No author name available
[Conference Review]
Proceedings - 2020 Inter-
national Conference on
Pervasive Artificial Intel-
ligence, ICPAI 2020
Proceedings - 2020 Inter-
national Conference on
Pervasive Artificial Intel-
ligence, ICPAI 2020
2020 S
Pang G., Zhu X., Lu K.,
Peng Z., Deng W.
A simulator for reinforce-
ment learning training in
the recommendation field
Proceedings - 2020 IEEE
International Symposium
on Parallel and Distrib-
uted Processing with
Applications, 2020 IEEE
International Conference
on Big Data and Cloud
Computing, 2020 IEEE
International Symposium
on Social Computing
and Networking and
2020 IEEE International
Conference on Sustain-
able Computing and
Communications, ISPA-
BDCloud-SocialCom-
SustainCom 2020
2020 S
Chemingui Y., Gastli A.,
Ellabban O.
Reinforcement learning-
based school energy
management system
Energies 2020 S, W
Krejci S.E., Ramroop-Butts
S., Torres H.N., Isokpehi
R.D.
Visual literacy intervention
for improving under-
graduate student critical
thinking of global sustain-
ability issues
Sustainability (Switzerland) 2020 S, W
Ballis H., Dimitriou L. Evaluation of reinforcement
learning traffic signalling
strategies for alternative
objectives: implementa-
tion in the network of
nicosia, cyprus
Transport and Telecommu-
nication
2020 S, W
Guliyev H.B., Tomin N.V.,
Ibrahimov F.S.
Methods of intelligent pro-
tection from asymmetrical
conditions in electric
networks
E3S Web of Conferences 2020 S
Wölfle D., Vishwanath A.,
Schmeck H.
A guide for the design of
benchmark environments
for building energy opti-
mization
BuildSys 2020 - Proceed-
ings of the 7th ACM
International Conference
on Systems for Energy-
Efficient Buildings, Cities,
and Transportation
2020 S
Liu H., Zhang C., Guo Q. Data-driven robust voltage/
var control using PV
inverters in active distri-
bution networks
Proceedings - 2020 Inter-
national Conference on
Smart Grids and Energy
Systems, SGES 2020
2020 S, W
Nakamoto Y., Kumalija E.,
Zhang M.
Toward autonomous adap-
tive embedded systems for
sustainable services using
reinforcement learning
(WiP report)
Proceedings - 2020 8th
International Sympo-
sium on Computing and
Networking Workshops,
CANDARW 2020
2020 S
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Reinforcement learning applications inenvironmental…
1 3
Page 55 of 68 88
Authors Title Source title Year Database
No author name available
[Conference Review]
Proceedings - 2020 8th
International Sympo-
sium on Computing and
Networking Workshops,
CANDARW 2020
Proceedings - 2020 8th
International Sympo-
sium on Computing and
Networking Workshops,
CANDARW 2020
2020 S
Han M., Del Castillo L.A.,
Khairy S., Chen X., Cai
L.X., Lin B., Hou F.
Multi-agent reinforcement
learning for green energy
powered IoT networks
with random access
IEEE Vehicular Technology
Conference
2020 S, W
Henderson P., Hu J., Romoff
J., Brunskill E., Jurafsky
D., Pineau J.
Towards the systematic
reporting of the energy
and carbon footprints of
machine learning
Journal of Machine Learn-
ing Research
2020 S
Tan Z., Karakose M. Comparative study for deep
reinforcement learning
with CNN, RNN, and
LSTM in autonomous
navigation
2020 International Confer-
ence on Data Analytics
for Business and Industry:
Way Towards a Sustain-
able Economy, ICDABI
2020
2020 S
Lee S., Cho Y., Lee Y.H. Injection mold production
sustainable scheduling
using deepreinforcement
learning
Sustainability (Switzerland) 2020 S, W
Han M., Duan J., Khairy S.,
Cai L.X.
Enabling sustainable
underwater IoT networks
with energy harvesting: a
decentralized reinforce-
ment learning approach
IEEE Internet of Things
Journal
2020 S, W
Opalic S.M., Goodwin M.,
Jiao L., Nielsen H.K., Lal
Kolhe M.
A deep reinforcement learn-
ing scheme for battery
energy management
2020 5th International
Conference on Smart and
Sustainable Technologies,
SpliTech 2020
2020 S
Dawn S., Saraogi U.,
Thakur U.S.
Agent-based learning for
auto-navigation within the
virtual city
2020 International Confer-
ence on Computational
Performance Evaluation,
ComPE 2020
2020 S, W
Xi F., Ruan X. Influence of intelligent
environmental art based
on reinforcement learn-
ing on the regionality of
architectural design
Proceedings of the Inter-
national Conference on
Electronics and Sustain-
able Communication
Systems, ICESC 2020
2020 S
Skardi M.J.E., Kerachian R.,
Abdolhay A.
Water and treated waste-
water allocation in urban
areas considering social
attachments
Journal of Hydrology 2020 S, W
Banerjee P.S., Mandal S.N.,
De D., Maiti B.
RL-sleep: temperature
adaptive sleep schedul-
ing using reinforcement
learning for sustainable
connectivity in wireless
sensor networks
Sustainable Computing:
Informatics and Systems
2020 S, W
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
M.Zuccotto et al.
1 3
88 Page 56 of 68
Authors Title Source title Year Database
Piovesan N., Miozzo M.,
Dini P.
Modeling the environment
in deep reinforcement
learning: The case of
energy harvesting base
stations
ICASSP, IEEE International
Conference on Acoustics,
Speech and Signal Pro-
cessing - Proceedings
2020 S, W
Radenkovic M., Ha Huynh
V.S.
Energy-aware opportunistic
charging and energy dis-
tribution for sustainable
vehicular edge and fog
networks
2020 5th International
Conference on Fog and
Mobile Edge Computing,
FMEC 2020
2020 S, W
Ma D., Lan G., Hassan M.,
Hu W., Das S.K.
Sensing, computing, and
communications for
energy harvesting IoTs: a
survey
IEEE Communications
Surveys and Tutorials
2020 S, W
Miozzo M., Piovesan N.,
Dini P.
Coordinated load control of
renewable powered small
base stations through
layered learning
IEEE Transactions on
Green Communications
and Networking
2020 S
No author name available
[Conference Review]
Proceedings of the 19th
international conference
on autonomous agents
and multiagent systems,
AAMAS 2020
Proceedings of the Interna-
tional Joint Conference
on Autonomous Agents
and Multiagent Systems,
AAMAS
2020 S
Yu K.-H., Jaimes E., Wang
C.-C.
Ai based energy optimiza-
tion in association with
class environment
ASME 2020 14th Inter-
national Conference on
Energy Sustainability, ES
2020
2020 S
Kazmi H., Driesen J. Automated demand side
management in buildings
Artificial Intelligence
Techniques for a Scal-
able Energy Transition:
Advanced Methods,
Digital Technologies,
Decision Support Tools,
and Applications
2020 S
No author name available
[Conference Review]
18th international con-
ference on practical
applications of agents
and multi-agent systems,
PAAMS 2020
Lecture Notes in Com-
puter Science (including
subseries Lecture Notes
in Artificial Intelligence
and Lecture Notes in
Bioinformatics)
2020 S
Bouhamed O., Ghazzai H.,
Besbes H., Massoud Y.
A UAV-assisted data col-
lection for wireless sensor
networks: autonomous
navigation and scheduling
IEEE Access 2020 S, W
Elavarasan D., Durairaj
Vincent P.M.
Crop yield prediction using
deep reinforcement learn-
ing model for sustainable
agrarian applications
IEEE Access 2020 S
Yang T., Zhao L., Li W.,
Zomaya A.Y.
Reinforcement learning in
sustainable energy and
electric systems: a survey
Annual Reviews in Control 2020 S, W
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Reinforcement learning applications inenvironmental…
1 3
Page 57 of 68 88
Authors Title Source title Year Database
Perera-Villalba J. J.,
Martinez-Borreguero G.,
Naranjo-Correa F. L.,
Mateos-Nunez M.
Validation of a didactic
intervention based on
video games for the
teaching of sustainability
contents in secondary
education
14th International Tech-
nology, Education and
Development Conference
(INTED2020)
2020 W
Worum H., Lillekroken D.,
Ahlsen B., Roaldsen K. S.,
BerglandA.
Otago exercise programme-
from evidence to practice:
a qualitative study of
physiotherapists’ percep-
tions of the importance
of organisational factors
of leadership, context and
culture for knowledge
translation in Norway
BMC Health Services
Research
2020 W
Temesgene D.A., Miozzo
M., Dini P.
Dynamic control of func-
tional splits for energy
harvesting virtual small
cells: A distributed
reinforcement learning
approach
Computer Communications 2019 S, W
Lin K., Lin B., Chen X., Lu
Y., Huang Z., Mo Y.
A time-driven workflow
scheduling strategy for
reasoning tasks of autono-
mous driving in edge
environment
Proceedings - 2019 IEEE
Intl Conf on Parallel and
Distributed Processing
with Applications, Big
Data and Cloud Comput-
ing, Sustainable Comput-
ing and Communications,
Social Computing and
Networking, ISPA/
BDCloud/SustainCom/
SocialCom 2019
2019 S
Bhargavi K., Sathish Babu
B.
Load balancing scheme for
the public cloud using
reinforcement learning
with raven roosting opti-
mization policy (RROP)
CSITSS 2019 - 2019 4th
International Confer-
ence on Computational
Systems and Information
Technology for Sustaina-
ble Solution, Proceedings
2019 S
Strnad F.M., Barfuss W.,
Donges J.F., Heitzig J.
Deep reinforcement
learning in World-Earth
system models to discover
sustainable management
strategies
Chaos 2019 S, W
Firdausiyah N., Taniguchi
E., Qureshi A.G.
Impacts of urban con-
solidation centres for
sustainable city logistics
using adaptive dynamic
programming based
multi-agent simulation
IOP Conference Series:
Earth and Environmental
Science
2019 S
Xu T., Wang N., Lin H.,
Sun Z.
UAV autonomous recon-
naissance route planning
based on deep reinforce-
ment learning
Proceedings of the 2019
IEEE International
Conference on Unmanned
Systems, ICUS 2019
2019 S
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
M.Zuccotto et al.
1 3
88 Page 58 of 68
Authors Title Source title Year Database
Alizadeh Shabestray S.M.,
Abdulhai B.
Multimodal iNtelligent
Deep (MiND) traffic
signal controller
2019 IEEE Intelligent
Transportation Systems
Conference, ITSC 2019
2019 S, W
Ebell N., Gütlein M., Pruck-
ner M.
Sharing of energy among
cooperative households
using distributed multi-
agent reinforcement
learning
Proceedings of 2019 IEEE
PES Innovative Smart
Grid Technologies
Europe, ISGT-Europe
2019
2019 S, W
Vo N.N.Y., He X., Liu S.,
Xu G.
Deep learning for decision
making and the optimiza-
tion of socially respon-
sible investments and
portfolio
Decision Support Systems 2019 S, W
Chang S., Saha N., Castro-
Lacouture D., Yang P.P.-J.
Multivariate relationships
between campus design
parameters and energy
performance using rein-
forcement learning and
parametric modeling
Applied Energy 2019 S, W
Qin F.-B., Xu D. Review of robot manipula-
tion skill models
Zidonghua Xuebao/Acta
Automatica Sinica
2019 S
Shabana Anjum S., Md
Noor R., Ahmedy I., Anisi
M.H.
Energy optimization of sus-
tainable Internet of Things
(IoT) systems using an
energy harvesting medium
access protocol
IOP Conference Series:
Earth and Environmental
Science
2019 S
Liu Q., Liu Z., Xu W., Tang
Q., Zhou Z., Pham D.T.
Human-robot collaboration
in disassembly for sustain-
able manufacturing
International Journal of
Production Research
2019 S, W
Shi B., Yuan H., Shi R. Pricing cloud resource
based on multi-agent rein-
forcement learning in the
competing environment
Proceedings - 16th IEEE
International Symposium
on Parallel and Distrib-
uted Processing with
Applications, 17th IEEE
International Conference
on Ubiquitous Comput-
ing and Communications,
8th IEEE International
Conference on Big Data
and Cloud Computing,
11th IEEE Interna-
tional Conference on
Social Computing and
Networking and 8th IEEE
International Conference
on Sustainable Comput-
ing and Communications,
ISPA/IUCC/BDCloud/
SocialCom/SustainCom
2018
2019 S
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Reinforcement learning applications inenvironmental…
1 3
Page 59 of 68 88
Authors Title Source title Year Database
Do Q.V., Koo I. Dynamic bandwidth alloca-
tion scheme for wireless
networks with energy
harvesting using actor-
critic deep reinforcement
learning
1st International Conference
on Artificial Intelligence
in Information and Com-
munication, ICAIIC 2019
2019 S, W
Blad C., Koch S., Gane-
swarathas S., Kallesøe
C.S., Bøgh S.
Control of HVAC-systems
with slow thermodynamic
using reinforcement
learning
Procedia Manufacturing 2019 S
Mikhail M., Yacout S.,
Ouali M.-S.
Optimal preventive main-
tenance strategy using
reinforcement learning
Proceedings of the Inter-
national Conference on
Industrial Engineering
and Operations Manage-
ment
2019 S
Chen H., Zhao T., Li C.,
Guo Y.
Green internet of vehicles:
architecture, enabling
technologies, and applica-
tions
IEEE Access 2019 S, W
Chaudhuri R., Vallam R.D.,
Garg S., Mukherjee K.,
Kumar A., Singh S.,
Narayanam R., Mathur A.,
Parija G.
Collaborative reinforce-
ment learning model for
sustainability of coopera-
tion in sequential social
dilemmas
Proceedings of the Interna-
tional Joint Conference
on Autonomous Agents
and Multiagent Systems,
AAMAS
2019 S, W
Park J.Y., Nagy Z. The influence of building
design, sensor placement,
and occupant preferences
on occupant centered
lighting control
Computing in Civil
Engineering 2019: Smart
Cities, Sustainability,
and Resilience - Selected
Papers from the ASCE
International Conference
on Computing in Civil
Engineering 2019
2019 S
No author name available
[Conference Review]
2nd International Confer-
ence on Intelligent Human
Systems Integration, IHSI
2019
Advances in Intelligent
Systems and Computing
2019 S
McLauchlan A., Joao E. Recognising learning’ as an
uncertain source of SEA
effectiveness
Impact Assessment and
Project Appraisal
2019 W
Halim D. A., Karyanto P.,
Sarwono
Education for sustainable
development: student’s
biophilia and the emome
model as an alternative
efforts of enhancement
in the perspectives of
education
2nd International Confer-
ence on Science, Math-
ematics, Environment,
and Education, 2019
2019 W
Saifuddin M. R. B., Logen-
thiran T., Naayagi R. T.,
Woo W. L.
A nano-biased energy man-
agement using reinforced
learning multi-agent on
layered coalition model:
consumer sovereignty
IEEE Access 2019 W
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
M.Zuccotto et al.
1 3
88 Page 60 of 68
Authors Title Source title Year Database
Temesgene D.A., Miozzo
M., Dini P.
Dynamic functional split
selection in energy har-
vesting virtual small cells
using temporal difference
learning
IEEE International Sympo-
sium on Personal, Indoor
and Mobile Radio Com-
munications, PIMRC
2018 S, W
No author name available
[Conference Review]
6th IEEE international
conference on advanced
logistics and transport,
ICALT 2017 - proceed-
ings
6th IEEE International
Conference on Advanced
Logistics and Transport,
ICALT 2017 - Proceed-
ings
2018 S
Prauzek M., Mourcet
N.R.A., Hlavica J.,
Musilek P.
Q-learning algorithm for
energy management in
solar powered embedded
monitoring systems
2018 IEEE Congress on
Evolutionary Computa-
tion, CEC 2018 - Proceed-
ings
2018 S, W
Ghanshala K.K., Sharma
S., Mohan S., Nautiyal L.,
Mishra P., Joshi R.C.
Self-organizing sustainable
spectrum management
methodology in cognitive
radio vehicular Adhoc
network (CRAVENET)
environment: a reinforce-
ment learning approach
ICSCCC 2018 - 1st Inter-
national Conference on
Secure Cyber Computing
and Communications
2018 S, W
Aziz H.M.A., Zhu F., Ukku-
suri S.V.
Learning-based traffic
signal control algorithms
with neighborhood
information sharing: an
application for sustainable
mobility
Journal of Intelligent
Transportation Systems:
Technology, Planning,
and Operations
2018 S
Laubis K., Knöll F., Zeidler
V., Simko V.
Crowdsensing-based road
condition monitoring ser-
vice: an assessment of its
managerial implications
to road authorities
Lecture Notes in Business
Information Processing
2018 S
Hwangbo S., Yoo C. A methodology of a hybrid
hydrogen supply network
(HHSN) under alternative
energy resources (AERs)
of hydrogen footprint
constraint for sustainable
energy production (SEP)
Computer Aided Chemical
Engineering
2018 S
Ganapathi Subramanian S.,
Crowley M.
Combining MCTS and A3C
for prediction of spatially
spreading processes in
forest wildfire settings
Lecture Notes in Com-
puter Science (including
subseries Lecture Notes
in Artificial Intelligence
and Lecture Notes in
Bioinformatics)
2018 S
Serrat R., Alcala M.,
Delgado-Aguilar M.,
Tarres J., Oliver-Ortega
H., Mutje P.
Case study: development
of biodegradable hybrid
materials as a substitute
for glass fiber reinforced
composites
EDULEARN18: 10th
International Conference
on Education and New
Learning Technologies
2018 W
Huang J., Gao Y., Lu S.,
Zhao X. B., Deng Y. D.,
Gu M.
Energy-efficient automatic
train driving by learning
driving patterns
2018 W
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Reinforcement learning applications inenvironmental…
1 3
Page 61 of 68 88
Authors Title Source title Year Database
Miozzo M., Giupponi L.,
Rossi M., Dini P.
Switch-On/off policies for
energy harvesting small
cells through distributed
Q-learning
2017 IEEE Wireless Com-
munications and Network-
ing Conference Work-
shops, WCNCW 2017
2017 S, W
Lindkvist E., Ekeberg Ö.,
Norberg J.
Strategies for sustainable
management of renewable
resources during environ-
mental change
Proceedings of the Royal
Society B: Biological
Sciences
2017 S
Perolat J., Leibo J.Z., Zam-
baldi V., Beattie C., Tuyls
K., Graepel T.
A multi-agent reinforce-
ment learning model of
common-pool resource
appropriation
Advances in Neural
Information Processing
Systems
2017 S, W
Dos Santos Mignon A., De
Azevedo Da Rocha R.L.
An adaptive implementation
of
𝜖
-greedy in reinforce-
ment learning
Procedia Computer Science 2017 S
Ciric D., Todorovic V.,
Lalic B.
Boosting student entrepre-
neurship through idealab
concept in Western
Balkan countries
9th International Confer-
ence on Education and
New Learning Technolo-
gies (EDULEARN17)
2017 W
Sheikhi A., Rayati M.,
Ranjbar A.M.
Dynamic load management
for a residential customer;
reinforcement learning
approach
Sustainable Cities and
Society
2016 S, W
Chen H., Li X., Zhao F. A reinforcement learning-
based sleep scheduling
algorithm for desired area
coverage in solar-powered
wireless sensor networks
IEEE Sensors Journal 2016 S, W
Mathlouthi S., Trabelsi
F.B.F., Zribi C.B.O.
A novel approach based on
reinforcement learning
for anaphora resolution in
Arabic texts
Proceedings of the 28th
International Business
Information Management
Association Conference -
Vision 2020: Innovation
Management, Develop-
ment Sustainability, and
Competitive Economic
Growth
2016 S
Kaiser, A., Kragulj, F.,
Grisold, T.
Identifying human needs in
organizations to develop
sustainable intellectual
capital - reflections on
best practices
Proceedings of the 8th
European Conference on
Intellectual Capital (ECIC
2016)
2016 W
Atesok K., Satava R. M.,
Van Heest A., Hogan M.
V., Pedowitz R. A., Fu F.
H., Sitnikov I., Marsh J.
L., Hurwitz S. R.
Retention of skills after
simulation-based training
in orthopaedic surgery
Journal of the American
Academy of Orthopaedic
Surgeons
2016 W
De Gracia A., Fernández
C., Castell A., Mateu C.,
Cabeza L.F.
Control of a PCM ventilated
facade using reinforce-
ment learning techniques
Energy and Buildings 2015 S
Soares I.B., De Hauwere
Y.-M., Januarius K., Brys
T., Salvant T., Nowe A.
Departure MANagement
with a reinforcement
learning approach:
respecting CFMU slots
IEEE Conference on Intel-
ligent Transportation
Systems, Proceedings,
ITSC
2015 S
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
M.Zuccotto et al.
1 3
88 Page 62 of 68
Authors Title Source title Year Database
Jin J., Ma X. Adaptive group-based
signal control using rein-
forcement learning with
eligibility traces
IEEE Conference on Intel-
ligent Transportation
Systems, Proceedings,
ITSC
2015 S
Hsu R.C., Lin T.-H., Chen
S.-M., Liu C.-T.
Dynamic energy manage-
ment of energy harvesting
wireless sensor nodes
using fuzzy inference
system with reinforcement
learning
Proceeding - 2015 IEEE
International Conference
on Industrial Informatics,
INDIN 2015
2015 S, W
Miozzo M., Giupponi L.,
Rossi M., Dini P.
Distributed Q-learning for
energy harvesting hetero-
geneous networks
2015 IEEE International
Conference on Communi-
cation Workshop, ICCW
2015
2015 S
Vankov P., Vankova D., Sustainable Educational &
Emotional Model - An
Experience from Bulgaria
EDULEARN15: 7th
International Conference
on Education and New
Learning Technologies
2015 W
Morales R. C., Sotomayor J.
J., Hochstetter J., Figueroa
D.
REFCOMTIC Interac-
tive Manual for The
Strengthen of Transversal
Teaching Competences
INTED2015: 9th Interna-
tional Technology, Educa-
tion and Development
Conference
2015 W
Geller E. S. Seven Life Lessons From
Humanistic Behavior-
ism: How to Bring the
Best Out of Yourself and
Others
Journal of Organizational
Behavior Management
2015 W
Comşa I.S., Aydin M.,
Zhang S., Kuonen P.,
Wagen J.-F., Lu Y.
Scheduling policies based
on dynamic throughput
and fairness tradeoff con-
trol in LTE-A networks
Proceedings - Conference
on Local Computer Net-
works, LCN
2014 S, W
Hsu R.C., Liu C.-T., Wang
H.-L.
A reinforcement learning-
based ToD provisioning
dynamic power manage-
ment for sustainable oper-
ation of energy harvesting
wireless sensor node
IEEE Transactions on
Emerging Topics in
Computing
2014 S
Urieli D., Stone P. TacTex’13: A champion
adaptive power trading
agent
AAMAS 2014 Workshop
on Adaptive and Learning
Agents, ALA 2014
2014 S, W
Urieli D., Stone P. TacTex’13: A champion
adaptive power trading
agent
13th International Confer-
ence on Autonomous
Agents and Multiagent
Systems, AAMAS 2014
2014 S
Urieli D., Stone P. TacTex’13: A champion
adaptive power trading
agent
Proceedings of the National
Conference on Artificial
Intelligence
2014 S, W
Lindkvist E., Norberg J. Modeling experiential
learning: The challenges
posed by threshold
dynamics for sustain-
able renewable resource
management
Ecological Economics 2014 S
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Reinforcement learning applications inenvironmental…
1 3
Page 63 of 68 88
Authors Title Source title Year Database
Crowley M. Using equilibrium policy
gradients for spatiotem-
poral planning in forest
ecosystem management
IEEE Transactions on
Computers
2014 S
Bielskis A.A., Guseinoviene
E., Zutautas L., Drungi-
las D., Dzemydiene D.,
Gricius G.
Modeling of Ambient
Comfort Affect Reward
based on multi-agents
in cloud interconnection
environment for develop-
ing the sustainable home
controller
2013 8th International Con-
ference and Exhibition
on Ecological Vehicles
and Renewable Energies,
EVER 2013
2013 S
Bielskis A.A., Guseinoviene
E., Dzemydiene D.,
Drungilas D., Gricius G.
Ambient lighting controller
based on reinforcement
learning components of
multi-agents
Elektronika ir Elektrotech-
nika
2012 S, W
No author name available
[Conference Review]
APBITM 2011 - Proceed-
ings 2011 IEEE Interna-
tional Summer Confer-
ence of Asia Pacific
Business Innovation and
Technology Management
APBITM 2011 - Proceed-
ings2011 IEEE Interna-
tional Summer Confer-
ence of Asia Pacific
Business Innovation and
Technology Management
2011 S
No author name available
[Conference Review]
IEEE 2011 EnergyTech,
ENERGYTECH 2011
IEEE 2011 EnergyTech,
ENERGYTECH 2011
2011 S
Anyanwu L.O., Keengwe J.,
Arome G.A.
Scalable intrusion detection
with recurrent neural
networks
ITNG2010 - 7th Interna-
tional Conference on
Information Technology:
New Generations
2010 S
Sabbadin R., Spring D.,
Bergonnier E.
A reinforcement-learning
application to biodiversity
conservation in Costa-
Rican forest
MODSIM07 - Land, Water
and Environmental
Management: Integrated
Systems for Sustainability,
Proceedings
2007 S
Chadès I., Martin T.G., Cur-
tis J.M.R., Barreto C.
Managing interacting
species: A reinforcement
learning decision theoretic
approach
MODSIM07 - Land, Water
and Environmental
Management: Integrated
Systems for Sustainability,
Proceedings
2007 S
Bielskis A. A., Denisovas
V., Ramasauskas O.
Ambient Intelligence of
e-possibilities perception
for sustainable develop-
ment
4th International Confer-
ence Citizens and Gov-
ernance for Sustainable
Development
2006 W
Salden A.H., Kempen Ma. Sustainable cybernetics
systems: Backbones of
ambient intelligent envi-
ronments
Ambient Intelligence: A
Novel Paradigm
2005 S
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
M.Zuccotto et al.
1 3
88 Page 64 of 68
Authors Title Source title Year Database
Chen L., Evans T., Anand
S., Boufford J., Brown H.,
Chowdhury M., Cueto
M., Dare L., Dussault G.,
Elzinga G., Fee E., Habte
D., Hanvoravongchai
P., Jacobs M., Kurowski
C., Michael S., Pablos-
Mendez A., Sewankambo
N., Solimano G., Stilwell
B., de Waal A., Wibulpol-
prasert S.
Human resources for health:
overcoming the crisis
LANCET 2004 W
Salden A., de Heer J. Natural anticipation and
selection of attention
within sustainable intel-
ligent multimodal systems
by collective intelligent
agents
8th World Multi-Conference
on Systemics, Cybernetics
and Informatics, Vol IX,
Proceedings: Computer
Science and Engineer-
ing: I
2004 W
Author contributions MZ: Conceptualization, Investigation, Methodology, Writing—Original Draft. AC,
DLT and AF: Supervision, Conceptualization, Investigation, Validation, Writing—Review and Editing. LM:
Conceptualization,Validation, Writing—Review and Editing.
Funding Open access funding provided by Università degli Studi di Verona within the CRUI-CARE
Agreement.
Declarations
Competing interests The authors declare no competing interests.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License,
which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long
as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Com-
mons licence, and indicate if changes were made. The images or other third party material in this article
are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the
material. If material is not included in the article’s Creative Commons licence and your intended use is not
permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly
from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.
References
Ajao L, Apeh S (2023) Secure edge computing vulnerabilities in smart cities sustainability using petri net
and genetic algorithm-based reinforcement learning. Intell Syst Appl. https:// doi. org/ 10. 1016/j. iswa.
2023. 200216
Al-Jawad A, Comşa I, Shah P, et al (2021) REDO: a reinforcement learning-based dynamic routing algo-
rithm selection method for SDN. In: IEEE conference on network function virtualization and software
defined networks (NFV-SDN), pp 54–59, https:// doi. org/ 10. 1109/ NFV- SDN53 031. 2021. 96651 40
Alanne K, Sierla S (2022) An overview of machine learning applications for smart buildings. Sustain Cities
Soc. https:// doi. org/ 10. 1016/j. scs. 2021. 103445
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Reinforcement learning applications inenvironmental…
1 3
Page 65 of 68 88
AlizadehShabestray SM, Abdulhai B (2019) Multimodal iNtelligent Deep (MiND) traffic signal controller.
In: IEEE intelligent transportation systems conference (ITSC), pp 4532–4539, https:// doi. org/ 10. 1109/
ITSC. 2019. 89174 93
Auffenberg F, Snow S, Stein S etal (2017) A comfort-based approach to smart heating and air conditioning.
ACM Trans Intell Syst Technol. https:// doi. org/ 10. 1145/ 30577 30
Aziz H, Zhu F, Ukkusuri S (2018) Learning-based traffic signal control algorithms with neighborhood
information sharing: an application for sustainable mobility. J Intell Trans Syst Technol Plan Operat.
https:// doi. org/ 10. 1080/ 15472 450. 2017. 13875 46
Azzalini D, Castellini A, Luperto M, et al (2020) HMMs for anomaly detection in autonomous robots.
In: Proceedings of the 2020 international conference on autonomous agents and multiagent systems,
AAMAS, p 105–113, https:// doi. org/ 10. 5555/ 33987 61. 33987 79
Bazzan ALC, Peleteiro-Ramallo A, Burguillo-Rial JC (2011) Learning to cooperate in the iterated pris-
oner’s dilemma by means of social attachments. J Braz Comput Soc 17(3):163–174. https:// doi. org/ 10.
1007/ s13173- 011- 0038-2
Bianchi F, Castellini A, Tarocco P, etal (2019) Load forecasting in district heating networks: Model com-
parison on a real-world case study. In: Machine learning, optimization, and data science: 5th interna-
tional conference, LOD 2019, proceedings. Springer-Verlag, p 553–565, https:// doi. org/ 10. 1007/ 978-3-
030- 37599-7_ 46
Bianchi F, Corsi D, Marzari L, etal (2023) Safe and efficient reinforcement learning for environmental
monitoring. In: Proceedings of Ital-IA 2023: 3rd National Conference on Artificial Intelligence, CEUR
Workshop Proceedings, vol 3486. CEUR-WS.org, pp 2610–615
Bistaffa F, Farinelli A, Chalkiadakis G etal (2017) A cooperative game-theoretic approach to the social
ridesharing problem. Artif Intell 246:86–117. https:// doi. org/ 10. 1016/j. artint. 2017. 02. 004
Bistaffa F, Blum C, Cerquides J etal (2021) A computational approach to quantify the benefits of rideshar-
ing for policy makers and travellers. IEEE Trans Intell Transport Syst 22(1):119–130. https:// doi. org/
10. 1109/ TITS. 2019. 29549 82
Blij NHVD, Chaifouroosh D, Cañizares CA, etal (2020) Improved power flow methods for DC grids. In:
29th IEEE international symposium on industrial electronics, ISIE. IEEE, pp 1135–1140, https:// doi.
org/ 10. 1109/ ISIE4 5063. 2020. 91525 70
Bouhamed O, Ghazzai H, Besbes H etal (2020) A UAV-assisted data collection for wireless sensor net-
works: Autonomous navigation and scheduling. IEEE Access. https:// doi. org/ 10. 1109/ ACCESS. 2020.
30025 38
Brown J, Abate A, Rogers A (2021) QUILT: quantify, infer and label the thermal efficiency of heating and
cooling residential homes. In: BuildSys ’21: The 8th ACM international conference on systems for
energy-efficient buildings, cities, and transportation. ACM, pp 51–60, https:// doi. org/ 10. 1145/ 34866
11. 34866 53
Capuzzo M, Zanella A, Zuccotto M, etal (2022) IoT systems for healthy and safe life environments. In:
IEEE forum on research and technologies for society and industry innovation (RTSI), pp 31–37,
https:// doi. org/ 10. 1109/ RTSI5 5261. 2022. 99051 93
Castellini A, Chalkiadakis G, Farinelli A (2019) Influence of state-variable constraints on partially observ-
able monte carlo planning. In: Proceedings of the twenty-eighth international joint conference on arti-
ficial intelligence, IJCAI 2019. International Joint Conferences on Artificial Intelligence Organization,
pp 5540–5546, https:// doi. org/ 10. 24963/ ijcai. 2019/ 769
Castellini A, Bicego M, Masillo F et al (2020) Time series segmentation for state-model generation of
autonomous aquatic drones: a systematic framework. Eng Appl Artif Intell. https:// doi. org/ 10. 1016/j.
engap pai. 2020. 103499
Castellini A, Bianchi F, Farinelli A (2021) Predictive model generation for load forecasting in district heat-
ing networks. IEEE Intell Syst 36(4):86–95. https:// doi. org/ 10. 1109/ MIS. 2020. 30059 03
Castellini A, Bianchi F, Farinelli A (2022) Generation and interpretation of parsimonious predictive models
for load forecasting in smart heating networks. Appl Intell 52(9):9621–9637. https:// doi. org/ 10. 1007/
s10489- 021- 02949-4
Castellini A, Bianchi F, Zorzi E, etal (2023) Scalable safe policy improvement via Monte Carlo tree search.
In: Proceedings of the 40th international conference on machine learning, proceedings of machine
learning research, vol 202. PMLR, pp 3732–3756
Charef N, Ben Mnaouer A, Aloqaily M etal (2023) Artificial intelligence implication on energy sustain-
ability in internet of things: a survey. Info Process Manag. https:// doi. org/ 10. 1016/j. ipm. 2022. 103212
Chen H, Li X, Zhao F (2016) A reinforcement learning-based sleep scheduling algorithm for desired area
coverage in solar-powered wireless sensor networks. IEEE Sensors Journal. https:// doi. org/ 10. 1109/
JSEN. 2016. 25170 84
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
M.Zuccotto et al.
1 3
88 Page 66 of 68
Chen H, Zhao T, Li C etal (2019) Green internet of vehicles: Architecture, enabling technologies, and
applications. IEEE Access. https:// doi. org/ 10. 1109/ ACCESS. 2019. 29581 75
Chen K, Wang H, Valverde-Pérez B etal (2021) Optimal control towards sustainable wastewater treatment
plants based on multi-agent reinforcement learning. Chemosphere. https:// doi. org/ 10. 1016/j. chemo
sphere. 2021. 130498
De Gracia A, Fernández C, Castell A etal (2015) Control of a PCM ventilated facade using reinforcement
learning techniques. Energy Build. https:// doi. org/ 10. 1016/j. enbui ld. 2015. 06. 045
Elavarasan D, Durairaj Vincent P (2020) Crop yield prediction using deep reinforcement learning model for
sustainable agrarian applications. IEEE Access. https:// doi. org/ 10. 1109/ ACCESS. 2020. 29924 80
Emamjomehzadeh O, Kerachian R, Emami-Skardi M etal (2023) Combining urban metabolism and rein-
forcement learning concepts for sustainable water resources management: a nexus approach. J Environ
Manag. https:// doi. org/ 10. 1016/j. jenvm an. 2022. 117046
Feng Y, Zhang X, Jia R etal (2023) Intelligent trajectory design for mobile energy harvesting and data
transmission. IEEE Internet Things J. https:// doi. org/ 10. 1109/ JIOT. 2022. 32022 52
Gao Y, Chang D, Chen CH (2023) A digital twin-based approach for optimizing operation energy consump-
tion at automated container terminals. J Clean Prod. https:// doi. org/ 10. 1016/j. jclep ro. 2022. 135782
Giri MK, Majumder S (2022) Deep Q-learning based optimal resource allocation method for energy har-
vested cognitive radio networks. Phys Commun. https:// doi. org/ 10. 1016/j. phycom. 2022. 101766
Goodland R (1995) The concept of environmental sustainability. Ann Rev Ecol Syst 26(1):1–24. https:// doi.
org/ 10. 1146/ annur ev. es. 26. 110195. 000245
Gu Z, Liu Z, Wang Q etal (2023) Reinforcement learning-based approach for minimizing energy loss of
driving platoon decisions. Sensors. https:// doi. org/ 10. 3390/ s2308 4176
Han M, Duan J, Khairy S etal (2020) Enabling sustainable underwater IoT networks with energy harvest-
ing: a decentralized reinforcement learning approach. IEEE Internet Things J. https:// doi. org/ 10. 1109/
JIOT. 2020. 29907 33
Harrold D, Cao J, Fan Z (2022) Data-driven battery operation for energy arbitrage using rainbow deep rein-
forcement learning. Energy. https:// doi. org/ 10. 1016/j. energy. 2021. 121958
Hausknecht M, Stone P (2015) Deep recurrent Q-learning for partially observable MDPs. Preprint at https://
arxiv. org/ abs/ 1507. 06527
Heinzelman W, Chandrakasan A, Balakrishnan H (2002) An application-specific protocol architecture for
wireless microsensor networks. IEEE Trans Wireless Commun 1(4):660–670. https:// doi. org/ 10. 1109/
TWC. 2002. 804190
Hessel M, Modayil J, van Hasselt H, etal (2018) Rainbow: combining improvements in deep reinforcement
learning. In: Proceedings of the AAAI conference on artificial intelligence, pp 3215–3222
Himeur Y, Elnour M, Fadli F etal (2022) Ai-big data analytics for building automation and management
systems: a survey, actual challenges and future perspectives. Artif Intell Rev 56(6):4929–5021.
https:// doi. org/ 10. 1007/ s10462- 022- 10286-2
Hsu R, Liu CT, Wang HL (2014) A reinforcement learning-based ToD provisioning dynamic power
management for sustainable operation of energy harvesting wireless sensor node. IEEE Trans
Emerg Topics Comput. https:// doi. org/ 10. 1109/ TETC. 2014. 23165 18
Huo D, Sari Y, Kealey R etal (2023) Reinforcement learning-based fleet dispatching for greenhouse
gas emission reduction in open-pit mining operations. Resour Conserv Recycl. https:// doi. org/ 10.
1016/j. resco nrec. 2022. 106664
Jendoubi I, Bouffard F (2022) Data-driven sustainable distributed energy resources’ control based on
multi-agent deep reinforcement learning. Sustain Energy Grids Netw. https:// doi. org/ 10. 1016/j.
segan. 2022. 100919
Kaelbling LP, Littman ML, Cassandra AR (1998) Planning and acting in partially observable stochastic
domains. Artif Intell 101(1–2):99–134. https:// doi. org/ 10. 1016/ S0004- 3702(98) 00023-X
Kathirgamanathan A, Mangina E, Finn D (2021) Development of a soft actor critic deep reinforcement
learning approach for harnessing energy flexibility in a large office building. Energy AI. https:// doi.
org/ 10. 1016/j. egyai. 2021. 100101
Khalid M, Wang L, Wang K et al (2023) Deep reinforcement learning-based long-range autonomous
valet parking for smart cities. Sustain Cities Soc. https:// doi. org/ 10. 1016/j. scs. 2022. 104311
Koufakis AM, Rigas ES, Bassiliades N etal (2020) Offline and online electric vehicle charging schedul-
ing with V2V energy transfer. IEEE Trans Intell Transport Syst 21(5):2128–2138. https:// doi. org/
10. 1109/ TITS. 2019. 29140 87
LeCun Y (1989) Generalization and network design strategies. Connect Perspect 19(143–155):18
Leng J, Ruan G, Song Y etal (2021) A loosely-coupled deep reinforcement learning approach for order
acceptance decision of mass-individualized printed circuit board manufacturing in industry 4.0. J
Clean Prod. https:// doi. org/ 10. 1016/j. jclep ro. 2020. 124405
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Reinforcement learning applications inenvironmental…
1 3
Page 67 of 68 88
Li C, Bai L, Yao L etal (2023) A bibliometric analysis and review on reinforcement learning for trans-
portation applications. Transportmetrica B. https:// doi. org/ 10. 1080/ 21680 566. 2023. 21794 61
Lillicrap TP, Hunt JJ, Pritzel A, etal (2016) Continuous control with deep reinforcement learning. In:
International conference on learning representations, ICLR
Liu Q, Sun S, Rong B etal (2021) Intelligent reflective surface based 6G communications for sustainable
energy infrastructure. IEEE Wireless Commun. https:// doi. org/ 10. 1109/ MWC. 016. 21001 79
Lowe R, Wu Y, Tamar A, etal (2017) Multi-agent actor-critic for mixed cooperative-competitive envi-
ronments. In: Proceedings of the international conference on neural information processing sys-
tems, NIPS, p 6382-6393
Ma D, Lan G, Hassan M etal (2020) Sensing, computing, and communications for energy harvesting
IoTs: a survey. IEEE Commun Surv Tutor 22(2):1222–1250. https:// doi. org/ 10. 1109/ COMST. 2019.
29625 26
Mabina P, Mukoma P, Booysen M (2021) Sustainability matchmaking: linking renewable sources to electric
water heating through machine learning. Energy Build. https:// doi. org/ 10. 1016/j. enbui ld. 2021. 111085
Marchesini E, Corsi D, Farinelli A (2021) Benchmarking safe deep reinforcement learning in aquatic
navigation. In: IEEE/RSJ international conference on intelligent robots and systems, IROS. IEEE,
pp 5590–5595, https:// doi. org/ 10. 1109/ IROS5 1168. 2021. 96359 25
Mazzi G, Castellini A, Farinelli A (2021) Rule-based shielding for partially observable monte-carlo
planning. In: Proceedings of the international conference on automated planning and scheduling pp
243–251. https:// doi. org/ 10. 1609/ icaps. v31i1. 15968
Mazzi G, Castellini A, Farinelli A (2023) Risk-aware shielding of partially observable monte carlo plan-
ning policies. Artif Intell 324:103987
Miozzo M, Giupponi L, Rossi M, etal (2015) Distributed Q-learning for energy harvesting heterogene-
ous networks. In: IEEE international conference on communication workshop (ICCW), pp 2006–
2011, https:// doi. org/ 10. 1109/ ICCW. 2015. 72474 75
Miozzo M, Giupponi L, Rossi M, etal (2017) Switch-on/off policies for energy harvesting small cells
through distributed Q-learning. In: IEEE wireless communications and networking conference
workshops (WCNCW), pp 1–6, https:// doi. org/ 10. 1109/ WCNCW. 2017. 79190 75
Mischos S, Dalagdi E, Vrakas D (2023) Intelligent energy management systems: a review. Artif Intell
Rev. https:// doi. org/ 10. 1007/ s10462- 023- 10441-3
Mnih V, Kavukcuoglu K, Silver D etal (2015) Human-level control through deep reinforcement learn-
ing. Nature 518(7540):529–533. https:// doi. org/ 10. 1038/ natur e14236
Moerland TM, Broekens J, Plaat A, etal (2020) Model-based reinforcement learning: a survey. arXiv
abs/2206.09328. https:// doi. org/ 10. 48550/ ARXIV. 2206. 09328
Ng A, Harada D, Russell SJ (1999) Policy invariance under reward transformations: theory and applica-
tion to reward shaping. In: Proceedings of the international conference on machine learning, ICML,
p 278–287
Orfanoudakis S, Chalkiadakis G (2023) A novel aggregation framework for the efficient integration of
distributed energy resources in the smart grid. In: Proceedings of the 2023 international conference
on autonomous agents and multiagent systems. AAMAS. ACM, pp 2514–2516, https:// doi. org/ 10.
5555/ 35459 46. 35989 86
Ounoughi C, Touibi G, Yahia S (2022) EcoLight: eco-friendly traffic signal control driven by urban
noise prediction. Lecture Notes Comput Sci. https:// doi. org/ 10. 1007/ 978-3- 031- 12423-5_ 16
Panagopoulos AA, Alam M, Rogers A, etal (2015) AdaHeat: a general adaptive intelligent agent for
domestic heating control. In: Proceedings of the 2015 international conference on autonomous
agents and multiagent systems, AAMAS. ACM, pp 1295–1303
Perianes-Rodriguez A, Waltman L, van Eck NJ (2016) Constructing bibliometric networks: a compari-
son between full and fractional counting. J Info 10(4):1178–1195. https:// doi. org/ 10. 1016/j. joi.
2016. 10. 006
Puterman ML (1994) Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley,
Hoboken
Radini S, Marinelli E, Akyol Çağrı etal (2021) Urban water-energy-food-climate nexus in integrated
wastewater and reuse systems: cyber-physical framework and innovations. Appl Energy 298:117268
Rampini L, Re Cecconi F (2022) Artificial intelligence in construction asset management: A review
of present status, challenges and future opportunities. J Info Technol Construct. https:// doi. org/ 10.
36680/j. itcon. 2022. 043
Rangel-Martinez D, Nigam K, Ricardez-Sandoval L (2021) Machine learning on sustainable energy: a
review and outlook on renewable energy systems, catalysis, smart grid and energy storage. Chem
Eng Res Design. https:// doi. org/ 10. 1016/j. cherd. 2021. 08. 013
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
M.Zuccotto et al.
1 3
88 Page 68 of 68
Roncalli M, Bistaffa F, Farinelli A (2019) Decentralized power distribution in the smart grid with ancil-
lary lines. Mobile Netw Appl 24(5):1654–1662. https:// doi. org/ 10. 1007/ s11036- 017- 0893-y
Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagating errors.
Nature 323(6088):533–536. https:// doi. org/ 10. 1038/ 32353 3a0
Sabet S, Farooq B (2022) Green vehicle routing problem: State of the art and future directions. IEEE
Access 10:101622–101642. https:// doi. org/ 10. 1109/ ACCESS. 2022. 32088 99
Sacco A, Esposito F, Marchetto G etal (2021) Sustainable task offloading in UAV networks via multi-
agent reinforcement learning. IEEE Trans Vehicul Technol. https:// doi. org/ 10. 1109/ TVT. 2021.
30743 04
Shaw R, Howley E, Barrett E (2022) Applying reinforcement learning towards automating energy efficient
virtual machine consolidation in cloud data centers. Info Syst. https:// doi. org/ 10. 1016/j. is. 2021. 101722
Sheikhi A, Rayati M, Ranjbar A (2016) Dynamic load management for a residential customer; reinforce-
ment learning approach. Sustain Cities Soc. https:// doi. org/ 10. 1016/j. scs. 2016. 04. 001
Silver D, Huang A, Maddison CJ etal (2016) Mastering the game of Go with deep neural networks and
tree search. Nature. https:// doi. org/ 10. 1038/ natur e16961
Silver D, Schrittwieser J, Simonyan K etal (2017) Mastering the game of go without human knowledge.
Nature. https:// doi. org/ 10. 1038/ natur e24270
Simão TD, Suilen M, Jansen N (2023) Safe policy improvement for POMDPs via finite-state controllers.
Proc AAAI Conf Artif Intell 37(12):15109–15117. https:// doi. org/ 10. 1609/ aaai. v37i12. 26763
Sivamayil K, Rajasekar E, Aljafari B etal (2023) A systematic study on reinforcement learning based
applications. Energies. https:// doi. org/ 10. 3390/ en160 31512
Skardi M, Kerachian R, Abdolhay A (2020) Water and treated wastewater allocation in urban areas con-
sidering social attachments. J Hydrol. https:// doi. org/ 10. 1016/j. jhydr ol. 2020. 124757
Steccanella L, Bloisi D, Castellini A etal (2020) Waterline and obstacle detection in images from low-
cost autonomous boats for environmental monitoring. Robot Auton Syst 124:103346
Sultanuddin S, Vibin R, Rajesh Kumar A etal (2023) Development of improved reinforcement learn-
ing smart charging strategy for electric vehicle fleet. J Energy Storage. https:// doi. org/ 10. 1016/j. est.
2023. 106987
Sutton RS, Barto AG (2018) Reinforcement learning: an introduction. A Bradford Book, Denver
United Nations (2015) Transforming our world: the 2030 agenda for sustainable development
van Hasselt H, Guez A, Silver D (2016) Deep reinforcement learning with double Q-earning. In: Pro-
ceedings of the AAAI conference on artificial intelligence, pp 2094–2100, https:// doi. org/ 10. 1609/
aaai. v30i1. 10295
Venkataswamy V, Grigsby J, Grimshaw A etal (2023) RARE: renewable energy aware resource man-
agement in datacenters. Lecture Notes Comput Sci. https:// doi. org/ 10. 1007/ 978-3- 031- 22698- 4\_6
Wang JJ, Wang L (2022) A cooperative memetic algorithm with learning-based agent for energy-aware dis-
tributed hybrid flow-shop scheduling. IEEE Trans Evol Comput. https:// doi. org/ 10. 1109/ TEVC. 2021.
31061 68
Watkins CJCH (1989) Learning from delayed rewards. King’s College, Oxford
Yang T, Zhao L, Li W etal (2020) Reinforcement learning in sustainable energy and electric systems: a sur-
vey. Ann Rev Control 49:145–163. https:// doi. org/ 10. 1016/j. arcon trol. 2020. 03. 001
Yao R, Hu Y, Varga L (2023) Applications of agent-based methods in multi-energy systems—a systematic
literature review. Energies. https:// doi. org/ 10. 3390/ en160 52456
Zhang W, Liu H, Wang F, et al (2021a) Intelligent electric vehicle charging recommendation based on
multi-agent reinforcement learning. In: Proceedings of the web conference, WWW, p 1856–1867,
https:// doi. org/ 10. 1145/ 34423 81. 34499 34
Zhang X, Manogaran G, Muthu B (2021b) IoT enabled integrated system for green energy into smart cities.
Sustain Energy Technol Assess. https:// doi. org/ 10. 1016/j. seta. 2021. 101208
Zuccotto M, Castellini A, Farinelli A (2022a) Learning state-variable relationships for improving POMCP
performance. In: Proceedings of the 37th ACM/SIGAPP symposium on applied computing. Associa-
tion for Computing Machinery, SAC, p 739–747
Zuccotto M, Piccinelli M, Castellini A etal (2022b) Learning state-variable relationships in POMCP: a
framework for mobile robots. Front Robotics AI 2022:183
Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and
institutional affiliations.
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
1.
2.
3.
4.
5.
6.
Terms and Conditions
Springer Nature journal content, brought to you courtesy of Springer Nature Customer Service Center
GmbH (“Springer Nature”).
Springer Nature supports a reasonable amount of sharing of research papers by authors, subscribers
and authorised users (“Users”), for small-scale personal, non-commercial use provided that all
copyright, trade and service marks and other proprietary notices are maintained. By accessing,
sharing, receiving or otherwise using the Springer Nature journal content you agree to these terms of
use (“Terms”). For these purposes, Springer Nature considers academic use (by researchers and
students) to be non-commercial.
These Terms are supplementary and will apply in addition to any applicable website terms and
conditions, a relevant site licence or a personal subscription. These Terms will prevail over any
conflict or ambiguity with regards to the relevant terms, a site licence or a personal subscription (to
the extent of the conflict or ambiguity only). For Creative Commons-licensed articles, the terms of
the Creative Commons license used will apply.
We collect and use personal data to provide access to the Springer Nature journal content. We may
also use these personal data internally within ResearchGate and Springer Nature and as agreed share
it, in an anonymised way, for purposes of tracking, analysis and reporting. We will not otherwise
disclose your personal data outside the ResearchGate or the Springer Nature group of companies
unless we have your permission as detailed in the Privacy Policy.
While Users may use the Springer Nature journal content for small scale, personal non-commercial
use, it is important to note that Users may not:
use such content for the purpose of providing other users with access on a regular or large scale
basis or as a means to circumvent access control;
use such content where to do so would be considered a criminal or statutory offence in any
jurisdiction, or gives rise to civil liability, or is otherwise unlawful;
falsely or misleadingly imply or suggest endorsement, approval , sponsorship, or association
unless explicitly agreed to by Springer Nature in writing;
use bots or other automated methods to access the content or redirect messages
override any security feature or exclusionary protocol; or
share the content in order to create substitute for Springer Nature products or services or a
systematic database of Springer Nature journal content.
In line with the restriction against commercial use, Springer Nature does not permit the creation of a
product or service that creates revenue, royalties, rent or income from our content or its inclusion as
part of a paid for service or for other commercial gain. Springer Nature journal content cannot be
used for inter-library loans and librarians may not upload Springer Nature journal content on a large
scale into their, or any other, institutional repository.
These terms of use are reviewed regularly and may be amended at any time. Springer Nature is not
obligated to publish any information or content on this website and may remove it or features or
functionality at our sole discretion, at any time with or without notice. Springer Nature may revoke
this licence to you at any time and remove access to any copies of the Springer Nature journal content
which have been saved.
To the fullest extent permitted by law, Springer Nature makes no warranties, representations or
guarantees to Users, either express or implied with respect to the Springer nature journal content and
all parties disclaim and waive any implied warranties or warranties imposed by law, including
merchantability or fitness for any particular purpose.
Please note that these rights do not automatically extend to content, data or other material published
by Springer Nature that may be licensed from third parties.
If you would like to use or distribute our Springer Nature journal content to a wider audience or on a
regular basis or in any other manner not expressly permitted by these Terms, please contact Springer
Nature at
onlineservice@springernature.com
ResearchGate has not been able to resolve any citations for this publication.
Conference Paper
Full-text available
This paper discusses the challenges of applying reinforcement techniques to real-world environmental monitoring problems and proposes innovative solutions to overcome them. In particular, we focus on safety, a fundamental problem in RL that arises when it is applied to domains involving humans or hazardous uncertain situations. We propose to use deep neural networks, formal verification, and online refinement of domain knowledge to improve the transparency and efficiency of the learning process, as well as the quality of the final policies. We present two case studies, specifically (i) autonomous water monitoring and (ii) smart control of air quality indoors. In particular, we discuss the challenges and solutions to these problems, addressing crucial issues such as anomaly detection and prevention, real-time control, and online learning. We believe that the proposed techniques can be used to overcome some limitations of RL, providing safe and efficient solutions to complex and urgent problems.
Conference Paper
Full-text available
Algorithms for safely improving policies are important to deploy reinforcement learning approaches in real-world scenarios. In this work, we propose an algorithm, called MCTS-SPIBB, that computes safe policy improvement online using a Monte Carlo Tree Search based strategy. We theoretically prove that the policy generated by MCTS-SPIBB converges, as the number of simulations grows, to the optimal safely improved policy generated by Safe Policy Improvement with Baseline Bootstrapping (SPIBB), a popular algorithm based on policy iteration. Moreover, our empirical analysis performed on three standard benchmark domains shows that MCTS-SPIBB scales to significantly larger problems than SPIBB because it computes the policy online and locally, i.e., only in the states actually visited by the agent.
Poster
Full-text available
In this paper, we put forward a novel DER aggregation framework , encompassing a multiagent architecture and various types of mechanisms for the effective management and efficient integration of DERs in the Grid. One critical component of our architecture is the Local Flexibility Estimators (LFEs) agents, which are key for offloading the Aggregator from serious or resource-intensive responsibilities-such as addressing privacy concerns and predicting the accuracy of DER statements regarding their offered demand response services. The proposed aggregation framework allows the formation of efficient LFE cooperatives. Our experiments verify its effectiveness for incorporating heterogeneous DERs into the Grid in an efficient manner-showing that the use of appropriate mechanisms results in higher payments for participating LFEs.
Article
Full-text available
Reinforcement learning (RL) methods for energy saving and greening have recently appeared in the field of autonomous driving. In inter-vehicle communication (IVC), a feasible and increasingly popular research direction of RL is to obtain the optimal action decision of agents in a special environment. This paper presents the application of reinforcement learning in the vehicle communication simulation framework (Veins). In this research, we explore the application of reinforcement learning algorithms in a green cooperative adaptive cruise control (CACC) platoon. Our aim is to train member vehicles to react appropriately in the event of a severe collision involving the leading vehicle. We seek to reduce collision damage and optimize energy consumption by encouraging behavior that conforms to the platoon’s environmentally friendly aim. Our study provides insight into the potential benefits of using reinforcement learning algorithms to improve the safety and efficiency of CACC platoons while promoting sustainable transportation. The policy gradient algorithm used in this paper has good convergence in the calculation of the minimum energy consumption problem and the optimal solution of vehicle behavior. In terms of energy consumption metrics, the policy gradient algorithm is used first in the IVC field for training the proposed platoon problem. It is a feasible training decision-planning algorithm for solving the minimization of energy consumption caused by decision making in platoon avoidance behavior.
Article
Full-text available
The Industrial Internet of Things (IIoT) revolution has emerged as a promising network that enhanced information dissemination about the city's resources. This city's resources are wirelessly connected to different constrained devices (such as sensors, robotics, and actuators). However, the communication of this wireless information is threatened by several malicious attacks, cyber-attacks, and hackers. This is due to unsecured IIoT networks that were exposed as a potential back door entry point for the attacks. Consequently, this study aims to develop a security framework for the smart cities’ sustainability edge computing vulnerabilities using Petri Net and Genetic Algorithm-Based Reinforcement Learning (GARL). First, a common trust model for addressing information outflows in the network using a distributed authorization algorithm is proposed. This algorithm is implemented on a secure framework modeling in Petri Net called secure trust-aware philosopher privacy and authentication (STAPPA) for mitigation of the privacy breach in the networks. Genetic Algorithm-based Reinforcement Learning (GARL) is used to optimize the search, detect anomalies, and shortest route during the agent learning in the environment. The detection and accuracy rate results obtained over a secure framework using reinforcement learning are 98.75, 99, 99.50, 99.75, and 100% during simulation in the network environment. The average sensitivity of the detection rate is 1.000, while the average specificity outcome is 0.868. The result of the GARL simulation model obtained shows the best distance of 238.84 * 10−3 fitness when the search space is optimized by reducing the number of chromosomes to 10 in the model. These approaches help to detect anomalies and prevent unauthorized users from accessing edge computing components in the city architecture.
Article
Full-text available
Climate change has become a major problem for humanity in the last two decades. One of the reasons that caused it, is our daily energy waste. People consume electricity in order to use home/work appliances and devices and also reach certain levels of comfort while working or being at home. However, even though the environmental impact of this behavior is not immediately observed, it leads to increased CO2 emissions coming from energy generation from power plants. It has been shown that about 40% of these emissions come from the electricity consumption and also that about 20% of this percentage could have been saved if we started using energy more efficiently. Confronting such a problem efficiently will affect both the environment and our society. Monitoring energy consumption in real-time, changing energy wastage behavior of occupants and using automations with incorporated energy savings scenarios, are ways to decrease global energy footprint. In this review, we study intelligent systems for energy management in residential, commercial and educational buildings, classifying them in two major categories depending on whether they provide direct or indirect control. The article also discusses what the strengths and weaknesses are, which optimization techniques do they use and finally, provide insights about how these systems can be improved in the future.
Article
Full-text available
The need for a greener and more sustainable energy system evokes a need for more extensive energy system transition research. The penetration of distributed energy resources and Internet of Things technologies facilitate energy system transition towards the next generation of energy system concepts. The next generation of energy system concepts include “integrated energy system”, “multi-energy system”, or “smart energy system”. These concepts reveal that future energy systems can integrate multiple energy carriers with autonomous intelligent decision making. There are noticeable trends in using the agent-based method in research of energy systems, including multi-energy system transition simulation with agent-based modeling (ABM) and multi-energy system management with multi-agent system (MAS) modeling. The need for a comprehensive review of the applications of the agent-based method motivates this review article. Thus, this article aims to systematically review the ABM and MAS applications in multi-energy systems with publications from 2007 to the end of 2021. The articles were sorted into MAS and ABM applications based on the details of agent implementations. MAS application papers in building energy systems, district energy systems, and regional energy systems are reviewed with regard to energy carriers, agent control architecture, optimization algorithms, and agent development environments. ABM application papers in behavior simulation and policy-making are reviewed with regard to the agent decision-making details and model objectives. In addition, the potential future research directions in reinforcement learning implementation and agent control synchronization are highlighted. The review shows that the agent-based method has great potential to contribute to energy transition studies with its plug-and-play ability and distributed decision-making process.
Article
We study safe policy improvement (SPI) for partially observable Markov decision processes (POMDPs). SPI is an offline reinforcement learning (RL) problem that assumes access to (1) historical data about an environment, and (2) the so-called behavior policy that previously generated this data by interacting with the environment. SPI methods neither require access to a model nor the environment itself, and aim to reliably improve upon the behavior policy in an offline manner. Existing methods make the strong assumption that the environment is fully observable. In our novel approach to the SPI problem for POMDPs, we assume that a finite-state controller (FSC) represents the behavior policy and that finite memory is sufficient to derive optimal policies. This assumption allows us to map the POMDP to a finite-state fully observable MDP, the history MDP. We estimate this MDP by combining the historical data and the memory of the FSC, and compute an improved policy using an off-the-shelf SPI algorithm. The underlying SPI method constrains the policy space according to the available data, such that the newly computed policy only differs from the behavior policy when sufficient data is available. We show that this new policy, converted into a new FSC for the (unknown) POMDP, outperforms the behavior policy with high probability. Experimental results on several well-established benchmarks show the applicability of the approach, even in cases where finite memory is not sufficient.
Article
Due to its environmental and energy sustainability, electric vehicles (EV) have emerged as the preferred option in the current transportation system. Uncontrolled EV charging, however, can raise consumers; charging costs and overwhelm the grid. Smart charging coordination systems are required to prevent the grid overload caused by charging too many electric vehicles at once. In light of the baseload that is present in the power grid, this research suggests an improved reinforcement learning charging management system. An optimization method, however, requires some knowledge in advance, such as the time the vehicle departs and how much energy it will need when it arrives at the charging station. Therefore, under realistic operating conditions, our improved Reinforcement Learning method with Double Deep Q-learning approach provides an adjustable, scalable, and flexible strategy for an electric car fleet. Our proposed approach provides fair value which solves the over-estimation action value problem in deep Q-learning. Then, a number of different charging strategies are compared to the Reinforcement Learning algorithm. The proposed Reinforcement Learning technique minimizes the variance of the overall load by 68 % when compared to an uncontrolled charging strategy.